Section 24: How nouns are parsed


Contents Back Forward	24. How nouns are parsed
	The Naming of Cats is a difficult matter, It isn't just one of your holiday games; You may think at first I'm as mad as a hatter When I tell you, a cat must have THREE DIFFERENT NAMES. ...T. S. Eliot (1888--1965), The Naming of Cats Bulldust, coolamon, dashiki, fizgig, grungy, jirble, pachinko, poodle-faker, sharny, taghairm ...Catachrestic words from Chambers English Dictionary
	Suppose we have a tomato defined with name "fried" "green" "tomato", but which is going to redden later and need to be referred to as "red tomato''. It's perfectly straightforward to alter the `name` property of an object, which is a word array of dictionary words. For example, [ Names obj i; for (i=0:2*i<obj.#name:i++) print (address) (obj.&name)-->i, "^"; ]; prints out the list of dictionary words held in `name` for a given object. It's perfectly possible to write to this, so we could just set (tomato.&name)-->1 = 'red'; but this is not a flexible or elegant solution, and it's time to begin delving into the parser.
$/\$	Note that we can't change the size of the `name` array. To simulate this, we could define the object with `name` set to, say, 30 copies of an 'untypeable word' (see below) such as `'blank.'`.
	The Inform parser is designed to be as "open-access'' as possible, because a parser cannot ever be general enough for every game without being highly modifiable. The first thing it does is to read in text from the keyboard and break it up into a stream of words: so the text "wizened man, eat the grey bread'' becomes `wizened` / `man` / `,` / `eat` / `the` / `grey` / `bread` and these words are numbered from 1. At all times the parser keeps a "word number'' marker to keep its place along this line, and this is held in the variable `wn`. The routine `NextWord()` returns the word at the current position of the marker, and moves it forward, i.e. adds 1 to `wn`. For instance, the parser may find itself at word 6 and trying to match "grey bread'' as the name of an object. Calling `NextWord()` gives the value `'grey'` and calling it again gives `'bread'`. Note that if the player had mistyped "grye bread'', "grye'' being a word which isn't mentioned anywhere in the program or created by the library, `NextWord()` returns 0 for 'misunderstood word'. Writing something like `if (w=='grye') ...` somewhere in the program makes Inform put "grye'' into the dictionary automatically.
$/\$	Remember that the game's dictionary only has 9-character resolution. (And only 6 if Inform has been told to compile an early-model story file: see Section 31.) Thus the values of `'polyunsaturate'` and `'polyunsaturated'` are equal. Also, upper case and lower case letters are considered the same. Words are permitted to contain numerals or symbols (but not at present to contain accented characters).

$/\/\$	A dictionary word can even contain spaces, full stops or commas. If so it is 'untypeable'. For instance, `'in,out'` is an untypeable word because if the player does type it then the parser cuts it into three, never checking the dictionary for the entire word. Thus the constant `'in,out'` can never be anything that `NextWord` returns. This can actually be useful (as it was in Section 16).

$/\$	It can also be useful to check for numbers. The library routine `TryNumber(wordnum)` tries to parse the word at `wordnum` as a number (recognising decimal numbers and English ones from "one'' to "twenty''), returning -1000 if it fails altogether, or else the number. Values exceeding 10000 are rounded down to 10000.

$/\/\$	Sometimes there is no alternative but to actually look at the player's text one character at a time (for instance, to check a 20-digit phone number). The routine `WordAddress(wordnum)` returns a byte array of the characters in the word, and `WordLength(wordnum)` tells you how many characters there are in it. Thus in the above example, thetext = WordAddress(4); print WordLength(4), " ", (char) thetext->0, (char) thetext->2; prints the text "3 et''.
	An object can affect how its name is parsed by giving a `parse_name` routine. This is expected to try to match as many words as possible starting from the current position of `wn`, reading them in one at a time using the `NextWord()` routine. Thus it must not stop just because the first word makes sense, but must keep reading and find out how many words in a row make sense. It should return: 0 if the text didn't make any sense at all, k if k words in a row of the text seem to refer to the object, or -1 to tell the parser it doesn't want to decide after all. The word marker `wn` can be left anywhere afterwards. For example: Object -> thing "weird thing" with parse_name [ i; while (NextWord()=='weird' or 'thing') i++; return i; ]; This definition duplicates (very nearly) the effect of having defined: Object -> thing "weird thing" with name "weird" "thing"; Which isn't very useful. But the tomato can now be coded up with parse_name [ i j; if (self has general) j='red'; else j='green'; while (NextWord()=='tomato' or 'fried' or j) i++; return i; ], so that "green" only applies until its `general` attribute has been set, whereupon "red'' does.
	EXERCISE 56: (link to the answer)
	Rewrite this to insist that the adjectives must come before the noun, which must be present.
	EXERCISE 57: (link to the answer)
	Create a musician called Princess who, when kissed, is transformed into "`/?%?/` (the artiste formerly known as Princess)''.
	EXERCISE 58: (link to the answer)
	(Cf. 'Cafè Inform'.) Construct a drinks machine capable of serving cola, coffee or tea, using only one object for the buttons and one for the possible drinks.
$/\$	`parse_name` is also used to spot plurals: see Section 25.
	Suppose that an object doesn't have a `parse_name` routine, or that it has but it returned -1. The parser then looks at the `name` words. It recognises any arrangement of some or all of these words as a match (the more words, the better). Thus "fried green tomato'' is understood, as are "fried tomato'' and "green tomato''. On the other hand, so are "fried green'' and "green green tomato green fried green''. This method is quick and good at understanding a wide variety of sensible inputs, though bad at throwing out foolish ones. However, you can affect this by using the `ParseNoun` entry point. This is called with one argument, the object in question, and should work exactly as if it were a `parse_name` routine: i.e., returning -1, 0 or the number of words matched as above. Remember that it is called very often and should not be horribly slow. For example, the following duplicates what the parser usually does: [ ParseNoun obj n; while (IsAWordIn(NextWord(),obj,name) == 1) n++; return n; ]; [ IsAWordIn w obj prop k l m; k=obj.&prop; l=(obj.#prop)/2; for (m=0:m<l:m++) if (w==k-->m) rtrue; rfalse; ]; In this example `IsAWordIn` just checks to see if `w` is one of the entries in the word array `obj.&prop`.
$??/\$	EXERCISE 59: (link to the answer)
	Many adventure-game parsers split object names into 'adjectives' and 'nouns', so that only the pattern <0 or more adjectives> <1 or more nouns> is recognised. Implement this.
$??/\$	EXERCISE 60: (link to the answer)
	During debugging it sometimes helps to be able to refer to objects by their internal numbers, so that "put object 31 on object 5'' would work. Implement this.
$??/\$	EXERCISE 61: (link to the answer)
	How could the word "`#`'' be made a wild-card, meaning "match any single object''?
$??/\/\$	EXERCISE 62: (link to the answer)
	And how could "`*`'' be a wild-card for "match any collection of objects''?
$??/\/\$	EXERCISE 63: (link to the answer)
	There is no problem with calling a container "hole in wall'', because the parser will understand "put apple in hole in wall'' as "put (apple) in (hole in wall)''. But create a fly in amber, so that "put fly in amber in hole in wall'' works properly and isn't misinterpreted as "put (fly) in (amber in hole in wall)''. (Warning: you may need to know about the `BeforeParsing` entry point (see Section 26) and the format of the `parse` buffer (see Section 27).)
	REFERENCES: Straightforward `parse_name` examples are the chess-pieces object and the kittens class of 'Alice Through The Looking-Glass'. Lengthier ones are found in 'Balances', especially in the white cubes class.

Contents / Back / Forward
Chapter I / Chapter II / Chapter III / Chapter IV / Chapter V / Chapter VI / Appendix

24. How nouns are parsed