Contents
Back
Forward

24. How nouns are parsed


The Naming of Cats is a difficult matter,
It isn't just one of your holiday games;
You may think at first I'm as mad as a hatter
When I tell you, a cat must have THREE DIFFERENT NAMES.

...T. S. Eliot (1888--1965), The Naming of Cats

Bulldust, coolamon, dashiki, fizgig, grungy, jirble, pachinko, poodle-faker, sharny, taghairm

...Catachrestic words from Chambers English Dictionary

Suppose we have a tomato defined with
    name "fried" "green" "tomato",
but which is going to redden later and need to be referred to as "red tomato''. It's perfectly straightforward to alter the name property of an object, which is a word array of dictionary words. For example,
[ Names obj i;
  for (i=0:2*i<obj.#name:i++) print (address) (obj.&name)-->i, "^";
];
prints out the list of dictionary words held in name for a given object. It's perfectly possible to write to this, so we could just set
  (tomato.&name)-->1 = 'red';
but this is not a flexible or elegant solution, and it's time to begin delving into the parser.

/\ Note that we can't change the size of the name array. To simulate this, we could define the object with name set to, say, 30 copies of an 'untypeable word' (see below) such as 'blank.'.

The Inform parser is designed to be as "open-access'' as possible, because a parser cannot ever be general enough for every game without being highly modifiable. The first thing it does is to read in text from the keyboard and break it up into a stream of words: so the text "wizened man, eat the grey bread'' becomes

wizened / man / , / eat / the / grey / bread
and these words are numbered from 1. At all times the parser keeps a "word number'' marker to keep its place along this line, and this is held in the variable wn. The routine NextWord() returns the word at the current position of the marker, and moves it forward, i.e. adds 1 to wn. For instance, the parser may find itself at word 6 and trying to match "grey bread'' as the name of an object. Calling NextWord() gives the value 'grey' and calling it again gives 'bread'.

Note that if the player had mistyped "grye bread'', "grye'' being a word which isn't mentioned anywhere in the program or created by the library, NextWord() returns 0 for 'misunderstood word'. Writing something like if (w=='grye') ... somewhere in the program makes Inform put "grye'' into the dictionary automatically.

/\ Remember that the game's dictionary only has 9-character resolution. (And only 6 if Inform has been told to compile an early-model story file: see Section 31.) Thus the values of 'polyunsaturate' and 'polyunsaturated' are equal. Also, upper case and lower case letters are considered the same. Words are permitted to contain numerals or symbols (but not at present to contain accented characters).

/\/\ A dictionary word can even contain spaces, full stops or commas. If so it is 'untypeable'. For instance, 'in,out' is an untypeable word because if the player does type it then the parser cuts it into three, never checking the dictionary for the entire word. Thus the constant 'in,out' can never be anything that NextWord returns. This can actually be useful (as it was in Section 16).

/\ It can also be useful to check for numbers. The library routine TryNumber(wordnum) tries to parse the word at wordnum as a number (recognising decimal numbers and English ones from "one'' to "twenty''), returning -1000 if it fails altogether, or else the number. Values exceeding 10000 are rounded down to 10000.

/\/\ Sometimes there is no alternative but to actually look at the player's text one character at a time (for instance, to check a 20-digit phone number). The routine WordAddress(wordnum) returns a byte array of the characters in the word, and WordLength(wordnum) tells you how many characters there are in it. Thus in the above example,
    thetext = WordAddress(4);
    print WordLength(4), " ", (char) thetext->0, (char) thetext->2;
prints the text "3 et''.

An object can affect how its name is parsed by giving a parse_name routine. This is expected to try to match as many words as possible starting from the current position of wn, reading them in one at a time using the NextWord() routine. Thus it must not stop just because the first word makes sense, but must keep reading and find out how many words in a row make sense. It should return:

0 if the text didn't make any sense at all,
k if k words in a row of the text seem to refer to the object, or
-1 to tell the parser it doesn't want to decide after all.
The word marker wn can be left anywhere afterwards. For example:
Object -> thing "weird thing"
  with parse_name
       [ i; while (NextWord()=='weird' or 'thing') i++;
            return i;
       ];
This definition duplicates (very nearly) the effect of having defined:
Object -> thing "weird thing"
  with name "weird" "thing";
Which isn't very useful. But the tomato can now be coded up with
       parse_name
       [ i j; if (self has general) j='red'; else j='green';
            while (NextWord()=='tomato' or 'fried' or j) i++;
            return i;
       ],
so that "green" only applies until its general attribute has been set, whereupon "red'' does.

??EXERCISE 56:
(link to
the answer)
Rewrite this to insist that the adjectives must come before the noun, which must be present.

??EXERCISE 57:
(link to
the answer)
Create a musician called Princess who, when kissed, is transformed into "/?%?/ (the artiste formerly known as Princess)''.

??EXERCISE 58:
(link to
the answer)
(Cf. 'Cafè Inform'.) Construct a drinks machine capable of serving cola, coffee or tea, using only one object for the buttons and one for the possible drinks.

/\ parse_name is also used to spot plurals: see Section 25.

Suppose that an object doesn't have a parse_name routine, or that it has but it returned -1. The parser then looks at the name words. It recognises any arrangement of some or all of these words as a match (the more words, the better). Thus "fried green tomato'' is understood, as are "fried tomato'' and "green tomato''. On the other hand, so are "fried green'' and "green green tomato green fried green''. This method is quick and good at understanding a wide variety of sensible inputs, though bad at throwing out foolish ones.

However, you can affect this by using the ParseNoun entry point. This is called with one argument, the object in question, and should work exactly as if it were a parse_name routine: i.e., returning -1, 0 or the number of words matched as above. Remember that it is called very often and should not be horribly slow. For example, the following duplicates what the parser usually does:

[ ParseNoun obj n;
  while (IsAWordIn(NextWord(),obj,name) == 1) n++; return n;
];
[ IsAWordIn w obj prop   k l m;
  k=obj.&prop; l=(obj.#prop)/2;
  for (m=0:m<l:m++)
      if (w==k-->m) rtrue;
  rfalse;
];
In this example IsAWordIn just checks to see if w is one of the entries in the word array obj.&prop.

??/\EXERCISE 59:
(link to
the answer)
Many adventure-game parsers split object names into 'adjectives' and 'nouns', so that only the pattern <0 or more adjectives> <1 or more nouns> is recognised. Implement this.

??/\EXERCISE 60:
(link to
the answer)
During debugging it sometimes helps to be able to refer to objects by their internal numbers, so that "put object 31 on object 5'' would work. Implement this.

??/\EXERCISE 61:
(link to
the answer)
How could the word "#'' be made a wild-card, meaning "match any single object''?

??/\/\EXERCISE 62:
(link to
the answer)
And how could "*'' be a wild-card for "match any collection of objects''?

??/\/\EXERCISE 63:
(link to
the answer)
There is no problem with calling a container "hole in wall'', because the parser will understand "put apple in hole in wall'' as "put (apple) in (hole in wall)''. But create a fly in amber, so that "put fly in amber in hole in wall'' works properly and isn't misinterpreted as "put (fly) in (amber in hole in wall)''. (Warning: you may need to know about the BeforeParsing entry point (see Section 26) and the format of the parse buffer (see Section 27).)

*REFERENCES:
Straightforward parse_name examples are the chess-pieces object and the kittens class of 'Alice Through The Looking-Glass'. Lengthier ones are found in 'Balances', especially in the white cubes class.

Contents / Back / Forward
Chapter I / Chapter II / Chapter III / Chapter IV / Chapter V / Chapter VI / Appendix
Mechanically translated to HTML from third edition as revised 16 May 1997. Copyright © Graham Nelson 1993, 1994, 1995, 1996, 1997: all rights reserved.