As I�ve described it so far, tokenisation is a breeze. You have an array of characters, which you scan sequentially. Depending on what you see first, you�ll either have a string, an integer, a word or punctuation. You read in the whole unit, create a node for it, ignore white space and move on to the next word. Easy!
Well almost, but not quite. There�s a slight blur between integers, punctuation and words. Some pieces of punctuation can map directly onto words (see the discussion on modes which follows), and some words can map directly onto integers (e.g. TEN is 10). OK, I�m sure you can handle that now you know in advance to program for it, but there�s a further problem: it�s possible that some words can carry implicit punctuation effects. When you look up a word, you therefore ought to check whether it affects the remainder of the sentence. In particular, you should look out for enquoting verbs.
Enquoting verbs are verbs that implicitly quote the remainder of a sentence. A normal verb is an action, for example LAUGH. Verbs can take parameters (unfortunately called objects in grammatical terms), for example EAT CHEESE. Some verbs can take strings as parameters, for example WRITE "none of the above" ON BALLOT PAPER. Now, for a few of those verbs that take strings as a parameter, you really don�t want to have to put in the quotes every time, for example SAY "hidey hi!". People would much rather type it without the quotes. If they did that, though, the remainder of the sentence would be taken as regular input, yet sadly SAY HIDEY HI! doesn�t parse � it�s not proper English (or proper anything else!).
By making SAY is an enquoting verb, however, the problem disappears. If the next symbol following SAY isn�t a string (i.e. doesn�t begin with a quotation mark), then the tokeniser will assume it�s seen one anyway and enquote the rest of the line. Occasionally you do need to put the marks (e.g. SAY "hello" TO BILL), but if there�s only one parameter you don�t. You can also implement verbs that enquote their second parameter, e.g. TELL BILL TO "go west"; they�re not much harder to do. Apart from their effects on tokenisation, either kind of enquoting verb is otherwise just the same as any normal verb.
Because each time you look up a symbol you have to check to see if it affects the remainder of the parse, it�s a one-symbol lookahead system. Computer languages are typically designed so that you can parse a whole program by deciding what to do next at any point solely on the basis of what symbol you�re looking at. As we�ll discover, though, things do get a bit harder for MUDs at the grammatical (rather than the word) level.
Modes
Sometimes, you want to talk directly to the tokeniser to tell it how you want stuff tokenised. If it parsed this input like a normal command, it wouldn�t know it was supposed to do something special as a result. More importantly, it might not be able to parse it at all!
What I�ve been describing so far is command mode. This is where what you type is considered to be a command. For MUDs, command mode is the mode in which people spend most of their time.
There are, however, other modes you can have. The convention that has evolved is to put special characters at the beginning of a line to tell the tokeniser to operate in a different mode for that line, with a further option to switch into the mode the whole time until further notice. For example, @ might mean "coding mode for this line only" and /@ might mean "coding mode for this and subsequent lines".
Here are some examples of common modes and the start-of-line flags they tend to use (some of which conflict with others):
> / . Command mode: input is a direct command to the game.
@ Coding mode: input is an admin command for programming the game.
" � ` Conversation mode: input is a parameter to the SAY command.
; : Acting mode: input is a parameter to the POSE/EMOTE/ACT command.
? Help mode: input is a parameter to the HELP command.
/ \ $
Switch mode: input is for the tokeniser itself.
Command mode is the default in text MUDs. Conversation mode is the default in graphical MUDs.