This page is out of date

The source code is processed in the following stages:

If necessary, convert the source code into the Unicode UTF-16 format, normalized form C.
Split the source code into tokens using the lexer grammar and lexer semantics.
Parse the resulting sequence of tokens using the parser grammar and evaluate it using the parser semantics [To be provided].

Lexing

Processing stage 2 is done as follows:

Let tokens be an empty array of Token metalanguage records. (As defined in the lexer semantics, a Token can be either an identifier, a keyword, a punctuation symbol, a number, a number with a unit, a string, or the end token.)
Let input be the input sequence of Unicode characters. Append a special placeholder End to the end of input.
Let regExpMayFollow be a Boolean variable. Initialize it to true.
Apply the lexer grammar to parse the longest possible prefix of input. If regExpMayFollow is true, use the start symbol NextToken^re. If regExpMayFollow is false, use the start symbol NextToken^div. The result of the parse should be a parse tree T. If the parse failed, return a syntax error.
Compute the action Token on T to obtain a Token t. If t is the end token, return the tokens array and go to the parse stage.
Append t to the end of the tokens array.
Compute the action RegExpMayFollow on T to obtain a Boolean value and assign that value to the regExpMayFollow variable.
Remove the characters matched by T from input, leaving only the yet-unparsed suffix of input.
Go to step 4.

If an implementation encounters an error while lexing, it is permitted to either report the error immediately or defer it until the affected token would actually be used by the parser. This flexibility allows an implementation to do lexing at the same time it parses the source program.

Show mapping from Token structures to parser grammar terminals (obvious, but needs to be written).

Parsing

To be provided

Waldemar Horwat
Last modified Sunday, April 30, 2000