April 2002 Draft
JavaScript 2.0
Formal Description
Stages
|
|
Thursday, October 18, 2001
The source code is processed in the following stages:
- If necessary, convert the source code into the Unicode UTF-16 format, normalized
form C.
- Remove any Unicode format control characters (category Cf) from the source code.
- Simultaneously split the source code into input elements using the lexical grammar and semantics
and parse it using the syntactic grammar to obtain a parse tree P.
- Evaluate P using the syntactic semantics by computing the action Eval
on it.
Lexing and Parsing
Processing stage 3 is done as follows:
- Let inputElements be an empty array of input elements (syntactic grammar terminals
and line breaks).
- Let input be the input sequence of Unicode characters. Append a special placeholder End
to the end of input.
- Let state be a variable that holds one of the constants re, div,
or unit. Initialize it to re.
- Apply the lexical grammar to parse the longest possible prefix of input. Use
the start symbol NextInputElementre,
NextInputElementdiv,
or NextInputElementunit
depending on whether state is re, div, or
unit, respectively. The result of the parse should be a lexical grammar parse tree T.
If the parse failed, return a syntax error.
- Compute the action InputElement on T to obtain an InputElement
e.
- If e is the end input element, go to step 15.
- Remove the characters matched by T from input, leaving only the yet-unlexed suffix of input.
- Interpret e as a syntactic grammar terminal or line break
as follows:
- A lineBreak is interpreted as a line break, which is
not a terminal itself but indicates one or more line breaks between two terminals. It prevents the syntactic
grammar from matching any productions that have a [no line break] annotation in the place where the
lineBreak occurred.
- An identifier s is interpreted as the terminal
Identifier. Applying the semantic action Name to the Identifier
returns the identifier’s String
s.
- A keyword s is interpreted as the reserved
word, future reserved word, or non-reserved word terminal corresponding
to the keyword’s String
s.
- A punctuator s is interpreted as the punctuation
token or future punctuation token terminal corresponding to the punctuator’s
String s.
- A number x is interpreted as the terminal
Number. Applying the semantic action Eval to the Number
returns the number’s Float64
x.
- A string s is interpreted as the terminal
String. Applying the semantic action Eval to the String
returns the string’s String
s.
- A regularExpression z is interpreted as the
terminal RegularExpression.
- Append the resulting terminal or line break
to the end of the inputElements array.
- If the inputElements array forms a valid prefix of the context-free language defined by the syntactic
grammar, go to step 13.
- If is not a lineBreak
but the previous element of the inputElements array is a lineBreak,
then insert a VirtualSemicolon terminal between that lineBreak
and in the inputElements array.
- If the inputElements array still does not form a valid prefix of the context-free language defined by the syntactic
grammar, signal a syntax error and stop.
- If is a Number,
then set state to unit. Otherwise, if the inputElements array followed
by the terminal
/
forms a valid prefix of the context-free language defined by the
syntactic grammar, then set state to div;
otherwise, set state to re.
- Go to step 4.
- If the inputElements array does not form a valid sentence of the context-free language defined by the syntactic
grammar, signal a syntax error and stop.
- Return the parse tree obtained by the syntactic grammar’s derivation of the sentence
formed by the inputElements array.