JavaScript 2.0 Lexical Semantics

April 2002 Draft

JavaScript 2.0

Formal Description

Lexical Semantics

Thursday, February 7, 2002

The lexical semantics describe the actions the lexer takes in order to transform an input stream of Unicode characters into a stream of tokens. For convenience, the lexical grammar is repeated here. See also the description of the semantic notation.

This document is also available as a Word 98 rtf file.

The start symbols are: NextInputElement^unit if the previous input element was a number; NextInputElement^re if the previous input element was not a number and a / should be interpreted as a regular expression; and NextInputElement^div if the previous input element was not a number and a / should be interpreted as a division or division-assignment operator.

Semantics

tag lineBreak;

tag endOfInput;

tuple Keyword

end tuple;

tuple Punctuator

end tuple;

tuple Identifier

end tuple;

tuple Number

value: Float64

end tuple;

tuple RegularExpression

body: String,

flags: String

end tuple;

Token = Keyword Punctuator Identifier Number String RegularExpression;

InputElement = {lineBreak, endOfInput} Token;

tag syntaxError;

SemanticException = {syntaxError};

Unicode Character Classes

Syntax

UnicodeCharacter Any Unicode character

UnicodeInitialAlphabetic Any Unicode initial alphabetic character (includes ASCII A-Z and a-z)

UnicodeAlphanumeric Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)

WhiteSpaceCharacter

«TAB» | «VT» | «FF» | «SP» | «u00A0»

| «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»

| «u2008» | «u2009» | «u200A» | «u200B»

| «u3000»

LineTerminator «LF» | «CR» | «u2028» | «u2029»

ASCIIDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

DecimalValue[ASCIIDigit]: Integer = digitValue(ASCIIDigit);

Comments

Syntax

LineComment / / LineCommentCharacters

LineCommentCharacters

«empty»

| LineCommentCharacters NonTerminator

NonTerminator UnicodeCharacter except LineTerminator

SingleLineBlockComment / * BlockCommentCharacters * /

BlockCommentCharacters

«empty»

| BlockCommentCharacters NonTerminatorOrSlash

| PreSlashCharacters /

PreSlashCharacters

«empty»

| BlockCommentCharacters NonTerminatorOrAsteriskOrSlash

| PreSlashCharacters /

NonTerminatorOrSlash NonTerminator except /

NonTerminatorOrAsteriskOrSlash NonTerminator except * | /

MultiLineBlockComment / * MultiLineBlockCommentCharacters BlockCommentCharacters * /

MultiLineBlockCommentCharacters

BlockCommentCharacters LineTerminator

| MultiLineBlockCommentCharacters BlockCommentCharacters LineTerminator

White Space

Syntax

WhiteSpace

«empty»

| WhiteSpace WhiteSpaceCharacter

| WhiteSpace SingleLineBlockComment

Line Breaks

Syntax

LineBreak

LineTerminator

| LineComment LineTerminator

| MultiLineBlockComment

LineBreaks

LineBreak

| LineBreaks WhiteSpace LineBreak

Input Elements

Syntax

{re, div, unit}

NextInputElement^re WhiteSpace InputElement^re

NextInputElement^div WhiteSpace InputElement^div

NextInputElement^unit

[lookahead{ContinuingIdentifierCharacter, \}] WhiteSpace InputElement^div

| [lookahead{_}] IdentifierName

Semantics

Lex[NextInputElement]: InputElement;

Lex[NextInputElement^re WhiteSpace InputElement^re] = Lex[InputElement^re];

Lex[NextInputElement^div WhiteSpace InputElement^div] = Lex[InputElement^div];

Lex[NextInputElement^unit [lookahead{ContinuingIdentifierCharacter, \}] WhiteSpace InputElement^div] = Lex[InputElement^div];

Lex[NextInputElement^unit [lookahead{_}] IdentifierName] = LexName[IdentifierName];

Syntax

InputElement^re

LineBreaks

| IdentifierOrKeyword

| Punctuator

| NumericLiteral

| StringLiteral

| RegExpLiteral

| EndOfInput

InputElement^div

LineBreaks

| IdentifierOrKeyword

| Punctuator

| DivisionPunctuator

| NumericLiteral

| StringLiteral

| EndOfInput

EndOfInput

End

| LineComment End

Semantics

Lex[InputElement]: InputElement;

Lex[InputElement LineBreaks] = lineBreak;

Lex[InputElement IdentifierOrKeyword] = Lex[IdentifierOrKeyword];

Lex[InputElement Punctuator] = Lex[Punctuator];

Lex[InputElement^div DivisionPunctuator] = Lex[DivisionPunctuator];

Lex[InputElement NumericLiteral] = Lex[NumericLiteral];

Lex[InputElement StringLiteral] = Lex[StringLiteral];

Lex[InputElement^re RegExpLiteral] = Lex[RegExpLiteral];

Lex[InputElement EndOfInput] = endOfInput;

Keywords and Identifiers

Syntax

IdentifierName

InitialIdentifierCharacterOrEscape

| NullEscapes InitialIdentifierCharacterOrEscape

| IdentifierName ContinuingIdentifierCharacterOrEscape

| IdentifierName NullEscape

NullEscapes

NullEscape

| NullEscapes NullEscape

NullEscape \ _

InitialIdentifierCharacterOrEscape

InitialIdentifierCharacter

| \ HexEscape

InitialIdentifierCharacter UnicodeInitialAlphabetic | $ | _

ContinuingIdentifierCharacterOrEscape

ContinuingIdentifierCharacter

| \ HexEscape

ContinuingIdentifierCharacter UnicodeAlphanumeric | $ | _

Semantics

LexName[IdentifierName]: String;

LexName[IdentifierName InitialIdentifierCharacterOrEscape] = [LexChar[InitialIdentifierCharacterOrEscape]];

LexName[IdentifierName NullEscapes InitialIdentifierCharacterOrEscape] = [LexChar[InitialIdentifierCharacterOrEscape]];

LexName[IdentifierName₀ IdentifierName₁ ContinuingIdentifierCharacterOrEscape] = LexName[IdentifierName₁] [LexChar[ContinuingIdentifierCharacterOrEscape]];

LexName[IdentifierName₀ IdentifierName₁ NullEscape] = LexName[IdentifierName₁];

ContainsEscapes[IdentifierName]: Boolean;

ContainsEscapes[IdentifierName InitialIdentifierCharacterOrEscape] = ContainsEscapes[InitialIdentifierCharacterOrEscape];

ContainsEscapes[IdentifierName NullEscapes InitialIdentifierCharacterOrEscape] = true;

ContainsEscapes[IdentifierName₀ IdentifierName₁ ContinuingIdentifierCharacterOrEscape] = ContainsEscapes[IdentifierName₁] or ContainsEscapes[ContinuingIdentifierCharacterOrEscape];

ContainsEscapes[IdentifierName IdentifierName NullEscape] = true;

LexChar[InitialIdentifierCharacterOrEscape]: Character;

LexChar[InitialIdentifierCharacterOrEscape InitialIdentifierCharacter] = InitialIdentifierCharacter;

LexChar[InitialIdentifierCharacterOrEscape \ HexEscape]

begin

if isInitialIdentifierCharacter(LexChar[HexEscape]) then

return LexChar[HexEscape]

else throw syntaxError

end if

end;

ContainsEscapes[InitialIdentifierCharacterOrEscape]: Boolean;

ContainsEscapes[InitialIdentifierCharacterOrEscape InitialIdentifierCharacter] = false;

ContainsEscapes[InitialIdentifierCharacterOrEscape \ HexEscape] = true;

LexChar[ContinuingIdentifierCharacterOrEscape]: Character;

LexChar[ContinuingIdentifierCharacterOrEscape ContinuingIdentifierCharacter] = ContinuingIdentifierCharacter;

LexChar[ContinuingIdentifierCharacterOrEscape \ HexEscape]

begin

if isContinuingIdentifierCharacter(LexChar[HexEscape]) then

return LexChar[HexEscape]

else throw syntaxError

end if

end;

ContainsEscapes[ContinuingIdentifierCharacterOrEscape]: Boolean;

ContainsEscapes[ContinuingIdentifierCharacterOrEscape ContinuingIdentifierCharacter] = false;

ContainsEscapes[ContinuingIdentifierCharacterOrEscape \ HexEscape] = true;

reservedWords: String[] = [“abstract”, “as”, “break”, “case”, “catch”, “class”, “const”, “continue”, “debugger”, “default”, “delete”, “do”, “else”, “enum”, “export”, “extends”, “false”, “final”, “finally”, “for”, “function”, “goto”, “if”, “implements”, “import”, “in”, “instanceof”, “interface”, “is”, “namespace”, “native”, “new”, “null”, “package”, “private”, “protected”, “public”, “return”, “static”, “super”, “switch”, “synchronized”, “this”, “throw”, “throws”, “transient”, “true”, “try”, “typeof”, “use”, “var”, “volatile”, “while”, “with”];

nonReservedWords: String[] = [“exclude”, “get”, “include”, “named”, “set”];

keywords: String[] = reservedWords nonReservedWords;

proc member(id: String, list: String[]): Boolean

if list = [] then return false end if;

if id = list[0] then return true end if;

return member(id, list[1 ...])

end proc;

Syntax

IdentifierOrKeyword IdentifierName

Semantics

Lex[IdentifierOrKeyword IdentifierName]: InputElement

begin

id: String LexName[IdentifierName];

if member(id, keywords) and not ContainsEscapes[IdentifierName] then

return Keywordname: id

else return Identifiername: id

end if

end;

Punctuators

Syntax

Punctuator

!

| ! =

| ! = =

| %

| % =

| &

| & &

| & & =

| & =

| (

| )

| *

| * =

| +

| + +

| + =

| ,

| -

| - -

| - =

| .

| . . .

| :

| : :

| ;

| <

| < <

| < < =

| < =

| =

| = =

| = = =

| >

| > =

| > >

| > > =

| > > >

| > > > =

| ?

| [

| ]

| ^

| ^ =

| ^ ^

| ^ ^ =

| {

| |

| | =

| | |

| | | =

| }

| ~

DivisionPunctuator

/ [lookahead{/, *}]

| / =

Semantics

Lex[Punctuator]: Token;

Lex[Punctuator !] = Punctuatorname: “!”;

Lex[Punctuator ! =] = Punctuatorname: “!=”;

Lex[Punctuator ! = =] = Punctuatorname: “!==”;

Lex[Punctuator %] = Punctuatorname: “%”;

Lex[Punctuator % =] = Punctuatorname: “%=”;

Lex[Punctuator &] = Punctuatorname: “&”;

Lex[Punctuator & &] = Punctuatorname: “&&”;

Lex[Punctuator & & =] = Punctuatorname: “&&=”;

Lex[Punctuator & =] = Punctuatorname: “&=”;

Lex[Punctuator (] = Punctuatorname: “(”;

Lex[Punctuator )] = Punctuatorname: “)”;

Lex[Punctuator *] = Punctuatorname: “*”;

Lex[Punctuator * =] = Punctuatorname: “*=”;

Lex[Punctuator +] = Punctuatorname: “+”;

Lex[Punctuator + +] = Punctuatorname: “++”;

Lex[Punctuator + =] = Punctuatorname: “+=”;

Lex[Punctuator ,] = Punctuatorname: “,”;

Lex[Punctuator -] = Punctuatorname: “-”;

Lex[Punctuator - -] = Punctuatorname: “--”;

Lex[Punctuator - =] = Punctuatorname: “-=”;

Lex[Punctuator .] = Punctuatorname: “.”;

Lex[Punctuator . . .] = Punctuatorname: “...”;

Lex[Punctuator :] = Punctuatorname: “:”;

Lex[Punctuator : :] = Punctuatorname: “::”;

Lex[Punctuator ;] = Punctuatorname: “;”;

Lex[Punctuator <] = Punctuatorname: “<”;

Lex[Punctuator < <] = Punctuatorname: “<<”;

Lex[Punctuator < < =] = Punctuatorname: “<<=”;

Lex[Punctuator < =] = Punctuatorname: “<=”;

Lex[Punctuator =] = Punctuatorname: “=”;

Lex[Punctuator = =] = Punctuatorname: “==”;

Lex[Punctuator = = =] = Punctuatorname: “===”;

Lex[Punctuator >] = Punctuatorname: “>”;

Lex[Punctuator > =] = Punctuatorname: “>=”;

Lex[Punctuator > >] = Punctuatorname: “>>”;

Lex[Punctuator > > =] = Punctuatorname: “>>=”;

Lex[Punctuator > > >] = Punctuatorname: “>>>”;

Lex[Punctuator > > > =] = Punctuatorname: “>>>=”;

Lex[Punctuator ?] = Punctuatorname: “?”;

Lex[Punctuator [] = Punctuatorname: “[”;

Lex[Punctuator ]] = Punctuatorname: “]”;

Lex[Punctuator ^] = Punctuatorname: “^”;

Lex[Punctuator ^ =] = Punctuatorname: “^=”;

Lex[Punctuator ^ ^] = Punctuatorname: “^^”;

Lex[Punctuator ^ ^ =] = Punctuatorname: “^^=”;

Lex[Punctuator {] = Punctuatorname: “{”;

Lex[Punctuator |] = Punctuatorname: “|”;

Lex[Punctuator | =] = Punctuatorname: “|=”;

Lex[Punctuator | |] = Punctuatorname: “||”;

Lex[Punctuator | | =] = Punctuatorname: “||=”;

Lex[Punctuator }] = Punctuatorname: “}”;

Lex[Punctuator ~] = Punctuatorname: “~”;

Lex[DivisionPunctuator]: Token;

Lex[DivisionPunctuator / [lookahead{/, *}]] = Punctuatorname: “/”;

Lex[DivisionPunctuator / =] = Punctuatorname: “/=”;

Numeric Literals

Syntax

NumericLiteral

DecimalLiteral

| HexIntegerLiteral [lookahead{HexDigit}]

Semantics

Lex[NumericLiteral]: Token;

Lex[NumericLiteral DecimalLiteral] = Numbervalue: realToFloat64(LexNumber[DecimalLiteral]);

Lex[NumericLiteral HexIntegerLiteral [lookahead{HexDigit}]] = Numbervalue: realToFloat64(LexNumber[HexIntegerLiteral]);

Syntax

DecimalLiteral

Mantissa

| Mantissa LetterE SignedInteger

LetterE E | e

Mantissa

DecimalIntegerLiteral

| DecimalIntegerLiteral .

| DecimalIntegerLiteral . Fraction

| . Fraction

DecimalIntegerLiteral

0

| NonZeroDecimalDigits

NonZeroDecimalDigits

NonZeroDigit

| NonZeroDecimalDigits ASCIIDigit

NonZeroDigit 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Fraction DecimalDigits

Semantics

LexNumber[DecimalLiteral]: Rational;

LexNumber[DecimalLiteral Mantissa] = LexNumber[Mantissa];

LexNumber[DecimalLiteral Mantissa LetterE SignedInteger] = LexNumber[Mantissa]10^{LexNumber[SignedInteger]};

LexNumber[Mantissa]: Rational;

LexNumber[Mantissa DecimalIntegerLiteral] = LexNumber[DecimalIntegerLiteral];

LexNumber[Mantissa DecimalIntegerLiteral .] = LexNumber[DecimalIntegerLiteral];

LexNumber[Mantissa DecimalIntegerLiteral . Fraction] = LexNumber[DecimalIntegerLiteral] + LexNumber[Fraction];

LexNumber[Mantissa . Fraction] = LexNumber[Fraction];

LexNumber[DecimalIntegerLiteral]: Integer;

LexNumber[DecimalIntegerLiteral 0] = 0;

LexNumber[DecimalIntegerLiteral NonZeroDecimalDigits] = LexNumber[NonZeroDecimalDigits];

LexNumber[NonZeroDecimalDigits]: Integer;

LexNumber[NonZeroDecimalDigits NonZeroDigit] = DecimalValue[NonZeroDigit];

LexNumber[NonZeroDecimalDigits₀ NonZeroDecimalDigits₁ ASCIIDigit] = 10LexNumber[NonZeroDecimalDigits₁] + DecimalValue[ASCIIDigit];

DecimalValue[NonZeroDigit]: Integer = digitValue(NonZeroDigit);

LexNumber[Fraction DecimalDigits]: Rational = LexNumber[DecimalDigits]/10^{NDigits[DecimalDigits]};

Syntax

SignedInteger

DecimalDigits

| + DecimalDigits

| - DecimalDigits

Semantics

LexNumber[SignedInteger]: Integer;

LexNumber[SignedInteger DecimalDigits] = LexNumber[DecimalDigits];

LexNumber[SignedInteger + DecimalDigits] = LexNumber[DecimalDigits];

LexNumber[SignedInteger - DecimalDigits] = –LexNumber[DecimalDigits];

Syntax

DecimalDigits

ASCIIDigit

| DecimalDigits ASCIIDigit

Semantics

LexNumber[DecimalDigits]: Integer;

LexNumber[DecimalDigits ASCIIDigit] = DecimalValue[ASCIIDigit];

LexNumber[DecimalDigits₀ DecimalDigits₁ ASCIIDigit] = 10LexNumber[DecimalDigits₁] + DecimalValue[ASCIIDigit];

NDigits[DecimalDigits]: Integer;

NDigits[DecimalDigits ASCIIDigit] = 1;

NDigits[DecimalDigits₀ DecimalDigits₁ ASCIIDigit] = NDigits[DecimalDigits₁] + 1;

Syntax

HexIntegerLiteral

0 LetterX HexDigit

| HexIntegerLiteral HexDigit

LetterX X | x

HexDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Semantics

LexNumber[HexIntegerLiteral]: Integer;

LexNumber[HexIntegerLiteral 0 LetterX HexDigit] = HexValue[HexDigit];

LexNumber[HexIntegerLiteral₀ HexIntegerLiteral₁ HexDigit] = 16LexNumber[HexIntegerLiteral₁] + HexValue[HexDigit];

HexValue[HexDigit]: Integer = digitValue(HexDigit);

String Literals

Syntax

{single, double}

StringLiteral

' StringChars^single '

| " StringChars^double "

Semantics

Lex[StringLiteral]: Token;

Lex[StringLiteral ' StringChars^single '] = LexString[StringChars^single];

Lex[StringLiteral " StringChars^double "] = LexString[StringChars^double];

Syntax

StringChars

«empty»

| StringChars StringChar

| StringChars NullEscape

StringChar

LiteralStringChar

| \ StringEscape

LiteralStringChar^single UnicodeCharacter except ' | \ | LineTerminator

LiteralStringChar^double UnicodeCharacter except " | \ | LineTerminator

Semantics

LexString[StringChars]: String;

LexString[StringChars «empty»] = “”;

LexString[StringChars₀ StringChars₁ StringChar] = LexString[StringChars₁] [LexChar[StringChar]];

LexString[StringChars₀ StringChars₁ NullEscape] = LexString[StringChars₁];

LexChar[StringChar]: Character;

LexChar[StringChar LiteralStringChar] = LiteralStringChar;

LexChar[StringChar \ StringEscape] = LexChar[StringEscape];

Syntax

StringEscape

ControlEscape

| ZeroEscape

| HexEscape

| IdentityEscape

IdentityEscape NonTerminator except _ | UnicodeAlphanumeric

Semantics

LexChar[StringEscape]: Character;

LexChar[StringEscape ControlEscape] = LexChar[ControlEscape];

LexChar[StringEscape ZeroEscape] = LexChar[ZeroEscape];

LexChar[StringEscape HexEscape] = LexChar[HexEscape];

LexChar[StringEscape IdentityEscape] = IdentityEscape;

Syntax

ControlEscape

b

| f

| n

| r

| t

| v

Semantics

LexChar[ControlEscape]: Character;

LexChar[ControlEscape b] = ‘«BS»’;

LexChar[ControlEscape f] = ‘«FF»’;

LexChar[ControlEscape n] = ‘«LF»’;

LexChar[ControlEscape r] = ‘«CR»’;

LexChar[ControlEscape t] = ‘«TAB»’;

LexChar[ControlEscape v] = ‘«VT»’;

Syntax

ZeroEscape 0 [lookahead{ASCIIDigit}]

Semantics

LexChar[ZeroEscape 0 [lookahead{ASCIIDigit}]]: Character = ‘«NUL»’;

Syntax

HexEscape

x HexDigit HexDigit

| u HexDigit HexDigit HexDigit HexDigit

Semantics

LexChar[HexEscape]: Character;

LexChar[HexEscape x HexDigit₁ HexDigit₂] = codeToCharacter(16HexValue[HexDigit₁] + HexValue[HexDigit₂]);

LexChar[HexEscape u HexDigit₁ HexDigit₂ HexDigit₃ HexDigit₄] = codeToCharacter(4096HexValue[HexDigit₁] + 256HexValue[HexDigit₂] + 16HexValue[HexDigit₃] + HexValue[HexDigit₄]);