April 2002 Draft
JavaScript 2.0
Formal Description
Lexical Semantics
previousupnext

Thursday, February 7, 2002

The lexical semantics describe the actions the lexer takes in order to transform an input stream of Unicode characters into a stream of tokens. For convenience, the lexical grammar is repeated here. See also the description of the semantic notation.

This document is also available as a Word 98 rtf file.

The start symbols are: NextInputElementunit if the previous input element was a number; NextInputElementre if the previous input element was not a number and a / should be interpreted as a regular expression; and NextInputElementdiv if the previous input element was not a number and a / should be interpreted as a division or division-assignment operator.

Semantics

tag lineBreak;
tag endOfInput;
tuple Keyword
nameString
end tuple;
tuple Punctuator
nameString
end tuple;
tuple Identifier
nameString
end tuple;
tuple Number
valueFloat64
end tuple;
tuple RegularExpression
bodyString,
flagsString
end tuple;
Token = Keyword ∪ Punctuator ∪ Identifier ∪ Number ∪ String ∪ RegularExpression;
InputElement = {lineBreakendOfInput∪ Token;
tag syntaxError;
SemanticException = {syntaxError};

Unicode Character Classes

Syntax

UnicodeCharacter ⇒ Any Unicode character
UnicodeInitialAlphabetic ⇒ Any Unicode initial alphabetic character (includes ASCII A-Z and a-z)
UnicodeAlphanumeric ⇒ Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
WhiteSpaceCharacter 
   «TAB» | «VT» | «FF» | «SP» | «u00A0»
|  «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»
|  «u2008» | «u2009» | «u200A» | «u200B»
|  «u3000»
LineTerminator ⇒ «LF» | «CR» | «u2028» | «u2029»
ASCIIDigit ⇒ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

DecimalValue[ASCIIDigit]: Integer = digitValue(ASCIIDigit);

Comments

Syntax

LineComment ⇒ / / LineCommentCharacters
LineCommentCharacters 
   «empty»
|  LineCommentCharacters NonTerminator
NonTerminator ⇒ UnicodeCharacter except LineTerminator
SingleLineBlockComment ⇒ / * BlockCommentCharacters * /
BlockCommentCharacters 
   «empty»
|  BlockCommentCharacters NonTerminatorOrSlash
|  PreSlashCharacters /
PreSlashCharacters 
   «empty»
|  BlockCommentCharacters NonTerminatorOrAsteriskOrSlash
|  PreSlashCharacters /
NonTerminatorOrSlash ⇒ NonTerminator except /
NonTerminatorOrAsteriskOrSlash ⇒ NonTerminator except * | /
MultiLineBlockComment ⇒ / * MultiLineBlockCommentCharacters BlockCommentCharacters * /
MultiLineBlockCommentCharacters 
   BlockCommentCharacters LineTerminator
|  MultiLineBlockCommentCharacters BlockCommentCharacters LineTerminator

White Space

Syntax

WhiteSpace 
   «empty»
|  WhiteSpace WhiteSpaceCharacter
|  WhiteSpace SingleLineBlockComment

Line Breaks

Syntax

LineBreak 
   LineTerminator
|  LineComment LineTerminator
|  MultiLineBlockComment
LineBreaks 
   LineBreak
|  LineBreaks WhiteSpace LineBreak

Input Elements

Syntax

ν ∈ {redivunit}
NextInputElementre ⇒ WhiteSpace InputElementre
NextInputElementdiv ⇒ WhiteSpace InputElementdiv
NextInputElementunit 
   [lookahead∉{ContinuingIdentifierCharacter\}] WhiteSpace InputElementdiv
|  [lookahead∉{_}] IdentifierName

Semantics

Lex[NextInputElementν]: InputElement;
Lex[NextInputElementre ⇒ WhiteSpace InputElementre] = Lex[InputElementre];
Lex[NextInputElementdiv ⇒ WhiteSpace InputElementdiv] = Lex[InputElementdiv];
Lex[NextInputElementunit ⇒ [lookahead∉{ContinuingIdentifierCharacter\}] WhiteSpace InputElementdiv] = Lex[InputElementdiv];
Lex[NextInputElementunit ⇒ [lookahead∉{_}] IdentifierName] = LexName[IdentifierName];

Syntax

InputElementre 
   LineBreaks
|  IdentifierOrKeyword
|  Punctuator
|  NumericLiteral
|  StringLiteral
|  RegExpLiteral
|  EndOfInput
InputElementdiv 
   LineBreaks
|  IdentifierOrKeyword
|  Punctuator
|  DivisionPunctuator
|  NumericLiteral
|  StringLiteral
|  EndOfInput
EndOfInput 
   End
|  LineComment End

Semantics

Lex[InputElementν]: InputElement;
Lex[InputElementν ⇒ LineBreaks] = lineBreak;
Lex[InputElementν ⇒ IdentifierOrKeyword] = Lex[IdentifierOrKeyword];
Lex[InputElementν ⇒ Punctuator] = Lex[Punctuator];
Lex[InputElementdiv ⇒ DivisionPunctuator] = Lex[DivisionPunctuator];
Lex[InputElementν ⇒ NumericLiteral] = Lex[NumericLiteral];
Lex[InputElementν ⇒ StringLiteral] = Lex[StringLiteral];
Lex[InputElementre ⇒ RegExpLiteral] = Lex[RegExpLiteral];
Lex[InputElementν ⇒ EndOfInput] = endOfInput;

Keywords and Identifiers

Syntax

IdentifierName 
   InitialIdentifierCharacterOrEscape
|  NullEscapes InitialIdentifierCharacterOrEscape
|  IdentifierName ContinuingIdentifierCharacterOrEscape
|  IdentifierName NullEscape
NullEscapes 
   NullEscape
|  NullEscapes NullEscape
NullEscape ⇒ \ _
InitialIdentifierCharacterOrEscape 
   InitialIdentifierCharacter
|  \ HexEscape
InitialIdentifierCharacter ⇒ UnicodeInitialAlphabetic | $ | _
ContinuingIdentifierCharacterOrEscape 
   ContinuingIdentifierCharacter
|  \ HexEscape
ContinuingIdentifierCharacter ⇒ UnicodeAlphanumeric | $ | _

Semantics

LexName[IdentifierName]: String;
LexName[IdentifierName0 ⇒ IdentifierName1 ContinuingIdentifierCharacterOrEscape] = LexName[IdentifierName1⊕ [LexChar[ContinuingIdentifierCharacterOrEscape]];
LexName[IdentifierName0 ⇒ IdentifierName1 NullEscape] = LexName[IdentifierName1];
ContainsEscapes[IdentifierName]: Boolean;
ContainsEscapes[IdentifierName ⇒ InitialIdentifierCharacterOrEscape] = ContainsEscapes[InitialIdentifierCharacterOrEscape];
ContainsEscapes[IdentifierName ⇒ NullEscapes InitialIdentifierCharacterOrEscape] = true;
ContainsEscapes[IdentifierName0 ⇒ IdentifierName1 ContinuingIdentifierCharacterOrEscape] = ContainsEscapes[IdentifierName1or ContainsEscapes[ContinuingIdentifierCharacterOrEscape];
ContainsEscapes[IdentifierName ⇒ IdentifierName NullEscape] = true;
LexChar[InitialIdentifierCharacterOrEscape]: Character;
LexChar[InitialIdentifierCharacterOrEscape ⇒ \ HexEscape]
begin
return LexChar[HexEscape]
else throw syntaxError
end if
end;
ContainsEscapes[InitialIdentifierCharacterOrEscape]: Boolean;
ContainsEscapes[InitialIdentifierCharacterOrEscape ⇒ InitialIdentifierCharacter] = false;
ContainsEscapes[InitialIdentifierCharacterOrEscape ⇒ \ HexEscape] = true;
LexChar[ContinuingIdentifierCharacterOrEscape]: Character;
LexChar[ContinuingIdentifierCharacterOrEscape ⇒ \ HexEscape]
begin
return LexChar[HexEscape]
else throw syntaxError
end if
end;
ContainsEscapes[ContinuingIdentifierCharacterOrEscape]: Boolean;
ContainsEscapes[ContinuingIdentifierCharacterOrEscape ⇒ ContinuingIdentifierCharacter] = false;
ContainsEscapes[ContinuingIdentifierCharacterOrEscape ⇒ \ HexEscape] = true;
reservedWordsString[] = [abstract”, “as”, “break”, “case”, “catch”, “class”, “const”, “continue”, “debugger”, “default”, “delete”, “do”, “else”, “enum”, “export”, “extends”, “false”, “final”, “finally”, “for”, “function”, “goto”, “if”, “implements”, “import”, “in”, “instanceof”, “interface”, “is”, “namespace”, “native”, “new”, “null”, “package”, “private”, “protected”, “public”, “return”, “static”, “super”, “switch”, “synchronized”, “this”, “throw”, “throws”, “transient”, “true”, “try”, “typeof”, “use”, “var”, “volatile”, “while”, “with];
nonReservedWordsString[] = [exclude”, “get”, “include”, “named”, “set];
keywordsString[] = reservedWords ⊕ nonReservedWords;
proc member(idStringlistString[]): Boolean
if list = [] then return false end if;
if id = list[0] then return true end if;
return member(idlist[1 ...])
end proc;

Syntax

IdentifierOrKeyword ⇒ IdentifierName

Semantics

Lex[IdentifierOrKeyword ⇒ IdentifierName]: InputElement
begin
idString ← LexName[IdentifierName];
if member(idkeywordsand not ContainsEscapes[IdentifierNamethen
return Keywordnameid
else return Identifiernameid
end if
end;

Punctuators

Syntax

Punctuator 
   !
|  ! =
|  ! = =
|  %
|  % =
|  &
|  & &
|  & & =
|  & =
|  (
|  )
|  *
|  * =
|  +
|  + +
|  + =
|  ,
|  -
|  - -
|  - =
|  .
|  . . .
|  :
|  : :
|  ;
|  <
|  < <
|  < < =
|  < =
|  =
|  = =
|  = = =
|  >
|  > =
|  > >
|  > > =
|  > > >
|  > > > =
|  ?
|  [
|  ]
|  ^
|  ^ =
|  ^ ^
|  ^ ^ =
|  {
|  |
|  | =
|  | |
|  | | =
|  }
|  ~
DivisionPunctuator 
   / [lookahead∉{/*}]
|  / =

Semantics

Lex[Punctuator]: Token;
Lex[Punctuator ⇒ !] = Punctuatorname: “!;
Lex[Punctuator ⇒ ! =] = Punctuatorname: “!=;
Lex[Punctuator ⇒ ! = =] = Punctuatorname: “!==;
Lex[Punctuator ⇒ %] = Punctuatorname: “%;
Lex[Punctuator ⇒ % =] = Punctuatorname: “%=;
Lex[Punctuator ⇒ &] = Punctuatorname: “&;
Lex[Punctuator ⇒ & &] = Punctuatorname: “&&;
Lex[Punctuator ⇒ & & =] = Punctuatorname: “&&=;
Lex[Punctuator ⇒ & =] = Punctuatorname: “&=;
Lex[Punctuator ⇒ (] = Punctuatorname: “(;
Lex[Punctuator ⇒ )] = Punctuatorname: “);
Lex[Punctuator ⇒ *] = Punctuatorname: “*;
Lex[Punctuator ⇒ * =] = Punctuatorname: “*=;
Lex[Punctuator ⇒ +] = Punctuatorname: “+;
Lex[Punctuator ⇒ + +] = Punctuatorname: “++;
Lex[Punctuator ⇒ + =] = Punctuatorname: “+=;
Lex[Punctuator ⇒ ,] = Punctuatorname: “,;
Lex[Punctuator ⇒ -] = Punctuatorname: “-;
Lex[Punctuator ⇒ - -] = Punctuatorname: “--;
Lex[Punctuator ⇒ - =] = Punctuatorname: “-=;
Lex[Punctuator ⇒ .] = Punctuatorname: “.;
Lex[Punctuator ⇒ . . .] = Punctuatorname: “...;
Lex[Punctuator ⇒ :] = Punctuatorname: “:;
Lex[Punctuator ⇒ : :] = Punctuatorname: “::;
Lex[Punctuator ⇒ ;] = Punctuatorname: “;;
Lex[Punctuator ⇒ <] = Punctuatorname: “<;
Lex[Punctuator ⇒ < <] = Punctuatorname: “<<;
Lex[Punctuator ⇒ < < =] = Punctuatorname: “<<=;
Lex[Punctuator ⇒ < =] = Punctuatorname: “<=;
Lex[Punctuator ⇒ =] = Punctuatorname: “=;
Lex[Punctuator ⇒ = =] = Punctuatorname: “==;
Lex[Punctuator ⇒ = = =] = Punctuatorname: “===;
Lex[Punctuator ⇒ >] = Punctuatorname: “>;
Lex[Punctuator ⇒ > =] = Punctuatorname: “>=;
Lex[Punctuator ⇒ > >] = Punctuatorname: “>>;
Lex[Punctuator ⇒ > > =] = Punctuatorname: “>>=;
Lex[Punctuator ⇒ > > >] = Punctuatorname: “>>>;
Lex[Punctuator ⇒ > > > =] = Punctuatorname: “>>>=;
Lex[Punctuator ⇒ ?] = Punctuatorname: “?;
Lex[Punctuator ⇒ [] = Punctuatorname: “[;
Lex[Punctuator ⇒ ]] = Punctuatorname: “];
Lex[Punctuator ⇒ ^] = Punctuatorname: “^;
Lex[Punctuator ⇒ ^ =] = Punctuatorname: “^=;
Lex[Punctuator ⇒ ^ ^] = Punctuatorname: “^^;
Lex[Punctuator ⇒ ^ ^ =] = Punctuatorname: “^^=;
Lex[Punctuator ⇒ {] = Punctuatorname: “{;
Lex[Punctuator ⇒ |] = Punctuatorname: “|;
Lex[Punctuator ⇒ | =] = Punctuatorname: “|=;
Lex[Punctuator ⇒ | |] = Punctuatorname: “||;
Lex[Punctuator ⇒ | | =] = Punctuatorname: “||=;
Lex[Punctuator ⇒ }] = Punctuatorname: “};
Lex[Punctuator ⇒ ~] = Punctuatorname: “~;
Lex[DivisionPunctuator]: Token;
Lex[DivisionPunctuator ⇒ / [lookahead∉{/*}]] = Punctuatorname: “/;
Lex[DivisionPunctuator ⇒ / =] = Punctuatorname: “/=;

Numeric Literals

Syntax

NumericLiteral 
   DecimalLiteral
|  HexIntegerLiteral [lookahead∉{HexDigit}]

Semantics

Lex[NumericLiteral]: Token;
Lex[NumericLiteral ⇒ DecimalLiteral] = NumbervaluerealToFloat64(LexNumber[DecimalLiteral]);
Lex[NumericLiteral ⇒ HexIntegerLiteral [lookahead∉{HexDigit}]] = NumbervaluerealToFloat64(LexNumber[HexIntegerLiteral]);

Syntax

DecimalLiteral 
   Mantissa
|  Mantissa LetterE SignedInteger
LetterE ⇒ E | e
Mantissa 
   DecimalIntegerLiteral
|  DecimalIntegerLiteral .
|  DecimalIntegerLiteral . Fraction
|  . Fraction
DecimalIntegerLiteral 
   0
|  NonZeroDecimalDigits
NonZeroDecimalDigits 
   NonZeroDigit
|  NonZeroDecimalDigits ASCIIDigit
NonZeroDigit ⇒ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Fraction ⇒ DecimalDigits

Semantics

LexNumber[DecimalLiteral]: Rational;
LexNumber[DecimalLiteral ⇒ Mantissa] = LexNumber[Mantissa];
LexNumber[DecimalLiteral ⇒ Mantissa LetterE SignedInteger] = LexNumber[Mantissa]×10LexNumber[SignedInteger];
LexNumber[Mantissa]: Rational;
LexNumber[Mantissa ⇒ DecimalIntegerLiteral] = LexNumber[DecimalIntegerLiteral];
LexNumber[Mantissa ⇒ DecimalIntegerLiteral .] = LexNumber[DecimalIntegerLiteral];
LexNumber[Mantissa ⇒ DecimalIntegerLiteral . Fraction] = LexNumber[DecimalIntegerLiteral] + LexNumber[Fraction];
LexNumber[Mantissa ⇒ . Fraction] = LexNumber[Fraction];
LexNumber[DecimalIntegerLiteral]: Integer;
LexNumber[DecimalIntegerLiteral ⇒ 0] = 0;
LexNumber[DecimalIntegerLiteral ⇒ NonZeroDecimalDigits] = LexNumber[NonZeroDecimalDigits];
LexNumber[NonZeroDecimalDigits]: Integer;
LexNumber[NonZeroDecimalDigits ⇒ NonZeroDigit] = DecimalValue[NonZeroDigit];
LexNumber[NonZeroDecimalDigits0 ⇒ NonZeroDecimalDigits1 ASCIIDigit] = 10×LexNumber[NonZeroDecimalDigits1] + DecimalValue[ASCIIDigit];
DecimalValue[NonZeroDigit]: Integer = digitValue(NonZeroDigit);
LexNumber[Fraction ⇒ DecimalDigits]: RationalLexNumber[DecimalDigits]/10NDigits[DecimalDigits];

Syntax

SignedInteger 
   DecimalDigits
|  + DecimalDigits
|  - DecimalDigits

Semantics

LexNumber[SignedInteger]: Integer;
LexNumber[SignedInteger ⇒ DecimalDigits] = LexNumber[DecimalDigits];
LexNumber[SignedInteger ⇒ + DecimalDigits] = LexNumber[DecimalDigits];
LexNumber[SignedInteger ⇒ - DecimalDigits] = –LexNumber[DecimalDigits];

Syntax

DecimalDigits 
   ASCIIDigit
|  DecimalDigits ASCIIDigit

Semantics

LexNumber[DecimalDigits]: Integer;
LexNumber[DecimalDigits ⇒ ASCIIDigit] = DecimalValue[ASCIIDigit];
LexNumber[DecimalDigits0 ⇒ DecimalDigits1 ASCIIDigit] = 10×LexNumber[DecimalDigits1] + DecimalValue[ASCIIDigit];
NDigits[DecimalDigits]: Integer;
NDigits[DecimalDigits ⇒ ASCIIDigit] = 1;
NDigits[DecimalDigits0 ⇒ DecimalDigits1 ASCIIDigit] = NDigits[DecimalDigits1] + 1;

Syntax

HexIntegerLiteral 
   0 LetterX HexDigit
|  HexIntegerLiteral HexDigit
LetterX ⇒ X | x
HexDigit ⇒ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Semantics

LexNumber[HexIntegerLiteral]: Integer;
LexNumber[HexIntegerLiteral ⇒ 0 LetterX HexDigit] = HexValue[HexDigit];
LexNumber[HexIntegerLiteral0 ⇒ HexIntegerLiteral1 HexDigit] = 16×LexNumber[HexIntegerLiteral1] + HexValue[HexDigit];
HexValue[HexDigit]: Integer = digitValue(HexDigit);

String Literals

Syntax

θ ∈ {singledouble}
StringLiteral 
   ' StringCharssingle '
|  " StringCharsdouble "

Semantics

Lex[StringLiteral]: Token;
Lex[StringLiteral ⇒ ' StringCharssingle '] = LexString[StringCharssingle];
Lex[StringLiteral ⇒ " StringCharsdouble "] = LexString[StringCharsdouble];

Syntax

StringCharsθ 
   «empty»
|  StringCharsθ StringCharθ
|  StringCharsθ NullEscape
StringCharθ 
   LiteralStringCharθ
|  \ StringEscape
LiteralStringCharsingle ⇒ UnicodeCharacter except ' | \ | LineTerminator
LiteralStringChardouble ⇒ UnicodeCharacter except " | \ | LineTerminator

Semantics

LexString[StringCharsθ]: String;
LexString[StringCharsθ ⇒ «empty»] = “”;
LexString[StringCharsθ0 ⇒ StringCharsθ1 StringCharθ] = LexString[StringCharsθ1⊕ [LexChar[StringCharθ]];
LexString[StringCharsθ0 ⇒ StringCharsθ1 NullEscape] = LexString[StringCharsθ1];
LexChar[StringCharθ]: Character;
LexChar[StringCharθ ⇒ LiteralStringCharθ] = LiteralStringCharθ;
LexChar[StringCharθ ⇒ \ StringEscape] = LexChar[StringEscape];

Syntax

StringEscape 
   ControlEscape
|  ZeroEscape
|  HexEscape
|  IdentityEscape
IdentityEscape ⇒ NonTerminator except _ | UnicodeAlphanumeric

Semantics

LexChar[StringEscape]: Character;
LexChar[StringEscape ⇒ ControlEscape] = LexChar[ControlEscape];
LexChar[StringEscape ⇒ ZeroEscape] = LexChar[ZeroEscape];
LexChar[StringEscape ⇒ HexEscape] = LexChar[HexEscape];
LexChar[StringEscape ⇒ IdentityEscape] = IdentityEscape;

Syntax

ControlEscape 
   b
|  f
|  n
|  r
|  t
|  v

Semantics

LexChar[ControlEscape]: Character;
LexChar[ControlEscape ⇒ b] = ‘«BS»’;
LexChar[ControlEscape ⇒ f] = ‘«FF»’;
LexChar[ControlEscape ⇒ n] = ‘«LF»’;
LexChar[ControlEscape ⇒ r] = ‘«CR»’;
LexChar[ControlEscape ⇒ t] = ‘«TAB»’;
LexChar[ControlEscape ⇒ v] = ‘«VT»’;

Syntax

ZeroEscape ⇒ 0 [lookahead∉{ASCIIDigit}]

Semantics

LexChar[ZeroEscape ⇒ 0 [lookahead∉{ASCIIDigit}]]: Character = ‘«NUL»’;

Syntax

HexEscape 
   x HexDigit HexDigit
|  u HexDigit HexDigit HexDigit HexDigit

Semantics

LexChar[HexEscape]: Character;
LexChar[HexEscape ⇒ x HexDigit1 HexDigit2] = codeToCharacter(16×HexValue[HexDigit1] + HexValue[HexDigit2]);
LexChar[HexEscape ⇒ u HexDigit1 HexDigit2 HexDigit3 HexDigit4] = codeToCharacter(4096×HexValue[HexDigit1] + 256×HexValue[HexDigit2] + 16×HexValue[HexDigit3] + HexValue[HexDigit4]);

Regular Expression Literals

Syntax

RegExpLiteral ⇒ RegExpBody RegExpFlags
RegExpFlags 
   «empty»
|  RegExpFlags ContinuingIdentifierCharacterOrEscape
|  RegExpFlags NullEscape
RegExpBody ⇒ / [lookahead∉{*}] RegExpChars /
RegExpChars 
   RegExpChar
|  RegExpChars RegExpChar
RegExpChar 
   OrdinaryRegExpChar
|  \ NonTerminator
OrdinaryRegExpChar ⇒ NonTerminator except \ | /

Semantics

Lex[RegExpLiteral ⇒ RegExpBody RegExpFlags]: TokenRegularExpressionbodyLexString[RegExpBody], flagsLexString[RegExpFlags];
LexString[RegExpFlags]: String;
LexString[RegExpFlags ⇒ «empty»] = “”;
LexString[RegExpFlags0 ⇒ RegExpFlags1 ContinuingIdentifierCharacterOrEscape] = LexString[RegExpFlags1⊕ [LexChar[ContinuingIdentifierCharacterOrEscape]];
LexString[RegExpFlags0 ⇒ RegExpFlags1 NullEscape] = LexString[RegExpFlags1];
LexString[RegExpBody ⇒ / [lookahead∉{*}] RegExpChars /]: StringLexString[RegExpChars];
LexString[RegExpChars]: String;
LexString[RegExpChars ⇒ RegExpChar] = LexString[RegExpChar];
LexString[RegExpChars0 ⇒ RegExpChars1 RegExpChar] = LexString[RegExpChars1⊕ LexString[RegExpChar];
LexString[RegExpChar]: String;
LexString[RegExpChar ⇒ OrdinaryRegExpChar] = [OrdinaryRegExpChar];
LexString[RegExpChar ⇒ \ NonTerminator] = [\’, NonTerminator];

Waldemar Horwat
Last modified Thursday, February 7, 2002
previousupnext