July 2000 Draft
JavaScript 2.0
Formal Description
Lexer Semantics
previousupnext

Monday, December 6, 1999

The lexer semantics describe the actions the lexer takes in order to transform an input stream of Unicode characters into a stream of tokens. For convenience, the lexer grammar is repeated here. See also the description of the semantic notation.

This document is also available as a Word 98 rtf file.

The start symbols are: NextTokenunit if the previous token was a number; NextTokenre if the previous token was not a number and a / should be interpreted as a regular expression; and NextTokendiv if the previous token was not a number and a / should be interpreted as a division or division-assignment operator.

Semantics

type SemanticException = oneof {syntaxError}

Unicode Character Classes

Syntax

UnicodeCharacter ⇒ Any Unicode character
UnicodeInitialAlphabetic ⇒ Any Unicode initial alphabetic character (includes ASCII A-Z and a-z)
UnicodeAlphanumeric ⇒ Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
WhiteSpaceCharacter 
   «TAB» | «VT» | «FF» | «SP» | «u00A0»
|  «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»
|  «u2008» | «u2009» | «u200A» | «u200B»
|  «u3000»
LineTerminator ⇒ «LF» | «CR» | «u2028» | «u2029»
ASCIIDigit ⇒ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

action DecimalValue[ASCIIDigit] : Integer = digitValue(ASCIIDigit)

Comments

Syntax

LineComment ⇒ / / LineCommentCharacters
LineCommentCharacters 
   «empty»
|  LineCommentCharacters NonTerminator
NonTerminator ⇒ UnicodeCharacter except LineTerminator
SingleLineBlockComment ⇒ / * BlockCommentCharacters * /
BlockCommentCharacters 
   «empty»
|  BlockCommentCharacters NonTerminatorOrSlash
|  PreSlashCharacters /
PreSlashCharacters 
   «empty»
|  BlockCommentCharacters NonTerminatorOrAsteriskOrSlash
|  PreSlashCharacters /
NonTerminatorOrSlash ⇒ NonTerminator except /
NonTerminatorOrAsteriskOrSlash ⇒ NonTerminator except * | /
MultiLineBlockComment ⇒ / * MultiLineBlockCommentCharacters BlockCommentCharacters * /
MultiLineBlockCommentCharacters 
   BlockCommentCharacters LineTerminator
|  MultiLineBlockCommentCharacters BlockCommentCharacters LineTerminator

White space

Syntax

WhiteSpace 
   «empty»
|  WhiteSpace WhiteSpaceCharacter
|  WhiteSpace SingleLineBlockComment

Line breaks

Syntax

LineBreak 
   LineTerminator
|  LineComment LineTerminator
|  MultiLineBlockComment
LineBreaks 
   LineBreak
|  LineBreaks WhiteSpace LineBreak

Tokens

Syntax

τ ∈ {redivunit}
NextTokenre ⇒ WhiteSpace Tokenre
NextTokendiv ⇒ WhiteSpace Tokendiv
NextTokenunit 
   [lookahead∉{OrdinaryContinuingIdentifierCharacter\}] WhiteSpace Tokendiv
|  [lookahead∉{_}] IdentifierName
|  _ IdentifierName

Semantics

action Token[NextTokenτ] : Token

Token[NextTokenre ⇒ WhiteSpace Tokenre] = Token[Tokenre]

Token[NextTokendiv ⇒ WhiteSpace Tokendiv] = Token[Tokendiv]

Token[NextTokenunit ⇒ [lookahead∉{OrdinaryContinuingIdentifierCharacter\}] WhiteSpace Tokendiv]
  = Token[Tokendiv]

Token[NextTokenunit ⇒ [lookahead∉{_}] IdentifierName] = string Name[IdentifierName]

Token[NextTokenunit ⇒ _ IdentifierName] = string Name[IdentifierName]

Syntax

Tokenre 
   LineBreaks
|  IdentifierOrReservedWord
|  Punctuator
|  NumericLiteral
|  StringLiteral
|  RegExpLiteral
|  EndOfInput
Tokendiv 
   LineBreaks
|  IdentifierOrReservedWord
|  Punctuator
|  DivisionPunctuator
|  NumericLiteral
|  StringLiteral
|  EndOfInput
EndOfInput 
   End
|  LineComment End

Semantics

type RegExp = tuple {reBodyStringreFlagsString}

type Quantity = tuple {amountDoubleunitString}

type Token
  = oneof {
           lineBreak;
           identifierString;
           keywordString;
           punctuatorString;
           numberDouble;
           stringString;
           regularExpressionRegExp;
           end}

action Token[Tokenτ] : Token

Token[Tokenτ ⇒ LineBreaks] = lineBreak

Token[Tokenτ ⇒ IdentifierOrReservedWord] = Token[IdentifierOrReservedWord]

Token[Tokenτ ⇒ Punctuator] = punctuator Punctuator[Punctuator]

Token[Tokendiv ⇒ DivisionPunctuator] = punctuator Punctuator[DivisionPunctuator]

Token[Tokenτ ⇒ NumericLiteral] = number DoubleValue[NumericLiteral]

Token[Tokenτ ⇒ StringLiteral] = string StringValue[StringLiteral]

Token[Tokenre ⇒ RegExpLiteral] = regularExpression REValue[RegExpLiteral]

Token[Tokenτ ⇒ EndOfInput] = end

Keywords and identifiers

Syntax

IdentifierName 
   InitialIdentifierCharacter
|  IdentifierName ContinuingIdentifierCharacter
InitialIdentifierCharacter 
   OrdinaryInitialIdentifierCharacter
|  \ HexEscape
OrdinaryInitialIdentifierCharacter ⇒ UnicodeInitialAlphabetic | $ | _
ContinuingIdentifierCharacter 
   OrdinaryContinuingIdentifierCharacter
|  \ HexEscape
OrdinaryContinuingIdentifierCharacter ⇒ UnicodeAlphanumeric | $ | _

Semantics

action Name[IdentifierName] : String

Name[IdentifierName ⇒ InitialIdentifierCharacter]
  = [CharacterValue[InitialIdentifierCharacter]]

Name[IdentifierName ⇒ IdentifierName1 ContinuingIdentifierCharacter]
  = Name[IdentifierName1⊕ [CharacterValue[ContinuingIdentifierCharacter]]

action ContainsEscapes[IdentifierName] : Boolean

ContainsEscapes[IdentifierName ⇒ InitialIdentifierCharacter]
  = ContainsEscapes[InitialIdentifierCharacter]

ContainsEscapes[IdentifierName ⇒ IdentifierName1 ContinuingIdentifierCharacter]
  = ContainsEscapes[IdentifierName1or ContainsEscapes[ContinuingIdentifierCharacter]

action CharacterValue[InitialIdentifierCharacter] : Character

CharacterValue[InitialIdentifierCharacter ⇒ OrdinaryInitialIdentifierCharacter]
  = OrdinaryInitialIdentifierCharacter

CharacterValue[InitialIdentifierCharacter ⇒ \ HexEscape]
  = if isOrdinaryInitialIdentifierCharacter(CharacterValue[HexEscape])
     then CharacterValue[HexEscape]
     else throw syntaxError

action ContainsEscapes[InitialIdentifierCharacter] : Boolean

ContainsEscapes[InitialIdentifierCharacter ⇒ OrdinaryInitialIdentifierCharacter] = false

ContainsEscapes[InitialIdentifierCharacter ⇒ \ HexEscape] = true

action CharacterValue[ContinuingIdentifierCharacter] : Character

CharacterValue[ContinuingIdentifierCharacter ⇒ OrdinaryContinuingIdentifierCharacter]
  = OrdinaryContinuingIdentifierCharacter

CharacterValue[ContinuingIdentifierCharacter ⇒ \ HexEscape]
  = if isOrdinaryContinuingIdentifierCharacter(CharacterValue[HexEscape])
     then CharacterValue[HexEscape]
     else throw syntaxError

action ContainsEscapes[ContinuingIdentifierCharacter] : Boolean

ContainsEscapes[ContinuingIdentifierCharacter ⇒ OrdinaryContinuingIdentifierCharacter]
  = false

ContainsEscapes[ContinuingIdentifierCharacter ⇒ \ HexEscape] = true

reservedWords : String[]
  = [abstract”,
      “break”,
      “case”,
      “catch”,
      “class”,
      “const”,
      “continue”,
      “debugger”,
      “default”,
      “delete”,
      “do”,
      “else”,
      “enum”,
      “eval”,
      “export”,
      “extends”,
      “false”,
      “final”,
      “finally”,
      “for”,
      “function”,
      “goto”,
      “if”,
      “implements”,
      “import”,
      “in”,
      “instanceof”,
      “native”,
      “new”,
      “null”,
      “package”,
      “private”,
      “protected”,
      “public”,
      “return”,
      “static”,
      “super”,
      “switch”,
      “synchronized”,
      “this”,
      “throw”,
      “throws”,
      “transient”,
      “true”,
      “try”,
      “typeof”,
      “var”,
      “volatile”,
      “while”,
      “with]

nonReservedWords : String[]
  = [box”,
      “constructor”,
      “field”,
      “get”,
      “language”,
      “local”,
      “method”,
      “override”,
      “set”,
      “version]

keywords : String[] = reservedWords ⊕ nonReservedWords

member(idStringlistString[]) : Boolean
  = if |list| = 0
     then false
     else if id = list[0]
     then true
     else member(idlist[1 ...])

Syntax

IdentifierOrReservedWord ⇒ IdentifierName

Semantics

action Token[IdentifierOrReservedWord] : Token

Token[IdentifierOrReservedWord ⇒ IdentifierName]
  = let idString = Name[IdentifierName]
     in if member(idkeywordsand not ContainsEscapes[IdentifierName]
         then keyword id
         else identifier id

Punctuators

Syntax

Punctuator 
   !
|  ! =
|  ! = =
|  #
|  %
|  % =
|  &
|  & &
|  & & =
|  & =
|  (
|  )
|  *
|  * =
|  +
|  + +
|  + =
|  ,
|  -
|  - -
|  - =
|  - >
|  .
|  . .
|  . . .
|  :
|  : :
|  ;
|  <
|  < <
|  < < =
|  < =
|  =
|  = =
|  = = =
|  >
|  > =
|  > >
|  > > =
|  > > >
|  > > > =
|  ?
|  @
|  [
|  ]
|  ^
|  ^ =
|  ^ ^
|  ^ ^ =
|  {
|  |
|  | =
|  | |
|  | | =
|  }
|  ~
DivisionPunctuator 
   / [lookahead∉{/*}]
|  / =

Semantics

action Punctuator[Punctuator] : String

Punctuator[Punctuator ⇒ !] = “!

Punctuator[Punctuator ⇒ ! =] = “!=

Punctuator[Punctuator ⇒ ! = =] = “!==

Punctuator[Punctuator ⇒ #] = “#

Punctuator[Punctuator ⇒ %] = “%

Punctuator[Punctuator ⇒ % =] = “%=

Punctuator[Punctuator ⇒ &] = “&

Punctuator[Punctuator ⇒ & &] = “&&

Punctuator[Punctuator ⇒ & & =] = “&&=

Punctuator[Punctuator ⇒ & =] = “&=

Punctuator[Punctuator ⇒ (] = “(

Punctuator[Punctuator ⇒ )] = “)

Punctuator[Punctuator ⇒ *] = “*

Punctuator[Punctuator ⇒ * =] = “*=

Punctuator[Punctuator ⇒ +] = “+

Punctuator[Punctuator ⇒ + +] = “++

Punctuator[Punctuator ⇒ + =] = “+=

Punctuator[Punctuator ⇒ ,] = “,

Punctuator[Punctuator ⇒ -] = “-

Punctuator[Punctuator ⇒ - -] = “--

Punctuator[Punctuator ⇒ - =] = “-=

Punctuator[Punctuator ⇒ - >] = “->

Punctuator[Punctuator ⇒ .] = “.

Punctuator[Punctuator ⇒ . .] = “..

Punctuator[Punctuator ⇒ . . .] = “...

Punctuator[Punctuator ⇒ :] = “:

Punctuator[Punctuator ⇒ : :] = “::

Punctuator[Punctuator ⇒ ;] = “;

Punctuator[Punctuator ⇒ <] = “<

Punctuator[Punctuator ⇒ < <] = “<<

Punctuator[Punctuator ⇒ < < =] = “<<=

Punctuator[Punctuator ⇒ < =] = “<=

Punctuator[Punctuator ⇒ =] = “=

Punctuator[Punctuator ⇒ = =] = “==

Punctuator[Punctuator ⇒ = = =] = “===

Punctuator[Punctuator ⇒ >] = “>

Punctuator[Punctuator ⇒ > =] = “>=

Punctuator[Punctuator ⇒ > >] = “>>

Punctuator[Punctuator ⇒ > > =] = “>>=

Punctuator[Punctuator ⇒ > > >] = “>>>

Punctuator[Punctuator ⇒ > > > =] = “>>>=

Punctuator[Punctuator ⇒ ?] = “?

Punctuator[Punctuator ⇒ @] = “@

Punctuator[Punctuator ⇒ [] = “[

Punctuator[Punctuator ⇒ ]] = “]

Punctuator[Punctuator ⇒ ^] = “^

Punctuator[Punctuator ⇒ ^ =] = “^=

Punctuator[Punctuator ⇒ ^ ^] = “^^

Punctuator[Punctuator ⇒ ^ ^ =] = “^^=

Punctuator[Punctuator ⇒ {] = “{

Punctuator[Punctuator ⇒ |] = “|

Punctuator[Punctuator ⇒ | =] = “|=

Punctuator[Punctuator ⇒ | |] = “||

Punctuator[Punctuator ⇒ | | =] = “||=

Punctuator[Punctuator ⇒ }] = “}

Punctuator[Punctuator ⇒ ~] = “~

action Punctuator[DivisionPunctuator] : String

Punctuator[DivisionPunctuator ⇒ / [lookahead∉{/*}]] = “/

Punctuator[DivisionPunctuator ⇒ / =] = “/=

Numeric literals

Syntax

NumericLiteral 
   DecimalLiteral
|  HexIntegerLiteral [lookahead∉{HexDigit}]

Semantics

action DoubleValue[NumericLiteral] : Double

DoubleValue[NumericLiteral ⇒ DecimalLiteral]
  = rationalToDouble(RationalValue[DecimalLiteral])

DoubleValue[NumericLiteral ⇒ HexIntegerLiteral [lookahead∉{HexDigit}]]
  = rationalToDouble(IntegerValue[HexIntegerLiteral])

expt(baseRationalexponentInteger) : Rational
  = if exponent = 0
     then 1
     else if exponent < 0
     then 1/expt(base, -exponent)
     else base*expt(baseexponent - 1)

Syntax

DecimalLiteral 
   Mantissa
|  Mantissa LetterE SignedInteger
LetterE ⇒ E | e
Mantissa 
   DecimalIntegerLiteral
|  DecimalIntegerLiteral .
|  DecimalIntegerLiteral . Fraction
|  . Fraction
DecimalIntegerLiteral 
   0
|  NonZeroDecimalDigits
NonZeroDecimalDigits 
   NonZeroDigit
|  NonZeroDecimalDigits ASCIIDigit
NonZeroDigit ⇒ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Fraction ⇒ DecimalDigits

Semantics

action RationalValue[DecimalLiteral] : Rational

RationalValue[DecimalLiteral ⇒ Mantissa] = RationalValue[Mantissa]

RationalValue[DecimalLiteral ⇒ Mantissa LetterE SignedInteger]
  = RationalValue[Mantissa]*expt(10, IntegerValue[SignedInteger])

action RationalValue[Mantissa] : Rational

RationalValue[Mantissa ⇒ DecimalIntegerLiteral] = IntegerValue[DecimalIntegerLiteral]

RationalValue[Mantissa ⇒ DecimalIntegerLiteral .] = IntegerValue[DecimalIntegerLiteral]

RationalValue[Mantissa ⇒ DecimalIntegerLiteral . Fraction]
  = IntegerValue[DecimalIntegerLiteral] + RationalValue[Fraction]

RationalValue[Mantissa ⇒ . Fraction] = RationalValue[Fraction]

action IntegerValue[DecimalIntegerLiteral] : Integer

IntegerValue[DecimalIntegerLiteral ⇒ 0] = 0

IntegerValue[DecimalIntegerLiteral ⇒ NonZeroDecimalDigits]
  = IntegerValue[NonZeroDecimalDigits]

action IntegerValue[NonZeroDecimalDigits] : Integer

IntegerValue[NonZeroDecimalDigits ⇒ NonZeroDigit] = DecimalValue[NonZeroDigit]

IntegerValue[NonZeroDecimalDigits ⇒ NonZeroDecimalDigits1 ASCIIDigit]
  = 10*IntegerValue[NonZeroDecimalDigits1] + DecimalValue[ASCIIDigit]

action DecimalValue[NonZeroDigit] : Integer = digitValue(NonZeroDigit)

action RationalValue[Fraction] : Rational

RationalValue[Fraction ⇒ DecimalDigits]
  = IntegerValue[DecimalDigits]/expt(10, NDigits[DecimalDigits])

Syntax

SignedInteger 
   DecimalDigits
|  + DecimalDigits
|  - DecimalDigits

Semantics

action IntegerValue[SignedInteger] : Integer

IntegerValue[SignedInteger ⇒ DecimalDigits] = IntegerValue[DecimalDigits]

IntegerValue[SignedInteger ⇒ + DecimalDigits] = IntegerValue[DecimalDigits]

IntegerValue[SignedInteger ⇒ - DecimalDigits] = -IntegerValue[DecimalDigits]

Syntax

DecimalDigits 
   ASCIIDigit
|  DecimalDigits ASCIIDigit

Semantics

action IntegerValue[DecimalDigits] : Integer

IntegerValue[DecimalDigits ⇒ ASCIIDigit] = DecimalValue[ASCIIDigit]

IntegerValue[DecimalDigits ⇒ DecimalDigits1 ASCIIDigit]
  = 10*IntegerValue[DecimalDigits1] + DecimalValue[ASCIIDigit]

action NDigits[DecimalDigits] : Integer

NDigits[DecimalDigits ⇒ ASCIIDigit] = 1

NDigits[DecimalDigits ⇒ DecimalDigits1 ASCIIDigit] = NDigits[DecimalDigits1] + 1

Syntax

HexIntegerLiteral 
   0 LetterX HexDigit
|  HexIntegerLiteral HexDigit
LetterX ⇒ X | x
HexDigit ⇒ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Semantics

action IntegerValue[HexIntegerLiteral] : Integer

IntegerValue[HexIntegerLiteral ⇒ 0 LetterX HexDigit] = HexValue[HexDigit]

IntegerValue[HexIntegerLiteral ⇒ HexIntegerLiteral1 HexDigit]
  = 16*IntegerValue[HexIntegerLiteral1] + HexValue[HexDigit]

action HexValue[HexDigit] : Integer = digitValue(HexDigit)

String literals

Syntax

θ ∈ {singledouble}
StringLiteral 
   ' StringCharssingle '
|  " StringCharsdouble "

Semantics

action StringValue[StringLiteral] : String

StringValue[StringLiteral ⇒ ' StringCharssingle '] = StringValue[StringCharssingle]

StringValue[StringLiteral ⇒ " StringCharsdouble "] = StringValue[StringCharsdouble]

Syntax

StringCharsθ 
   «empty»
|  StringCharsθ StringCharθ
StringCharθ 
   LiteralStringCharθ
|  \ StringEscape
LiteralStringCharsingle ⇒ UnicodeCharacter except ' | \ | LineTerminator
LiteralStringChardouble ⇒ UnicodeCharacter except " | \ | LineTerminator

Semantics

action StringValue[StringCharsθ] : String

StringValue[StringCharsθ ⇒ «empty»] = “”

StringValue[StringCharsθ ⇒ StringCharsθ1 StringCharθ]
  = StringValue[StringCharsθ1⊕ [CharacterValue[StringCharθ]]

action CharacterValue[StringCharθ] : Character

CharacterValue[StringCharθ ⇒ LiteralStringCharθ] = LiteralStringCharθ

CharacterValue[StringCharθ ⇒ \ StringEscape] = CharacterValue[StringEscape]

Syntax

StringEscape 
   ControlEscape
|  ZeroEscape
|  HexEscape
|  IdentityEscape
IdentityEscape ⇒ NonTerminator except UnicodeAlphanumeric

Semantics

action CharacterValue[StringEscape] : Character

CharacterValue[StringEscape ⇒ ControlEscape] = CharacterValue[ControlEscape]

CharacterValue[StringEscape ⇒ ZeroEscape] = CharacterValue[ZeroEscape]

CharacterValue[StringEscape ⇒ HexEscape] = CharacterValue[HexEscape]

CharacterValue[StringEscape ⇒ IdentityEscape] = IdentityEscape

Syntax

ControlEscape 
   b
|  f
|  n
|  r
|  t
|  v

Semantics

action CharacterValue[ControlEscape] : Character

CharacterValue[ControlEscape ⇒ b] = ‘«BS»

CharacterValue[ControlEscape ⇒ f] = ‘«FF»

CharacterValue[ControlEscape ⇒ n] = ‘«LF»

CharacterValue[ControlEscape ⇒ r] = ‘«CR»

CharacterValue[ControlEscape ⇒ t] = ‘«TAB»

CharacterValue[ControlEscape ⇒ v] = ‘«VT»

Syntax

ZeroEscape ⇒ 0 [lookahead∉{ASCIIDigit}]

Semantics

action CharacterValue[ZeroEscape] : Character

CharacterValue[ZeroEscape ⇒ 0 [lookahead∉{ASCIIDigit}]] = ‘«NUL»

Syntax

HexEscape 
   x HexDigit HexDigit
|  u HexDigit HexDigit HexDigit HexDigit

Semantics

action CharacterValue[HexEscape] : Character

CharacterValue[HexEscape ⇒ x HexDigit1 HexDigit2]
  = codeToCharacter(16*HexValue[HexDigit1] + HexValue[HexDigit2])

CharacterValue[HexEscape ⇒ u HexDigit1 HexDigit2 HexDigit3 HexDigit4]
  = codeToCharacter(
         4096*HexValue[HexDigit1] + 256*HexValue[HexDigit2] + 16*HexValue[HexDigit3] +
         HexValue[HexDigit4])

Regular expression literals

Syntax

RegExpLiteral ⇒ RegExpBody RegExpFlags
RegExpFlags 
   «empty»
|  RegExpFlags ContinuingIdentifierCharacter
RegExpBody ⇒ / [lookahead∉{*}] RegExpChars /
RegExpChars 
   RegExpChar
|  RegExpChars RegExpChar
RegExpChar 
   OrdinaryRegExpChar
|  \ NonTerminator
OrdinaryRegExpChar ⇒ NonTerminator except \ | /

Semantics

action REValue[RegExpLiteral] : RegExp

REValue[RegExpLiteral ⇒ RegExpBody RegExpFlags]
  = reBody REBody[RegExpBody], reFlags REFlags[RegExpFlags]

action REFlags[RegExpFlags] : String

REFlags[RegExpFlags ⇒ «empty»] = “”

REFlags[RegExpFlags ⇒ RegExpFlags1 ContinuingIdentifierCharacter]
  = REFlags[RegExpFlags1⊕ [CharacterValue[ContinuingIdentifierCharacter]]

action REBody[RegExpBody] : String

REBody[RegExpBody ⇒ / [lookahead∉{*}] RegExpChars /] = REBody[RegExpChars]

action REBody[RegExpChars] : String

REBody[RegExpChars ⇒ RegExpChar] = REBody[RegExpChar]

REBody[RegExpChars ⇒ RegExpChars1 RegExpChar]
  = REBody[RegExpChars1⊕ REBody[RegExpChar]

action REBody[RegExpChar] : String

REBody[RegExpChar ⇒ OrdinaryRegExpChar] = [OrdinaryRegExpChar]

REBody[RegExpChar ⇒ \ NonTerminator] = [\’, NonTerminator]


Waldemar Horwat
Last modified Monday, December 6, 1999
previousupnext