July 2000 Draft
JavaScript 2.0
Formal Description
Lexer Grammar
previousupnext

Monday, December 6, 1999

This LALR(1) grammar describes the lexer syntax of the JavaScript 2.0 proposal. See also the description of the grammar notation.

This document is also available as a Word 98 rtf file.

The start symbols are: NextTokenunit if the previous token was a number; NextTokenre if the previous token was not a number and a / should be interpreted as a regular expression; and NextTokendiv if the previous token was not a number and a / should be interpreted as a division or division-assignment operator.

Unicode Character Classes

UnicodeCharacter  Any Unicode character
UnicodeInitialAlphabetic  Any Unicode initial alphabetic character (includes ASCII A-Z and a-z)
UnicodeAlphanumeric  Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
WhiteSpaceCharacter 
   «TAB» | «VT» | «FF» | «SP» | «u00A0»
|  «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»
|  «u2008» | «u2009» | «u200A» | «u200B»
|  «u3000»
LineTerminator  «LF» | «CR» | «u2028» | «u2029»
ASCIIDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Comments

LineComment  / / LineCommentCharacters
LineCommentCharacters 
   «empty»
|  LineCommentCharacters NonTerminator
NonTerminator  UnicodeCharacter except LineTerminator
SingleLineBlockComment  / * BlockCommentCharacters * /
BlockCommentCharacters 
   «empty»
|  BlockCommentCharacters NonTerminatorOrSlash
|  PreSlashCharacters /
PreSlashCharacters 
   «empty»
|  BlockCommentCharacters NonTerminatorOrAsteriskOrSlash
|  PreSlashCharacters /
NonTerminatorOrSlash  NonTerminator except /
NonTerminatorOrAsteriskOrSlash  NonTerminator except * | /
MultiLineBlockComment  / * MultiLineBlockCommentCharacters BlockCommentCharacters * /
MultiLineBlockCommentCharacters 
   BlockCommentCharacters LineTerminator
|  MultiLineBlockCommentCharacters BlockCommentCharacters LineTerminator

White space

WhiteSpace 
   «empty»
|  WhiteSpace WhiteSpaceCharacter
|  WhiteSpace SingleLineBlockComment

Line breaks

LineBreak 
   LineTerminator
|  LineComment LineTerminator
|  MultiLineBlockComment
LineBreaks 
   LineBreak
|  LineBreaks WhiteSpace LineBreak

Tokens

  {redivunit}
NextTokenre  WhiteSpace Tokenre
NextTokendiv  WhiteSpace Tokendiv
NextTokenunit 
   [lookahead{OrdinaryContinuingIdentifierCharacter\}] WhiteSpace Tokendiv
|  [lookahead{_}] IdentifierName
|  _ IdentifierName
Tokenre 
   LineBreaks
|  IdentifierOrReservedWord
|  Punctuator
|  NumericLiteral
|  StringLiteral
|  RegExpLiteral
|  EndOfInput
Tokendiv 
   LineBreaks
|  IdentifierOrReservedWord
|  Punctuator
|  DivisionPunctuator
|  NumericLiteral
|  StringLiteral
|  EndOfInput
EndOfInput 
   End
|  LineComment End

Keywords and identifiers

IdentifierName 
   InitialIdentifierCharacter
|  IdentifierName ContinuingIdentifierCharacter
InitialIdentifierCharacter 
   OrdinaryInitialIdentifierCharacter
|  \ HexEscape
OrdinaryInitialIdentifierCharacter  UnicodeInitialAlphabetic | $ | _
ContinuingIdentifierCharacter 
   OrdinaryContinuingIdentifierCharacter
|  \ HexEscape
OrdinaryContinuingIdentifierCharacter  UnicodeAlphanumeric | $ | _
IdentifierOrReservedWord  IdentifierName

Punctuators

Punctuator 
   !
|  ! =
|  ! = =
|  #
|  %
|  % =
|  &
|  & &
|  & & =
|  & =
|  (
|  )
|  *
|  * =
|  +
|  + +
|  + =
|  ,
|  -
|  - -
|  - =
|  - >
|  .
|  . .
|  . . .
|  :
|  : :
|  ;
|  <
|  < <
|  < < =
|  < =
|  =
|  = =
|  = = =
|  >
|  > =
|  > >
|  > > =
|  > > >
|  > > > =
|  ?
|  @
|  [
|  ]
|  ^
|  ^ =
|  ^ ^
|  ^ ^ =
|  {
|  |
|  | =
|  | |
|  | | =
|  }
|  ~
DivisionPunctuator 
   / [lookahead{/*}]
|  / =

Numeric literals

NumericLiteral 
   DecimalLiteral
|  HexIntegerLiteral [lookahead{HexDigit}]
DecimalLiteral 
   Mantissa
|  Mantissa LetterE SignedInteger
LetterE  E | e
Mantissa 
   DecimalIntegerLiteral
|  DecimalIntegerLiteral .
|  DecimalIntegerLiteral . Fraction
|  . Fraction
DecimalIntegerLiteral 
   0
|  NonZeroDecimalDigits
NonZeroDecimalDigits 
   NonZeroDigit
|  NonZeroDecimalDigits ASCIIDigit
NonZeroDigit  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Fraction  DecimalDigits
SignedInteger 
   DecimalDigits
|  + DecimalDigits
|  - DecimalDigits
DecimalDigits 
   ASCIIDigit
|  DecimalDigits ASCIIDigit
HexIntegerLiteral 
   0 LetterX HexDigit
|  HexIntegerLiteral HexDigit
LetterX  X | x
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

String literals

  {singledouble}
StringLiteral 
   ' StringCharssingle '
|  " StringCharsdouble "
StringChars 
   «empty»
|  StringChars StringChar
StringChar 
   LiteralStringChar
|  \ StringEscape
LiteralStringCharsingle  UnicodeCharacter except ' | \ | LineTerminator
LiteralStringChardouble  UnicodeCharacter except " | \ | LineTerminator
StringEscape 
   ControlEscape
|  ZeroEscape
|  HexEscape
|  IdentityEscape
IdentityEscape  NonTerminator except UnicodeAlphanumeric
ControlEscape 
   b
|  f
|  n
|  r
|  t
|  v
ZeroEscape  0 [lookahead{ASCIIDigit}]
HexEscape 
   x HexDigit HexDigit
|  u HexDigit HexDigit HexDigit HexDigit

Regular expression literals

RegExpLiteral  RegExpBody RegExpFlags
RegExpFlags 
   «empty»
|  RegExpFlags ContinuingIdentifierCharacter
RegExpBody  / [lookahead{*}] RegExpChars /
RegExpChars 
   RegExpChar
|  RegExpChars RegExpChar
RegExpChar 
   OrdinaryRegExpChar
|  \ NonTerminator
OrdinaryRegExpChar  NonTerminator except \ | /

Waldemar Horwat
Last modified Monday, December 6, 1999
previousupnext