JavaScript 2.0 Notation

July 2000 Draft

JavaScript 2.0

Introduction

Notation

Friday, July 7, 2000

Character Notation

This proposal uses the following conventions to denote literal characters:

Printable ASCII literal characters (values 20 through 7E hexadecimal) are in a blue monospaced font. Other characters are denoted by enclosing their four-digit hexadecimal Unicode value between «u and ». For example, the non-breakable space character would be denoted in this document as «u00A0». A few of the common control characters are represented by name:

Abbreviation	Unicode Value
`«NUL»`	`«u0000»`
`«BS»`	`«u0008»`
`«TAB»`	`«u0009»`
`«LF»`	`«u000A»`
`«VT»`	`«u000B»`
`«FF»`	`«u000C»`
`«CR»`	`«u000D»`
`«SP»`	`«u0020»`

A space character is denoted in this document either by a blank space where it's obvious from the context or by «SP» where the space might be confused with some other notation.

Grammar Notation

Each LR(1) parser grammar and lexer grammar rule consists of a nonterminal, a , and one or more expansions of the nonterminal separated by vertical bars (|). The expansions are usually listed on separate lines but may be listed on the same line if they are short. An empty expansion is denoted as «empty».

Consider the sample rule:

SampleList

«empty»

| ... Identifier

| SampleListPrefix

| SampleListPrefix , ... Identifier

This rule states that the nonterminal SampleList can represent one of four kinds of sequences of input tokens:

It can represent nothing (indicated by the «empty» alternative);
It can represent the token ... followed by some expansion of the nonterminal Identifier;
It can represent an expansion of the nonterminal SampleListPrefix;
It can represent an expansion of the nonterminal SampleListPrefix followed by the tokens , and ... and an expansion of the nonterminal Identifier.

Input tokens are characters (and the special End placeholder) in the lexer grammar and lexer tokens in the parser grammar. Spaces separate input tokens and nonterminals from each other. An input token that consists of a space character is denoted as «SP». Other non-ASCII or non-printable characters are denoted by also using « and », as described in the character notation section.

Lookahead Constraints

If the phrase "[lookahead set]" appears in the expansion of a production, it indicates that the production may not be used if the immediately following input terminal is a member of the given set. That set can be written as a list of terminals enclosed in curly braces. For convenience, set can also be written as a nonterminal, in which case it represents the set of all terminals to which that nonterminal could expand.

For example, given the rules

DecimalDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

DecimalDigits

DecimalDigit

| DecimalDigits DecimalDigit

the rule

LookaheadExample

n [lookahead {1, 3, 5, 7, 9}] DecimalDigits

| DecimalDigit [lookahead {DecimalDigit}]

matches either the letter n followed by one or more decimal digits the first of which is even, or a decimal digit not followed by another decimal digit.

These lookahead constraints do not make the grammars more theoretically powerful than LR(1), but they do allow these grammars to be written more simply. The semantic engine compiles grammars with lookahead constraints into parse tables that have the same format as those produced from ordinary LR(1) or LALR(1) grammars.

Parametrized Rules

Many rules in the grammars occur in groups of analogous rules. Rather than list them individually, these groups have been summarized using the shorthand illustrated by the example below:

Metadefinitions such as

{normal, initial}

{allowIn, noIn}

introduce grammar arguments and . If these arguments later parametrize the nonterminal on the left side of a rule, that rule is implicitly replicated into a set of rules in each of which a grammar argument is consistently substituted by one of its variants. For example, the sample rule

AssignmentExpression^,

ConditionalExpression^,

| LeftSideExpression = AssignmentExpression^normal,

| LeftSideExpression CompoundAssignment AssignmentExpression^normal,

expands into the following four rules:

AssignmentExpression^{normal,allowIn}

ConditionalExpression^{normal,allowIn}

| LeftSideExpression^normal = AssignmentExpression^{normal,allowIn}

| LeftSideExpression^normal CompoundAssignment AssignmentExpression^{normal,allowIn}

AssignmentExpression^normal,noIn

ConditionalExpression^normal,noIn

| LeftSideExpression^normal = AssignmentExpression^normal,noIn

| LeftSideExpression^normal CompoundAssignment AssignmentExpression^normal,noIn

AssignmentExpression^{initial,allowIn}

ConditionalExpression^{initial,allowIn}

| LeftSideExpression^initial = AssignmentExpression^{normal,allowIn}

| LeftSideExpression^initial CompoundAssignment AssignmentExpression^{normal,allowIn}

AssignmentExpression^initial,noIn

ConditionalExpression^initial,noIn

| LeftSideExpression^initial = AssignmentExpression^normal,noIn

| LeftSideExpression^initial CompoundAssignment AssignmentExpression^normal,noIn

AssignmentExpression^{normal,allowIn} is now an unparametrized nonterminal and processed normally by the grammar.

Some of the expanded rules (such as the fourth one in the example above) may be unreachable from the grammar's starting nonterminal; these are ignored.

Special Lexer Rules

A few lexer rules have too many expansions to be practically listed. These are specified by descriptive text instead of a list of expansions after the .

Some lexer rules contain the metaword except. These rules match any expansion that is listed before the except but that does not match any expansion after the except. All of these rules ultimately expand into single characters. For exaple, the rule below matches any single UnicodeCharacter except the * and / characters:

NonAsteriskOrSlash UnicodeCharacter except * | /

Informal Grammar Syntax

A few parts of the main body of this proposal still use an informal syntax to describe language constructs, although this syntax is being phased out. An example is the following:

VersionsAndRenames

[< VersionRange [: Identifier] , ... , VersionRange [: Identifier] >]

VersionRange

Version

| [Version] .. [Version]

VersionsAndRenames and VersionRange are the names of the grammar rules. The black square brackets represent optional items, and the black ... together with its neighbors represents optional repetition of zero or more items, so a VersionsAndRenames can have zero or more sets of VersionRange [: Identifier] separated by commas. A black | indicates that either its left or right alternative may be present, but not both; |'s have the lowest metasymbol precedence. Syntactic tokens to be typed literally are in a bold blue monospaced font. Grammar nonterminals are in green italic and correspond to the nonterminals in the parser grammar or lexer grammar.

Waldemar Horwat
Last modified Friday, July 7, 2000