You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.




ECMAScript 4 Netscape Proposal
upnext

Monday, June 30, 2003

A multi-page version of this document is also available.


This document is Netscape’s proposal to the ECMA TC39TG1 working group for the ECMAScript Edition 4 language. This proposal is being updated continuously to match issues resolved in the committee discussions. The TC39TG1 working group’s current schedule calls for a release of a ECMAScript Edition 4 standard sometime in 2002.

See also a draft of the specification under construction in PDF format.

JavaScript 2.0 is a slight superset of ECMAScript Edition 4. JavaScript 2.0 contains features that were considered for ECMAScript Edition 4 but for which more experience is desirable before standardization occurs.

Contents

Changes

The following are recent major changes in this document:

Date Revisions
Jun 30, 2003
Jun 4, 2003
  • Split the super semantic field into super for classes and archetype for prototype-based objects.
  • Implemented simple prototypes on most objects, including class objects themselves, for compatibility with JavaScript 1.5.
  • Changed semantics so that the enumerable attribute now inherits when overriding an instance property. This was the original intent, although it was ambiguous in the written description of the attribute.
  • Put back a very simple package definition/import mechanism. Moved the extended import directive with the include/exclude selection mechanism to the rationale.
  • Added semantics for the library classes Object, Void, Null, Boolean, GeneralNumber (except for number formatting), Number, float, sbyte, byte, short, ushort, int, uint, long, ulong, Character, String (except for regular expressions), and Namespace.
  • Added numerous semantic utility functions to support the above classes.
  • Added \U escapes to the lexical grammar (but not yet the semantics).
  • Removed include and exclude keywords.
  • Removed some obsolete text from the description of Types; however, that page still needs a little work as it’s not as up-to-date as the semantics.
May 22, 2003
  • Renamed uses of the word “member” to “property” in the semantics. Cleaned up many other names without changing the meaning of the semantics.
  • Renamed Character to Char16, as well as related semantic names. Updated the description of Char16.
  • Implemented conversions of strings to numbers in the semantics. Appended an extra start symbol to the lexical grammar and semantics to assist with these conversions.
  • Eliminated the class Prototype, merging it with the class Object. This required modifying the rules on which objects can be used as prototypes — only direct instances of Object or other prototype-based classes can now be used as prototypes.
  • Made Object be dynamic, which required a slight change to the rules of inheritance of the dynamic attribute — it is now inherited from any superclass except Object.
  • Fixed instanceof when the second operand is a class such as RegExp and Array.
  • Implemented simple prototypes on primitive classes for compatibility with ECMAScript Edition 3.
  • Removed SystemFrame, which wasn’t needed any more.
  • Fixed errors in grammar rules for private.
  • Implemented final and static as attributes instead of syntax.
  • Removed obsolete semantic utilities and added a few helpers for constructing the standard classes.
  • Defined the global object, namespaces, attributes, and classes.
  • Changed the syntax of semantic record constructors to use single angular braces instead of double angular braces.
May 2, 2003
  • Updated the semantics. Simplified some of the object data structures. Merged uninitialised with none. Defined class initialization. Defined getters and setters. Implemented instanceof.
  • Defined error classes and replaced error tags with these classes. Revised which error gets reported in many cases.
  • Renamed class semantic variables from names such as objectClass to Object.
  • Removed package, import, and export directives. The description of packages and imports remains in the packages page, but is non-normative and there are no detailed semantics for these constructs.
  • The return statement cannot return a value from a constructor or a setter.
  • A getter must return a value; it cannot fall off the end of the function.
Apr 2, 2003
  • Removed multiple constructors from the body of the proposal and moved them to the rationale for future consideration.
  • Removed the constructor attribute.
  • Replaced the invoke superconstructor case with the ... function call syntax for passing an array of arguments (similar to what apply does) to any function or constructor.
Mar 24, 2003
  • Updated the semantics. Implemented is, catch, for, and for-in. Implemented instance methods on classes.
  • Disallowed initializers on for-in target variables. These are allowed by the grammar for convenience but prohibited by the semantics’ validation.
  • Fixed a major const definition design flaw in the semantics that led to race conditions and the possibility of definitions not being set up properly or initialized several times. Replaced the design, which relied on error catching to detect non-constant expressions at compile time, with one that lazily evaluates forward-referenced const expressions.
  • The prototype attribute now applies only to functions to make them into prototype-based functions/constructors. Removed its usage to make extractable class members.
  • Removed the old usage of the word “generic”.
  • Changed the enumerability defaults to match ECMAScript Edition 3.
  • Removed the static path analysis from the calling a superconstructor section.
  • Added invoke case to calling a superconstructor to allow passing a variable number of arguments to a superconstructor. (invoke is the same as apply except that it doesn’t take a this parameter.)
  • Required initializers for instance members to be compile-time constant expressions to simplify the semantics and avoid issues about how many times they are evaluated.
  • Minor editorial fixes.
Feb 17, 2003
  • Updated the semantics. Reorganized the property read, write, and delete code and indirected it through six customizable internal methods.
  • Merged packages with global objects in the semantics.
  • Introduced ilong and iulong notation in the semantics.
  • Introduced ns::id notation in the semantics.
  • Merged the concepts of inaccessible and uninitialized variables in the semantics.
  • Fixed the handling of const variable initializers in the semantics.
  • Implemented with statements and array initializers and updated object initializers in the semantics. These resulted in slight grammar changes.
Jan 29, 2003
  • Removed named function parameters from the body of the proposal and moved them to the rationale for future consideration.
  • Significantly simplified the semantics by removing named function parameters.
  • Class constructors no longer take named parameters to initialize instance members.
  • Allowed dynamic properties with namespaces to better integrate with E4X.
  • Removed typed arrays from the body of the proposal and moved them to the rationale for future consideration.
Jan 24, 2003
  • Updated the semantics.
  • Simplified the override logic by introducing the restriction that an instance variable cannot override a getter or a setter. The reverse is still allowed — a getter may override either a getter or a virtual instance variable, and a setter may override either a setter or a virtual instance variable. Also, the overriding definition may no longer override two different base members or add namespaces not mentioned in the base member.
  • An overriding method definition may no longer change the signature.
  • Allowed run-time namespace expressions in qualified identifiers to harmonize this proposal with E4X.
  • Fixed typos and anachronisms.
Jan 13, 2003
Nov 19, 2002
  • Updated the semantics.
  • Reorganized the grammar productions for function definitions and try statements without materially changing the language. This helped better factor the semantics for these constructs.
Oct 29, 2002
  • Changed numeric arithmetic contagion rules to try to return a long or ulong value when at least one operand is a long or ulong; if that’s not possible, the result overflows to a Number. Division will return a Number if that result would be more precise than a long or ulong.
  • Removed floating-point float arithmetic operations other than unary negation; these operations will always return a Number (or possibly a long or ulong if the other operand is a long or ulong).
  • Made === ignore differences in numeric types, so 5.0 === 5L will now be true.
  • Reduced repetitive semantics: when all actions A on a nonterminal N merely call A on the nonterminals in N’s grammar expansions, the actions are now abbreviated.
  • Simplified the lexer and stages descriptions.
  • Fixed several typos.
Sep 25, 2002
  • Fixed Number and float functionality.
  • Added conversions from numbers to strings.
  • Fixed broken links.
Sep 20, 2002
  • Operator overriding has been deferred until a future version of the language. Removed it from the proposal and moved it to the rationale section.
  • Removed the .() operator again.
  • Unit support has been deferred until a future version of the language. Removed it from the proposal and moved it to the rationale section.
  • Added semantics for float support.
  • Added long, ulong, and float literals.
  • Removed the concept of indexable members. All public members are now accessible via [].
  • Revised the lexical and syntactic grammar and semantics.
Jun 18, 2002
  • Attributes are now idempotent.
  • Removed the abstract attribute.
  • The enumerable attribute implies public.
  • Namespaces and classes defined as members of another class must specify the static attribute.
  • Added requirement that each variable referenced in a const initializer be resolvable to a scope.
  • Restricted class and namespace definitions to only global, package, or class scopes or blocks nested inside these scopes.
  • Corrected wording discrepancy: the superclass of a class must be a compile-time constant expression without forward references.
May 17, 2002
  • Removed class extensions from the proposal and added them to the rationale.
  • Since class extensions are now gone, made unit lookup look in the unit namespace instead of the Unit class.
  • Simplified the discussion of definition scopes, since no scope attributes remain in the proposal.
  • Deleted wrap pragma for wraparound machine integer arithmetic.
  • Deleted the compile attribute. Constants defined using const automatically become compile-time constants when used in compile-time constant expressions.
  • Defined explicit coercions for integral machine integers.
  • Removed restriction that top-level definitions in a package can only be placed in the namespace public or in namespaces defined within that package.
  • Added interfaces to the rationale.
Apr 22, 2002
  • Corrected and clarified the rules for when a variable or a constant may be accessed. Except for type- and attribute-less definitions, constants and variables are accessible only after being defined. Constants may be written at most once. Type expressions must be compile-time constant expressions.
  • A constant and a setter with the same qualified name may not be defined in the same scope.
  • Hoisting does not occur into class scope (this possibility is now excluded by the definition of a regional scope).
  • explicit is now an attribute instead of a specialized namespace.
  • Corrected the interaction of overriding and namespace defaulting in a definition. Defined the namespace defaulting and overriding rules.
  • Replaced mayOverride by override(undefined). Added regular rules for the behavior of overriding and the override attribute.
  • Allowed uninitialized semantic record fields.
  • Continuing to write the formal semantics. Added semantics for constant and variable declarations.
Mar 4, 2002
  • Restricted pragma arguments to be literal booleans, numbers, or strings. Also, strict mode changes now affect the semicolon after the pragma.
  • Removed class declarations without a body.
  • Removed include/exclude clauses from use directives. These now apply only to import directives and now indicate which top-level properties are shared. Simplified the name lookup algorithms to not account for includes/excludes on individual use directives.
  • Made other minor grammar changes to share productions and their semantics without affecting the language.
  • Added for each to the list of computation steps.
  • Added abbreviated notation for copying most of the fields from an existing tuple or record into a newly constructed one.
  • Continuing to write the formal semantics. Made major changes and additions to the formal semantics for evaluation of expressions and statements. Reorganized object layout and unified frames with container objects. Defined package objects. Added a phase parameter to distinguish compile-time constant expressions from runtime expressions.
Jan 4, 2002
  • Removed attributes from pragmas. Although somewhat useful, this made parsing dependent on attribute evaluation, and I'd rather not have such a dependency in the language.
Dec 20, 2001
  • Regularized the grammar of statements and directives. Separated annotated blocks into groups of either directives or substatements, which eliminated the troublesome situation of a definition being located in a substatement that happens to be a group. The attributes on a group of substatements are now restricted to only true and false.
Dec 19, 2001
  • Removed the local attribute, which was no longer necessary for anything.
  • Added the compile attribute to explicitly mark which const definitions must be compile-time constants. This became necessary because the semantics of compile-time const definitions are subtly different from those of regular const definitions, and it became unwieldy to try to guess which one a const definition is, based on how it is used.
  • Revamped the description of compile-time constant expressions. Removed the references to dominators. There are now two kinds of compile-time constant expressions: ones that allow forward references and ones that don’t.
Dec 17, 2001
  • Disallowed nested labels with the same name in the same function in the semantics.
  • Renamed GoContinue to Continue, GoBreak to Break, GoThrow to ThrownValue, and GoReturn to ReturnedValue in the semantics.
  • Pragmas may now take attributes (true and false only).
  • Fixed outdated text in the description of function parameters.
Dec 6, 2001
  • Reverted to the old definition of the Integer type; it’s back to being the set of integral IEEE doubles.
  • Deleted the double type and made Number act as double once again.
  • Renamed Character to char.
  • Rewrote the machine types section once again. Renamed int8, uint8, int16, uint16, int32, uint32, int64, and uint64 to sbyte, byte, short, ushort, int, uint, long, and ulong respectively. The first six of these are now subtypes of Integer; long and ulong are now disjoint from Number.
  • Renamed “coercion” to “implicit coercion” and “cast” to “explicit coercion”.
  • Added special case to the is operator to treat –0.0 as though it were +0.0.
  • Made the as operator support implicit coercions.
  • Added implicit coercions to the concept of a type.
Nov 26, 2001
  • Added support for comments inside functions in the semantics. Removed invariant and steps and turned them into comments.
  • Temporary variables in the semantics can now be declared without assigning a value to them.
  • Added more syntactic semantics.
  • Removed the #, ->, .., and @ tokens.
  • Made the Character type distinct from the set of single-character Strings.
  • Made double be a subtype of Number, which is now the union of numeric types. Made Integer distinct from double and accept unlimited-precision integers. Unified the integral machine types with Integer.
  • Made integral conversions that can result in loss of range error-checked by default. Added the wrap pragma to convert the behavior to wraparound.
Oct 26, 2001
Oct 18, 2001
Oct 16, 2001
Oct 3, 2001
  • Updated semantic notation to combine action procedures for multiple expansions of a nonterminal into one action procedure. Also deleted the notation for simple procedures to reduce confusion.
  • Reformatted the formal semantics and incorporated the above notation changes.
  • Added indexes to the rtf files.
Sep 26, 2001
  • Added more semantics.
  • delete now takes a PostfixExpression instead of a PostfixExpressionOrSuper.
Sep 24, 2001
Aug 22, 2001
Aug 17, 2001
Aug 15, 2001
Aug 10, 2001
Jul 24, 2001
  • null is now a valid String value, distinct from the empty string.
  • Removed the const Array and const t types. Added the StaticArray[t], DynamicArray[t], ConstArray, and ConstArray[t] types.
  • Removed the t[] notation. This is now reserved for fixed-size machine array types.
  • Removed the const operator.
  • Instance member initializers are re-evaluated each time a new instance is created.
  • Toughened the rules for argument type matching in overrides.
  • Changed the variable lookup rules to make a class’s instance members visible but inaccessible in static methods.
  • Removed the empty argument list case from the PragmaExpr and UseDirective grammars.
  • Removed the * wildcard case from IncludesExcludes.
  • Regularized the ImportDirective grammar.
  • Enabled true and false attributes on use namespace directives. Restricted import directive attributes to only true and false.
  • Simplified various grammar productions and renamed grammar nonterminals without affecting the language. Folded the Definition production into the Directive productions.
  • Specified extent of pragmas.
Jun 29, 2001
  • Renamed instanceof to is to allow it to have the new, more sensible semantics without breaking ECMAScript Edition 3 programs. instanceof is not part of ECMAScript Edition 4 but can be supported for legacy programs. is is a new reserved word.
  • Removed the [no line break] constraints after use.
  • Updated the compatibility section.
Jun 15, 2001
  • Renamed the machine types byte, ubyte, short, ushort, int, uint, long, ulong to int8, uint8, int16, uint16, int32, uint32, int64, and uint64 respectively.
  • Renamed Tuple to const Array.
  • Removed the nonindexable and nonenumerable visibility modifier attributes. Updated the definitions of the indexable and enumerable attributes.
  • Removed the indexable attribute keyword, but the indexable concept still exists because it is needed for compatibility with ECMAScript Edition 3.
  • Changed the default for public definitions to nonindexable and non-enumerable.
  • Forbade conflicting or repeated attributes.
  • The default member modifier attribute for const definitions is now final instead of static.
  • Removed the requirement to use this. to initialize instance constants from a constructor.
  • Setters can no longer return a value; the result of an assignment expression is now always the value of its right side.
  • Only classes with at least one member with the prototype attribute will themselves have a predefined prototype global member.
  • Renamed language directives to pragmas and removed the list of JavaScript versions.
  • Simplified the syntax of pragmas and changed the syntax of pragmas, namespace uses, and imports to make them consistent. Eliminated noninsertable semicolons.
  • Removed the parenthesized expression format of FieldName.
  • Noted that an implementation does not have to support package circularities but may do so as an extension.
  • Interchanged the names of the Definition and AnnotatedDefinition nonterminals.
  • Renamed the parser grammar to the syntactic grammar and the lexer grammar to the lexical grammar for consistency with ECMAScript Edition 3.
Apr 11, 2001
  • Reincarnated the @ operator as the as operator, which is now a new reserved word and has a lower precedence to match instanceof. The as operator now returns null if the type doesn’t match but the destination type contains null. as does not do any coercions.
  • Removed the | syntax for indicating named parameters. All optional parameters are now also named.
  • The arguments local variable is now supported only for unchecked functions.
  • Modified the syntax of the semantics to better match the ECMAScript Edition 3 style. Partially updated the notation page.
  • Fixed a bug in the unit semantics.
  • Note: Not all of the decisions reached in the last two ECMA TC39TG1 meetings have been integrated into this document yet.
Mar 9, 2001
  • Added links to the preliminary draft of the specification in PDF format.
  • Minor editorial changes.
Mar 2, 2001
Feb 28, 2001
  • Created the syntactic semantics. It’s still a work in progress.
  • Updated the execution stages page.
  • Added mutable cells and the associated operators to the semantic notation.
  • Renamed the Double semantic type to Float64 (this affects the semantics only, not the ECMAScript language).
  • Added many Float64 manipulation functions.
  • Made minor stylistic changes throughout the document.
Feb 22, 2001
  • Renamed type None to Never.
  • Renamed the empty escape from \Q to \_.
Feb 8, 2001
  • Added generic class members.
  • Extended the prototype attribute to allow any value for the this parameter and cause an instance member to appear in the class’s prototype global property. This makes prototype methods appear to be "intentionally generic" as per ECMAScript Edition 3.
Feb 6, 2001
  • Updated the definition extent model as discussed at the January TC39TG1 meeting. Now definitions are local by default but hoist if necessary. Removed the regional and global attributes.
  • Updated the definition conflict rules: added the rule about non-interfering local definitions and removed the rule about permitting re-execution of a const definition, since that is now impossible.
  • Removed the paragraph about ECMAScript Edition 4 being firmly in the dynamic camp from the introduction, since the examples given — dynamically defining classes and functions by placing them inside an if statement — no longer apply. Some of this can still be done using boolean attributes, but these attributes must be compile-time expressions.
  • Changed the data structures returned by methods of the for-in iteration protocol to be objects with named properties value and state rather than arrays or tuples with numbered properties 0 and 1. This allows the two properties to be declared using different types.
  • Specified a default superconstructor call if a constructor contains no other superconstructor calls.
  • Specified that object construction follows the Java model rather than the C++ model — when a constructor calls a virtual method on an object o under construction, it sees the most derived method even if the class in which that method is located has not yet run its constructor on o.
  • Added vector comprehensions [g(a| a  u] and [g(a| a  u and c(a)] to the semantic notation.
Jan 31, 2001
  • Eliminated namespace inheritance to simplify the proposal. This is unnecessary given the ability to use const to combine several namespace attributes.
  • Removed property lookups using a class before the :: (x.C::n and super x.C::n where C is a class). This feature did not add much useful functionality.
Jan 25, 2001
  • Minor wording changes.
  • Brought back the .() operator.
  • Updated semantic notation — revised description and usage of and added unique id’s.
Jan 11, 2001
  • In strict mode the default scope is local everywhere.
  • Like in C++, for statements now form their own scopes. This affects local definitions only.
  • catch clauses now form their own scopes.
  • try statements no longer allow annotations on any of their constituent blocks.
  • Split the notion of statements into statements and directives. Directives (including most definitions) can only be at the top level of a block, while statements can be anywhere. This avoids the problem of conditional definitions as substatements of a compound statement.
  • Reorganized the statement grammar around the distinction between statements and directives. Added the Substatement nonterminal.
  • Revised the discussion of annotated blocks.
  • Revised the description of scopes.
  • Added an optional const attribute to function parameters, which makes them read-only.
Jan 9, 2001
  • Removed the attribute weak.
  • Made minor clarifications without affecting the content.
  • Added rationale for the types Object and None.
  • Added attribute-style alternative to the type syntax rationale. Feedback on this style would be appreciated.
  • Revamped variable lookup to search for instance members inside instance methods without the need for the this. prefix. Also moved the description of the lookup of static members inside a class’s scope from the class section to the variable lookup section.
  • Simplified the package referencing rules. Also fixed the examples there to not use a package as a :: qualifier, since one is not allowed there by the name lookup rules.
  • Changed explicit from a specialized attribute into a namespace similar to internal. Removed implicit. This simplified the name lookup rules.
  • Greatly simplified use’s included and excluded name possibilities to be either * or a list of identifiers.
  • Generalized the iteration protocol to allow its methods to return objects of any type as long as they have the properties 0 and 1. Stated that the values returned by the iteration protocol expire when the next iteration occurs.
  • Added rationale for not overriding !, ||, ^^, &&, and ?:.
Dec 21, 2000
Dec 20, 2000
  • Made minor wording fixes in the packages section.
  • Simplified the grammar by making the const operator have the same precedence as the other unary operators. When used as a type constructor, the const operator now applies to the array type instead of the element type. The const operator can no longer be used to emulate the const statement.
  • Simplified the IncludesExcludes grammar without changing its behavior in any significant way by merging it with the expression grammar.
Dec 18, 2000
  • Changed the syntax for defining an operator override from function operator "op" to operator function "op". operator is now an attribute instead of a keyword.
  • Renamed operator overloading to operator overriding. Moved the operators page from the libraries to the core section.
  • Removed the Boolean return type restriction from the <, <=, ==, ===, and in operators.
  • Restricted the === operator to take operands of the same class.
  • Modified single and double operator dispatch rules so that null is considered to be a member of only the types Null and Object. Without this restriction dispatch becomes ambiguous. Note that ordinary method dispatch already has this restriction built-in.
  • Specified the built-in operator definitions.
Dec 2, 2000
  • Revamped the syntax for operator overriding based on feedback from the last ECMA meeting. There is now a special syntax to do this.
  • Modified the syntax for accessing a superclass’s method to conform to the new operator overriding scheme. The super::id syntax is no longer supported. super is now an operator modifier that alters the behavior of another operator such as . (property lookup) or any of the overridable operators such as +.
  • Updated the property lookup section to accommodate the new super syntax.
Nov 29, 2000
  • Revamped property lookup. For an unqualified reference x.n, instance name lookup previously found the most derived property. Now it first looks for the least derived property and picks the most derived overload of that property. The new definition handles private property access correctly and allows property access to be optimized to a simple offset lookup when the static type of the instance is known.
  • For simplicity removed the C::n (where C is a class) case of variable lookup. Use C.n instead.
  • Added descriptions of instances, properties, and property names.
  • Stated that the activation frame of a class contains aliases to the superclass’s global members.
  • Cleaned up nomenclature: the names defined by a class are members, while the bindings inside an object are properties. Members cause properties to be constructed when a class instance is created.
Nov 20, 2000
  • Renamed the s regular expression flag from simple to span.
  • Removed the /UnitProduct production from the unit grammar.
  • Allowed white space in the unit grammar. White space can now also be used to indicate implicit multiplication there.
  • Fixed the product rule in the unit grammar.
  • Renamed expt to pow in unit expressions.
Nov 4, 2000
  • Added attributes true and false.
  • Replaced the qualified attribute by the include and exclude syntax in import and use namespace statements. Added nonreserved words include and exclude.
  • Imported definitions appear in the global scope instead of the scope of the import statement; however, the implicit use still applies to scope of the import statement.
  • Predefined type names are constants in the global scope instead of a scope enclosing the global scope.
  • Defined activation frames and qualified names.
  • Blocks with attributes are not scopes.
  • Eliminated the notion of the static and dynamic extent of a definition. Rewrote and corrected the definition extent and name lookup rules in terms of scopes and activation frames only.
  • Simplified name lookup rules, making them independent of the order in which namespaces are used.
  • Relaxed the compile-time constant rules to permit expressions that either return a known value or signal an error. Without this change packages and namespaces cannot work.
  • A class can be used instead of a namespace before the :: during name lookup. This limits the search to members of that class or its superclasses.
  • Eliminated the a::b::c syntactic sugar. In the rare cases where it’s still needed, use (a.b)::c.
  • Defining b::n when there is already a definition a::n in the same scope causes an error if both namespaces a and b are used at the point of the definition of b::n.
  • Renamed scope to regional. scope blocks can now be either local or regional blocks. Defined the notion of a regional scope.
  • Changed unit lookup to look in the Unit class instead of prefixing unit_ to each unit name and looking for a global symbol. Added the unit attribute.
  • In strict mode the default scope is local only inside functions.
  • Strict mode is optional.
Oct 27, 2000
  • Expanded the qualified attribute to optionally take a list of symbols to exclude.
  • Definitions of top-level entities in a package can only be placed in the namespace public or in namespaces defined within that package.
  • Extensively modified and updated the description of units. Allowed multiple unit on the same expression in order to support combining number class units (such as "decimal") with units of measure. Defined the grammar and semantics of unit expressions.
Oct 10, 2000
  • Removed the primitive attribute.
  • Defined methods for overriding the for-in operator.
  • Added the s (simple) flag to regular expressions. This flag makes . match every character.
Oct 9, 2000
  • Made classof, eval, and include no longer be reserved words.
  • Combined the import statement with an automatic use namespace of selected namespaces.
  • Changed classof x to x.class.
  • Fixed minor editing errors.
Sep 23, 2000
  • Defined coercions of undefined to any type except None.
  • Renamed the none and void types to None and Void. Now all predefined type names start with upper case letters.
  • Resurrected the ECMAScript Edition 3 void operator and added the classof operator.
  • Made default constructors accept named arguments.
  • Modified the syntax of class extensions and moved their description to the definitions page.
  • Changed the meaning of a::b::c to be (a.b)::c rather than the intersection of the a::c and b::c sets. This means that an identifier may be qualified by only one namespace.
  • Updated the compatibility, namespace, operator overloading, and versioning pages.
Sep 22, 2000
Sep 21, 2000
  • Updated list of reserved words.
  • Updated concept page.
  • Renamed volatile to virtual, keeping volatile’s old semantics.
  • Revamped list and descriptions of attributes. Added a number of new attributes.
  • Disallowed qualifiers in names of literal object fields and names of arguments. Allowed parenthesized indirect expressions that evaluate to strings in those cases.
  • Added const operator, array types, and const array types.
  • Changed the handling of the rest parameter to only accept named arguments if it’s preceded by a |.
Sep 18, 2000 Integrated many changes based on recent discussions with Herman and others:
  • Renamed any to Object.
  • Added notion of live and dead statements.
  • Required attributes to be compile-time constants.
  • Reinstated parenthesized expressions before :: (except in attributes).
  • Changed package names in package definitions to be identifiers or dotted identifier lists. This required renaming the package attribute to internal in order to keep the grammar LR(1)-parsable.
  • Renamed import to qualified import and use import to import.
  • Allowed super.foo as an abbreviation for this.super::foo.
  • Removed obj.(expr) member access syntax.
  • Removed definitions that define qualified identifiers.
  • Required const declarations to have initializers.
  • Each block is its own scope in strict mode.
  • Specified the means of defining parameters that take named arguments. Rewrote description of argument passing.
  • Cleaned up significant portions of the grammar.
Aug 21, 2000
  • Significantly reworked syntax of attributes and expressions to allow attributes to take arguments. This required making constructor, namespace, and use into reserved words.
  • Required parentheses after eval.
  • Removed the syntaxes for placing a parenthesized expression before :: and for unquoted package names due to grammar conflicts.
  • Removed the attribute keyword; attributes are now defined using the const syntax.
  • Removed compile blocks.
  • Added the \Q escape, which turns into nothing and is useful for using identifiers that would otherwise be reserved words.
Aug 17, 2000 First version; split off from the JavaScript 2.0 proposal.

ECMAScript 4 Netscape Proposal
Introduction
previousupnext

Wednesday, September 4, 2002


ECMAScript 4 is the next major step in the evolution of the ECMAScript language. ECMAScript 4 incorporates the following features in addition to those already found in ECMAScript 3:

  • Class definition syntax, both static and dynamic
  • Packages, including a namespace and versioning mechanism
  • Types for program and interface documentation
  • Invariant declarations such as const and final
  • private, internal, public, and user-defined access controls
  • Machine types such as int for more faithful communication with other programming languages

These facilities reinforce each other while remaining fairly small and simple. Unlike in Java, the philosophy behind them is to provide the minimal necessary facilities that other parties can use to write packages that specialize the language for particular domains rather than define these packages as part of the language core.

The versioning and access control mechanisms make the language is suitable for programming-in-the-large.


ECMAScript 4 Netscape Proposal
Introduction
Motivation
previousupnext

Wednesday, September 4, 2002

Goals

The main goals of ECMAScript 4 are:

  • Making the language suitable for writing modular and object-oriented applications
  • Making it possible and easy to write robust and secure applications
  • Improving upon ECMAScript’s facilities for interfacing with a variety of other languages and environments
  • Improving ECMAScript’s suitability for writing applications for which performance matters
  • Simplifying the language where possible
  • Keeping the language implementation compact and flexible

The following are specifically not goals of ECMAScript 4:

  • Making ECMAScript 4 suitable for all programming tasks
  • Making ECMAScript 4 similar to any existing programming language

ECMAScript is not currently an all-purpose programming language. Its strengths are its quick execution from source (thus enabling it to be distributed in web pages in source form), its dynamism, and its interfaces to Java and other environments. ECMAScript 4 is intended to improve upon these strengths, while adding others such as the abilities to reliably compose ECMAScript programs out of components and libraries and to write object-oriented programs. On the other hand, it is not our intent to have ECMAScript 4 supplant languages such as C++ and Java, which will still be more suitable for writing many kinds of applications, including very large, performance-critical, and low-level ones.

Rationale

The proposed features are derived from the goals above. Consider, for example, the goals of writing modular and robust applications.

To achieve modularity we would like some kind of a library mechanism. The proposed package mechanism serves this purpose, but by itself it would not be enough. Unlike existing ECMAScript programs which tend to be monolithic, packages and their clients are often written by different people at different times. Once we introduce packages, we encounter the problems of the author of a package not having access to all of its clients, or the author of a client not having access to all versions of the library it needs. If we add packages to the language without solving these problems, we will never be able to achieve robustness, so we must address these problems by creating facilities for defining abstractions between packages and clients.

To create these abstractions we make the language more disciplined by adding optional types and type-checking. We also introduce a coherent and disciplined syntax for defining classes and hierarchies and versioning of classes. Unlike ECMAScript 3, the author of a class can guarantee invariants concerning its instances and can control access to its instances, making the package author’s job tractable. The class syntax is also much more self-documenting than in ECMAScript 3, making it easier to understand and use ECMAScript 4 code. Defining subclasses is easy in ECMAScript 4, while doing it robustly in ECMAScript 3 is quite difficult.

To make packages work we need to make the language more robust in other areas as well. It would not be good if one package redefined Object.toString or added methods to the Array prototype and thereby corrupted another package. We can simplify the language by eliminating many idioms like these (except when running legacy programs, which would not use packages) and provide better alternatives instead. This has the added advantage of speeding up the language’s implementation by eliminating thread synchronization points. Making the standard packages robust can also significantly reduce the memory requirements and improve speed on servers by allowing packages to be shared among many different requests rather than having to start with a clean set of packages for each request because some other request might have modified some property.

ECMAScript 4 should interface with other languages even better than ECMAScript 3 does. If the goal of integration is achieved, the user of an abstraction should not have to care much about whether the abstraction is written in ECMAScript, Java, or another language. It should also be possible to make ECMAScript abstractions that appear native to Java or other language users.

In order to achieve seamless interfacing with other languages, ECMAScript should provide equivalents for the fundamental data types of those languages. Details such as syntax do not have to be the same, but the concepts should be there. ECMAScript 3 lacks support for integers, making it hard to interface with a Java method that expects a long.

ECMAScript is appearing in a number of different application domains, many of which are evolving. Rather than support all of these domains in the core ECMAScript, ECMAScript 4 should provide flexible facilities that allow these application domains to define their own, evolving standards that are convenient to use without requiring continuous changes to the core of ECMAScript. ECMAScript 4 partially addresses this goal by letting user programs define facilities such as getters and setters — facilities that could only be done by the core of the language in ECMAScript 3.


ECMAScript 4 Netscape Proposal
Introduction
Notation
previousupnext

Wednesday, June 11, 2003

Character Notation

This proposal uses the following conventions to denote literal characters:

Printable ASCII literal characters (values 20 through 7E hexadecimal) are in a blue monospaced font. Unicode characters in the Basic Multilingual Plane (code points from 0000 to FFFF hexadecimal) are denoted by enclosing their four-digit hexadecimal Unicode code points between «u and ». Supplementary Unicode characters (code points from 10000 to 10FFFF hexadecimal) are denoted by enclosing their eight-digit hexadecimal Unicode code points between «U and ». For example, the non-breakable space character would be denoted in this document as «u00A0», and the character with the code point 1234F hexadecimal would be denoted as «U0001234F». A few of the common control characters are represented by name:

Abbreviation   Unicode Value
«NUL» «u0000»
«BS» «u0008»
«TAB» «u0009»
«LF» «u000A»
«VT» «u000B»
«FF» «u000C»
«CR» «u000D»
«SP» «u0020»

A space character is denoted in this document either by a blank space where it’s obvious from the context or by «SP» where the space might be confused with some other notation.

Grammar Notation

Each LR(1) syntactic grammar and lexical grammar rule consists of a nonterminal, a , and one or more expansions of the nonterminal separated by vertical bars (|). The expansions are usually listed on separate lines but may be listed on the same line if they are short. An empty expansion is denoted as «empty».

Consider the sample rule:

SampleList 
   «empty»
|  ... Identifier
|  SampleListPrefix
|  SampleListPrefix , ... Identifier

This rule states that the nonterminal SampleList can represent one of four kinds of sequences of input tokens:

  • It can represent nothing (indicated by the «empty» alternative);
  • It can represent the token ... followed by some expansion of the nonterminal Identifier;
  • It can represent an expansion of the nonterminal SampleListPrefix;
  • It can represent an expansion of the nonterminal SampleListPrefix followed by the tokens , and ... and an expansion of the nonterminal Identifier.

Input tokens are characters (and the special End placeholder) in the lexical grammar and lexer tokens in the syntactic grammar. Input tokens to be typed literally are in a bold blue monospaced font. Spaces separate input tokens and nonterminals from each other. An input token that consists of a space character is denoted as «SP». Other non-ASCII or non-printable characters are denoted by also using « and », as described in the character notation section.

Lookahead Constraints

If the phrase “[lookahead  set]” appears in the expansion of a production, it indicates that the production may not be used if the immediately following input terminal is a member of the given set. That set can be written as a list of terminals enclosed in curly braces. For convenience, set can also be written as a nonterminal, in which case it represents the set of all terminals to which that nonterminal could expand.

For example, given the rules

DecimalDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
DecimalDigits 
   DecimalDigit
|  DecimalDigits DecimalDigit

the rule

LookaheadExample 
   n [lookahead  {13579}] DecimalDigits
|  DecimalDigit [lookahead  {DecimalDigit}]

matches either the letter n followed by one or more decimal digits the first of which is even, or a decimal digit not followed by another decimal digit.

These lookahead constraints do not make the grammars more theoretically powerful than LR(1), but they do allow these grammars to be written more simply. The semantic engine compiles grammars with lookahead constraints into parse tables that have the same format as those produced from ordinary LR(1) or LALR(1) grammars.

Line Break Constraints

If the phrase “[no line break]” appears in the expansion of a production, it indicates that this production cannot be used if there is a line break following the last terminal matched by the grammar. Line break constraints are only present in the syntactic grammar.

Parametrized Rules

Many rules in the grammars occur in groups of analogous rules. Rather than list them individually, these groups have been summarized using the shorthand illustrated by the example below:

Metadefinitions such as

   {normalinitial}
   {allowInnoIn}

introduce grammar arguments and . If these arguments later parametrize the nonterminal on the left side of a rule, that rule is implicitly replicated into a set of rules in each of which a grammar argument is consistently substituted by one of its variants. For example, the sample rule

AssignmentExpression, 
   ConditionalExpression,
|  LeftSideExpression = AssignmentExpressionnormal,
|  LeftSideExpression CompoundAssignment AssignmentExpressionnormal,

expands into the following four rules:

AssignmentExpressionnormal,allowIn 
   ConditionalExpressionnormal,allowIn
|  LeftSideExpressionnormal = AssignmentExpressionnormal,allowIn
|  LeftSideExpressionnormal CompoundAssignment AssignmentExpressionnormal,allowIn
AssignmentExpressionnormal,noIn 
   ConditionalExpressionnormal,noIn
|  LeftSideExpressionnormal = AssignmentExpressionnormal,noIn
|  LeftSideExpressionnormal CompoundAssignment AssignmentExpressionnormal,noIn
AssignmentExpressioninitial,allowIn 
   ConditionalExpressioninitial,allowIn
|  LeftSideExpressioninitial = AssignmentExpressionnormal,allowIn
|  LeftSideExpressioninitial CompoundAssignment AssignmentExpressionnormal,allowIn
AssignmentExpressioninitial,noIn 
   ConditionalExpressioninitial,noIn
|  LeftSideExpressioninitial = AssignmentExpressionnormal,noIn
|  LeftSideExpressioninitial CompoundAssignment AssignmentExpressionnormal,noIn

AssignmentExpressionnormal,allowIn is now an unparametrized nonterminal and processed normally by the grammar.

Some of the expanded rules (such as the fourth one in the example above) may be unreachable from the grammar’s starting nonterminal; these are ignored.

Special Lexical Rules

A few lexical rules have too many expansions to be practically listed. These are specified by descriptive text instead of a list of expansions after the .

Some lexical rules contain the metaword except. These rules match any expansion that is listed before the except but that does not match any expansion after the except. All of these rules ultimately expand into single characters. For example, the rule below matches any single UnicodeCharacter except the * and / characters:

NonAsteriskOrSlash  UnicodeCharacter except * | /

ECMAScript 4 Netscape Proposal
Core Language
previousupnext

Wednesday, August 14, 2002


This chapter presents an informal description of the core language. The exact syntax and semantics are specified in the formal description. Libraries are also specified in a separate library chapter.


ECMAScript 4 Netscape Proposal
Core Language
Concepts
previousupnext

Wednesday, September 4, 2002

Values

A value is an entity that can be stored in a variable, passed to a function, or returned from a function. Sample values include:

  • undefined
  • null
  • 5 (a number)
  • true (a boolean)
  • "Kilopi" (a string)
  • [1, 5, false] (a three-element array)
  • {a:3, b:7} (an object with two properties)
  • function (x) {return x*x} (a function)
  • String (a class, a function, and a type)

Types

A type t represents three things:

  • A possibly infinite set of values S
  • A partial mapping I from the set of all values to the set S
  • A partial mapping E from the set of all values to the set S

The set S indicates which values are considered to be members of type t. We write v t to indicate that value v is a member of type t. The mapping I indicates how values may be implicitly coerced to type t. For each value v already in S, the mapping I must map v to itself. The mapping E indicates how values may be explicitly coerced to type t. For each value v in the domain of I, E must map v to the same value as I maps v. In other words, any implicit coercion is also an explicit coercion but not vice versa.

A value can be a member of multiple sets, and, in general, a value belongs to more than one type. Thus, it is generally not useful to ask about the type of a value; one may ask instead whether a value belongs to some given type. There can also exist two different types with the same set of values but different coercion mappings.

On the other hand, a variable does have a particular type. If we declare a variable x of type t, then whatever value is held in x is guaranteed to be a member of type t, and we can assign any value of type t to x. We may also be able to assign a value v t to x if type t’s mapping specifies an implicit coercion for value v; in this case the coerced value is stored in x.

Every type represents some set of values but not every set of values is represented by some type (this is required for logical consistency — there are uncountably infinitely many sets of values but only countably infinitely many types).

Every type is also itself a value and can be stored in a variable, passed to a function, or returned from a function.

Type Hierarchy

If type a’s set of values is a subset of type b’s set of values, then we say that that type a is a subtype of type b. We denote this as a b.

Subtyping is transitive, so if a b and b c, then a c. Subtyping is also reflexive: a a. Also, if v t and t s, then v s.

The set of all values is represented by the type Object, which is the supertype of all types. A variable with type Object can hold any value.

The set of no values is represented by the type Never, which is the subtype of all types. A function with the return type Never cannot return.

Classes

A class is a template for creating similar values, often called objects or instances. These instances generally share characteristics such as common methods and properties.

Every class is also a type and a value. When used as a type, a class represents the set of all possible instances of that class.

A class C can be derived from a superclass S. Class C can then inherit characteristics of class S. Every instance of C is also an instance of S, but not vice versa, which, by the definition of subtyping above, implies that C S when we consider C and S as types.

The subclass relation imposes a hierarchy relation on the set of all classes. ECMAScript 4 currently does not support multiple inheritance, although this is a possible future direction. If multiple inheritance were allowed, the subclass relation would impose a partial order on the set of all classes.

Members

A class typically contains a set of members, which can be variables, functions, etc. Members are classified as either instance or global members. Instance members become properties of instances of the class. Global members become properties (sometimes called class properties) of the class object itself; a class has only one global member with a given name.

Members can have attributes which modify their behavior.

Instances

An instance (sometimes called an object) contains a set of properties. An instance belongs to a particular class and must have properties for the instance members defined in that class and its ancestors; these bindings are made when the instance is created. An instance may also have additional dynamic properties, which can be added and deleted any time after the instance has been created.

Unless specified otherwise, each separately created instance has a distinct identity. The === operator returns false when applied to two different instances or true when applied to the same instance (with the exception of NaN, which always compares unequal to itself).

Instances provide the appearance of lasting forever, although an implementation is expected to garbage-collect them when they are no longer reachable.

Properties

A property is a runtime binding of a property name to a value. The values of some properties can change. Properties can be fixed or dynamic. Fixed properties are declared as members of a class definition and are created at the time the object is constructed. Dynamic properties can be added to an object at any time after the object was created. All ECMAScript 3 properties are dynamic.

Properties can have attributes. Fixed properties inherit their attributes from the corresponding members.

Property Names

A property name identifies a property of an instance and consists of a namespace N, an identifier id, and a class C. There is no language syntax for fully specifying a property name; in this specification property names are denoted using the notation N::idC. If the property is fixed, then C is the class in which the corresponding member is defined. If the property is dynamic, then C is the instance’s most derived class.

An instance can contain at most one property for each property name. An instance can contain multiple properties with the same namespace N and name id but different classes C; in this case the property with the most derived class C is said to override the others. The overridden properties are still present in the instance and can be accessed using the super operator.

Scopes

A scope represents one of the following delimited portions of ECMAScript source code. Some scopes are further distinguished as being regional scopes, as indicated in the table below.

Nonterminal  Regional Description
Program yes The top level of a program
PackageDefinition yes A package definition
ClassDefinition yes A class definition
FunctionDefinition yes A function definition
FunctionExpression   yes A function expression
Block no* A block (but not a group of directives prefixed with attributes)
ForStatement no* A for statement
CatchClause no* A catch clause of a try statement

*These three scopes become regional scopes if the next outer scope is a class scope.

A scope is a static entity that does not change while an ECMAScript program is running (except that if the program calls eval then new ECMAScript source code will be created which may share existing scopes or create its own scopes). A scope other than the top level of a program (the global scope) is always contained inside another scope. If two scopes overlap, one must be contained entirely within the other, so scopes form a hierarchy.

Scope information is used at run time to help with variable and property lookups and visibility checks.

A scope should not be confused with an activation frame. A scope should also not be confused with a namespace.

Activation Frames

An activation frame contains a set of runtime bindings of all qualified names defined in a scope to values. A new activation frame comes into existence each time the scope is entered. A function closure captures a reference to the activation frame in which it was created. Activation frames provide the appearance of lasting forever, although an implementation is expected to discard them after the scope is exited and any closures that captured the activation frame have been garbage-collected.

The values of some bindings in an activation frame can change. The bindings’ values begin in an uninitialized state. It is an error to read a binding in an uninitialized state.

Activation frame bindings can have attributes which modify their behavior.

Qualified Names

A qualified name consists of a namespace N and an identifier id. In this specification qualified names are denoted using the notation N::id. An activation frame can contain at most one binding for each qualified name.

Namespaces

A namespace parametrizes names. A namespace attribute N may be attached to the declaration of any name or property p. That namespace then acts like a lock on accesses to p: another piece of code may access p only by qualifying it with that namespace using N::p or by executing a use namespace(N) directive in a scope surrounding the access of p. Unlike in C++, a namespace is not a scope and does not contain any names or properties itself; a namespace only modifies the accessibility and visibility of names or properties attached to activation frames, classes, or objects.

A namespace is a value that can be passed as the first operand of the :: operator.

public is the default namespace for declarations; a use namespace(public) directive is implicit around the global scope. Each package has a predefined, anonymous internal namespace and each class has a predefined, anonymous private namespace; these provide access control. User-defined namespaces may also be used for more flexible access control.


ECMAScript 4 Netscape Proposal
Core Language
Lexer
previousupnext

Wednesday, June 4, 2003

This section presents an informal overview of the ECMAScript 4 lexer. See the stages and lexical semantics sections in the formal description chapter for the details.

Changes since ECMAScript 3

The ECMAScript 4 lexer behaves in the same way as the ECMAScript 3 lexer except for the following:

  • There are additional punctuators and reserved words.
  • The lexer recognizes several nonreserved words that have special meanings in some contexts but can be used as identifiers.
  • Only semicolon insertion on line breaks is handled by the lexer; the ECMAScript 4 parser allows semicolons to be omitted before a closing }. In addition, the ECMAScript 4 parser allows semicolons to be omitted before the else of an if-else statement and before the while of a do-while statement.
  • Semicolon insertion on line breaks are both disabled in strict mode.
  • [no line break] restrictions in grammar productions are ignored in strict mode.

Source Code

ECMAScript 4 source text consists of a sequence of UTF-16 Unicode version 2.1 or later characters normalized to Unicode Normalized Form C (canonical composition), as described in the Unicode Technical Report #15.

Comments and White Space

Comments and white space behave just like in ECMAScript 3.

Punctuators

The following ECMAScript 3 punctuation tokens are recognized in ECMAScript 4:

!   !=   !==   %   %=   &   &&   &=   (   )   *   *=   +   ++   +=   ,   -   --   -=   .   /   /=   :   ::   ;   <   <<   <<=   <=   =   ==   ===   >   >=   >>   >>=   >>>   >>>=   ?   [   ]   ^   ^=   {   |   |=   ||   }   ~

The following punctuation tokens are new in ECMAScript 4:

&&=   ...   ^^   ^^=   ||=

Keywords

The following reserved words are used in ECMAScript 4:

as   break   case   catch   class   const   continue   default   delete   do   else   export   extends   false   finally   for   function   if   import   in   instanceof   is   namespace   new   null   package   private   public   return   super   switch   this   throw   true   try   typeof   use   var   void   while   with

The following reserved words are reserved for future expansion:

abstract   debugger   enum   goto   implements   interface   native   protected   synchronized   throws   transient   volatile

The following words have special meaning in some contexts in ECMAScript 4 but are not reserved and may be used as identifiers:

get   set

Any of the above keywords may be used as an identifier by including a \_ escape anywhere within the identifier, which strips it of any keyword meanings. The two, four, and eight-digit hexadecimal escapes \xdd, \udddd, and \Udddddddd may also be used in identifiers; these strip the identifier of any keyword meanings as well.

Changes from ECMAScript 3

The following words were reserved in ECMAScript 3 but are not reserved in ECMAScript 4:

boolean   byte   char   double   final   float   int   long   short   static

The following words were not reserved in ECMAScript 3 but are reserved in ECMAScript 4:

as   is   namespace   use

Semicolon Insertion

The ECMAScript 4 syntactic grammar explicitly makes semicolons optional in the following situations:

  • Before any }
  • Before the else of an if-else statement
  • Before the while of a do-while statement (but not before the while of a while statement)
  • Before the end of the program

Semicolons are optional in these situations even if they would construct empty statements. Strict mode has no effect on semicolon insertion in the above cases.

In addition, sometimes line breaks in the input stream are turned into VirtualSemicolon tokens. Specifically, if the first through the nth tokens of an ECMAScript program form are grammatically valid but the first through the n+1st tokens are not and there is a line break (or a comment including a line break) between the nth tokens and the n+1st tokens, then the parser tries to parse the program again after inserting a VirtualSemicolon token between the nth and the n+1st tokens. This kind of VirtualSemicolon insertion does not occur in strict mode.

See also the semicolon insertion syntax rationale.

Numeric Literals

The syntax for numeric literals is the same as in ECMAScript 3, with the addition of long, ulong, and float numeric literals. The rules for numeric literals are as follows:

  • A numeric literal without a suffix is converted to an IEEE double-precision floating-point number.
  • A numeric literal with the suffix l or L is interpreted as a long value and must be a decimal or hexadecimal constant without an exponent or decimal point and be in the range of 0 through 263; furthermore, if the value is exactly 263 then the literal can only be used as the operand of the - unary negation operator.
  • A numeric literal with the suffix ul, uL, Ul, or UL is interpreted as a ulong value and must be a decimal or hexadecimal constant without an exponent or decimal point and be in the range of 0 through 264–1.
  • A numeric literal with the suffix f or F is interpreted as a float value and must be a decimal constant. Hexadecimal float constants are not permitted because the suffix would be interpreted as a hexadecimal digit.

The suffix must be adjacent to the number with no intervening white space. A number may not be followed by an identifier without intervening white space.

Regular Expression Literals

Regular expression literals begin with a slash (/) character not immediately followed by another slash (two slashes start a line comment). Like in ECMAScript 3, regular expression literals are ambiguous with the division (/) or division-assignment (/=) tokens. The lexer treats a / or /= as a division or division-assignment token if either of these tokens would be allowed by the syntactic grammar as the next token; otherwise, the lexer treats a / or /= as starting a regular expression.

This unfortunate dependence of lexical parsing on grammatical parsing is inherited from ECMAScript 3. See the regular expression syntax rationale for a discussion of the issues.


ECMAScript 4 Netscape Proposal
Core Language
Expressions
previousupnext

Monday, June 30, 2003

Most of the behavior of expressions is the same as in ECMAScript 3. Differences are highlighted below.

  {allowInnoIn}

Identifiers

Identifier 
   Identifier
|  get
|  set

The above keywords are not reserved and may be used in identifiers.

Qualified Identifiers

SimpleQualifiedIdentifier 
ExpressionQualifiedIdentifier  ParenExpression :: Identifier
QualifiedIdentifier 

Just like in ECMAScript Edition 3, an identifier evaluates to an internal data structure called a reference. However, ECMAScript 4 references can be qualified by a qualifier, which, in the general syntax, is a ParenExpression that evaluates to a namespace. For convenience, if the ParenExpression consists of a single identifier, the parentheses may be omitted: (a)::m may be written as a::m.

The reserved words public and private may also be used as qualifiers. public evaluates to the public namespace. internal (which is not a reserved word) evaluates to the containing package’s anonymous namespace. private can only be used inside a class and evaluates to the containing class’s anonymous namespace.

See the name lookup section for more information on the :: operator.

Primary Expressions

PrimaryExpression 
   null
|  true
|  false
|  Number
|  String
|  this
|  RegularExpression
ReservedNamespace 
   public
|  private
ParenExpression  ( AssignmentExpressionallowIn )
ParenListExpression 
|  ( ListExpressionallowIn , AssignmentExpressionallowIn )

public evaluates to the public namespace. private can be used only inside a class and evaluates to that class’s private namespace.

this may only be used in methods, constructors, or functions with the prototype attribute set.

Function Expressions

FunctionExpression 
   function FunctionCommon
|  function Identifier FunctionCommon

A FunctionExpression creates and returns an anonymous function.

Object Literals

ObjectLiteral  { FieldList }
FieldList 
   «empty»
NonemptyFieldList 
LiteralField  FieldName : AssignmentExpressionallowIn
FieldName 
|  String
|  Number
The ParenExpression is evaluated at run time and its result coerced to a qualified name.

Array Literals

ArrayLiteral  [ ElementList ]
ElementList 
   «empty»
|  , ElementList
LiteralElement  AssignmentExpressionallowIn

Super Expressions

SuperExpression 
   super
|  super ParenExpression

super, which may only be used inside a class C, can be applied to a subexpression that evaluates to an instance v of C. That subexpression can be either a ParenExpression or omitted, in which case it defaults to this.

As specified in the grammar below, the SuperExpression must be embedded as the left operand of a . (property lookup) or [] (indexing) operator. super changes the behavior of the operator in which it is embedded by limiting its property search to definitions inherited from class C’s superclass. See property lookup.

Postfix Expressions

PostfixExpression 
AttributeExpression 
FullPostfixExpression 
|  PostfixExpression [no line break] ++
|  PostfixExpression [no line break] --
FullNewExpression  new FullNewSubexpression Arguments
FullNewSubexpression 
ShortNewExpression  new ShortNewSubexpression
ShortNewSubexpression 

A SimpleQualifiedIdentifier or ExpressionQualifiedIdentifier expression id resolves to the binding of id in the innermost enclosing scope that has a visible binding of id. If a qualifier q is present before the id, then the QualifiedIdentifier expression resolves to the binding of id in the innermost enclosing scope that has a visible binding of id in the namespace q.

Property Operators

PropertyOperator 
Brackets 
   [ ]
|  [ ListExpressionallowIn ]
Arguments 
   ( )
ExpressionsWithRest 
RestExpression  ... AssignmentExpressionallowIn

The . operator accepts a QualifiedIdentifier as the second operand and performs a property lookup.

The grammar allows the [] operator to take multiple arguments. However, all built-in objects take at most one argument. Implementation-defined host objects may take more arguments.

For most objects other than arrays and some host objects, the expression o[m] explicitly coerces m to a qualified name q and returns the result of o.q. See property lookup.

An argument list may contain a final argument preceded by .... That argument must be an Array and cannot be null. The elements of that array become additional arguments to the function, following the arguments preceding the ..., if any; the array itself is not passed as an argument. The array must not contain holes.

Unary Operators

UnaryExpression 
|  delete PostfixExpression
|  void UnaryExpression
|  typeof UnaryExpression
|  - NegatedMinLong

The typeof operator returns a string as in ECMAScript 3. There is no way to query the most specific class of an object — all one can ask is whether an object is a member of a specific class.

Multiplicative Operators

MultiplicativeExpression 

Additive Operators

AdditiveExpression 

Bitwise Shift Operators

ShiftExpression 

Relational Operators

RelationalExpressionallowIn 
|  RelationalExpressionallowIn instanceof ShiftExpression
RelationalExpressionnoIn 

The expression a is b takes an expression that must evaluate to a type as its second operand b. When a is not null, the expression a is b returns true if a is a member of type b and false otherwise; this is equivalent to testing whether a can be stored in a variable of type b without coercion. When a is null, a is b behaves analogously to method dispatch and returns true if b is either Object or Null and false otherwise. As a special case, when a is –0.0 and b is sbyte, byte, short, ushort, int, or uint, a is b returns true.

The expression a as b returns a if a is a member of type b. Otherwise, if a can be implicitly coerced to type b, then the result is the result of that implicit coercion. Otherwise, a as b returns null if null is a member of type b or throws an exception otherwise. In any case b must evaluate to a type.

The instanceof operator behaves in the same way as in ECMAScript 3 — a instanceof b follows a’s prototype chain.

Equality Operators

EqualityExpression 
   RelationalExpression
|  EqualityExpression == RelationalExpression
|  EqualityExpression != RelationalExpression
|  EqualityExpression === RelationalExpression
|  EqualityExpression !== RelationalExpression

Binary Bitwise Operators

BitwiseAndExpression 
   EqualityExpression
|  BitwiseAndExpression & EqualityExpression
BitwiseXorExpression 
   BitwiseAndExpression
|  BitwiseXorExpression ^ BitwiseAndExpression
BitwiseOrExpression 
   BitwiseXorExpression
|  BitwiseOrExpression | BitwiseXorExpression

Binary Logical Operators

LogicalAndExpression 
   BitwiseOrExpression
|  LogicalAndExpression && BitwiseOrExpression
LogicalXorExpression 
   LogicalAndExpression
|  LogicalXorExpression ^^ LogicalAndExpression

The ^^ operator is a logical exclusive-or operator. It evaluates both operands. If they both convert to true or both convert to false, then ^^ returns false; otherwise ^^ returns the unconverted value of whichever argument converted to true.

LogicalOrExpression 
   LogicalXorExpression
|  LogicalOrExpression || LogicalXorExpression

Conditional Operator

ConditionalExpression 
   LogicalOrExpression
|  LogicalOrExpression ? AssignmentExpression : AssignmentExpression
NonAssignmentExpression 
   LogicalOrExpression
|  LogicalOrExpression ? NonAssignmentExpression : NonAssignmentExpression

Assignment Operators

AssignmentExpression 
   ConditionalExpression
|  PostfixExpression = AssignmentExpression
CompoundAssignment 
   *=
|  /=
|  %=
|  +=
|  -=
|  <<=
|  >>=
|  >>>=
|  &=
|  ^=
|  |=
LogicalAssignment 
   &&=
|  ^^=
|  ||=

Comma Expressions

ListExpression 
   AssignmentExpression
|  ListExpression , AssignmentExpression

Type Expressions

TypeExpression  NonAssignmentExpression

Compile-Time Constant Expressions

A compile-time constant expression is an expression that either produces an error or evaluates to a value that can be determined at compile time.

The reason that a compile-time constant expression is not guaranteed to always evaluate successfully at run time is that global name lookup cannot be guaranteed to succeed. It is possible for a program to import a package P that defines a global constant P::A that can be accessed as A and then dynamically define another top-level variable Q::A that collides with A. It does not appear to be practical to restrict compile-time constant expressions to only qualified names to eliminate the possibility of such collisions.

A compile-time expression can consist of the following:

  • null, numeric, boolean, and string constants.
  • Uses of the operators + (unary and binary), - (unary and binary), ~, !, *, /, %, <<, >>, >>>, <, >, <=, >=, is, as, in, instanceof, ==, !=, ===, !==, &, ^, |, &&, ^^, ||, ?:, and , as long as they are used only on numbers, booleans, strings, null, or undefined.
  • References to compile-time constants, subject to the restrictions below. The references may be qualified, but the qualifiers themselves have to be compile-time constant expressions.
  • Lookup of properties of compile-time expressions.
  • Calls to pure functions using compile-time expressions as arguments.

A pure function cannot have any read or write side effects or create any objects. A pure function’s result depends only on its arguments. An ECMAScript host embedding may define some pure functions. Currently there is no way for a script to define any such functions, but a future language extension may permit that.

A reference R to a definition D of a compile-time constant is allowed inside a compile-time constant expression as long as the conditions below are met. If D was imported from another package, then the location of D is considered to be the location of the import directive. If D is visible to R by virtue of an intervening use directive U, then the conditions below have to be satisfied both with respect to D and R and with respect to U and R.

  • D is visible to R.
  • There does not exist any scope between D’s scope and R’s scope that contains any declarations that can shadow D.
  • D was not hidden or made inaccessible by a conflict arising from another use directive between D and R.

Some compile-time constant expressions only allow references to definitions that are textually prior to the point of the reference. Other compile-time constant expressions allow forward references to later compile-time constant definitions.

Restrictions

ECMAScript 4 imposes the following restrictions:

  • TypeExpressions must be compile-time constant expressions that evaluate to types. Except for the TypeExpression specifying the superclass of a class, these expressions can contain forward references.
  • Attributes must be compile-time constant expressions that evaluate to attributes. These expressions cannot contain forward references when used as attributes of statements or directives.
  • const definitions defining constants that are used in compile-time constant expressions must have initializers that are themselves compile-time constant expressions. These initializers cannot contain forward references.
  • Default parameter values must be compile-time constant expressions. These expressions can contain forward references.
  • Initializers for instance members must be compile-time constant expressions. These expressions can contain forward references.
  • Each live import, class, and namespace directive must dominate the end of the program or package. This restriction limits these statements to the top level of the program, a top-level block, or a top-level conditional whose condition is known at compile time.
  • Each live declaration of a class member must dominate the end of the class definition.

A statement A dominates statement B if any of the following conditions are met:

  • A and B are the same statement.
  • A and B are in the same block, with A before B and no case or default labels between them.
  • Statement B is enclosed inside statement C and A dominates C.
  • Statement A is enclosed inside a block C, C is not prefixed by an attribute that evaluates to false, and C dominates B.

Note that the above definition is conservative. If statement A dominates statement B, then it is guaranteed that, if B is executed then A must have been executed earlier; however, there may be some other statements A' that also are guaranteed to have been executed before B but which do not dominate B by the above definition.

A statement A is dead if any of the following conditions are met:

  • A is prefixed by an attribute that evaluates to false.
  • There exists a break, continue, return, or throw statement B such that statements A and B are in the same block with B before A and no case or default labels between them.
  • A is enclosed inside statement B and B is dead.

Note that the above definition is conservative. If a statement is dead, then it is guaranteed that it cannot be executed; however, there may be statements that cannot be executed that are not dead by the above definition.

A statement is live if it is not dead.


ECMAScript 4 Netscape Proposal
Core Language
Statements
previousupnext

Wednesday, June 4, 2003

Statements

Most of the behavior of statements is the same as in ECMAScript 3. Differences are highlighted below.

  {abbrevnoShortIffull}
Statement 
   ExpressionStatement Semicolon
|  SuperStatement Semicolon
|  Block
|  LabeledStatement
|  IfStatement
|  DoStatement Semicolon
|  WhileStatement
|  ForStatement
|  WithStatement
|  ContinueStatement Semicolon
|  BreakStatement Semicolon
|  ReturnStatement Semicolon
|  ThrowStatement Semicolon
Substatement 
|  Statement
|  SimpleVariableDefinition Semicolon
|  Attributes [no line break] { Substatements }
Substatements 
   «empty»
SubstatementsPrefix 
   «empty»
Semicolonabbrev 
   ;
|  VirtualSemicolon
|  «empty»
SemicolonnoShortIf 
   ;
|  VirtualSemicolon
|  «empty»
Semicolonfull 
   ;
|  VirtualSemicolon

A Substatement is a statement directly contained by one of the compound statements label:, if, switch, while, do while, for, or with (but not a block). A substatement cannot be a directive except that, in non-strict mode only, it can be a var definition without attributes or types.

A substatement can also consist of one or more attributes applied to a group of substatements enclosed in braces. The attributes must evaluate to either true or false. The braces do not form a scope in this case.

The Semicolon productions allow both grammatical and line-break semicolon insertion.

Empty Statement

EmptyStatement  ;

Expression Statement

ExpressionStatement  [lookahead{function{}] ListExpressionallowIn

Super Statement

SuperStatement  super Arguments

The super statement calls the superclass’s constructor. It can only be used inside a class’s constructor.

Block Statement

Block  { Directives }

A block groups statements and forms a scope.

Labeled Statements

LabeledStatement  Identifier : Substatement

If Statement

IfStatementabbrev 
|  if ParenListExpression SubstatementnoShortIf else Substatementabbrev
IfStatementfull 
|  if ParenListExpression SubstatementnoShortIf else Substatementfull
IfStatementnoShortIf  if ParenListExpression SubstatementnoShortIf else SubstatementnoShortIf

The semicolon is optional before the else.

Switch Statement

SwitchStatement  switch ParenListExpression { CaseElements }
CaseElements 
   «empty»
CaseElementsPrefix 
   «empty»
CaseElement 
   Directive
CaseLabel 
   case ListExpressionallowIn :
|  default :

Do-While Statement

DoStatement  do Substatementabbrev while ParenListExpression

The semicolon is optional before the closing while.

While Statement

WhileStatement  while ParenListExpression Substatement

For Statements

ForStatement 
   for ( ForInitializer ; OptionalExpression ; OptionalExpression ) Substatement
|  for ( ForInBinding in ListExpressionallowIn ) Substatement
ForInitializer 
   «empty»
|  Attributes [no line break] VariableDefinitionnoIn
ForInBinding 
|  Attributes [no line break] VariableDefinitionKind VariableBindingnoIn
OptionalExpression 
   ListExpressionallowIn
|  «empty»

A for statement forms a scope. Any definitions in it (including the ForInitializer and ForInBinding) are visible inside the for statement and its substatement, but not outside the for statement. However, a var definition inside a for statement may be hoisted to the nearest enclosing regional scope.

With Statement

WithStatement  with ParenListExpression Substatement

Continue and Break Statements

ContinueStatement 
   continue
|  continue [no line break] Identifier
BreakStatement 
   break
|  break [no line break] Identifier

Return Statement

ReturnStatement 
   return
|  return [no line break] ListExpressionallowIn

A return statement can only be used inside a function or constructor. The return statement cannot have an expression if used inside a constructor or a setter.

Throw Statement

ThrowStatement  throw [no line break] ListExpressionallowIn

Try Statement

TryStatement 
   try Block CatchClauses
|  try Block CatchClausesOpt finally Block
CatchClausesOpt 
   «empty»
CatchClauses 
CatchClause  catch ( Parameter ) Block

Each CatchClause forms a scope. The Parameter, if any, is defined as a local variable visible only within the CatchClause.

The Blocks following try and finally are also scopes like other Block statements.

Directives

Directive 
|  Statement
|  AnnotatableDirective
|  Attributes [no line break] AnnotatableDirective
|  Attributes [no line break] { Directives }
|  Pragma Semicolon
AnnotatableDirective 
   VariableDefinitionallowIn Semicolon
|  NamespaceDefinition Semicolon
|  ImportDirective Semicolon
|  UseDirective Semicolon
Directives 
   «empty»
DirectivesPrefix 
   «empty»

Attributes can be applied to a group of directives by following them by a {, the directives, and a }. The attributes apply to all of the enclosed directives. The attribute true is ignored. The attribute false causes all of the enclosed directives to be omitted. When used this way, the braces do not form a block or a scope.

Annotated groups are useful to define several items without having to repeat attributes for each one. For example,

class foo {
  var z:Integer;
  public var a;
  private var b;
  private function f() {}
  private function g(x:Integer):Boolean {}
}

is equivalent to:

class foo {
  var z:Integer;
  public var a;
  private {
    var b;
    function f() {}
    function g(x:Integer):Boolean {}
  }
}

Programs

Program 

ECMAScript 4 Netscape Proposal
Core Language
Definitions
previousupnext

Thursday, May 22, 2003

Introduction

Definitions are directives that introduce new constants, variables, functions, classes, namespaces, and packages. All definitions except those of packages can be preceded by zero or more attributes. In non-strict mode there must not be any line breaks between the attributes or after the last attribute.

Attributes

Attributes 
   Attribute
AttributeCombination  Attribute [no line break] Attributes
Attribute 
|  true
|  false

An attribute is an expression (usually just an identifier) that modifies a definition’s meaning. Attributes can specify a definition’s scope, namespace, semantics, and other hints. An ECMAScript program may also define and subsequently use its own attributes. Attributes can be qualified identifiers (as long as they don’t start with a () and dotted and function call expressions, but they must be compile-time constants.

The table below summarizes the predefined attributes.

Category Attributes Behavior
Namespace private
internal
public
Makes the definition visible only in the enclosing class’s private namespace (private), the enclosing package’s private namespace (internal), or anywhere (public).
Visibility Modifier enumerable This definition can be seen using a for-in statement.
explicit This top-level definition is not shared via an import directive.
Class Modifier final This class cannot be subclassed. Can be used only on classes.
dynamic Direct instances of this class can contain dynamic properties. Can be used only on classes.
Member Modifier   static
virtual
final
The definition creates a global member (static) or instance member (virtual or final) of the enclosing class. If defining an instance member, the definition can (virtual) or cannot (final) be overridden in subclasses. Can be used only on class members.
override
override(true)
override(false)
override(undefined) 
Assertion that the definition overrides (override or override(true)), may override (override(undefined)), or does not override (override(false)) a member of a superclass. Can be used only on class members. Controls errors only.
Conditional true
false
The definition or directive is (true) or is not (false) processed.
Miscellaneous prototype Allows a function to access this and be used as a prototype-based constructor.
unused Assertion that the definition is not used.

Multiple conflicting attributes cannot be used in the same definition, so virtual final private is an error. The attributes true and false do not conflict. Specifying an attribute more than once has the same effect as specifying it once.

Namespace Attributes

Namespace attributes control the definition’s visibility. User-defined attributes provide a finer grain of visibility control.

Every package P has a predefined, anonymous namespace PackageInternalP. That namespace is attached to all definitions with the internal attribute in that package. Package P’s scope includes an implicit use namespace(PackageInternalP) definition around the package that grants access to these definitions from within the package only.

Every class C has a predefined, anonymous namespace ClassInternalC. That namespace is attached to all definitions with the private attribute in that class. Class C’s scope includes an implicit use namespace(ClassInternalC) definition around the class that grants access to these definitions from within that class only. private can only be used inside a class.

Namespace attributes, including user-defined namespaces, are additive; if several are given for a definition, then that definition is put into each of the designated namespaces. Thus, a single definition may define a name in two or more namespaces namespace1 and namespace2 by listing the namespaces as attributes: namespace1 namespace2 var x. Such multiple definitions are aliases of each other; there is only one storage location x.

A definition of a name id is always put into the namespaces explicitly specified in the definition’s attributes. In addition, the definition may be placed in additional namespaces according to the rules below:

  • If the definition explicitly specifies one or more namespaces N:
    • If the definition defines an instance member in a class C and one of C’s ancestors contains an instance member M with name id and at least one of the namespaces in N, then the definition will override M and be placed in all of M’s namespaces. It is an error if N contains another namespace that is not in M’s namespaces.
    • Otherwise, the definition is put into each of the namespaces in N; however, to avoid confusion the override(false) or override(undefined) attribute is required on the definition if an inherited instance member with the name id is visible at the point of the definition. This restriction prevents one from accidentally defining a private instance member with the same name as a public instance member in a superclass (by the rules of member lookup, such a private member would be shadowed by the existing public member for all unqualified accesses).
  • If the definition does not explicitly specify any namespaces:
    • If the definition is in a class C and one of C’s ancestors contains a visible member M with name id, then the definition will override M and be placed in all of M’s namespaces. It is an error to attempt to override two different members with one definition.
    • Otherwise, the definition is put into the public namespace.

Visibility Modifier Attributes

Visibility modifier attributes control the definition’s visibility in several special cases.

enumerable

An enumerable definition can be seen by the for-in iteration statement. A non-enumerable definition cannot be seen by such a statement. enumerable only applies to public definitions.

The default for dynamic properties and class properties is enumerable. The default for instance properties is non-enumerable. There is no way to make a user-defined dynamic or class property non-enumerable.

explicit

explicit is used to add definitions to a package P without having them conflict with definitions in other packages that import package P. explicit prevents the definition from being accessed as a top-level variable when a package is imported. The definition can still be accessed as a property of an object. For example,

package My.P1 {
  const c1 = 5;
  explicit const c2 = 7;
}

package My.P2 {
  import P = My.P1;  // Imports My.P1 without qualification
  c1;                // OK; evaluates to 5
  c2;                // Error: c2 not defined because explicit variables are not shared
  P.c2;              // OK: explicit properties are visible
}

Class Modifier Attributes

Class modifier attributes apply to the definition of a class C itself. They may only be used on definitions of classes.

final

If a class C is defined using the final attribute, then any attempt to define a subclass of C signals an error.

Note that final is also a member modifier — when used on a class member, final makes that member nonoverridable. If final is used on a class member that is itself a class, then it acts like a class modifier instead of a member modifier — it prevents the inner class from being subclassed.

dynamic

Direct instances of a dynamic class C can contain dynamic properties. Other instances cannot contain dynamic properties. A class is dynamic if it has the dynamic attribute or it has a dynamic ancestor other than Object.

Member Modifier Attributes

Member modifier attributes modify a class member definition’s semantics with respect to a class hierarchy. They may only be used on a definition of a member M of a class C. They cannot be used on definitions that, for example, create local variables inside a function.

static, virtual, and final

The static attribute makes M be a global member of C.

The virtual and final attributes make M be an instance member of C.

The final attribute prevents subclasses from defining their own members with the name M (unless they can’t see this M, in which case they can define an independent M). virtual allows subclasses to override M.

The default setting for the definition of a member M of a class named C is:

Default Attribute Kind of Member M
none — the function is treated specially as a class constructor   function C
virtual function F where the name F differs from C
final var and const definitions
none — static attribute must be specified explicitly class and namespace definitions

These attributes may not be used on an export definition, since export reuses the original member’s setting.

Note that final is also a class modifier — when used on a class member M, final prevents M from being subclassed rather than making M be a nonoverridable instance member of C.

override

The override attribute reports errors; it has no other effect on the behavior of the program. The override attribute can only be used on definitions in a class and describes the programmer’s intent to either override or not override a member from a superclass. If the actual behavior, as defined by the namespace defaulting and overriding rules, differs, then an error is signaled.

The table below describes the behavior of when a definition of a member M with name id is placed in a class C:

Override attribute given
None override or
override(true)
override(undefined) override(false)
M overrides a member in some superclass according to the namespace defaulting and overriding rules Error OK OK Error
M does not override anything but there exists an ancestor of C with a member with name id visible at the point of definition of M Error Error OK OK
M does not override anything and no ancestor of C has a member with name id visible at the point of definition of M OK Error OK OK

The middle case arises for example when an ancestor of a class C defines a public member named X and class C attempts to define a private member named X.

Conditional Attributes

An attribute whose value is true causes the definition or directive to be evaluated normally. An attribute whose value is false causes the definition or directive to be skipped; the remaining attributes and the body of the definition or directive are not evaluated. These are useful for turning definitions on and off based on configuration settings, such as:

const debug = true;
const nondebug = !debug;

debug var nCalls = 0;
debug function checkConsistency() {...}

Miscellaneous Attributes

prototype

The prototype attribute can only be used on a function. A function with this attribute treats this in the same manner as ECMAScript 3 and defines its own prototype-based class as in ECMAScript 3. By default, the prototype attribute is set on any unchecked function. It can be set explicitly on other functions as long as they are not getters, setters, or constructors.

unused

The unused attribute is a hint that the definition is not referenced anywhere. Referencing the definition will generate an error.

User-Defined Attributes

A user-defined attribute may be defined using a const definition or other definitions that define constants. All attributes must be compile-time constants. For example:

const ipriv = internal static;
explicit namespace Version1;
explicit namespace Version2;
internal const Version1and2 = Version1 Version2;

class C {
  ipriv var x;                          // Same as internal static var x;
  Version1and2 var simple;              // Same as Version1 Version2 var simple;
  Version2 var complicated;
  ipriv const a:Array = new Array(10);

  private var i;
  for (i = 0; i != 10; i++) a[i] = i;
}

Definition Scope

A definition applies to the innermost enclosing scope except when it is hoisted. If that scope is a class, the definition appears as a member of that class. If that scope is a package, the definition appears as a member of that package.

Scope Hoisting

For compatibility with ECMAScript 3, in some cases a definition’s scope is hoisted to the innermost regional scope R instead of the innermost scope S. This happens only when all of the conditions below are met:

  • The definition is a var definition.
  • The definition does not specify a type.
  • The definition has no attributes other than true.
  • The regional scope R is not a class.
  • Strict mode is not in effect.

When a definition of n is hosted, the effect is as though n were declared (but not initialized) at the top of the regional scope R.

Definitions not meeting the above criteria are not hoisted. However, an inner non-hoisted definition of name n in scope S within regional scope R prevents n from being referenced or defined in any scope within R but outside S; see definition conflicts.

Extent

A definition extends an activation frame with one or more bindings of qualified names to values. The bindings are generally visible from the activation frame’s scope. However, a definition may be invisible or partially invisible inside its scope either because it is shadowed by a more local definition or it uses a namespace that is not used. The name lookup rules specify the detailed behavior of accessing activation frame bindings.

Each definition or declaration D of a name n applies to some scope S using the rules above. Any of S’s activation frames will contain a binding for n as soon as S is entered. That binding starts in the following state:

  • In non-strict mode, a var n definition D without a type or attributes is initialized to the value undefined upon entry into S.
  • In non-strict mode, a function n definition D without types or attributes is initialized to its closure upon entry into S instead of at the time D is executed.
  • A class definition D of a name n in scope S binds n to an opaque value V of type Type upon entry into S. V may be used as a type to declare variables, but all other operations are prohibited. V becomes the actual class object at the time the definition is executed, after which point instances of V may be created and subclasses may be derived from V.
  • All other definitions produce bindings in the uninitialized state upon entry into the scope S. The bindings are initialized at the time the definition is executed.

Accessing an activation frame binding in the uninitialized state is an error. If this happens, implementations are encouraged to throw an exception, but may return a value V if they can prove that the definition would assign the value V to the binding.

Definition Conflicts

In general, it is not legal to rebind the same name in the same namespace within an activation frame A. There are a couple exceptions:

  • Multiple var definitions (which may be hoisted) are allowed in a regional scope as long as all such definitions have no type and no attributes and strict mode is not in effect.
  • A getter and a setter with the same name may be defined independently.

In addition, if a name n is defined in a scope S inside regional scope R, then it is not permitted to access a definition of n made outside of R from anywhere inside R. Also, two nested scopes S1 and S2 located inside the same regional scope R cannot both define n (S1, S2, and R may be the same scope). In either of these situations, n may be hoisted; if hoisting is not allowed, an error occurs. For example,

const b:Integer = 1;

function f(c:Boolean):Integer {
  const a = b;  // Error: b is defined inside the local scope below, which prevents accesses to global b
                // from anywhere inside the regional scope
  if (c) {
    const b:Integer = a + 10;  // OK to hide the global b from here.
    return b;
  }
  return a;
}

function g(c:Boolean):Integer {
  const b = 3;  // OK to hide the global b from here.
  if (c) {
    const b:Integer = 10;  // Error: can’t redefine b inside the same regional scope.
    return b;
  }
  return b;
}

function h(c:Boolean):Integer {
  if (c) {
    const b:Integer = 10;  // OK to hide the global b from here.
    return b;
  } else {
    const b:Integer = 42;  // OK: Two independent local definitions of b.
    return b;
  }
}

To help catch accidental redefinitions, binding a qualified name q::n in activation frame A when there is already a binding r::n in A causes an error if both namespaces q and r are used at the point of the definition of q::n and the bindings are not aliases of each other. This prevents the same name from being used for both public and private variables in the same class. Two bindings sharing the same name but with different namespaces may still be introduced into an activation frame, but only by code that does not use one or both of the namespaces.

Examples

In the example below the comments indicate the scope and namespace of each definition:

var a0;                  // Public global variable
internal const a1 = true;// Package-visible global variable
private var a2;          // Error: private  can only be used inside a class
public var a3 = b1;      // Public global variable

if (a1) {
  var b0;                // Local to this block
  var b1;                // Hoisted to the global level because of the reference to b1 in the definition of a3
}

if (a1) {
  var b0;                // Local to this block
}

public function F() {    // Public global function
  var c0;                // Local to this function
  internal var c1;       // Local to this function  (may generate a style warning)
  public var c2;         // Local to this function  (may generate a style warning)
}

class C {                // Public global class
  var e0;                // Public class instance variable
  private var e1;        // Class-visible class instance variable
  internal var e2;       // Package-visible class instance variable
  public var e3;         // Public class instance variable
  static var e4;         // Public class-global variable
  private static var e5; // Class-visible class-global variable
  internal static var e6;// Package-visible class-global variable
  public static var e7;  // Public class-global variable

  if (a1) {
    var f0;              // Local to this block
    private var f1;      // Local to this block  (may generate a style warning)
  }
  public function I() {} // Public class method
}

Discussion

Should we have a protected Attribute? It has been omitted for now to keep the language simple, but there does not appear to be any fundamental reason why it could not be supported. If we do support it, it might be better to choose the C++ protected concept (visible only in class and subclasses); the Java protected concept (visible in class, subclasses, and the original class’s package) could be represented as internal protected.


ECMAScript 4 Netscape Proposal
Core Language
Variables
previousupnext

Wednesday, June 4, 2003

Variable Definitions

VariableDefinition  VariableDefinitionKind VariableBindingList
VariableDefinitionKind 
   var
|  const
VariableBindingList 
   VariableBinding
|  VariableBindingList , VariableBinding
VariableBinding  TypedIdentifier VariableInitialisation
VariableInitialisation 
   «empty»
|  = VariableInitializer
VariableInitializer 
   AssignmentExpression
TypedIdentifier 
|  Identifier : TypeExpression

A SimpleVariableDefinition represents the subset of VariableDefinition expansions that may be used when the variable definition is used as a Substatement instead of a Directive in non-strict mode. In strict mode variable definitions may not be used as substatements.

SimpleVariableDefinition  var UntypedVariableBindingList
UntypedVariableBindingList 
UntypedVariableBinding  Identifier VariableInitialisationallowIn

A variable defined with var can be modified, while one defined with const is read-only after its value is set. Identifier is the name of the variable and TypeExpression is its type. Identifier can be any non-reserved identifier. TypeExpression must be a compile-time expression that evaluates to a type t other than Never. TypeExpression may contain forward references to compile-time constants defined later in the program.

If provided, AssignmentExpression gives the variable’s initial value v. If AssignmentExpression is not provided in a var definition, then undefined is assumed. If the variable being defined is not an instance member of a class, then the AssignmentExpression is evaluated at the time the variable definition is evaluated. The resulting value is then implicitly coerced to the variable’s type t and stored in the variable. If the variable is defined using var, any values subsequently assigned to the variable are also implicitly coerced to type t at the time of each such assignment. If the variable is an instance member of a class, then the AssignmentExpression is evaluated each time an instance of the class is constructed.

Reading or writing a variable before its definition is evaluated signals an error except when the variable definition has no attributes and no type and strict mode is not in effect; in that case, the variable may be read or written prior to its definition being evaluated and its initial value is undefined.

Multiple variables separated by commas can be defined in the same VariableDefinition. The values of earlier variables are available in the AssignmentExpressions of later variables.

If omitted, TypeExpression defaults to type Object. Thus, the definition

var a, b=3, c:Integer=7, d, e:Integer, f:Number=c;

is equivalent to:

var a:Object = undefined;
var b:Object = 3;
var c:Integer = 7;
var d:Object = undefined;
var e:Integer = undefined;   // Implicitly coerced to NaN
var f:Number = c;            // 7

The compiler might issue a warning for a VariableDefinition that contains an untyped variable prior to a typed variable to remind programmers that the type of d is Object rather than Integer in var d, e:Integer.

Constant Definitions

const means that assignments to Identifier are not allowed after the constant has been set. Once defined, a constant cannot be redefined in the same scope, even if the redefinition would be to the same value. If the VariableBinding in a const declaration does not contain an initializer, then the constant may be written once after it is defined. Any attempt to read the constant prior to writing its value will result in an error. For example:

function f(x) {return x+c}

f(3);           // Error: c’s value is not defined
const c = 5;
f(3);           // Returns 8
const c = 5;    // Error: redefining c

Just like any other definition, a constant may be rebound after leaving its scope. For example, the following is legal; j is local to the block, so a new j binding is created each time through the loop:

var k = 0;
for (var i = 0; i < 10; i++) {
  const j = i;
  k += j;
}

A const definition defines a compile-time constant if it has an initializer and that initializer is a compile-time constant expression which may contain forward references to other compile-time constants.

In order for the compiler to be able to distinguish const definitions that define run-time constants from ones that define compile-time constants, it must be able to resolve each variable referenced in a const initializer to a scope. Because of this, a const initializer may only refer to variables declared at compile time; referring to a dynamically created variable not declared at compile time results in a compile-time error. It is also an error to attempt to create a dynamic variable that changes the resolution of a variable in a const initializer. For example:

const a = 5;      // OK: a is a compile-time constant with the value 5
const b = a + c;  // 
OK: b is a compile-time constant with the value 7
const c = 2;      // 
OK: c is a compile-time constant with the value 2
const d = e;      // 
OK: d is a run-time constant that starts as undefined
var e = 2.718281828459045;
f = "Run time";
const g = f;      // 
Error: f is not declared at compile time
const h = uint;   // 
OK: h is a compile time constant that holds the type uint
this.ulong = 15;  // 
OK: creates the global property ulong that shadows the system ulong type
this.uint = 15;   // 
Error: can’t create the global property uint because it would
                  // 
  change the resolution of uint in the definition of h

Inside a class, const preceded by final defines an instance member. Preceding it with virtual would also define an instance member, but is only useful if one wants subclasses to be able to override the constant. Precede it with static to define a global member. The default is final.

If const is declaring an instance member m of a class, then the initializer is evaluated each time an instance of the class is constructed. If absent, then the member’s property may be written exactly once, cannot be re-written after it has been written, and must be written before it can be read. For example:

class C {
  static const red = 0xFF0000;      // Defines static constant C.red
  static const green = 0x00FF00;    // Defines static constant C.green
  static const blue = 0x0000FF;     // Defines static constant C.blue
  static const infrared;            // Defines uninitialized static constant C.infrared

  const myColor;                    // Defines instance constant C::myColor with value set by the constructor
  final const yourColor;            // Defines instance constant C::yourColor with value set by the constructor
  const ourColor = 0;               // Defines instance constant C::ourColor that is always zero (not very useful)
  virtual const theirColor = 0;     // Defines instance constant C::theirColor that can be overridden by subclasses

  function C(x:int) {
    myColor = x;                    // Sets this instance’s myColor
    ourColor = x;                   // Error: ourColor is already set to 0
    myColor = x;                    // Error: myColor can be set only once
    var a = [x, this];
    a[1].yourColor = x;             // Sets this instance’s yourColor
  }
}

Getters and Setters

A definition var x:t = v internally creates a hidden variable and defines a getter and a setter to access that variable:

  • Evaluate t, which should evaluate to a type.
  • Create an anonymous variable .
  • Implicitly coerce undefined to type t (such a coercion must exist for every type) and assign the result to .
  • Define a getter function get x():t {return }.
  • Define a setter function set x(a:t):Void { = a}.
  • Evaluate v, implicitly coerce it to type t, and assign the result to .

A definition const x:t = v internally creates a hidden variable and defines a getter to access that variable:

  • Evaluate t, which should evaluate to a type.
  • Create an anonymous variable .
  • Define a getter function get x():t {return }.
  • Define a setter function set x(a:t):Never {throw ConstWriteError}.
  • Evaluate v, implicitly coerce it to type t, and assign the result to .

This relationship between a variable and its getter and setter is normally transparent but can be exploited occasionally. For instance, a variable can be declared that is private for writing but public for reading:

private var name:String;
public export get name;

A subclass may override a variable’s getter or setter. To do this, the original variable has to be declared non-final because variables are final by default:

class C {
  virtual var x:Integer;
  var y:Integer;
}

class D extends C {
  override function set x(a:Integer):Integer {y = a*2}
}

var c = new C;
c.x = 5;
c.x;       // Returns 5
c.y;       // Returns NaN (the default value for an Integer variable)
var d = new D;
d.x = 5;
d.x;       // Returns NaN
d.y;       // Returns 10

ECMAScript 4 Netscape Proposal
Core Language
Functions
previousupnext

Monday, April 28, 2003

Syntax

FunctionDefinition  function FunctionName FunctionCommon
FunctionName 
|  get [no line break] Identifier
|  set [no line break] Identifier
FunctionCommon  ( Parameters ) Result Block

Like other definitions, a function definition may be preceded by one or more attributes, which affect the function’s scope, namespace, and semantics. Every function (except a getter or a setter) is also a value and has type Function.

Unless a function f is defined with the prototype attribute (either explicitly or by default because f is unchecked), that function does not define a class, f’s name cannot be used in a new expression, and f cannot refer to this unless f is an instance method or constructor of a class.

A FunctionDefinition can specify a function, getter (if its name is preceded by get), or setter (if its name is preceded by set).

Parameters give the names and the types of the function’s parameters. Result gives the type of the function’s result. The Block contains the function body and is evaluated only when the function is called.

Parameter Declarations

A function may take zero or more parameters and an optional rest parameter. Optional parameters may follow but not precede required parameters (this condition is not in the grammar but is checked by the formal semantics).

Parameters 
   «empty»
NonemptyParameters 

Individual parameters have the forms:

Parameter  ParameterAttributes TypedIdentifierallowIn
ParameterAttributes 
   «empty»
|  const
ParameterInit 
   Parameter
RestParameter 
   ...

The TypeExpression gives the parameter’s type and defaults to type Object. The TypeExpression must evaluate to a type other than Never.

If a Parameter is followed by a =, then that parameter is optional. If a function call does not provide an argument for an optional parameter, then that parameter is set to the value of its AssignmentExpression, implicitly coerced to the parameter’s type if necessary. The AssignmentExpression must be a compile-time constant.

If a Parameter is prefixed with const, then the parameter is declared using const instead of var. The effect is that the parameter’s value cannot be changed from within the function. Without the const, the function can change the parameter’s value, which, however, has no effect on the argument.

If a function call does not provide an argument for a required Parameter, then an error occurs unless the function is unchecked, in which case the parameter gets the value undefined, implicitly coerced to the parameter’s type if necessary.

The parameters’ Identifiers are local variables with types given by the corresponding TypeExpressions inside the function’s Block. Code in the Block may read and write these variables. Arguments are passed by value, so writes to these variables do not affect the passed arguments’ values in the caller.

Attempting to define a function with two different parameters with the same name is an error.

Rest Parameter

If the ... is present, the function accepts arguments not matched by any of the other listed parameters. If a parameter is given after the ..., then that parameter’s identifier is bound to an array of all remaining arguments. That identifier is declared as a local var or const using the type Array. The remaining arguments are stored as elements of the rest array with numeric indices starting from 0.

Each unchecked function also has a predefined const arguments local variable which holds an array (of type Array) of all arguments passed to this function.

Result Type

Result 
   «empty»
|  : TypeExpressionallowIn

The function’s result type is TypeExpression, which defaults to type Object if not given. The TypeExpression must evaluate to a type.

If the function does not return a useful value, it’s good practice to set TypeExpression to Void to document this fact. If the function cannot return at all (it either always falls into an infinite loop or throws an exception), then it’s good practice to set TypeExpression to Never to document this fact; this also lets the compiler know that code after a call to this function is unreachable, which can help cut down on spurious warnings.

Evaluation Order

A function’s parameter and result TypeExpressions are evaluated at the time the function definition or declaration is executed. These types are then saved for use in argument and result coercions at the time the function is called.

The static and dynamic extent of a parameter includes all subsequent parameters’ and the result type’s TypeExpressions and AssignmentExpressions. However, the case where a subsequent parameter’s or the result type’s TypeExpression or AssignmentExpression references a prior parameter is reserved for a future language extension. For now, an implementation should raise an error in this case:

const t = Integer;
function choice(a:Boolean, t:Type, c:t, d:t):t {
  return a ? c : d;
}

This definition of choice should (for now) be an error and not:

function choice(a:Boolean, t:Type, c:Integer, d:Integer):Integer {
  return a ? c : d;
}

The intent is that a future language extension might make the first definition of choice legal and permit calls to it like choice(true,String,"Be","Not Be"), which would return "Be".

When a function is called, the following list indicates the order of evaluation of the various expressions in a FunctionDefinition. These steps are taken only after all of the argument names and values have been evaluated.

  1. If the function is unchecked, bind the arguments local variable to an array of all arguments and their names.
  2. Get the saved type t that was the result of evaluating the first parameter’s TypeExpression at the time the function was defined.
  3. If the first parameter is required and no argument has been supplied for it, then raise an error unless the function is unchecked, in which case let undefined be the first parameter’s value.
  4. If the first parameter is optional and there is an argument remaining, use the value of the argument. If there are no remaining arguments, then evaluate the first parameter’s AssignmentExpression and let it be the first parameter’s value.
  5. Implicitly coerce the argument (or default) value to type t and bind the parameter’s Identifier to the result.
  6. Repeat steps 2-5 for each additional parameter.
  7. If there is a RestParameter with an Identifier, bind that Identifier to an array of the remaining arguments using indices starting from 0.
  8. If there is no RestParameter and any arguments remain, raise an error unless the function is unchecked.
  9. Evaluate the body.
  10. Get the saved type r that was the result of evaluating the result TypeExpression at the time the function was defined.
  11. Implicitly coerce the result to type r and return it.

Getters and Setters

If a FunctionName contains the keyword get or set, then the defined function is a getter or a setter.

A getter must not take any parameters. Unlike an ordinary function, a getter is invoked by merely mentioning its name without an Arguments list in any expression except as the destination of an assignment. For example, the following code returns the string “<2,3,1>”:

var x:Integer = 0;
function get serialNumber():Integer {return ++x}

var y = serialNumber;
return "<" + serialNumber + "," + serialNumber + "," + y + ">";

A getter must either evaluate a return statement or throw an exception; it cannot fall off the end without returning a value.

A setter must take exactly one required parameter. Unlike an ordinary function, a setter is invoked by merely mentioning its name (without an Arguments list) on the left side of an assignment or as the target of a mutator such as ++ or --. The setter should not return a value and should be declared as returning type Void or Never. The result of an assignment expression is the argument passed to the setter. For example, the following code returns the string “<1,2,42,43>”:

var x:Integer = 0;
function get serialNumber():Integer {return ++x}
function set serialNumber(n:Integer):Void {x=n}

var s = "<" + serialNumber + "," + serialNumber;
s += "," + (serialNumber = 42);
return s + "," + serialNumber + ">";

A setter cannot return a value; it may invoke a return statement as long as that statement does not supply an expression.

A setter can have the same name as a getter in the same lexical scope. A getter or setter cannot be extracted from its variable, so the notion of the type of a getter or setter is vacuous; a getter or setter can only be called.

Contrast the following:

var x:Integer = 0;
function f():Integer {return ++x}
function g():Function {return f}
function get h():Function {return f}

f;     // Evaluates to function f
g;     // Evaluates to function g
h;     // Evaluates to function f (not h)
f();   // Evaluates to 1
g();   // Evaluates to function f
h();   // Evaluates to 2
g()(); // Evaluates to 3

See also the discussion of getter and setter syntax.

Unchecked Functions

An unchecked function relaxes argument checking. Unchecked function definitions are provided for compatibility with ECMAScript 3.

A function definition is unchecked if all of the following are true:

  • strict mode is disabled at the point of the function definition;
  • the function is not a class member;
  • the function has no optional or rest parameters;
  • none of the function’s parameters has a declared type;
  • the function does not have a declared return type;
  • the function is not a getter or setter.

An unchecked function also has the prototype attribute set by default.


ECMAScript 4 Netscape Proposal
Core Language
Classes
previousupnext

Monday, April 28, 2003

Class Definitions

Classes are defined using the class keyword.

ClassDefinition  class Identifier Inheritance Block
Inheritance 
   «empty»
|  extends TypeExpressionallowIn

Like other definitions, a class definition may be preceded by one or more attributes, which affect the class’s scope, namespace, and semantics. Every class is also a value and has type Type.

A class definition may only be located at a scope that allows class definitions, defined as follows:

  • The global scope allows class definitions
  • A package scope allows class definitions
  • A class scope allows class definitions
  • If a scope X allows class definitions and a block B is directly inside scope X, then B’s scope also allows class definitions

According to these rules, a class may not be defined inside a function or a compound statement other than a block. If a class B is defined as a member of another class A, then B must be declared static.

Superclasses

A class may have a superclass specified by its extends clause. If omitted, the superclass defaults to Object. The superclass TypeExpression must be a compile-time constant expression without forward references.

A class is a subtype of its superclass.

Body

When a ClassDefinition is evaluated, the following steps take place:

  1. Create a new type t and bind the class’s QualifiedIdentifier to the constant t.
  2. The TypeExpression, if any, in the extends clause is evaluated, and t is made a subtype of its superclass. Any static members of t’s superclass are also defined as properties of the object t.
  3. A new, anonymous namespace for holding the class’s private members is constructed and used for the lexical extent of the Block.
  4. Block is evaluated using a new activation frame initialized with alias bindings for all most derived global members of the superclass. Any static and constructor members defined for Block’s activation frame are added as properties of the object t as they are being defined; these may hide static members inherited from superclasses.
  5. If Block is evaluated successfully (without throwing out an exception), all instance members defined for Block’s top-level scope (along with those inherited from superclasses) are collected to make a template for creating instances of type t.

A ClassDefinition’s Block is evaluated just like any other Block, so it can contain expressions, statements, loops, etc. Such statements that do not contain declarations do not contribute members to the class being declared, but they are evaluated when the class is declared.

Instance Members

A class C’s instance member id becomes a separate property of each instance of C. If c is an instance of C, then such a property can be accessed using the expression c.id. Instance members are inherited from the superclass.

If present, an initializer for a var or const instance member must be a compile-time constant expression.

Methods

A function instance member is called a method. A method may use this to refer to the object on which it was called. The value of this will always be an instance of the class or one of its subclasses. A method may not change the value of this.

A method is not in itself a value and has no type. There is no way to extract an undispatched method from a class. The . operator produces a function (more specifically, a closure) that is already dispatched and has this bound to the left operand of the . operator.

A method is called by combining the . operator with a function call. For example:

class C {
  var x:Integer = 3;
  function m() {return x}
  function n(x) {return x+4}
}

var c = new C;
c.m();                 //
returns 3
c.n(7);                //
returns 11
var f:Function = c.m;  //
f is a zero-argument function with this bound to c
f();                   //
returns 3
c.x = 8;
f();                   //
returns 8

Overriding

A class C may override a method m defined in its superclass s. To do this, C should define a method m' with the same name as m and use the override attribute in the definition of m'. Overriding a method without using the override attribute or using the override attribute when not overriding a method results in an error intended to catch misspelled method names.

The overriding method m' must have the same set of parameters that the overridden method m has.

Let p be any parameter. If m' does not specify a type for p, it inherits the type of p from m. If m' does specify a type for p, it must be the same type as that for p in m. If p is optional in m, then it must also be optional in m' with the same parameter name; however, the default value may differ.

If omitted, the return type of m' defaults to the return type of m. If supplied, the return type of m' must be the same as the return type of m.

A final method cannot be overridden (or further overridden) in the subclasses in which it is visible.

The overridden method m' is put in the same namespaces as method m.

Method m' may call method m using the super operator: either super.m(args) or super this.m(args).

A method may only override another method. An instance variable may only override another instance variable. A getter may override a getter or an instance variable. A setter may override a setter or an instance variable.

Static Members

A class C’s static member id becomes a property of the class object C. This member can be accessed using the expression C.id. static members are inherited from the superclass.

Inherited static variables have only one global value, not one value per subclass. For example, if class C has a static variable v and class D inherits from C, then v can be read or written either as C.v or as D.v; it’s the same variable rather than two separate variables.

Each instance member o named n of class C (other than members that are setters without a corresponding getter) also causes a global member g named n to be defined in C. That global member is currently inaccessible and reserved for a future language extension.

Constructors

A constructor is a function that creates a new instance of a class C. A constructor is defined as a method with the name C without any of the attributes static, virtual, or final. A constructor is invoked using the expression new C or new C(args).

A constructor can refer to its class’s instance variables via this. If a class C inherits from class B, then when B’s constructor is called while creating an instance of C, B’s constructor will be able to call virtual methods of class C on the partially constructed instance. Likewise, B’s constructor could store this into a global variable v and some other function could call a method of C on the partially constructed object v. Class C’s methods can be assured that they are only called on fully initialized instances of C only if neither C nor any of its ancestors contains a constructor that exhibits either of the behaviors above.

A constructor may invoke a return statement as long as that statement does not supply a value; a constructor cannot return a value. The newly created object is returned automatically. A constructor’s return type must be omitted. A constructor always returns a new instance.

A class named C must not define a static member with the name C in any namespace; such usage is reserved for a future extension.

If a class C does not define a constructor or a static function with the name C, a default constructor is automatically defined; that constructor takes the arguments that C’s superclass’s constructor takes, calls that superconstructor with those arguments, and initializes C’s new instance members to their default values.

Calling a Superconstructor

Let C be a class and B its superclass. C’s constructor must call B’s constructor before it accesses this or super or before it returns. The call can be either explicit or implicit; if C’s constructor does not contain any calls to B’s constructor, then a call to B’s constructor with no arguments is automatically inserted as the first statement of C’s constructor. C’s constructor does not have to call B’s constructor when it exits by throwing an exception. C’s constructor may not call B’s constructor again after it already called it.

C’s constructor calls B’s constructor using the statement super(args). This must be a complete statement; it means something different if it is a subexpression of a larger expression. It is not possible to skip class hierarchy levels while constructing an object — if C’s superclass is B and B’s superclass is A, then C’s constructor cannot directly call A’s constructor.


ECMAScript 4 Netscape Proposal
Core Language
Namespaces
previousupnext

Wednesday, September 4, 2002

Namespace Definition

NamespaceDefinition  namespace Identifier

A namespace definition defines a new namespace named Identifier.

A namespace definition may only be located at a scope that allows class definitions. If a namespace is defined as a member of a class, then the namespace must be declared static.

Use Directive

UseDirective  use namespace ParenListExpression

A use namespace directive makes the contents of each namespace in the comma-separated list ParenListExpression accessible without a qualifier.

use namespace directives are lexically scoped and their effect does not extend past the end of the enclosing block, directive group, or substatement group. A use namespace directive may be preceded by attributes; however, all such attributes must evaluate to either true or false.

Name Lookup

The following paragraphs describe what happens when a name is looked up. See also the description of how namespace attributes affect name definitions.

Properties

Conceptually, an instance x is a collection of properties. All properties have property names q::nC, where q is a namespace, n is an identifier, and C is a class. There may be several aliases that refer to the same property (due to either multiple namespace attributes or aliases introduced with the export definition), but a property name q::nC can refer to at most one property of an instance.

An instance x can have several properties q::nC with the same namespace q and name n but different classes C. In the following descriptions, q::n denotes the most derived of these properties, which is the one with the most derived class C.

A property reference can be either unqualified or qualified and is looked up according to the table below. There are two entries in the table for each kind of lookup, depending on whether the left operand of the . operator is a SuperExpression or not. x is an expression that evaluates to an instance, is the set of all scopes enclosing the property reference, and Q is the set of all namespaces q that are used by the scopes in .

Qualified reference x.q::n where q is a namespace Select x’s most derived property q::n. Signal an error if there is no such property.
Qualified reference super x.q::n where q is a namespace This form may only be used for references in the scope of a class C other than Object. Let S be C’s superclass. Among all of x’s properties q::nA select the one whose class A is most derived but still either S or an ancestor of S. Signal an error if there is no such property.
Unqualified reference x.n Let A be the least derived (closest to Object) class such that x contains at least one property named q::nA where q is any element of Q; signal an error if x has no such properties. Let Q' be the set of all namespaces q such that q is in Q and x contains the property named q::nA. Let P be the set of all most derived properties q::n of x such that q is in Q'. If P has only one element p or if all of P’s elements are aliases of one property p, select p; otherwise signal an error.
Unqualified reference super x.n This form may only be used for references in the scope of a class C other than Object. Let S be C’s superclass. Let A be the least derived (closest to Object) class such that x contains at least one property named q::nA where q is any element of Q; signal an error if x has no such properties or if A is not S or an ancestor of S. Let Q' be the set of all namespaces q such that q is in Q and x contains the property named q::nA. For each q in Q' let Bq be the most derived class such that Bq is S or an ancestor of S and x contains the property q::nBq; for each q in Q' let pq be the property q::nBq. Let P be the set of all such properties pq. If P has only one element p or if all of P’s elements are aliases of one property p, select p; otherwise signal an error.
Dynamic reference x[s] Let s evaluate to a string n. Get the property x.public::n. Signal an error if there is no such property.
Dynamic reference super x[s] Let s evaluate to a string n. Get the property super x.public::n. Signal an error if there is no such property.

Note that the only way to access an overridden method is to use super. This is by design to prevent security attacks.

Variables

Conceptually, all variables (which for the purpose of this section also include constants, functions, classes, and such) have qualified names q::n, where q is a namespace and n an identifier. There may be several aliases that refer to the same variable (due to either multiple namespace attributes or aliases introduced with the export definition), but there cannot be two different variables defined using the same qualified name in the same scope.

A variable reference can be either unqualified or qualified and is looked up as follows:

Qualified reference q::n where q is a namespace

Let be the set of all scopes enclosing the qualified reference q::n. Search the scopes in , starting from the innermost one and continuing outwards until a value is found or all scopes have been examined. If no binding has been found after all scopes have been examined, signal an error. For each scope S in , do the following:

  1. If S’s activation frame currently has a binding for q::n, select the value to which q::n is bound.
  2. Otherwise, if S is the scope of the definition of a class C and C or any of its ancestors contains a global member named q::n, then select C’s most derived property q::n. Signal an error if there is no such property.
  3. Otherwise, if S is the top-level scope of a constructor or instance method defined in class C and C or any of its ancestors contains an instance member named q::n, then let this be the value of this corresponding to S’s activation frame and select this’s most derived property q::n. Signal an error if there is no such property.
  4. Otherwise, if S is the scope of the definition of a class C and C or any of its ancestors contains an instance member named q::n, then signal an error.
Unqualified reference n

Let be the set of all scopes enclosing the unqualified reference n. Search the scopes in , starting from the innermost one and continuing outwards until a value is found or all scopes have been examined. If no binding has been found after all scopes have been examined, signal an error. For each scope S in , do the following:

  1. If S’s activation frame A currently has a binding for a variable V, V has a name q::n for some namespace q that is used by a scope in , and that use includes n, then select the value to which q::n is bound in A. If A contains more than one such variable V (not counting aliases of the same variable), signal an error.
  2. Otherwise, if S is the scope of the definition of a class C, C or any of its ancestors contains a global member named q::n for some namespace q that is used by a scope in , then select C’s most derived property q::n. Signal an error if there is no such property or if there are multiple such properties (not counting aliases of the same property) for different q’s.
  3. Otherwise, if S is the top-level scope of a constructor or instance method defined in class C and C or any of its ancestors contains an instance member named q::n for some namespace q that is used by a scope in , then select this’s most derived property q::n using the value of this corresponding to S’s activation frame. Signal an error if there is no such property or if there are multiple such properties (not counting aliases of the same property) for different q’s.
  4. Otherwise, if S is the scope of the definition of a class C, C or any of its ancestors contains an instance member named q::n for some namespace q that is used by a scope in , then signal an error.

 


ECMAScript 4 Netscape Proposal
Core Language
Packages
previousupnext

Wednesday, June 4, 2003

Packages were originally part of the ECMAScript Edition 4 proposal but have been removed due to time constraints. If implemented, packages and import directives might be defined as described below.

Defining Packages

Packages are an abstraction mechanism for grouping and distributing related code. Packages are designed to be linked at run time to allow a program to take advantage of packages written elsewhere or provided by the embedding environment. ECMAScript 4 offers a number of facilities to make packages robust for dynamic linking:

  • Selected package contents can be protected from outside reference.
  • Classes can maintain invariants that cannot be violated by code outside the class and/or package.
  • Function arguments and data structure references can be type-checked to limit the kinds of unexpected inputs the package’s code can experience.
  • Packages can export multiple namespaces, allowing graceful upgrades to packages without changing the code that uses them.

A package is defined using the following syntax:

PackageDefinition  package PackageNameOpt Block
PackageNameOpt 
   «empty»
PackageName 
   String
PackageIdentifiers 

When a package is defined, it may, but is not required to, be given a PackageName, which is either a string or a series of dot-separated identifiers. It is implementation-defined what the restrictions, if any, are on naming packages to avoid clashes with other packages that may be present.

The Block contains the body of a package P. The Block is evaluated at the time package P is loaded. Any public top-level definitions are available to other packages that import package P. Any public class member definitions are available to all other packages, regardless of whether they import package P. Top-level and class definitions defined by P in another namespace N are available to other packages only if they use namespace N or qualify the access with namespace N.

A package is loaded (its body is evaluated) when the package is first imported or invoked directly (if, for example, the package is on an HTML web page). Some standard packages are loaded when the ECMAScript engine first starts up. When a package is loaded, its statements are evaluated in order, which may cause other packages to be loaded along the way when import directives are encountered. Circularities are not allowed in the graph of package imports.

Two attempts to load the same package in the same environment result in sharing of that package. What constitutes an environment is necessarily application-dependent. However, if package P1 loads packages P2 and P3, both of which load package P4, then P4 is loaded only once and thereafter its code and data is shared by P2 and P3.

Javascript does not support package definition circularities (two packages A and B that each import the other), although an implementation may provide such a facility as an extension.

Importing Packages

A package P can reference another package Q via an import directive:

ImportDirective 
   import PackageName
|  import Identifier = PackageName

An import directive may be preceded by attributes; however, all such attributes must evaluate to either true or false.

There are two ways an import directive can name a package to be imported:

  • The PackageName may be PackageIdentifiers. In this case, the system looks for a package with that exact PackageIdentifiers on its implementation-defined search path.
  • The PackageName may be a literal string. In this case, the system interprets the contents of the string in an implementation-defined manner in order to locate the package. Specific ECMAScript 4 embeddings should define the manner in which the contents of the string are interpreted. For example, a browser embedding may be defined to interpret the string as a URI and look for a package at the location given by that URI.

An import directive does the following:

  • Locate the target package specified by PackageName. If the package has not yet been loaded, then load it and wait until the target package’s Block is done evaluating. If loading the target package causes an import of the current package then throw a package circularity exception.
  • Let P be the target package object.
  • If Identifier is given, const-bind it to P in the current scope.
  • For each non-explicit top-level definition N::n (n in namespace N) in P, bind an alias N::n to P’s N::n in the global scope unless N::n is already defined in the global scope.

If package P has a top-level definition n and package Q imports P using import PkgP = P, then package Q can refer to n as either n or PkgP.n. The shorter form n is not available if it conflicts with some other n. If package P has an explicit top-level definition n and package Q imports P, then package Q can refer to that n only as PkgP.n.


ECMAScript 4 Netscape Proposal
Core Language
Pragmas
previousupnext

Tuesday, January 28, 2003

Pragmas allow a script writer to select program modes and options. Pragmas are lexically scoped and their effect does not extend past the end of the enclosing block, directive group, or substatement group.

Pragma  use PragmaItems
PragmaItems 
PragmaItem 
|  PragmaExpr ?
PragmaExpr 
PragmaArgument 
   true
|  false
|  Number
|  - Number
|  - NegatedMinLong
|  String

The keyword use is followed by one or more PragmaItems, each of which consists of an identifier, an optional argument, and an optional ?.

The pragma identifiers below are currently defined. Implementations may define additional identifiers that have meaning as pragmas.

Identifier   Meaning
ecmascript(n) Error if version n of ECMAScript is not supported; otherwise recommends but does not require that the implementation only support ECMAScript version n features.
strict
strict(true)
Strict mode
strict(false) Non-strict mode (default)

The pragma takes effect starting with the directive after the pragma and continues either until the end of the enclosing block or statement group or until overridden by another pragma in the same block or statement group, whichever comes first. If a pragma references the same identifier several times, the last reference takes precedence. The semicolon insertion rule changes implied by the strict pragma apply to the semicolon, if any, ending the use directive that contains that pragma.

If an implementation does not recognize a pragma identifier, then if the PragmaItem ends with a ? then that PragmaItem is ignored; if the PragmaItem does not end with a ? then an error occurs.

Strict Mode

Many parts of ECMAScript 4 are relaxed or unduly convoluted due to compatibility requirements with ECMAScript 3. Strict mode sacrifices some of this compatibility for simplicity and additional error checking. Strict mode is intended to be used in newly written ECMAScript 4 programs, although existing ECMAScript 3 programs may be retrofitted.

The opposite of strict mode is nonstrict mode, which is the default. A program can readily mix strict and nonstrict portions.

Strict mode has the following effects:

  • Line-break semicolon insertion is turned off. (Grammatical semicolon insertion remains turned on.)
  • [no line break] restrictions in grammar productions are ignored. Line breaks can be placed anywhere between input tokens.
  • Variables must be declared.
  • Definition scopes are not hoisted.
  • var and function declarations without attributes or types are initialized at the beginning of a scope.
  • FunctionDefinitions define constants rather than variables.
  • Calls to functions defined under strict mode are checked for the correct number of arguments except in functions that explicitly allow a variable number of arguments. (The mode of the call site does not matter.)
  • Implementations may choose to disable other compatibility extensions such as support for octal literals. These are not officially part of ECMAScript 4 but most implementations support these in nonstrict mode for compatibility with older programs.

An implementation does not have to implement strict mode; however, implementations are encouraged to do so.

See also the rationale.


ECMAScript 4 Netscape Proposal
Libraries
previousupnext

Monday, December 11, 2000


This chapter presents the libraries that accompany the core language.

For the time being, only the libraries new to ECMAScript 4 are described. The basic libraries such as String, Array, etc. carry over from ECMAScript 3.


ECMAScript 4 Netscape Proposal
Libraries
Types
previousupnext

Wednesday, June 4, 2003

Predefined Types

The following types are predefined in ECMAScript 4:

Type Set of Values
Never No values
Void undefined
Null null
Boolean   true and false
Integer Double-precision IEEE floating-point numbers that are mathematical integers, including +0.0, –0.0, +, –, and NaN
Number Double-precision IEEE floating-point numbers, including +0.0, –0.0, +, –, and NaN
char  Single 16-bit unicode characters
String Immutable strings of unicode characters, including null
Function All functions, including null
Array null as well as all arrays
Type All types, including null
Object All values, including null and undefined

Unlike in ECMAScript 3, there is no distinction between objects and primitive values. All values can have methods. Values of some classes are sealed, which disallows addition of dynamic properties. User-defined classes can be made to behave like primitives by using the class modifier final.

The above type names are not reserved words. They can be used as names of local variables or class members. However, they are defined as constants in the global scope, so a package cannot use them to name global variables.

Object is the supertype of all types. Never is the subtype of all types. Never is useful to describe the return type of a function that cannot return normally because it either falls into an infinite loop or always throws an exception. Never cannot be used as the type of a variable or parameter. Void is useful to describe the return type of a function that can return but that does not produce a useful value. See rationale.

A literal number is a member of the type Number; if that literal has an integral value, then it is also a member of type Integer. A literal string is a member of the type String. There are no literals of type char; a char value can be constructed by an explicit or implicit conversion.

An object created with the expression new f where f is a function has the type Object.

User-Defined Types

Any class defined using the class declaration is also a type that denotes the set of all of its and its descendants’ instances. These include the predefined classes, so Object, Date, etc. are all types. null is an instance of a user-defined class. undefined is never an instance of a user-defined class.

Meaning of Types

Types are generally used to restrict the set of objects that can be held in a variable or passed as a function argument. For example, the declaration

var x:Integer;

restricts the values that can be held in variable x to be integers.

A type declaration does not affect the semantics of reading the variable or accessing one of its properties. Thus, as long as expression new MyType() returns a value of type MyType, the following two code snippets are equivalent:

var x:MyType = new MyType();
x.foo();
var x = new MyType();
x.foo();

This equivalence always holds, even if these snippets are inside the declaration of class MyType and foo is a private field of that class. As a corollary, adding true type annotations does not change the meaning of a program.

Type Expressions

The language cannot syntactically distinguish type expressions from value expressions, so a type expression can be any compile-time constant expression that evaluates to a type.

A type is also a value (whose type is Type) and can be used in expressions, assigned to variables, passed to functions, etc. For example, the code

const R:Type = Number;
function abs_val(x:R):R {
  return x<0 ? -x : x;
}

is equivalent to:

function abs_val(x:Number):Number {
  return x<0 ? -x : x;
}

Implicit Coercions

Implicit coercions can take place in the following situations:

  • Assigning a value v to a variable of type t
  • Declaring an uninitialized variable of type t, in which case undefined is implicitly coerced to type t
  • Passing an argument v to a function whose corresponding parameter has type t
  • Returning a result v from a function declared to return a value of type t

In any of these cases, if v t, then v is passed unchanged. If v t, then if t defines an implicit mapping for value v then that mapped v is used; otherwise an error occurs.

Explicit Coercions

An explicit coercion performs more aggressive transformations than an implicit coercion. To invoke an explicit coercion, use the type as a function, passing it the value as an argument:

type(value)

For example, Integer(258.1) returns the integer 258, and String(2+2==4) returns the string "true".

If value is already a member of type, the explicit coercion returns value unchanged. If value can be implicitly coerced to type, the explicit coercion returns the result of the implicit coercion. Otherwise, the explicit coercion uses type’s explicit mapping.


ECMAScript 4 Netscape Proposal
Libraries
Machine Types
previousupnext

Tuesday, March 4, 2003

Purpose

Machine types are low-level numeric types for use in ECMAScript 4 programs. These types provide Java-style integer operations that are useful for communicating between ECMAScript 4 and other programming languages. These types are not intended to replace Number and Integer for general-purpose scripting.

Contents

The following low-level numeric types are available:

Type Suffix  Values
sbyte   Integer values between –128 and 127 inclusive, excluding –0.0
byte   Integer values between 0 and 255 inclusive, excluding –0.0
short   Integer values between –32768 and 32767 inclusive, excluding –0.0
ushort    Integer values between 0 and 65535 inclusive, excluding –0.0
int   Integer values between –2147483648 and 2147483647 inclusive, excluding –0.0
uint   Integer values between 0 and 4294967295 inclusive, excluding –0.0
long L Long integer values between –9223372036854775808 and 9223372036854775807 inclusive
ulong UL Long integer values between 0 and 18446744073709551615 inclusive
float F Single-precision IEEE floating-point numbers, including positive and negative zeroes, infinities, and NaN

The above type names are not reserved words.

8, 16, and 32-bit Integers

The first six types sbyte, byte, short, ushort, int, and uint are all proper subtypes of Integer, which is itself a subtype of Number. A particular number is a member of multiple types. For example, 3.0 is a member of sbyte, byte, short, ushort, int, uint, Integer, Number, and Object, while –2000.0 is a member of short, int, Integer, Number, and Object. ECMAScript does not distinguish between the literals 3 and 3.0 in any way.

All arithmetic operations and comparisons on sbyte, byte, short, ushort, int, and uint values treat them just like they would any other Number values — the operations are performed using full IEEE double-precision arithmetic.

Implicit Coercions

There are no predefined implicit coercions from values of type sbyte, byte, short, ushort, int, or uint other than the coercions predefined on the type Number. The following predefined implicit coercions are applicable when the destination type is sbyte, byte, short, ushort, int, or uint:

  • undefined +0.0
  • –0.0 +0.0
  • long and ulong values within range of the destination type T are converted to equivalent values of type T
  • finite integral float values within range of the destination type T are converted to equivalent values of type T

Note that there are no implicit coercions from +, –, or NaN to sbyte, byte, short, ushort, int, or uint.

Explicit Coercions

There are no predefined explicit coercions from values of type sbyte, byte, short, ushort, int, or uint other than the coercions predefined on the type Number. The predefined explicit coercions below are applicable when the destination type T is sbyte, byte, short, ushort, int, or uint. The notation |T| represents the range of the type T, where |sbyte| = |byte| = 256, |short| = |ushort| = 65536, and |int| = |uint| = 232.

  • undefined +0.0
  • A long or ulong value x is converted to the one value y of type T that satisfies x = y (mod |T|)
  • float values are first converted to equivalent Number values and then converted as below
  • A Number value is first converted to an Integer value x by truncating towards zero if necessary. Then, if x is –0.0, +, –, or NaN, it is converted to +0.0; otherwise, x is converted to the one value y of type T that satisfies x = y (mod |T|)

64-bit Integers

The types long and ulong represent signed and unsigned 64-bit integers. long and ulong literals are written with the suffix L or UL and no exponent or decimal point. Literal values of type long are written as –9223372036854775808L through 9223372036854775807L. Literal values of type ulong are written as 0UL through 18446744073709551615UL.

The types long and ulong are disjoint from Number, so 5L and 5 are different objects, although they compare == and === to each other. 5L and 5UL are also different objects, although they compare == and === to each other.

Negation, addition, subtraction, and multiplication, and modulo (%) on long and ulong values is exact, and long and ulong values may be mixed in an expression. There are five possible cases depending on the mathematical result x:

  • If –9223372036854775808  x  –1, then the result has type long.
  • If 0  x  9223372036854775807, then the result has type ulong if at least one operand has type ulong; otherwise, the result has type long.
  • If 9223372036854775808  x  18446744073709551615, then the result has type ulong.
  • Otherwise, the result is the closest representable Number using the IEEE round-to-nearest mode.

Division involving two long or ulong operands returns the most precise quotient available from among the possible long, ulong, and Number values. In some cases the quotient will be a long or ulong; in other cases the quotient will be a Number. See the semantics for the details.

Division and modulo on long and ulong values can produce the Number values positive or negative infinity or NaN when the divisor is zero.

Addition, subtraction, multiplication, division, and modulo mixing a long or ulong operand with a Number (or any subtype of Number) or float operand first checks whether the Number or float operand is an exact integer (including either +0.0 or –0.0 but not infinities or NaN). If it is, then the computation uses the integral semantics above. If not, then the long or ulong operand is coerced to a Number and the operation is done using Number arithmetic.

The bitwise operations &, |, and ^ are 64 bits wide if at least one operand is a long or ulong, in which case the other operand is truncated to an integer and treated modulo 264 if necessary. The result is a ulong if at least one operand is a ulong; otherwise, the result is a long.

The bitwise shifts <<, >>, and >>> are 64 bits wide if the first operand is a long or ulong. The result is a ulong if the first operand is a ulong; otherwise, the result is a long. >> copies the most significant bit and >>> shifts in zero bits regardless of whether the first operand is a long or ulong.

Comparisons mixing a long or ulong operand with a Number (or any subtype of Number) or float operand compare exact mathematical values without any coercions.

Implicit Coercions

The following predefined implicit coercions are applicable when the destination type is long:

  • undefined 0L
  • ulong values between 0UL and 9223372036854775807UL are converted to equivalent long values
  • Finite Integer values between –9223372036854775808 and 9223372036854775807 are converted to equivalent long values
  • Finite integral float values between –9223372036854775808F and 9223372036854775807F are converted to equivalent long values

The following predefined implicit coercions are applicable when the destination type is ulong:

  • undefined 0UL
  • long values between 0L and 9223372036854775807L are converted to equivalent ulong values
  • Finite Integer values between –0.0 and 18446744073709551615 are converted to equivalent ulong values
  • Finite integral float values between –0.0F and 18446744073709551615F are converted to equivalent ulong values

Note that there are no implicit coercions from NaN or positive or negative infinity to long or ulong.

A long or ulong value can be implicitly coerced to type Number, Integer, or float. The result is the closest representable Number or float value using the same rounding as when a string is converted to a number. If the source is 0L or 0UL then the result is +0.0 or +0.0F.

Explicit Coercions

The predefined explicit coercions below are applicable when the destination type T is long or ulong.

  • undefined 0L or 0UL
  • A long or ulong value x is converted to the one value y of type T that satisfies x = y (mod 264)
  • float values are first converted to equivalent Number values and then converted as below
  • A Number value is first converted to an Integer value x by truncating towards zero if necessary. Then, if x is –0.0, +, –, or NaN, it is converted to 0L or 0UL; otherwise, x is converted to the one value y of type T that satisfies x = y (mod 264).

A long or ulong value x can be explicitly coerced to type Number, Integer, float or String. Explicit coercions to Number, Integer, float are the same as the implicit coercions. Explicit coercions to type String produce the x as a string of decimal digits. Negative values have a minus sign prepended. Zero produces the string "0"; all other values produce strings starting with a non-zero digit.

Single-Precision Floats

The type float represents single-precision IEEE floating-point numbers. float literals are written with the suffix F. float infinities and NaN are separate from Number infinities and NaN.

The type float is disjoint from Number, so 5F and 5 are different objects, although they compare == to each other.

Negating a float value returns a float value. All other arithmetic first converts the float value to the corresponding Number value. The bitwise operations &, |, ^, <<, >>, and >>> coerce any float operands to type Number before proceeding.

Implicit Coercions

The following predefined implicit coercions are applicable when the destination type is float:

  • undefined float(NaN)
  • Number values (including NaN and the infinities) are converted to the closest representable float values using the IEEE round-to-nearest mode
  • long and ulong values are converted to the closest representable float values (excluding –0.0F)

A float value can be implicitly coerced to type Number. The result is the equivalent Number value.


ECMAScript 4 Netscape Proposal
Formal Description
previousupnext

Wednesday, August 14, 2002


This chapter presents the formal syntax and semantics of ECMAScript 4. The syntax notation and semantic notation sections explain the notation used for this description. A simple metalanguage based on a typed lambda calculus is used to specify the semantics.

The syntax and semantic sections are available in both HTML 4.0 and Microsoft Word RTF formats. In the HTML versions each use of a grammar nonterminal or metalanguage value, type, or field is hyperlinked to its definition, making the HTML version preferred for browsing. On the other hand, the RTF version looks much better when printed. The fonts, colors, and other formatting of the various grammar and semantic elements are all encoded as CSS (in HTML) or Word (in RTF) styles and can be altered if desired.

The syntax and semantics sections are machine-generated from code supplied to a small engine that can type-check and execute the semantics directly. This engine is in the CVS tree at mozilla/js2/semantics; the input files are at mozilla/js2/semantics/JS20.


ECMAScript 4 Netscape Proposal
Formal Description
Semantic Notation
previousupnext

Friday, June 13, 2003

The semantics of ECMAScript 4 are written in Algol-like pseudocode. The following sections define the notation used to write the semantics’ concepts, expressions, procedures, and actions.

Operators

The table below summarizes operators used in expressions in this document. The operators are listed in groups in order from the highest precedence (tightest-binding) to the lowest precedence (loosest-binding). Other than the relational operators, operators in the same group have the same precedence and associate left-to-right, so, for example, 7–3+2–1 means ((7–3)+2)–1 instead of 7–(3+(2–1)) or (7–(3+2))–1. As is traditional in mathematics, the relational operators cascade, so

a = b  c < d

means

a = b and b  c and c < d

Parentheses are used to override precedences or clarify expressions.

Expressions used in describing algorithms may perform side effects. Except for and, or, and ?:, the operators compute all of their operands left-to-right, and if computation of any operand throws an exception e, then the operator immediately propagates e without computing any following operands.

Group Operator Description
Nonassociative (x) Return x. Parentheses are used to override operator precedence.
{x1x2, ... , xn} Set or semantic domain with the elements x1x2, ... , xn
{f(x) | x  A}
{f(x) | x  A such that predicate(x)}
Set comprehension
[x0x1, ... , xn–1] List with the elements x0x1, ... , xn–1
[f(x| x  u]
[f(x| x  u such that predicate(x)]
List comprehension
Namelabel1x1, ... , labelnxn
Name
Tuple constructor
|x| Absolute value of a number x, cardinality of a set x, or length of a list x
x Floor of x
x Ceiling of x
Action[nonterminali] This notation is used inside an action for a grammar production that has nonterminal nonterminal on the production’s left or right side. Return the value of action Action invoked on the ith instance of nonterminal nonterminal on the left or right side of . The subscript i can be omitted if it is 1 and there is only one instance of nonterminal nonterminal on ’s right side.
nonterminali This notation is used inside an action for a grammar production that has nonterminal nonterminal on the production’s left or right side. Furthermore, every complete expansion of grammar nonterminal nonterminal expands into a single character.
Return the character to which the ith instance of nonterminal nonterminal on the right side of expands. The subscript i can be omitted if there is only one instance of nonterminal nonterminal in . If the subscript is omitted and nonterminal nonterminal appears on the left side of , then this expression returns the single character to which this whole production expands.
Suffix ilong Convert integer i to a Long
iulong Convert integer i to a ULong
xf32 Convert real number x to the “closest” Float32 value by calling realToFloat32(x)
xf64 Convert real number x to the “closest” Float64 value by calling realToFloat64(x)
xy x raised to the yth power
u[i] ith element of list u
u[i ... j]
u[i ...]
Slice of list u
u[i \ x]   List element substitution
a.label Field named label of tuple or record a
T{} Semantic domain of all sets whose elements are members of semantic domain T
T[] Semantic domain of all lists whose elements are members of semantic domain T
f(x1, ..., xn) Procedure call
Prefix new Namelabel1x1, ... , labelnxn Record constructor
x Real number negation
min A Smallest element of a set
max A Largest element of a set
Factor x  y Real number product
x / y Real number quotient (y must not be zero)
x mod y Real number remainder (y must not be zero)
A  B Set intersection
T1  T2  ...  Tn  T
()  T
T1  T2  ...  Tn  ()
()  ()
Semantic domain of procedures
Term x + y Real number addition
x – y Real number subtraction or set difference
u  v List concatenation
A  B Set union
Relational x = y
x  y
Equality and inequality predicates on tags, real numbers, sets, booleans, characters, lists, strings, tuples, and records. Values of differing kinds (such as the boolean true and the character ‘A’) are always considered unequal.
x < y
x  y
x > y
x  y
Order predicates on real numbers, characters, and strings.
x  A
x  A
Set membership predicates
A  B Subset predicate
A  B Proper subset predicate
Negation not a Logical negation
Conjunction a and b Short-circuiting logical conjunction
Disjunction a or b Short-circuiting logical disjunction
a xor b Logical exclusive or
Conditional a ? x : y Conditional
some x  A satisfies predicate(x)
every x  A satisfies predicate(x)
Set or list quantifiers

Semantic Domains

Semantic domains describe the possible values that a variable might take on in an algorithm. The algorithms are constructed in a way that ensures that these constraints are always met, regardless of any valid or invalid programmer or user input or actions.

A semantic domain can be intuitively thought of as a set of possible values, and, in fact, any set of values explicitly described in this document is also a semantic domain. Nevertheless, semantic domains have a more precise mathematical definition in domain theory (see for example [Schmidt86]) that allows one to define semantic domains recursively without encountering paradoxes such as trying to define a set A whose members include all functions mapping values from A to Integer. The problem with an ordinary definition of such a set A is that the cardinality of the set of all functions mapping A to Integer is always strictly greater than the cardinality of A, leading to a contradiction. Domain theory uses a least fixed point construction to allow A to be defined as a semantic domain without encountering problems.

Semantic domains have names in Capitalized Red Small Caps. Such a name is to be considered distinct from a tag or regular variable with the same name, so Undefined, undefined, and undefined are three different and independent entities.

A variable v is constrained using the notation

vT

where T is a semantic domain. This constraint indicates that the value of v will always be a member of the semantic domain T. These declarations are informative (they may be dropped without affecting the semantics’ correctness) but useful in understanding the semantics. For example, when the semantics state that xInteger then one does not have to worry about what happens when x has the value true or +f64.

The constraints can be proven statically. The ECMAScript semantics have been machine-checked to ensure that every constraint holds.

Tags

Tags are computational tokens with no internal structure. Tags are written using a dark red font. Two tags are equal if and only if they have the same name.

Each tag that does not also name a tuple or a record is defined using the notation:

tag name;

In the HTML version of the semantics, each use of a tag’s name is linked back to its definition.

Booleans

The tags true and false represent booleans. Boolean is the two-element semantic domain {truefalse}.

Let a and b be booleans and x and y any values. In addition to = and , the following operations can be done on them:

Notation   Description
not a true if a is false; false if a is true
a and b If a is false, return false without computing b; if a is true, return the value of b
a or b If a is false, return the value of b; if a is true, return true without computing b
a xor b true if a is true and b is false or a is false and b is true; false otherwise. a xor b is equivalent to a  b
a ? x : y If a is true, compute and return the value of x; if a is false, compute and return the value of y

Note that the and, or, and ?: operators short-circuit. These are the only operators that do not always compute all of their operands.

Sets

A set is an unordered, possibly infinite collection of elements. Each element may occur at most once in a set. There must be an equivalence relation = defined on all pairs of the set’s elements. Elements of a set may themselves be sets.

A set is denoted by enclosing a comma-separated list of values inside braces:

{element1element2, ... , elementn}

The empty set is written as {}. Any duplicate elements are included only once in the set.

For example, the set {3, 0, 10, 11, 12, 13, –5} contains seven integers.

Sets of either integers or characters can be abbreviated using the ... range operator, which generates inclusive ranges of integers or character code points. For example, the above set can also be written as {0, –5, 3 ... 3, 10 ... 13}.

If the beginning of the range is equal to the end of the range, then the range consists of only one element: {7 ... 7} is the same as {7}. If the end of the range is one less than the beginning, then the range contains no elements: {7 ... 6} is the same as {}. The end of the range is never more than one less than the beginning.

A set can also be written using the set comprehension notation

{f(x) | x  A}

which denotes the set of the results of computing expression f on all elements x of set A. A predicate can be added:

{f(x) | x  A such that predicate(x)}

denotes the set of the results of computing expression f on all elements x of set A that satisfy the predicate expression. There can also be more than one free variable x and set A, in which case all combinations of free variables’ values are considered. For example,

{x | x  Integer such that x2 < 10} = {–3, –2, –1, 0, 1, 2, 3};
{x2 | x  {–5, –1, 1, 2, 4}} = {1, 4, 16, 25};
{x10 + y | x  {1, 2, 4}, y  {3, 5}} = {13, 15, 23, 25, 43, 45}.

The same notation is used for operations on sets and on semantic domains. Let A and B be sets (or semantic domains) and x and y be values. The following operations can be done on them:

Notation   Description
x  A true if x is an element of A and false if not
x  A false if x is an element of A and true if not
|A| The number of elements in A (only used on finite sets)
min A The value m that satisfies both m  A and for all elements x  A, x  m (only used on nonempty, finite sets whose elements have a well-defined order relation)
max A The value m that satisfies both m  A and for all elements x  A, x  m (only used on nonempty, finite sets whose elements have a well-defined order relation)
A  B The intersection of A and B (the set or semantic domain of all values that are present both in A and in B)
A  B The union of A and B (the set or semantic domain of all values that are present in at least one of A or B)
A – B The difference of A and B (the set or semantic domain of all values that are present in A but not B)
A = B true if A and B are equal and false otherwise. A and B are equal if every element of A is also in B and every element of B is also in A.
A  B false if A and B are equal and true otherwise
A  B true if A is a subset of B and false otherwise. A is a subset of B if every element of A is also in B. Every set is a subset of itself. The empty set {} is a subset of every set.
A  B true if A is a proper subset of B and false otherwise. A  B is equivalent to A  B and A  B.

If T is a semantic domain, then T{} is the semantic domain of all sets whose elements are members of T. For example, if T = {1,2,3}, then T{} = {{}, {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}}. The empty set {} is a member of T{} for any semantic domain T.

In addition to the above, the some and every quantifiers can be used on sets (see also lists). The quantifier

some x  A satisfies predicate(x)

returns true if there exists at least one element x in set A such that predicate(x) computes to true. If there is no such element x, then the some quantifier’s result is false. If the some quantifier returns true, then variable x is left bound to any element of A for which predicate(x) computes to true; if there is more than one such element x, then one of them is chosen arbitrarily. For example,

some x  {3, 16, 19, 26} satisfies x mod 10 = 6

evaluates to true and leaves x set to either 16 or 26. Other examples include:

(some x  {3, 16, 19, 26} satisfies x mod 10 = 7) = false;
(some x  {} satisfies x mod 10 = 7) = false;
(some x  {“Hello”} satisfies true) = true and leaves x set to the string “Hello”;
(some x  {} satisfies true) = false.

The quantifier

every x  A satisfies predicate(x)

returns true if there exists no element x in set A such that predicate(x) computes to false. If there is at least one such element x, then the every quantifier’s result is false. As a degenerate case, the every quantifier is always true if the set A is empty. For example,

(every x  {3, 16, 19, 26} satisfies x mod 10 = 6) = false;
(every x  {6, 26, 96, 106} satisfies x mod 10 = 6) = true;
(every x  {} satisfies x mod 10 = 6) = true.

Real Numbers

Numbers written in plain font are exact mathematical real numbers. Numbers can be written with or without a decimal point. Integers preceded with “0x” are hexadecimal (base 16). 4294967296, 4294967296.000, 0x100000000, and 232 are all the same number. 0.1 is the exact value 1/10.

Integer is the semantic domain of all integers {... –3, –2, –1, 0, 1, 2, 3 ...}. 3.0, 3, 0xFF, and –10100 are all integers.

Rational is the semantic domain of all rational numbers. Every integer is also a rational number: Integer  Rational. 3, 1/3, 7.5, –12/7, and 2–5 are examples of rational numbers.

Real is the semantic domain of all real numbers. Every rational number is also a real number: Rational  Real. is an example of a real number slightly larger than 3.14.

Let x and y be real numbers. The following operations can be done on them and always produce exact results:

Notation   Description
x Negation
|x| Absolute value
x + y Sum
x – y Difference
x  y Product
x / y Quotient (y must not be zero)
xy x raised to the yth power (used only when either x0 and y is an integer or x is any number and y>0)
x Floor of x, which is the unique integer i such that i  x < i+1.  = 3, –3.5 = –4, and 7 = 7.
x Ceiling of x, which is the unique integer i such that i–1 < x  i.  = 4, –3.5 = –3, and 7 = 7.
x mod y x modulo y, which is defined as x – yx/y. y must not be zero. 10 mod 7 = 3, and –1 mod 7 = 6.
log10(x) The exact base-10 logarithm of x (x will always be greater than zero)

Real numbers can be compared using =, , <, , >, and . The result is either true or false.

Bitwise Integer Operators

The four procedures below perform bitwise operations on integers. The integers are treated as though they were written in infinite-precision two’s complement binary notation, with each 1 bit representing true and 0 bit representing false.

More precisely, any integer x can be represented as an infinite sequence of bits ai where the index i ranges over the nonnegative integers and every ai  {0, 1}. The sequence is traditionally written in reverse order:

..., a4, a3, a2, a1, a0

The unique sequence corresponding to an integer x is generated by the formula

ai = x / 2i mod 2

If x is zero or positive, then its sequence will have infinitely many consecutive leading 0’s, while a negative integer x will generate a sequence with infinitely many consecutive leading 1’s. For example, 6 generates the sequence ...0...0000110, while –6 generates ...1...1111010.

The logical and, or, and xor operations below operate on corresponding elements of the sequences ai and bi generated by the two parameters x and y. The result is another infinite sequence of bits ci. The result of the operation is the unique integer z that generates the sequence ci. For example, anding corresponding elements of the sequences generated by 6 and –6 yields the sequence ...0...0000010, which is the sequence generated by the integer 2. Thus, bitwiseAnd(6, –6) = 2.

Procedure   Description
bitwiseAnd(xIntegeryInteger): Integer The bitwise and of x and y
bitwiseOr(xIntegeryInteger): Integer The bitwise or of x and y
bitwiseXor(xIntegeryInteger): Integer The bitwise xor of x and y
bitwiseShift(xIntegercountInteger): Integer Shift x to the left by count bits. If count is negative, shift x to the right by –count bits. Bits shifted out of the right end are lost; bit shifted in at the right end are zero. bitwiseShift(xcount) is exactly equivalent to x  2count.

Characters

Characters enclosed in single quotes ‘ and ’ represent Unicode characters with code points ranging from 0000 to 10FFFF hexadecimal. Even though Unicode does not define characters for some of these code points, in this specification any of these 1114112 code points is considered to be a valid character. Examples of characters include ‘A’, ‘b’, ‘«LF»’, ‘«uFFFF»’, ‘«U00010000»’ and , ‘«U0010FFFF»’ (see also the notation for non-ASCII characters).

Unicode has the notion of code points, which are numerical indices of characters in the Unicode character table, as well as code units, which are numerical values for storing characters in a particular representation. ECMAScript is designed to make it appear that strings are represented in the UTF-16 representation, which means that a code unit is a 16-bit value (an implementation may store strings in other formats such as UTF-8, but it must make it appear for indexing and character extraction purposes as if strings were sequences of 16-bit code units). For convenience this specification does not distinguish between code units and code points in the range from 0000 to FFFF hexadecimal.

Char16 is the semantic domain of the 65536 Unicode characters in the set {‘«u0000»’ ... ‘«uFFFF»’}. These characters form Unicode’s Basic Multilingual Plane. These characters have code points between 0000 and FFFF hexadecimal. Code units are also represented by values in the Char16 semantic domain.

SupplementaryChar is the semantic domain of the 1048576 Unicode characters in the set {‘«U00010000»’ ... ‘«U0010FFFF»’}. These are Unicode’s supplementary characters with code points between 10000 and 10FFFF hexadecimal. Since these characters are not members of the Char16 domain, they cannot be stored directly in strings of Char16 code units. Instead, whereever necessary the semantic algorithms convert supplementary characters into pairs of surrogate code units before storing them into strings. The first surrogate code unit h is in the set {‘«uD800»’ ... ‘«uDBFF»’} and the second surrogate code unit l is in the set {‘«uDC00»’ ... ‘«uDFFF»’}; together they encode the supplementary character with the code point value 0x10000 + (char16ToInteger(h) – 0xD800)0x400 + char16ToInteger(l) – 0xDC00.

Char21 is the semantic domain of all 1114112 Unicode characters {‘«u0000»’ ... ‘«U0010FFFF»’}.

Char21 = Char16  SupplementaryChar

Characters can be compared using =, , <, , >, and . These operators compare code point values, so ‘A’ = ‘A’, ‘A’ < ‘B’, ‘A’ < ‘a’, and ‘«uFFFF»’ < ‘«U00010000»’ are all true.

Character Conversions

The following procedures convert between characters and integers:

Procedure   Description
char16ToInteger(cChar16): {0 ... 0xFFFF} The number of character c’s Unicode code point or code unit
char21ToInteger(cChar21): {0 ... 0x10FFFF} The number of character c’s Unicode code point
integerToChar16(i: {0 ... 0xFFFF}): Char16 The character whose Unicode code point or code unit number is i
integerToSupplementaryChar(i: {0x10000 ... 0x10FFFF}): SupplementaryChar The character whose Unicode code point number is i
integerToChar21(i: {0 ... 0x10FFFF}): Char21 The character whose Unicode code point number is i

The procedure digitValue is defined as follows:

proc digitValue(c: {‘0’ ... ‘9’, ‘A’ ... ‘Z’, ‘a’ ... ‘z’}): {0 ... 35}
case c of
{‘0’ ... ‘9’} do return char16ToInteger(c) – char16ToInteger(‘0’);
{‘A’ ... ‘Z’} do return char16ToInteger(c) – char16ToInteger(‘A’) + 10;
{‘a’ ... ‘z’} do return char16ToInteger(c) – char16ToInteger(‘a’) + 10
end case
end proc;

Lists

A finite ordered list of zero or more elements is written by listing the elements inside bold brackets:

[element0element1, ... , elementn–1]

For example, the following list contains four strings:

[parsley”, “sage”, “rosemary”, “thyme]

The empty list is written as [].

Unlike a set, the elements of a list are indexed by integers starting from 0. A list can contain duplicate elements.

A list can also be written using the list comprehension notation

[f(x| x  u]

which denotes the list [f(u[0]), f(u[1]), ... , f(u[|u|–1])] whose elements consist of the results of applying expression f to each corresponding element of list u. x is the name of the parameter in expression f. A predicate can be added:

[f(x| x  u such that predicate(x)]

denotes the list of the results of computing expression f on all elements x of list u that satisfy the predicate expression. The results are listed in the same order as the elements x of list u. For example,

[x2 | x  [–1, 1, 2, 3, 4, 2, 5]] = [1, 1, 4, 9, 16, 4, 25]
[x+1 | x  [–1, 1, 2, 3, 4, 5, 3, 10] such that x mod 2 = 1] = [0, 2, 4, 6, 4]

Let u = [e0e1, ... , en–1] and v = [f0f1, ... , fm–1] be lists, e be an element, i and j be integers, and x be a value. The operations below can be done on lists. The operations are meaningful only when their preconditions are met; the semantics never use the operations below without meeting their preconditions.

Notation   Precondition Description
|u| The length n of the list
u[i]  i < |u| The ith element ei.
u[i ... j]  i  j+1  |u| The list slice [eiei+1, ... , ej] consisting of all elements of u between the ith and the jth, inclusive. The result is the empty list [] if j=i–1.
u[i ...]  i  |u| The list slice [eiei+1, ... , en–1] consisting of all elements of u between the ith and the end. The result is the empty list [] if i=n.
u[i \ x]    i < |u| The list [e0, ... , ei–1xei+1, ... , en–1] with the ith element replaced by the value x and the other elements unchanged
u  v The concatenated list [e0e1, ... , en–1f0f1, ... , fm–1]
repeat(ei)  0 The list [ee, ... , e] of length i containing i identical elements e
u = v true if the lists u and v are equal and false otherwise. Lists u and v are equal if they have the same length and all of their corresponding elements are equal.
u  v false if the lists u and v are equal and true otherwise.

Lists are functional — there is no notation for modifying a list in place.

If T is a semantic domain, then T[] is the semantic domain of all lists whose elements are members of T. The empty list [] is a member of T[] for any semantic domain T.

In addition to the above, the some and every quantifiers can be used on lists just as on sets:

some x  u satisfies predicate(x)
every x  u satisfies predicate(x)

These quantifiers’ behavior on lists is analogous to that on sets, except that, if the some quantifier returns true then it leaves variable x set to the first element of list u that satisfies condition predicate(x). For example,

some x  [3, 36, 19, 26] satisfies x mod 10 = 6

evaluates to true and leaves x set to 36.

Strings

A list of Char16 code units is called a string. In addition to the normal list notation, for notational convenience a string can also be written as zero or more characters enclosed in double quotes (see also the notation for non-ASCII characters). Thus,

Wonder«LF»

is equivalent to:

[W’, ‘o’, ‘n’, ‘d’, ‘e’, ‘r’, ‘«LF»]

The empty string is written as “”.

A string holds code units, not code points. Supplementary Unicode characters are represented as pairs of surrogate code units when stored in strings.

In addition to all of the other list operations, <, , >, and are defined on strings. A string x is less than string y when y is not the empty string and either x is the empty string, the first code unit of x is less than the first code unit of y, or the first code unit of x is equal to the first code unit of y and the rest of string x is less than the rest of string y.

Note that these relations compare code units, not code points, which can produce unexpected effects if a string contains supplementary characters expanded into a pairs of surrogates. For example, even though ‘«uFFFF»’ < ‘«U00010000»’, the supplementary character ‘«U00010000»’ is represented in a string as “«uD800»«uDC00»”, and, by the above rules, “«uFFFF»” > “«uD800»«uDC00»”.

String is the semantic domain of all strings. String = Char16[].

Tuples

A tuple is an immutable aggregate of values comprised of a name and zero or more labeled fields.

The pseudocode defines each tuple and describes its fields. A tuple definition has the form

tuple Name
label1T1,
... ,
labelnTn
end tuple;

and defines tuples with name Name to have n fields with semantic domains T1 through Tn respectively. In the HTML version of the semantics, each use of a tuple’s Name is linked back to its definition.

After Name is defined, the notation

Namelabel1v1, ... , labelnvn

represents a tuple with name Name and values v1 through vn for fields labeled label1 through labeln respectively. Each value vi is a member of the corresponding semantic domain Ti. When most of the fields are copied from an existing tuple a, this notation can be abbreviated as

Namelabeli1vi1, ... , labelikvik, other fields from a

which represents a tuple with name Name and values vi1 through vik for fields labeled labeli1 through labelik respectively and the values of correspondingly labeled fields from a for all other fields.

If a is the tuple Namelabel1v1, ... , labelnvn, then

a.labeli

returns the ith field’s value vi. Tuples are functional — there is no notation for modifying a tuple in place. In the HTML version of the semantics, each use of labeli is linked back to a’s type.

The equality operators = and may be used to compare tuples. Tuples are equal when they have the same name and their corresponding fields’ values are equal.

The notation

Name

represents the semantic domain of all tuples with name Name.

Records

A record is a mutable aggregate of values similar to a tuple but with different equality behavior.

A record is comprised of a name and an address. The address points to a mutable data structure comprised of zero or more labeled fields. The address acts as the record’s serial number — every record allocated by new (see below) gets a different address, including records created by identical expressions or even the same expression used twice.

The pseudocode defines each record and describes its fields. A record definition has the form

record Name
label1T1,
... ,
labelnTn
end record;

and defines records with name Name to have n fields with semantic domains T1 through Tn respectively. In the HTML version of the semantics, each use of a record’s Name is linked back to its definition.

After Name is defined, the expression

new Namelabel1v1, ... , labelnvn

creates a record with name Name and a new address . The fields labeled label1 through labeln at address are initialized with values v1 through vn respectively. Each value vi is a member of the corresponding semantic domain Ti. A labelkvk pair may be omitted from a new expression, which indicates that the initial value of field labelk does not matter because the semantics will always explicitly write a value into that field before reading it.

When most of the fields are copied from an existing record a, the new expression can be abbreviated as

new Namelabeli1vi1, ... , labelikvik, other fields from a

which represents a record b with name Name and a new address . The fields labeled labeli1 through labelik at address are initialized with values vi1 through vik respectively; the other fields at address are initialized with the values of correspondingly labeled fields from a’s address.

If a is a record with name Name and address , then

a.labeli

returns the current value v of the ith field at address . That field may be set to a new value w, which must be a member of the semantic domain Ti, using the assignment

a.labeli  w

after which a.labeli will evaluate to w. Any record with a different address is unaffected by the assignment. In the HTML version of the semantics, each use of labeli is linked back to a’s type.

The equality operators = and may be used to compare records. Records are equal if and only if they have the same address.

The notation

Name

represents the infinite semantic domain of all records that have name Name and all addresses.

ECMAScript Numeric Types

ECMAScript does not support exact real numbers as one of the programmer-visible data types. Instead, ECMAScript numbers have finite range and precision. The semantic domain of all programmer-visible numbers representable in ECMAScript is GeneralNumber, defined as the union of four basic numeric semantic domains Long, ULong, Float32, and Float64:

GeneralNumber = Long  ULong  Float32  Float64

The four basic numeric semantic domains are all disjoint from each other and from the semantic domains Integer, Rational, and Real.

The semantic domain FiniteGeneralNumber is the subtype of all finite values in GeneralNumber:

FiniteGeneralNumber = Long  ULong  FiniteFloat32  FiniteFloat64

Signed Long Integers

Programmer-visible signed 64-bit long integers are represented by the semantic domain Long. These are wrapped in a tuple to keep them disjoint from members of the semantic domains ULong, Float32, and Float64.

tuple Long
value: {–263 ... 263 – 1}
end tuple

Shorthand Notation

In this specification, when i is an integer between –263 and 263 – 1, the notation ilong indicates the result of Longvaluei, which is the integer i wrapped in a Long tuple.

Unsigned Long Integers

Programmer-visible unsigned 64-bit long integers are represented by the semantic domain ULong. These are wrapped in a tuple to keep them disjoint from members of the semantic domains Long, Float32, and Float64.

tuple ULong
value: {0 ... 264 – 1}
end tuple

Shorthand Notation

In this specification, when i is an integer between 0 and 264 – 1, the notation iulong indicates the result of ULongvaluei, which is the integer i wrapped in a ULong tuple.

Single-Precision Floating-Point Numbers

Float32 is the semantic domain of all representable single-precision floating-point IEEE 754 values, with all not-a-number values considered indistinguishable from each other. Float32 is the union of the following semantic domains:

Float32 = FiniteFloat32  {+f32f32NaNf32};
FiniteFloat32 = NonzeroFiniteFloat32  {+zerof32–zerof32}

The non-zero finite values are wrapped in a tuple to keep them disjoint from members of the semantic domains Long, ULong, and Float64. The remaining values are the tags +zerof32 (positive zero), –zerof32 (negative zero), +f32 (positive infinity), f32 (negative infinity), and NaNf32 (not a number).

tuple NonzeroFiniteFloat32
valueNormalizedFloat32Values  DenormalizedFloat32Values
end tuple

There are 4261412864 (that is, 232–225) normalized values:

NormalizedFloat32Values = {sm2e | s  {–1, 1}, m  {223 ... 224–1}, e  {–149 ... 104}}

m is called the significand.

There are also 16777214 (that is, 224–2) denormalized non-zero values:

DenormalizedFloat32Values = {sm2–149 | s  {–1, 1}, m  {1 ... 223–1}}

m is called the significand.

Members of the semantic domain NonzeroFiniteFloat32 with value greater than zero are called positive finite. The remaining members of NonzeroFiniteFloat32 are called negative finite.

Since floating-point numbers are either tags or tuples wrapping rational numbers, the notation = and may be used to compare them. Note that = is false for different tags, so +zerof32  –zerof32 but NaNf32 = NaNf32. The ECMAScript x == y and x === y operators have different behavior for Float32 values, defined by isEqual and isStrictlyEqual.

Shorthand Notation

In this specification, when x is a real number, the notation xf32 indicates the result of realToFloat32(x), which is the “closest” Float32 value as defined below. Thus, 3.4 is a Real number, while 3.4f32 is a Float32 value (whose exact value is actually 3.400000095367431640625). The positive finite Float32 values range from 10–45f32 to (3.4028235  1038)f32.

Conversion

The procedure realToFloat32 converts a real number x into the applicable element of Float32 as follows:

proc realToFloat32(xReal): Float32
sRational{}  NormalizedFloat32Values  DenormalizedFloat32Values  {–2128, 0, 2128};
Let aRational be the element of s closest to x (i.e. such that |ax| is as small as possible). If two elements of s are equally close, let a be the one with an even significand; for this purpose –2128, 0, and 2128 are considered to have even significands.
if a = 2128 then return +f32
elsif a = –2128 then return f32
elsif a  0 then return NonzeroFiniteFloat32valuea
elsif x < 0 then return –zerof32
else return +zerof32
end if
end proc

This procedure corresponds exactly to the behavior of the IEEE 754 “round to nearest” mode.

The procedure truncateFiniteFloat32 truncates a FiniteFloat32 value to an integer, rounding towards zero:

proc truncateFiniteFloat32(xFiniteFloat32): Integer
if x  {+zerof32–zerof32then return 0 end if;
rRational  x.value;
if r > 0 then return r else return r end if
end proc

Arithmetic

The following table defines negation of Float32 values using IEEE 754 rules. Note that exprf32 indicates the result of realToFloat32(expr).

float32Negate(xFloat32): Float32
x Result
f32 +f32
negative finite (–x.value)f32
–zerof32 +zerof32
+zerof32 –zerof32
positive finite (–x.value)f32
+f32 f32
NaNf32 NaNf32

Double-Precision Floating-Point Numbers

Float64 is the semantic domain of all representable double-precision floating-point IEEE 754 values, with all not-a-number values considered indistinguishable from each other. Float64 is the union of the following semantic domains:

Float64 = FiniteFloat64  {+f64f64NaNf64};
FiniteFloat64 = NonzeroFiniteFloat64  {+zerof64–zerof64}

The non-zero finite values are wrapped in a tuple to keep them disjoint from members of the semantic domains Long, ULong, and Float32. The remaining values are the tags +zerof64 (positive zero), –zerof64 (negative zero), +f64 (positive infinity), f64 (negative infinity), and NaNf64 (not a number).

tuple NonzeroFiniteFloat64
valueNormalizedFloat64Values  DenormalizedFloat64Values
end tuple

There are 18428729675200069632 (that is, 264–254) normalized values:

NormalizedFloat64Values = {sm2e | s  {–1, 1}, m  {252 ... 253–1}, e  {–1074 ... 971}}

m is called the significand.

There are also 9007199254740990 (that is, 253–2) denormalized non-zero values:

DenormalizedFloat64Values = {sm2–1074 | s  {–1, 1}, m  {1 ... 252–1}}

m is called the significand.

Members of the semantic domain NonzeroFiniteFloat64 with value greater than zero are called positive finite. The remaining members of NonzeroFiniteFloat64 are called negative finite.

Since floating-point numbers are either tags or tuples wrapping rational numbers, the notation = and may be used to compare them. Note that = is false for different tags, so +zerof64  –zerof64 but NaNf64 = NaNf64. The ECMAScript x == y and x === y operators have different behavior for Float64 values, defined by isEqual and isStrictlyEqual.

Shorthand Notation

In this specification, when x is a real number, the notation xf64 indicates the result of realToFloat64(x), which is the “closest” Float64 value as defined below. Thus, 3.4 is a Real number, while 3.4f64 is a Float64 value (whose exact value is actually 3.399999999999999911182158029987476766109466552734375). The positive finite Float64 values range from (5  10–324)f64 to (1.7976931348623157  10308)f64.

Conversion

The procedure realToFloat64 converts a real number x into the applicable element of Float64 as follows:

proc realToFloat64(xReal): Float64
sRational{}  NormalizedFloat64Values  DenormalizedFloat64Values  {–21024, 0, 21024};
Let aRational be the element of s closest to x (i.e. such that |ax| is as small as possible). If two elements of s are equally close, let a be the one with an even significand; for this purpose –21024, 0, and 21024 are considered to have even significands.
if a = 21024 then return +f64
elsif a = –21024 then return f64
elsif a  0 then return NonzeroFiniteFloat64valuea
elsif x < 0 then return –zerof64
else return +zerof64
end if
end proc

This procedure corresponds exactly to the behavior of the IEEE 754 “round to nearest” mode.

The procedure float32ToFloat64 converts a Float32 number x into the corresponding Float64 number as defined by the following table:

proc float32ToFloat64(xFloat32): Float64
x Result
f32 f64
–zerof32 –zerof64
+zerof32 +zerof64
+f32 +f64
NaNf32 NaNf64
Any NonzeroFiniteFloat32 value NonzeroFiniteFloat64valuex.value

The procedure truncateFiniteFloat64 truncates a FiniteFloat64 value to an integer, rounding towards zero:

proc truncateFiniteFloat64(xFiniteFloat64): Integer
if x  {+zerof64–zerof64then return 0 end if;
rRational  x.value;
if r > 0 then return r else return r end if
end proc

Arithmetic

The following tables define procedures that perform common arithmetic on Float64 values using IEEE 754 rules. Note that exprf64 indicates the result of realToFloat64(expr).

float64Abs(xFloat64): Float64
x Result
f64 +f64
negative finite (–x.value)f64
–zerof64 +zerof64
+zerof64 +zerof64
positive finite x
+f64 +f64
NaNf64 NaNf64
float64Negate(xFloat64): Float64
x Result
f64 +f64
negative finite (–x.value)f64
–zerof64 +zerof64
+zerof64 –zerof64
positive finite (–x.value)f64
+f64 f64
NaNf64 NaNf64
float64Add(xFloat64yFloat64): Float64
x y
f64 negative finite –zerof64 +zerof64 positive finite +f64 NaNf64
f64 f64 f64 f64 f64 f64 NaNf64 NaNf64
negative finite f64 (x.value + y.value)f64 x x (x.value + y.value)f64 +f64 NaNf64
–zerof64 f64 y –zerof64 +zerof64 y +f64 NaNf64
+zerof64 f64 y +zerof64 +zerof64 y +f64 NaNf64
positive finite f64 (x.value + y.value)f64 x x (x.value + y.value)f64 +f64 NaNf64
+f64 NaNf64 +f64 +f64 +f64 +f64 +f64 NaNf64
NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64
float64Subtract(xFloat64yFloat64): Float64
x y
f64 negative finite –zerof64 +zerof64 positive finite +f64 NaNf64
f64 NaNf64 f64 f64 f64 f64 f64 NaNf64
negative finite +f64 (x.value – y.value)f64 x x (x.value – y.value)f64 f64 NaNf64
–zerof64 +f64 (–y.value)f64 +zerof64 –zerof64 (–y.value)f64 f64 NaNf64
+zerof64 +f64 (–y.value)f64 +zerof64 +zerof64 (–y.value)f64 f64 NaNf64
positive finite +f64 (x.value – y.value)f64 x x (x.value – y.value)f64 f64 NaNf64
+f64 +f64 +f64 +f64 +f64 +f64 NaNf64 NaNf64
NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64
float64Multiply(xFloat64yFloat64): Float64
x y
f64 negative finite –zerof64 +zerof64 positive finite +f64 NaNf64
f64 +f64 +f64 NaNf64 NaNf64 f64 f64 NaNf64
negative finite +f64 (x.value  y.value)f64 +zerof64 –zerof64 (x.value  y.value)f64 f64 NaNf64
–zerof64 NaNf64 +zerof64 +zerof64 –zerof64 –zerof64 NaNf64 NaNf64
+zerof64 NaNf64 –zerof64 –zerof64 +zerof64 +zerof64 NaNf64 NaNf64
positive finite f64 (x.value  y.value)f64 –zerof64 +zerof64 (x.value  y.value)f64 +f64 NaNf64
+f64 f64 f64 NaNf64 NaNf64 +f64 +f64 NaNf64
NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64
float64Divide(xFloat64yFloat64): Float64
x y
f64 negative finite –zerof64 +zerof64 positive finite +f64 NaNf64
f64 NaNf64 +f64 +f64 f64 f64 NaNf64 NaNf64
negative finite +zerof64 (x.value / y.value)f64 +f64 f64 (x.value / y.value)f64 –zerof64 NaNf64
–zerof64 +zerof64 +zerof64 NaNf64 NaNf64 –zerof64 –zerof64 NaNf64
+zerof64 –zerof64 –zerof64 NaNf64 NaNf64 +zerof64 +zerof64 NaNf64
positive finite –zerof64 (x.value / y.value)f64 f64 +f64 (x.value / y.value)f64 +zerof64 NaNf64
+f64 NaNf64 f64 f64 +f64 +f64 NaNf64 NaNf64
NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64 NaNf64
float64Remainder(xFloat64yFloat64): Float64
x y
f64, +f64 positive or negative finite –zerof64, +zerof64 NaNf64
f64 NaNf64 NaNf64 NaNf64 NaNf64
negative finite x float64Negate(float64Remainder(float64Negate(x), y)) NaNf64 NaNf64
–zerof64 –zerof64 –zerof64 NaNf64 NaNf64
+zerof64 +zerof64 +zerof64 NaNf64 NaNf64
positive finite x (x.value – |y.value|x.value/|y.value|)f64 NaNf64 NaNf64
+f64 NaNf64 NaNf64 NaNf64 NaNf64
NaNf64 NaNf64 NaNf64 NaNf64 NaNf64

Note that float64Remainder(float64Negate(x), y) always produces the same result as float64Negate(float64Remainder(xy)). Also, float64Remainder(xfloat64Negate(y)) always produces the same result as float64Remainder(xy).

Procedures

A procedure is a function that receives zero or more arguments, performs computations, and optionally returns a result. Procedures may perform side effects. In this document the word procedure is used to refer to internal algorithms; the word function is used to refer to the programmer-visible function ECMAScript construct.

A procedure is denoted as:

proc f(param1T1, ... , paramnTn): T
step1;
step2;
... ;
stepm
end proc;

If the procedure does not return a value, the : T on the first line is omitted.

f is the procedure’s name, param1 through paramn are the procedure’s parameters, T1 through Tn are the parameters’ respective semantic domains, T is the semantic domain of the procedure’s result, and step1 through stepm describe the procedure’s computation steps, which may produce side effects and/or return a result. If T is omitted, the procedure does not return a result. When the procedure is called with argument values v1 through vn, the procedure’s steps are performed and the result, if any, returned to the caller.

A procedure’s steps can refer to the parameters param1 through paramn; each reference to a parameter parami evaluates to the corresponding argument value vi. Procedure parameters are statically scoped. Arguments are passed by value.

Operations

The only operation done on a procedure f is calling it using the f(arg1, ..., argn) syntax. f is computed first, followed by the argument expressions arg1 through argn, in left-to-right order. If the result of computing f or any of the argument expressions throws an exception e, then the call immediately propagates e without computing any following argument expressions. Otherwise, f is invoked using the provided arguments and the resulting value, if any, returned to the caller.

Procedures are never compared using =, , or any of the other comparison operators.

Semantic Domains of Procedures

The semantic domain of procedures that take n parameters in semantic domains T1 through Tn respectively and produce a result in semantic domain T is written as T1  T2  ...  Tn  T. If n = 0, this semantic domain is written as ()  T. If the procedure does not produce a result, the semantic domain of procedures is written either as T1  T2  ...  Tn  () or as ()  ().

Computation Steps

Computation steps in procedures are described using a mixture of English and formal notation. The various kinds of formal steps are described in this section. Multiple steps are separated by semicolons and performed in order unless an earlier step exits via a return or propagates an exception.

Informal steps state invariants and provide comments.

nothing

A nothing step performs no operation.

note Comment

A note step performs no operation. It provides an informative comment about the algorithm. If Comment is an expression, then the note step is an informative comment that asserts that the expression, if evaluated at this point, would be guaranteed to evaluate to true.

expression

A computation step may consist of an expression. The expression is computed and its value, if any, ignored.

vT  expression
v  expression

An assignment step is indicated using the assignment operator . This step computes the value of expression and assigns the result to the temporary variable or mutable global v. If this is the first time the temporary variable is referenced in a procedure, the variable’s semantic domain T is listed; any value stored in v is guaranteed to be a member of the semantic domain T.

vT

This step declares v to be a temporary variable with semantic domain T without assigning anything to the variable. v will not be read unless some other step first assigns a value to it.

Action[nonterminali expression

Inside an action, the assignment operator can also be used to define the result of another action Action[nonterminali] applied to the current expansion of nonterminali, which must appear on the current production’s left or right side. Such an assignment is done only when the value of Action[nonterminali] is not defined explicitly via an action. The value of Action[nonterminali] is set at most once and never modified afterwards. If the same nonterminal is expanded several times while parsing a source program, all such expansions are treated independently.

Temporary variables are local to the procedures that define them (including any nested procedures). Each time a procedure is called it gets a new set of temporary variables.

a.label  expression

This form of assignment sets the value of field label of record a to the value of expression.

if expression1 then stepstep; ...; step
elsif expression2 then stepstep; ...; step
...
elsif expressionn then stepstep; ...; step
else stepstep; ...; step
end if

An if step computes expression1, which will evaluate to either true or false. If it is true, the first list of steps is performed. Otherwise, expression2 is computed and tested, and so on. If no expression evaluates to true, the list of steps following the else is performed. The else clause may be omitted, in which case no action is taken when no expression evaluates to true.

case expression of
T1 do stepstep; ...; step;
T2 do stepstep; ...; step;
...;
Tn do stepstep; ...; step
else stepstep; ...; step
end case

A case step computes expression, which will evaluate to a value v. If v  T1, then the first list of steps is performed. Otherwise, if v  T2, then the second list of steps is performed, and so on. If v is not a member of any Ti, the list of steps following the else is performed. The else clause may be omitted, in which case v will always be a member of some Ti.

while expression do
step;
step;
...;
step
end while

A while step computes expression, which will evaluate to either true or false. If it is false, no action is taken. If it is true, the list of steps is performed and then expression is computed and tested again. This repeats until expression returns true (or until the procedure exits via a return or an exception is propagated out).

for each x  expression do
step;
step;
...;
step
end for each

A for each step computes expression, which will evaluate to either a set or a list A. The list of steps is performed repeatedly with variable x bound to each element of A. If A is a list, x is bound to each of its elements in order; if A is a set, the order in which x is bound to its elements is arbitrary. The repetition ends after x has been bound to all elements of A (or when either the procedure exits via a return or an exception is propagated out).

return expression

A return step computes expression to obtain a value v and returns from the enclosing procedure with the result v. No further steps in the enclosing procedure are performed. The expression may be omitted, in which case the enclosing procedure returns with no result.

Exceptions

throw expression

A throw step computes expression to obtain a value v and begins propagating exception v outwards, exiting partially performed steps and procedure calls until the exception is caught by a catch step. Unless the enclosing procedure catches this exception, no further steps in the enclosing procedure are performed.

try
step;
step;
...;
step
catch vT do
step;
step;
...;
step
end try

A try step performs the first list of steps. If they complete normally (or if they return), then the try step is done. If any of the steps propagates out an exception e, then if e  T, then exception e stops propagating, variable v is bound to the value e, and the second list of steps is performed. If e  T, then exception e keeps propagating out.

A try step does not intercept exceptions that may be propagated out of its second list of steps.

Nested Procedures

An inner proc may be nested as a step inside an outer proc. In this case the inner procedure is a closure and can access the parameters and temporaries of the outer procedure.

Semantic Actions

Semantic actions tie the grammar and the semantics together. A semantic action ascribes semantic meaning to a grammar production.

To illustrate the use of semantic actions, we shall look at an example, followed by a description of the notation for specifying semantic actions.

Example

Consider the following grammar, with the start nonterminal Numeral:

Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Digits 
   Digit
Numeral 
   Digits
|  Digits # Digits

This grammar defines the syntax of an acceptable input: “37”, “33#4” and “30#2” are acceptable syntactically, while “1a” is not. However, the grammar does not indicate what these various inputs mean. That is the job of the semantics, which are defined in terms of actions on the parse tree of grammar rule expansions. Consider the following sample set of actions defined on this grammar, with a starting Numeral action called (in this example) Value:

tag syntaxError;
SemanticException = {syntaxError};
Value[Digit]: Integer = digitValue(Digit);
DecimalValue[Digits]: Integer;
DecimalValue[Digits  Digit] = Value[Digit];
DecimalValue[Digits0  Digits1 Digit] = 10DecimalValue[Digits1] + Value[Digit];
proc BaseValue[Digits] (baseInteger): Integer
[Digits  Digitdo
dInteger  Value[Digit];
if d < base then return d else throw syntaxError end if;
[Digits0  Digits1 Digitdo
dInteger  Value[Digit];
if d < base then return baseBaseValue[Digits1](base) + d
else throw syntaxError
end if
end proc;
Value[Numeral]: Integer;
Value[Numeral  Digits] = DecimalValue[Digits];
Value[Numeral  Digits1 # Digits2]
begin
baseInteger  DecimalValue[Digits2];
if base  2 and base  10 then return BaseValue[Digits1](base)
else throw syntaxError
end if
end;

Action names are written in violet cursive type. The definition

Value[Numeral]: Integer;

states that the action Value can be applied to any expansion of the nonterminal Numeral, and the result is an Integer. This action either maps an input to an integer or throws an exception. The code above throws the exception syntaxError when presented with the input “30#2”.

There are two definitions of the Value action on Numeral, one for each grammar production that expands Numeral:

Value[Numeral  Digits] = DecimalValue[Digits];
Value[Numeral  Digits1 # Digits2]
begin
baseInteger  DecimalValue[Digits2];
if base  2 and base  10 then return BaseValue[Digits1](base)
else throw syntaxError
end if
end;

Each definition of an action is allowed to perform actions on the terminals and nonterminals on the right side of the expansion. For example, Value applied to the first Numeral production (the one that expands Numeral into Digits) simply applies the DecimalValue action to the expansion of the nonterminal Digits and returns the result. On the other hand, Value applied to the second Numeral production (the one that expands Numeral into Digits # Digits) performs a computation using the results of the DecimalValue and BaseValue applied to the two expansions of the Digits nonterminals. In this case there are two identical nonterminals Digits on the right side of the expansion, so subscripts are used to indicate on which the actions DecimalValue and BaseValue are performed.

The definition

proc BaseValue[Digits] (baseInteger): Integer
[Digits  Digitdo
dInteger  Value[Digit];
if d < base then return d else throw syntaxError end if;
[Digits0  Digits1 Digitdo
dInteger  Value[Digit];
if d < base then return baseBaseValue[Digits1](base) + d
else throw syntaxError
end if
end proc;

states that the action BaseValue can be applied to any expansion of the nonterminal Digits, and the result is a procedure that takes one Integer argument base and returns an Integer. The procedure’s body is comprised of independent cases for each production that expands Digits. When the procedure is called, the case corresponding to the expansion of the nonterminal Digits is evaluated.

The Value action on Digit illustrates the direct use of a nonterminal in a semantic expression: digitValue(Digit). Using the nonterminal Digit in this way refers to the character into which the Digit grammar rule expands.

We can fully evaluate the semantics on our sample inputs to get the following results:

Input    Semantic Result
37 37
33#4 15
30#2 throw syntaxError

Abbreviated Actions

In some cases the all actions named A for a nonterminal N’s rule are repetitive, merely calling A on the nonterminals on the right side of the expansions of N in the grammar. In these cases the semantics of action A are abbreviated, as illustrated by the example below.

Given the grammar rule

Expression 
   Subexpression
|  Expression * Subexpression
|  Subexpression + Subexpression
|  this

the notation

Validate[Expression] (cxtContextenvEnvironment) propagates the call to Validate to nonterminals in the expansion of Expression.

is an abbreviation for the following:

proc Validate[Expression] (cxtContextenvEnvironment)
[Expression  Subexpressiondo Validate[Subexpression](cxtenv);
[Expression0  Expression1 * Subexpressiondo
Validate[Expression1](cxtenv);
Validate[Subexpression](cxtenv);
[Expression  Subexpression1 + Subexpression2do
Validate[Subexpression1](cxtenv);
Validate[Subexpression2](cxtenv);
[Expression  thisdo nothing
end proc;

Note that:

  • The expanded calls to Validate get the same arguments cxt and env passed in to the call to Validate on Expression.
  • When an expansion of Expression has more than one nonterminal on its right side, Validate is called on all of the nonterminals in left-to-right order.
  • When an expansion of Expression has no nonterminals on its right side, Validate does nothing.

The propagation notation is also used in when the actions return a value. In this case each expansion must have exactly one nonterminal. For example, given the grammar rule

Id 
   SimpleId
|  ComplexId

the notation

Eval[Id] (envEnvironmentphasePhase): Multiname propagates the call to Eval to nonterminals in the expansion of Id.

is an abbreviation for the following:

proc Eval[Id] (envEnvironmentphasePhase): Multiname
[Id  SimpleIddo return Eval[SimpleId](envphase);
[Id  ComplexIddo return Eval[ComplexId](envphase)
end proc;

Semantic Definition Summary

The following notation is used to define top-level semantic entities:

Name = expression;

This notation defines Name to be a shorthand for the semantic domain expression. In the HTML version of the semantics, each use of Name is linked back to this definition.

nameT = expression;

This notation defines name to be a constant value given by the result of computing expression. The value is guaranteed to be a member of the semantic domain T. In the HTML version of the semantics, each use of name is linked back to this definition.

nameT  expression;

This notation defines name to be a mutable global value. Its initial value is the result of computing expression, but it may be subsequently altered using an assignment. The value is guaranteed to be a member of the semantic domain T. In the HTML version of the semantics, each use of name is linked back to this definition.

proc f(param1T1, ... , paramnTn): T
step1;
step2;
... ;
stepm
end proc;

This notation defines f to be a procedure.

tag name;

This notation defines a tag.

tuple Name
label1T1,
... ,
labelnTn
end tuple;

This notation defines a tuple.

record Name
label1T1,
... ,
labelnTn
end record;

This notation defines a record.

Action[nonterminal]: T;

This notation states that action Action can be performed on nonterminal nonterminal and returns a value that is a member of the semantic domain T. The action’s value is either defined using the notation Action[nonterminal  expansion] = expression below or set as a side effect of computing another action via an action assignment.

Action[nonterminal  expansion] = expression;

This notation specifies the value that action Action on nonterminal nonterminal computes in the case where nonterminal nonterminal expands to the given expansion. expansion can contain zero or more terminals and nonterminals (as well as other notations allowed on the right side of a grammar production). Furthermore, the terminals and nonterminals of expansion can be subscripted to allow them to be unambiguously referenced by action references or nonterminal references inside expression.

Action[nonterminal  expansion]: T = expression;

This notation combines the above two — it specifies the semantic domain of the action as well as its value.

Action[nonterminal  expansion]
begin
step1;
step2;
... ;
stepm
end;

This notation is used when the computation of the action is too complex for an expression. Here the steps to compute the action are listed as step1 through stepm. A return step produces the value of the action.

proc Action[nonterminal  expansion] (param1T1, ... , paramnTn): T
step1;
step2;
... ;
stepm
end proc;

This notation is used only when Action returns a procedure when applied to nonterminal nonterminal with a single expansion expansion. Here the steps of the procedure are listed as step1 through stepm.

proc Action[nonterminal] (param1T1, ... , paramnTn): T
[nonterminal  expansion1do
step;
... ;
step;
[nonterminal  expansion2do
step;
... ;
step;
...;
[nonterminal  expansionndo
step;
... ;
step
end proc;

This notation is used only when Action returns a procedure when applied to nonterminal nonterminal with several expansions expansion1 through expansionn. The procedure is comprised of a series of cases, one for each expansion. Only the steps corresponding to the expansion found by the grammar parser used are evaluated.

Action[nonterminal] (param1T1, ... , paramnTn) propagates the call to Action to every nonterminal in the expansion of nonterminal.

This notation is an abbreviation stating that calling Action on nonterminal causes Action to be called with the same arguments on every nonterminal on the right side of the appropriate expansion of nonterminal.


ECMAScript 4 Netscape Proposal
Formal Description
Stages
previousupnext

Tuesday, October 15, 2002

The source code is processed in the following stages:

  1. If necessary, convert the source code into the Unicode UTF-16 format, normalized form C.
  2. Remove any Unicode format control characters (category Cf) from the source code.
  3. Simultaneously split the source code into input elements using the lexical grammar and semantics and parse it using the syntactic grammar to obtain a parse tree P.
  4. Evaluate P using the syntactic semantics by computing the action Eval on it.

Lexing and Parsing

Processing stage 3 is done as follows:

  1. Let inputElements be an empty array of input elements (syntactic grammar terminals and line breaks).
  2. Let input be the input sequence of Unicode characters. Append a special placeholder End to the end of input.
  3. Let state be a variable that holds one of the constants re, div, or num. Initialize it to re.
  4. Apply the lexical grammar to parse the longest possible prefix of input. Use the start symbol NextInputElementre, NextInputElementdiv, or NextInputElementnum depending on whether state is re, div, or num, respectively. The result of the parse should be a lexical grammar parse tree T. If the parse failed, return a syntax error.
  5. Compute the action InputElement on T to obtain an InputElement e.
  6. If e is the endOfInput input element, go to step 15.
  7. Remove the characters matched by T from input, leaving only the yet-unlexed suffix of input.
  8. Interpret e as a syntactic grammar terminal or line break as follows:
    • A lineBreak is interpreted as a line break, which is not a terminal itself but indicates one or more line breaks between two terminals. It prevents the syntactic grammar from matching any productions that have a [no line break] annotation in the place where the lineBreak occurred.
    • An Identifier s is interpreted as the terminal Identifier. Applying the semantic action Name to the Identifier returns the String value s.name.
    • A Keyword s is interpreted as the reserved word, future reserved word, or non-reserved word terminal corresponding to the Keyword’s String s.
    • A Punctuator s is interpreted as the punctuation token or future punctuation token terminal corresponding to the Punctuator’s String s.
    • A NumberToken x is interpreted as the terminal Number. Applying the semantic action Value to the Number returns the GeneralNumber value x.value.
    • A negatedMinLong, which results from a numeric long token with the value 263, is interpreted as the terminal NegatedMinLong.
    • A StringToken s is interpreted as the terminal String. Applying the semantic action Value to the String returns the String value s.value.
    • A RegularExpression z is interpreted as the terminal RegularExpression.
  9. Append the resulting terminal or line break to the end of the inputElements array.
  10. If the inputElements array forms a valid prefix of the context-free language defined by the syntactic grammar, go to step 13.
  11. If is not a lineBreak but the previous element of the inputElements array is a lineBreak, then insert a VirtualSemicolon terminal between that lineBreak and in the inputElements array.
  12. If the inputElements array still does not form a valid prefix of the context-free language defined by the syntactic grammar, signal a syntax error and stop.
  13. If is a Number or NegatedMinLong, then set state to num. Otherwise, if the inputElements array followed by the terminal / forms a valid prefix of the context-free language defined by the syntactic grammar, then set state to div; otherwise, set state to re.
  14. Go to step 4.
  15. If the inputElements array does not form a valid sentence of the context-free language defined by the syntactic grammar, signal a syntax error and stop.
  16. Return the parse tree obtained by the syntactic grammar’s derivation of the sentence formed by the inputElements array.

ECMAScript 4 Netscape Proposal
Formal Description
Lexical Grammar
previousupnext

Monday, June 30, 2003

This LALR(1) grammar describes the lexical syntax of the ECMAScript 4 proposal. See also the description of the grammar notation.

This document is also available as a Word RTF file.

The lexer’s start symbols are: NextInputElementnum if the previous input element was a number; NextInputElementre if the previous input element was not a number and a / should be interpreted as a regular expression; and NextInputElementdiv if the previous input element was not a number and a / should be interpreted as a division or division-assignment operator.

In addition to the above, the start symbol StringNumericLiteral is used by the syntactic semantics for string-to-number conversions and the start symbol StringDecimalLiteral is used by the syntactic semantics for implementing the parseFloat function.

Unicode Character Classes

UnicodeCharacter  Any Unicode character
UnicodeInitialAlphabetic  Any character in category Lu (uppercase letter), Ll (lowercase letter), Lt (titlecase letter), Lm (modifier letter), Lo (other letter), or Nl (letter number) in the Unicode Character Database
UnicodeAlphanumeric  Any character in category Lu (uppercase letter), Ll (lowercase letter), Lt (titlecase letter), Lm (modifier letter), Lo (other letter), Nd (decimal number), Nl (letter number), Mn (non-spacing mark), Mc (combining spacing mark), or Pc (connector punctuation) in the Unicode Character Database
WhiteSpaceCharacter 
   «TAB» | «VT» | «FF» | «SP» | «u00A0»
|  «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»
|  «u2008» | «u2009» | «u200A» | «u200B»
|  «u3000»
LineTerminator  «LF» | «CR» | «u0085» | «u2028» | «u2029»
ASCIIDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Comments

LineComment  / / LineCommentCharacters
LineCommentCharacters 
   «empty»
NonTerminator  UnicodeCharacter except LineTerminator
SingleLineBlockComment  / * BlockCommentCharacters * /
BlockCommentCharacters 
   «empty»
PreSlashCharacters 
   «empty»
NonTerminatorOrSlash  NonTerminator except /
NonTerminatorOrAsteriskOrSlash  NonTerminator except * | /
MultiLineBlockComment  / * MultiLineBlockCommentCharacters BlockCommentCharacters * /
MultiLineBlockCommentCharacters 

White Space

WhiteSpace 
   «empty»

Line Breaks

LineBreak 
LineBreaks 
   LineBreak

Input Elements

  {redivnum}
NextInputElementre  WhiteSpace InputElementre
NextInputElementdiv  WhiteSpace InputElementdiv
NextInputElementnum  [lookahead{ContinuingIdentifierCharacter\}] WhiteSpace InputElementdiv
InputElementre 
InputElementdiv 
EndOfInput 
   End
|  LineComment End

Keywords and Identifiers

IdentifierOrKeyword  IdentifierName
NullEscapes 
NullEscape  \ _
InitialIdentifierCharacterOrEscape 
|  \ HexEscape
InitialIdentifierCharacter  UnicodeInitialAlphabetic | $ | _
ContinuingIdentifierCharacterOrEscape 
|  \ HexEscape
ContinuingIdentifierCharacter  UnicodeAlphanumeric | $ | _

Punctuators

Punctuator 
   !
|  ! =
|  ! = =
|  %
|  % =
|  &
|  & &
|  & & =
|  & =
|  (
|  )
|  *
|  * =
|  +
|  + +
|  + =
|  ,
|  -
|  - -
|  - =
|  .
|  . . .
|  :
|  : :
|  ;
|  <
|  < <
|  < < =
|  < =
|  =
|  = =
|  = = =
|  >
|  > =
|  > >
|  > > =
|  > > >
|  > > > =
|  ?
|  [
|  ]
|  ^
|  ^ =
|  ^ ^
|  ^ ^ =
|  {
|  |
|  | =
|  | |
|  | | =
|  }
|  ~
DivisionPunctuator 
   / [lookahead{/*}]
|  / =

Numeric Literals

NumericLiteral 
   DecimalLiteralnoLeadingZeros
|  DecimalLiteralnoLeadingZeros LetterF
IntegerLiteral 
   DecimalIntegerLiteralnoLeadingZeros
LetterF  F | f
LetterL  L | l
LetterU  U | u
  {noLeadingZerosallowLeadingZeros}
DecimalLiteral 
   Mantissa
|  Mantissa LetterE SignedInteger
LetterE  E | e
Mantissa 
   DecimalIntegerLiteral
|  DecimalIntegerLiteral .
|  DecimalIntegerLiteral . Fraction
|  . Fraction
DecimalIntegerLiteralnoLeadingZeros 
   0
DecimalIntegerLiteralallowLeadingZeros  DecimalDigits
NonZeroDecimalDigits 
NonZeroDigit  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Fraction  DecimalDigits
SignedInteger  OptionalSign DecimalDigits
OptionalSign 
   «empty»
|  +
|  -
DecimalDigits 
HexIntegerLiteral 
   0 LetterX HexDigit
LetterX  X | x
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

String Literals

  {singledouble}
StringLiteral 
   ' StringCharssingle '
|  " StringCharsdouble "
StringChars 
   «empty»
|  StringChars StringChar
|  StringChars NullEscape
StringChar 
   LiteralStringChar
|  \ StringEscape
LiteralStringCharsingle  UnicodeCharacter except ' | \ | LineTerminator
LiteralStringChardouble  UnicodeCharacter except " | \ | LineTerminator
StringEscape 
IdentityEscape  NonTerminator except _ | UnicodeAlphanumeric
ControlEscape 
   b
|  f
|  n
|  r
|  t
|  v
ZeroEscape  0 [lookahead{ASCIIDigit}]
HexEscape 
   x HexDigit HexDigit

Regular Expression Literals

RegExpLiteral  RegExpBody RegExpFlags
RegExpFlags 
   «empty»
RegExpBody  / [lookahead{*}] RegExpChars /
RegExpChars 
RegExpChar 
OrdinaryRegExpChar  NonTerminator except \ | /

String-to-Number Conversion

SignedDecimalLiteral 
   OptionalSign DecimalLiteralallowLeadingZeros
|  OptionalSign I n f i n i t y
|  N a N
StringWhiteSpace 
   «empty»
WhiteSpaceOrLineTerminatorChar  WhiteSpaceCharacter | LineTerminator

parseFloat Conversion

StringDecimalLiteral  StringWhiteSpace SignedDecimalLiteral

ECMAScript 4 Netscape Proposal
Formal Description
Lexical Semantics
previousupnext

Monday, June 30, 2003

The lexical semantics describe the actions the lexer takes in order to transform an input stream of Unicode characters into a stream of tokens. For convenience, the lexical grammar is repeated here. See also the description of the semantic notation.

This document is also available as a Word RTF file.

The lexer’s start symbols are: NextInputElementnum if the previous input element was a number; NextInputElementre if the previous input element was not a number and a / should be interpreted as a regular expression; and NextInputElementdiv if the previous input element was not a number and a / should be interpreted as a division or division-assignment operator.

In addition to the above, the start symbol StringNumericLiteral is used by the syntactic semantics for string-to-number conversions and the start symbol StringDecimalLiteral is used by the syntactic semantics for implementing the parseFloat function.

Semantics

tag lineBreak;
tag endOfInput;
tuple Keyword
nameString
end tuple;
tuple Punctuator
nameString
end tuple;
tuple Identifier
nameString
end tuple;
tuple NumberToken
end tuple;
tag negatedMinLong;
tuple StringToken
valueString
end tuple;
tuple RegularExpression
bodyString,
flagsString
end tuple;
TokenKeyword  Punctuator  Identifier  NumberToken  {negatedMinLong StringToken  RegularExpression;
InputElement = {lineBreakendOfInput Token;
tag syntaxError;
tag rangeError;
SemanticException = {syntaxErrorrangeError};

Unicode Character Classes

Syntax

UnicodeCharacter  Any Unicode character
UnicodeInitialAlphabetic  Any character in category Lu (uppercase letter), Ll (lowercase letter), Lt (titlecase letter), Lm (modifier letter), Lo (other letter), or Nl (letter number) in the Unicode Character Database
UnicodeAlphanumeric  Any character in category Lu (uppercase letter), Ll (lowercase letter), Lt (titlecase letter), Lm (modifier letter), Lo (other letter), Nd (decimal number), Nl (letter number), Mn (non-spacing mark), Mc (combining spacing mark), or Pc (connector punctuation) in the Unicode Character Database
WhiteSpaceCharacter 
   «TAB» | «VT» | «FF» | «SP» | «u00A0»
|  «u2000» | «u2001» | «u2002» | «u2003» | «u2004» | «u2005» | «u2006» | «u2007»
|  «u2008» | «u2009» | «u200A» | «u200B»
|  «u3000»
LineTerminator  «LF» | «CR» | «u0085» | «u2028» | «u2029»
ASCIIDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

DecimalValue[ASCIIDigit]: Integer = digitValue(ASCIIDigit);

Comments

Syntax

LineComment  / / LineCommentCharacters
LineCommentCharacters 
   «empty»
NonTerminator  UnicodeCharacter except LineTerminator
SingleLineBlockComment  / * BlockCommentCharacters * /
BlockCommentCharacters 
   «empty»
PreSlashCharacters 
   «empty»
NonTerminatorOrSlash  NonTerminator except /
NonTerminatorOrAsteriskOrSlash  NonTerminator except * | /
MultiLineBlockComment  / * MultiLineBlockCommentCharacters BlockCommentCharacters * /
MultiLineBlockCommentCharacters 

White Space

Syntax

WhiteSpace 
   «empty»

Line Breaks

Syntax

LineBreak 
LineBreaks 
   LineBreak

Input Elements

Syntax

  {redivnum}
NextInputElementre  WhiteSpace InputElementre
NextInputElementdiv  WhiteSpace InputElementdiv
NextInputElementnum  [lookahead{ContinuingIdentifierCharacter\}] WhiteSpace InputElementdiv

Semantics

Lex[NextInputElement]: InputElement;
Lex[NextInputElementre  WhiteSpace InputElementre] = Lex[InputElementre];
Lex[NextInputElementdiv  WhiteSpace InputElementdiv] = Lex[InputElementdiv];
Lex[NextInputElementnum  [lookahead{ContinuingIdentifierCharacter\}] WhiteSpace InputElementdiv] = Lex[InputElementdiv];

Syntax

InputElementre 
InputElementdiv 
EndOfInput 
   End
|  LineComment End

Semantics

Lex[InputElement]: InputElement;
Lex[InputElement  LineBreaks] = lineBreak;
Lex[InputElement  IdentifierOrKeyword] = Lex[IdentifierOrKeyword];
Lex[InputElement  Punctuator] = Lex[Punctuator];
Lex[InputElementdiv  DivisionPunctuator] = Lex[DivisionPunctuator];
Lex[InputElement  NumericLiteral] = Lex[NumericLiteral];
Lex[InputElement  StringLiteral] = Lex[StringLiteral];
Lex[InputElementre  RegExpLiteral] = Lex[RegExpLiteral];
Lex[InputElement  EndOfInput] = endOfInput;

Keywords and Identifiers

Syntax

IdentifierOrKeyword  IdentifierName

Semantics

Lex[IdentifierOrKeyword  IdentifierName]: InputElement
begin
idString  Lex[IdentifierName];
if id  {“abstract”, “as”, “break”, “case”, “catch”, “class”, “const”, “continue”, “debugger”, “default”, “delete”, “do”, “else”, “enum”, “export”, “extends”, “false”, “finally”, “for”, “function”, “get”, “goto”, “if”, “implements”, “import”, “in”, “instanceof”, “interface”, “is”, “namespace”, “native”, “new”, “null”, “package”, “private”, “protected”, “public”, “return”, “set”, “super”, “switch”, “synchronized”, “this”, “throw”, “throws”, “transient”, “true”, “try”, “typeof”, “use”, “var”, “volatile”, “while”, “with”} and not ContainsEscapes[IdentifierNamethen
return Keywordnameid
else return Identifiernameid
end if
end;

Syntax

NullEscapes 
NullEscape  \ _
InitialIdentifierCharacterOrEscape 
|  \ HexEscape
InitialIdentifierCharacter  UnicodeInitialAlphabetic | $ | _
ContinuingIdentifierCharacterOrEscape 
|  \ HexEscape
ContinuingIdentifierCharacter  UnicodeAlphanumeric | $ | _

Semantics

ContainsEscapes[IdentifierName]: Boolean;
ContainsEscapes[IdentifierName  InitialIdentifierCharacterOrEscape] = ContainsEscapes[InitialIdentifierCharacterOrEscape];
ContainsEscapes[IdentifierName  NullEscapes InitialIdentifierCharacterOrEscape] = true;
ContainsEscapes[IdentifierName0  IdentifierName1 ContinuingIdentifierCharacterOrEscape] = ContainsEscapes[IdentifierName1or ContainsEscapes[ContinuingIdentifierCharacterOrEscape];
ContainsEscapes[IdentifierName  IdentifierName NullEscape] = true;
Lex[InitialIdentifierCharacterOrEscape  \ HexEscape]
begin
chChar16  Lex[HexEscape];
if the nonterminal InitialIdentifierCharacter can expand into [ch] then
return ch
else throw syntaxError
end if
end;
ContainsEscapes[InitialIdentifierCharacterOrEscape  InitialIdentifierCharacter] = false;
ContainsEscapes[InitialIdentifierCharacterOrEscape  \ HexEscape] = true;
Lex[ContinuingIdentifierCharacterOrEscape  \ HexEscape]
begin
chChar16  Lex[HexEscape];
if the nonterminal ContinuingIdentifierCharacter can expand into [ch] then
return ch
else throw syntaxError
end if
end;
ContainsEscapes[ContinuingIdentifierCharacterOrEscape  ContinuingIdentifierCharacter] = false;
ContainsEscapes[ContinuingIdentifierCharacterOrEscape  \ HexEscape] = true;

Punctuators

Syntax

Punctuator 
   !
|  ! =
|  ! = =
|  %
|  % =
|  &
|  & &
|  & & =
|  & =
|  (
|  )
|  *
|  * =
|  +
|  + +
|  + =
|  ,
|  -
|  - -
|  - =
|  .
|  . . .
|  :
|  : :
|  ;
|  <
|  < <
|  < < =
|  < =
|  =
|  = =
|  = = =
|  >
|  > =
|  > >
|  > > =
|  > > >
|  > > > =
|  ?
|  [
|  ]
|  ^
|  ^ =
|  ^ ^
|  ^ ^ =
|  {
|  |
|  | =
|  | |
|  | | =
|  }
|  ~
DivisionPunctuator 
   / [lookahead{/*}]
|  / =

Semantics

Lex[Punctuator  !] = Punctuatorname: “!;
Lex[Punctuator  ! =] = Punctuatorname: “!=;
Lex[Punctuator  ! = =] = Punctuatorname: “!==;
Lex[Punctuator  %] = Punctuatorname: “%;
Lex[Punctuator  % =] = Punctuatorname: “%=;
Lex[Punctuator  &] = Punctuatorname: “&;
Lex[Punctuator  & &] = Punctuatorname: “&&;
Lex[Punctuator  & & =] = Punctuatorname: “&&=;
Lex[Punctuator  & =] = Punctuatorname: “&=;
Lex[Punctuator  (] = Punctuatorname: “(;
Lex[Punctuator  )] = Punctuatorname: “);
Lex[Punctuator  *] = Punctuatorname: “*;
Lex[Punctuator  * =] = Punctuatorname: “*=;
Lex[Punctuator  +] = Punctuatorname: “+;
Lex[Punctuator  + +] = Punctuatorname: “++;
Lex[Punctuator  + =] = Punctuatorname: “+=;
Lex[Punctuator  ,] = Punctuatorname: “,;
Lex[Punctuator  -] = Punctuatorname: “-;
Lex[Punctuator  - -] = Punctuatorname: “--;
Lex[Punctuator  - =] = Punctuatorname: “-=;
Lex[Punctuator  .] = Punctuatorname: “.;
Lex[Punctuator  . . .] = Punctuatorname: “...;
Lex[Punctuator  :] = Punctuatorname: “:;
Lex[Punctuator  : :] = Punctuatorname: “::;
Lex[Punctuator  ;] = Punctuatorname: “;;
Lex[Punctuator  <] = Punctuatorname: “<;
Lex[Punctuator  < <] = Punctuatorname: “<<;
Lex[Punctuator  < < =] = Punctuatorname: “<<=;
Lex[Punctuator  < =] = Punctuatorname: “<=;
Lex[Punctuator  =] = Punctuatorname: “=;
Lex[Punctuator  = =] = Punctuatorname: “==;
Lex[Punctuator  = = =] = Punctuatorname: “===;
Lex[Punctuator  >] = Punctuatorname: “>;
Lex[Punctuator  > =] = Punctuatorname: “>=;
Lex[Punctuator  > >] = Punctuatorname: “>>;
Lex[Punctuator  > > =] = Punctuatorname: “>>=;
Lex[Punctuator  > > >] = Punctuatorname: “>>>;
Lex[Punctuator  > > > =] = Punctuatorname: “>>>=;
Lex[Punctuator  ?] = Punctuatorname: “?;
Lex[Punctuator  [] = Punctuatorname: “[;
Lex[Punctuator  ]] = Punctuatorname: “];
Lex[Punctuator  ^] = Punctuatorname: “^;
Lex[Punctuator  ^ =] = Punctuatorname: “^=;
Lex[Punctuator  ^ ^] = Punctuatorname: “^^;
Lex[Punctuator  ^ ^ =] = Punctuatorname: “^^=;
Lex[Punctuator  {] = Punctuatorname: “{;
Lex[Punctuator  |] = Punctuatorname: “|;
Lex[Punctuator  | =] = Punctuatorname: “|=;
Lex[Punctuator  | |] = Punctuatorname: “||;
Lex[Punctuator  | | =] = Punctuatorname: “||=;
Lex[Punctuator  }] = Punctuatorname: “};
Lex[Punctuator  ~] = Punctuatorname: “~;
Lex[DivisionPunctuator  / [lookahead{/*}]] = Punctuatorname: “/;
Lex[DivisionPunctuator  / =] = Punctuatorname: “/=;

Numeric Literals

Syntax

NumericLiteral 
   DecimalLiteralnoLeadingZeros
|  DecimalLiteralnoLeadingZeros LetterF
IntegerLiteral 
   DecimalIntegerLiteralnoLeadingZeros
LetterF  F | f
LetterL  L | l
LetterU  U | u

Semantics

Lex[NumericLiteral  DecimalLiteralnoLeadingZeros] = NumberTokenvalue: (Lex[DecimalLiteralnoLeadingZeros])f64;
Lex[NumericLiteral  HexIntegerLiteral] = NumberTokenvalue: (Lex[HexIntegerLiteral])f64;
Lex[NumericLiteral  DecimalLiteralnoLeadingZeros LetterF] = NumberTokenvalue: (Lex[DecimalLiteralnoLeadingZeros])f32;
Lex[NumericLiteral  IntegerLiteral LetterL]
begin
iInteger  Lex[IntegerLiteral];
if i  263 – 1 then return NumberTokenvalueilong
elsif i = 263 then return negatedMinLong
else throw rangeError
end if
end;
Lex[NumericLiteral  IntegerLiteral LetterU LetterL]
begin
iInteger  Lex[IntegerLiteral];
if i  264 – 1 then return NumberTokenvalueiulong
else throw rangeError
end if
end;
Lex[IntegerLiteral  DecimalIntegerLiteralnoLeadingZeros] = Lex[DecimalIntegerLiteralnoLeadingZeros];
Lex[IntegerLiteral  HexIntegerLiteral] = Lex[HexIntegerLiteral];

Syntax

  {noLeadingZerosallowLeadingZeros}
DecimalLiteral 
   Mantissa
|  Mantissa LetterE SignedInteger
LetterE  E | e
Mantissa 
   DecimalIntegerLiteral
|  DecimalIntegerLiteral .
|  DecimalIntegerLiteral . Fraction
|  . Fraction
DecimalIntegerLiteralnoLeadingZeros 
   0
DecimalIntegerLiteralallowLeadingZeros  DecimalDigits
NonZeroDecimalDigits 
NonZeroDigit  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Fraction  DecimalDigits

Semantics

Lex[DecimalLiteral]: Rational;
Lex[DecimalLiteral  Mantissa] = Lex[Mantissa];
Lex[DecimalLiteral  Mantissa LetterE SignedInteger] = Lex[Mantissa]10Lex[SignedInteger];
Lex[Mantissa]: Rational;
Lex[Mantissa  DecimalIntegerLiteral] = Lex[DecimalIntegerLiteral];
Lex[Mantissa  DecimalIntegerLiteral .] = Lex[DecimalIntegerLiteral];
Lex[Mantissa  DecimalIntegerLiteral . Fraction] = Lex[DecimalIntegerLiteral] + Lex[Fraction];
Lex[Mantissa  . Fraction] = Lex[Fraction];
Lex[DecimalIntegerLiteral]: Integer;
Lex[DecimalIntegerLiteralnoLeadingZeros  0] = 0;
Lex[DecimalIntegerLiteralnoLeadingZeros  NonZeroDecimalDigits] = Lex[NonZeroDecimalDigits];
Lex[DecimalIntegerLiteralallowLeadingZeros  DecimalDigits] = Lex[DecimalDigits];
Lex[NonZeroDecimalDigits  NonZeroDigit] = DecimalValue[NonZeroDigit];
Lex[NonZeroDecimalDigits0  NonZeroDecimalDigits1 ASCIIDigit] = 10Lex[NonZeroDecimalDigits1] + DecimalValue[ASCIIDigit];
Lex[Fraction  DecimalDigits]: Rational = Lex[DecimalDigits]/10NDigits[DecimalDigits];

Syntax

SignedInteger  OptionalSign DecimalDigits
OptionalSign 
   «empty»
|  +
|  -

Semantics

Lex[SignedInteger  OptionalSign DecimalDigits]: IntegerLex[OptionalSign]Lex[DecimalDigits];
Lex[OptionalSign]: {–1, 1};
Lex[OptionalSign  «empty»] = 1;
Lex[OptionalSign  +] = 1;
Lex[OptionalSign  -] = –1;

Syntax

DecimalDigits 

Semantics

Lex[DecimalDigits  ASCIIDigit] = DecimalValue[ASCIIDigit];
Lex[DecimalDigits0  DecimalDigits1 ASCIIDigit] = 10Lex[DecimalDigits1] + DecimalValue[ASCIIDigit];
NDigits[DecimalDigits  ASCIIDigit] = 1;
NDigits[DecimalDigits0  DecimalDigits1 ASCIIDigit] = NDigits[DecimalDigits1] + 1;

Syntax

HexIntegerLiteral 
   0 LetterX HexDigit
LetterX  X | x
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Semantics

Lex[HexIntegerLiteral  0 LetterX HexDigit] = HexValue[HexDigit];
Lex[HexIntegerLiteral0  HexIntegerLiteral1 HexDigit] = 16Lex[HexIntegerLiteral1] + HexValue[HexDigit];

String Literals

Syntax

  {singledouble}
StringLiteral 
   ' StringCharssingle '
|  " StringCharsdouble "

Semantics

Lex[StringLiteral  ' StringCharssingle '] = StringTokenvalueLex[StringCharssingle];
Lex[StringLiteral  " StringCharsdouble "] = StringTokenvalueLex[StringCharsdouble];

Syntax

StringChars 
   «empty»
|  StringChars StringChar
|  StringChars NullEscape
StringChar 
   LiteralStringChar
|  \ StringEscape
LiteralStringCharsingle  UnicodeCharacter except ' | \ | LineTerminator
LiteralStringChardouble  UnicodeCharacter except " | \ | LineTerminator

Semantics

Lex[StringChars]: String;
Lex[StringChars  «empty»] = “”;
Lex[StringChars0  StringChars1 StringChar] = Lex[StringChars1 [Lex[StringChar]];
Lex[StringChars0  StringChars1 NullEscape] = Lex[StringChars1];
Lex[StringChar]: Char16;
Lex[StringChar  LiteralStringChar] = LiteralStringChar;
Lex[StringChar  \ StringEscape] = Lex[StringEscape];

Syntax

StringEscape 
IdentityEscape  NonTerminator except _ | UnicodeAlphanumeric

Semantics

Lex[StringEscape  ControlEscape] = Lex[ControlEscape];
Lex[StringEscape  ZeroEscape] = Lex[ZeroEscape];
Lex[StringEscape  HexEscape] = Lex[HexEscape];
Lex[StringEscape  IdentityEscape] = IdentityEscape;

Syntax

ControlEscape 
   b
|  f
|  n
|  r
|  t
|  v

Semantics

Lex[ControlEscape  b] = ‘«BS»’;
Lex[ControlEscape  f] = ‘«FF»’;
Lex[ControlEscape  n] = ‘«LF»’;
Lex[ControlEscape  r] = ‘«CR»’;
Lex[ControlEscape  t] = ‘«TAB»’;
Lex[ControlEscape  v] = ‘«VT»’;

Syntax

ZeroEscape  0 [lookahead{ASCIIDigit}]

Semantics

Lex[ZeroEscape  0 [lookahead{ASCIIDigit}]]: Char16 = ‘«NUL»’;

Syntax

HexEscape 
   x HexDigit HexDigit

Semantics

Lex[HexEscape  x HexDigit1 HexDigit2] = integerToChar16(16HexValue[HexDigit1] + HexValue[HexDigit2]);
Lex[HexEscape  u HexDigit1 HexDigit2 HexDigit3 HexDigit4] = integerToChar16(4096HexValue[HexDigit1] + 256HexValue[HexDigit2] + 16HexValue[HexDigit3] + HexValue[HexDigit4]);
Lex[HexEscape  U HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit] = ????;

Regular Expression Literals

Syntax

RegExpLiteral  RegExpBody RegExpFlags
RegExpFlags 
   «empty»
RegExpBody  / [lookahead{*}] RegExpChars /
RegExpChars 
RegExpChar 
OrdinaryRegExpChar  NonTerminator except \ | /

Semantics

Lex[RegExpLiteral  RegExpBody RegExpFlags]: TokenRegularExpressionbodyLex[RegExpBody], flagsLex[RegExpFlags];
Lex[RegExpFlags  «empty»] = “”;
Lex[RegExpFlags0  RegExpFlags1 ContinuingIdentifierCharacterOrEscape] = Lex[RegExpFlags1 [Lex[ContinuingIdentifierCharacterOrEscape]];
Lex[RegExpFlags0  RegExpFlags1 NullEscape] = Lex[RegExpFlags1];
Lex[RegExpBody  / [lookahead{*}] RegExpChars /]: String = Lex[RegExpChars];
Lex[RegExpChars  RegExpChar] = Lex[RegExpChar];
Lex[RegExpChars0  RegExpChars1 RegExpChar] = Lex[RegExpChars1 Lex[RegExpChar];
Lex[RegExpChar  OrdinaryRegExpChar] = [OrdinaryRegExpChar];
Lex[RegExpChar  \ NonTerminator] = [\’, NonTerminator];

String-to-Number Conversion

Syntax

SignedDecimalLiteral 
   OptionalSign DecimalLiteralallowLeadingZeros
|  OptionalSign I n f i n i t y
|  N a N
StringWhiteSpace 
   «empty»
WhiteSpaceOrLineTerminatorChar  WhiteSpaceCharacter | LineTerminator

Semantics

tag +zero;
tag –zero;
tag +;
tag ;
tag NaN;
ExtendedRational = Rational  {+zero–zero+NaN};
Lex[SignedDecimalLiteral  OptionalSign DecimalLiteralallowLeadingZeros] = combineWithSign(Lex[OptionalSign], Lex[DecimalLiteralallowLeadingZeros]);
Lex[SignedDecimalLiteral  OptionalSign I n f i n i t y] = Lex[OptionalSign] > 0 ? + : ;
Lex[SignedDecimalLiteral  N a N] = NaN;
proc combineWithSign(sign: {–1, 1}, qRational): ExtendedRational
if q  0 then return signq
elsif sign > 0 then return +zero
else return –zero
end if
end proc;

parseFloat Conversion

Syntax

StringDecimalLiteral  StringWhiteSpace SignedDecimalLiteral

Semantics


ECMAScript 4 Netscape Proposal
Formal Description
Regular Expression Grammar
previousupnext

Monday, June 9, 2003

This LR(1) grammar describes the regular expression syntax of the ECMAScript 4 proposal. See also the description of the grammar notation.

This document is also available as a Word RTF file.

Unicode Character Classes

UnicodeCharacter  Any Unicode character
UnicodeAlphanumeric  Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
LineTerminator  «LF» | «CR» | «u0085» | «u2028» | «u2029»

Regular Expression Definitions

Regular Expression Patterns

RegularExpressionPattern  Disjunction

Disjunctions

Disjunction 

Alternatives

Alternative 
   «empty»

Terms

Term 
   Assertion
|  Atom
Quantifier 
QuantifierPrefix 
   *
|  +
|  ?
|  { DecimalDigits }
|  { DecimalDigits , }
DecimalDigits 
DecimalDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Assertions

Assertion 
   ^
|  $
|  \ b
|  \ B

Atoms

Atom 
|  .
|  \ AtomEscape
|  ( Disjunction )
|  ( ? : Disjunction )
|  ( ? = Disjunction )
|  ( ? ! Disjunction )
PatternCharacter  UnicodeCharacter except ^ | $ | \ | . | * | + | ? | ( | ) | [ | ] | { | } | |

Escapes

NullEscape  \ _
AtomEscape 
CharacterEscape 
ControlLetter 
   A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
|  a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
IdentityEscape  UnicodeCharacter except _ | UnicodeAlphanumeric
ControlEscape 
   f
|  n
|  r
|  t
|  v

Decimal Escapes

DecimalEscape  DecimalIntegerLiteral [lookahead{DecimalDigit}]
DecimalIntegerLiteral 
   0
NonZeroDecimalDigits 
NonZeroDigit  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Hexadecimal Escapes

HexEscape 
   x HexDigit HexDigit
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Character Class Escapes

CharacterClassEscape 
   s
|  S
|  d
|  D
|  w
|  W

User-Specified Character Classes

CharacterClass 
   [ [lookahead{^}] ClassRanges ]
|  [ ^ ClassRanges ]
ClassRanges 
   «empty»
  {dashnoDash}
NonemptyClassRanges 
   ClassAtomdash
|  ClassAtom NonemptyClassRangesnoDash
|  ClassAtom - ClassAtomdash ClassRanges

Character Class Range Atoms

ClassAtom 
   ClassCharacter
|  \ ClassEscape
ClassCharacterdash  UnicodeCharacter except \ | ]
ClassCharacternoDash  ClassCharacterdash except -
ClassEscape 
|  b

ECMAScript 4 Netscape Proposal
Formal Description
Regular Expression Semantics
previousupnext

Monday, June 9, 2003

The regular expression semantics describe the actions the regular expression engine takes in order to transform a regular expression pattern into a function for matching against input strings. For convenience, the regular expression grammar is repeated here. See also the description of the semantic notation.

This document is also available as a Word RTF file.

Case-insensitive matches are not implemented in the semantics below.

Semantics

tag syntaxError;
SemanticException = {syntaxError};

Unicode Character Classes

Syntax

UnicodeCharacter  Any Unicode character
UnicodeAlphanumeric  Any Unicode alphabetic or decimal digit character (includes ASCII 0-9, A-Z, and a-z)
LineTerminator  «LF» | «CR» | «u0085» | «u2028» | «u2029»

Semantics

lineTerminatorsChar16{} = {‘«LF»’, ‘«CR»’, ‘«u2028»’, ‘«u2029»’};
reWhitespacesChar16{} = {‘«FF»’, ‘«LF»’, ‘«CR»’, ‘«TAB»’, ‘«VT»’, ‘ ’};
reDigitsChar16{} = {‘0’ ... ‘9’};
reWordCharactersChar16{} = {‘0’ ... ‘9’, ‘A’ ... ‘Z’, ‘a’ ... ‘z’, ‘_’};

Regular Expression Definitions

Semantics

tuple REInput
strString,
ignoreCaseBoolean,
multilineBoolean,
spanBoolean
end tuple;

Field str is the input string. ignoreCase, multiline, and span are the corresponding regular expression flags.

tag undefined;
Capture = String  {undefined};
tuple REMatch
endIndexInteger,
capturesCapture[]
end tuple;
tag failure;
REResult = REMatch  {failure};

A REMatch holds an intermediate state during the pattern-matching process. endIndex is the index of the next input character to be matched by the next component in a regular expression pattern. If we are at the end of the pattern, endIndex is one plus the index of the last matched input character. captures is a zero-based array of the strings captured so far by capturing parentheses.

Continuation = REMatch  REResult;

A Continuation is a function that attempts to match the remaining portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. If a match is possible, it returns a REMatch result that contains the final state; if no match is possible, it returns a failure result.

Matcher = REInput  REMatch  Continuation  REResult;

A Matcher is a function that attempts to match a middle portion of the pattern against the input string, starting at the intermediate state given by its REMatch argument. Since the remainder of the pattern heavily influences whether (and how) a middle portion will match, we must pass in a Continuation function that checks whether the rest of the pattern matched. If the continuation returns failure, the matcher function may call it repeatedly, trying various alternatives at pattern choice points.

The REInput parameter contains the input string and is merely passed down to subroutines.

A Integer Matcher is a function executed at the time the regular expression is compiled that returns a Matcher for a part of the pattern. The Integer parameter contains the number of capturing left parentheses seen so far in the pattern and is used to assign static, consecutive numbers to capturing parentheses.

proc characterSetMatcher(acceptanceSetChar16{}, invertBoolean): Matcher
proc m(tREInputxREMatchcContinuation): REResult
iInteger  x.endIndex;
sString  t.str;
if i = |sthen return failure
elsif s[i acceptanceSet xor invert then
return c(REMatchendIndexi + 1, capturesx.captures)
else return failure
end if
end proc;
return m
end proc;

characterSetMatcher returns a Matcher that matches a single input string character. If invert is false, the match succeeds if the character is a member of the acceptanceSet set of characters (possibly ignoring case). If invert is true, the match succeeds if the character is not a member of the acceptanceSet set of characters (possibly ignoring case).

proc characterMatcher(chChar16): Matcher
return characterSetMatcher({ch}, false)
end proc;

characterMatcher returns a Matcher that matches a single input string character. The match succeeds if the character is the same as ch (possibly ignoring case).

Regular Expression Patterns

Syntax

RegularExpressionPattern  Disjunction

Semantics

Execute[RegularExpressionPattern  Disjunction]: REInput  Integer  REResult
begin
m1Matcher  GenMatcher[Disjunction](0);
proc e(tREInputindexInteger): REResult
xREMatch  REMatchendIndexindex, capturesrepeat(undefinedCountParens[Disjunction]);
return m1(txsuccessContinuation)
end proc;
return e
end;
proc successContinuation(xREMatch): REResult
return x
end proc;

Disjunctions

Syntax

Disjunction 

Semantics

proc GenMatcher[Disjunction] (parenIndexInteger): Matcher
[Disjunction  Alternativedo return GenMatcher[Alternative](parenIndex);
[Disjunction0  Alternative | Disjunction1do
m1Matcher  GenMatcher[Alternative](parenIndex);
m2Matcher  GenMatcher[Disjunction1](parenIndex + CountParens[Alternative]);
proc m3(tREInputxREMatchcContinuation): REResult
yREResult  m1(txc);
case y of
REMatch do return y;
{failuredo return m2(txc)
end case
end proc;
return m3
end proc;
CountParens[Disjunction]: Integer;
CountParens[Disjunction  Alternative] = CountParens[Alternative];
CountParens[Disjunction0  Alternative | Disjunction1] = CountParens[Alternative] + CountParens[Disjunction1];

Alternatives

Syntax

Alternative 
   «empty»

Semantics

proc GenMatcher[Alternative] (parenIndexInteger): Matcher
[Alternative  «empty»] do
proc m(tREInputxREMatchcContinuation): REResult
return c(x)
end proc;
return m;
[Alternative0  Alternative1 Termdo
m1Matcher  GenMatcher[Alternative1](parenIndex);
m2Matcher  GenMatcher[Term](parenIndex + CountParens[Alternative1]);
proc m3(tREInputxREMatchcContinuation): REResult
proc d(yREMatch): REResult
return m2(tyc)
end proc;
return m1(txd)
end proc;
return m3
end proc;
CountParens[Alternative]: Integer;
CountParens[Alternative  «empty»] = 0;
CountParens[Alternative0  Alternative1 Term] = CountParens[Alternative1] + CountParens[Term];

Terms

Syntax

Term 
   Assertion
|  Atom

Semantics

proc GenMatcher[Term] (parenIndexInteger): Matcher
[Term  Assertiondo
proc m(tREInputxREMatchcContinuation): REResult
if TestAssertion[Assertion](txthen return c(xelse return failure end if
end proc;
return m;
[Term  Atomdo return GenMatcher[Atom](parenIndex);
[Term  Atom Quantifierdo
mMatcher  GenMatcher[Atom](parenIndex);
minInteger  Minimum[Quantifier];
maxLimit  Maximum[Quantifier];
greedyBoolean  Greedy[Quantifier];
if max  + then if max < min then throw syntaxError end if end if;
return repeatMatcher(mminmaxgreedyparenIndexCountParens[Atom])
end proc;
CountParens[Term]: Integer;
CountParens[Term  Assertion] = 0;
CountParens[Term  Atom] = CountParens[Atom];
CountParens[Term  Atom Quantifier] = CountParens[Atom];

Syntax

Quantifier 
QuantifierPrefix 
   *
|  +
|  ?
|  { DecimalDigits }
|  { DecimalDigits , }
DecimalDigits 
DecimalDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

tag +;
Limit = Integer  {+};
proc resetParens(xREMatchpIntegernParensInteger): REMatch
capturesCapture[]  x.captures;
iInteger  p;
while i < p + nParens do captures  captures[i \ undefined]; i  i + 1 end while;
return REMatchendIndexx.endIndexcapturescaptures
end proc;
proc repeatMatcher(bodyMatcher, minInteger, maxLimit, greedyBoolean, parenIndexInteger, nBodyParensInteger): Matcher
proc m(tREInputxREMatchcContinuation): REResult
if max = 0 then return c(xend if;
proc d(yREMatch): REResult
if min = 0 and y.endIndex = x.endIndex then return failure end if;
newMinInteger  min;
if min  0 then newMin  min – 1 end if;
newMaxLimit  max;
if max  + then newMax  max – 1 end if;
m2Matcher  repeatMatcher(bodynewMinnewMaxgreedyparenIndexnBodyParens);
return m2(tyc)
end proc;
xrREMatch  resetParens(xparenIndexnBodyParens);
if min  0 then return body(txrd)
elsif greedy then
zREResult  body(txrd);
case z of
REMatch do return z;
{failuredo return c(x)
end case
else
zREResult  c(x);
case z of
REMatch do return z;
{failuredo return body(txrd)
end case
end if
end proc;
return m
end proc;
Minimum[Quantifier]: Integer;
Minimum[Quantifier  QuantifierPrefix] = Minimum[QuantifierPrefix];
Minimum[Quantifier  QuantifierPrefix ?] = Minimum[QuantifierPrefix];
Maximum[Quantifier]: Limit;
Maximum[Quantifier  QuantifierPrefix] = Maximum[QuantifierPrefix];
Maximum[Quantifier  QuantifierPrefix ?] = Maximum[QuantifierPrefix];
Greedy[Quantifier]: Boolean;
Greedy[Quantifier  QuantifierPrefix] = true;
Greedy[Quantifier  QuantifierPrefix ?] = false;
Minimum[QuantifierPrefix  *] = 0;
Minimum[QuantifierPrefix  +] = 1;
Minimum[QuantifierPrefix  ?] = 0;
Minimum[QuantifierPrefix  { DecimalDigits }] = IntegerValue[DecimalDigits];
Minimum[QuantifierPrefix  { DecimalDigits , }] = IntegerValue[DecimalDigits];
Minimum[QuantifierPrefix  { DecimalDigits1 , DecimalDigits2 }] = IntegerValue[DecimalDigits1];
Maximum[QuantifierPrefix  *] = +;
Maximum[QuantifierPrefix  +] = +;
Maximum[QuantifierPrefix  ?] = 1;
Maximum[QuantifierPrefix  { DecimalDigits }] = IntegerValue[DecimalDigits];
Maximum[QuantifierPrefix  { DecimalDigits , }] = +;
Maximum[QuantifierPrefix  { DecimalDigits1 , DecimalDigits2 }] = IntegerValue[DecimalDigits2];
IntegerValue[DecimalDigits]: Integer;
IntegerValue[DecimalDigits  DecimalDigit] = DecimalValue[DecimalDigit];
IntegerValue[DecimalDigits0  DecimalDigits1 DecimalDigit] = 10IntegerValue[DecimalDigits1] + DecimalValue[DecimalDigit];

Assertions

Syntax

Assertion 
   ^
|  $
|  \ b
|  \ B

Semantics

proc TestAssertion[Assertion] (tREInputxREMatch): Boolean
[Assertion  ^do
return x.endIndex = 0 or (t.multiline and t.str[x.endIndex – 1]  lineTerminators);
[Assertion  $do
return x.endIndex = |t.stror (t.multiline and t.str[x.endIndex lineTerminators);
[Assertion  \ bdo return atWordBoundary(x.endIndext.str);
[Assertion  \ Bdo return not atWordBoundary(x.endIndext.str)
end proc;
proc atWordBoundary(iIntegersString): Boolean
return inWord(i – 1, sxor inWord(is)
end proc;
proc inWord(iIntegersString): Boolean
if i = –1 or i = |sthen return false else return s[i reWordCharacters end if
end proc;

Atoms

Syntax

Atom 
|  .
|  \ AtomEscape
|  ( Disjunction )
|  ( ? : Disjunction )
|  ( ? = Disjunction )
|  ( ? ! Disjunction )
PatternCharacter  UnicodeCharacter except ^ | $ | \ | . | * | + | ? | ( | ) | [ | ] | { | } | |

Semantics

proc GenMatcher[Atom] (parenIndexInteger): Matcher
[Atom  PatternCharacterdo return characterMatcher(PatternCharacter);
[Atom  .do
proc m1(tREInputxREMatchcContinuation): REResult
aChar16{}  t.span ? {} : lineTerminators;
m2Matcher  characterSetMatcher(atrue);
return m2(txc)
end proc;
return m1;
[Atom  NullEscapedo
proc m(tREInputxREMatchcContinuation): REResult
return c(x)
end proc;
return m;
[Atom  \ AtomEscapedo return GenMatcher[AtomEscape](parenIndex);
[Atom  CharacterClassdo
aChar16{}  AcceptanceSet[CharacterClass];
return characterSetMatcher(aInvert[CharacterClass]);
[Atom  ( Disjunction )do
m1Matcher  GenMatcher[Disjunction](parenIndex + 1);
proc m2(tREInputxREMatchcContinuation): REResult
proc d(yREMatch): REResult
refCapture  t.str[x.endIndex ... y.endIndex – 1];
updatedCapturesCapture[]  y.captures[parenIndex \ ref];
return c(REMatchendIndexy.endIndexcapturesupdatedCaptures)
end proc;
return m1(txd)
end proc;
return m2;
[Atom  ( ? : Disjunction )do return GenMatcher[Disjunction](parenIndex);
[Atom  ( ? = Disjunction )do
m1Matcher  GenMatcher[Disjunction](parenIndex);
proc m2(tREInputxREMatchcContinuation): REResult
yREResult  m1(txsuccessContinuation);
case y of
REMatch do return c(REMatchendIndexx.endIndexcapturesy.captures);
{failuredo return failure
end case
end proc;
return m2;
[Atom  ( ? ! Disjunction )do
m1Matcher  GenMatcher[Disjunction](parenIndex);
proc m2(tREInputxREMatchcContinuation): REResult
case m1(txsuccessContinuationof
REMatch do return failure;
{failuredo return c(x)
end case
end proc;
return m2
end proc;
CountParens[Atom]: Integer;
CountParens[Atom  PatternCharacter] = 0;
CountParens[Atom  .] = 0;
CountParens[Atom  NullEscape] = 0;
CountParens[Atom  \ AtomEscape] = 0;
CountParens[Atom  CharacterClass] = 0;
CountParens[Atom  ( Disjunction )] = CountParens[Disjunction] + 1;
CountParens[Atom  ( ? : Disjunction )] = CountParens[Disjunction];
CountParens[Atom  ( ? = Disjunction )] = CountParens[Disjunction];
CountParens[Atom  ( ? ! Disjunction )] = CountParens[Disjunction];

Escapes

Syntax

NullEscape  \ _
AtomEscape 

Semantics

proc GenMatcher[AtomEscape] (parenIndexInteger): Matcher
[AtomEscape  DecimalEscapedo
nInteger  EscapeValue[DecimalEscape];
if n = 0 then return characterMatcher(‘«NUL»’)
elsif n > parenIndex then throw syntaxError
else return backreferenceMatcher(n)
end if;
[AtomEscape  CharacterEscapedo
return characterMatcher(CharacterValue[CharacterEscape]);
[AtomEscape  CharacterClassEscapedo
return characterSetMatcher(AcceptanceSet[CharacterClassEscape], false)
end proc;
proc backreferenceMatcher(nInteger): Matcher
proc m(tREInputxREMatchcContinuation): REResult
refCapture  nthBackreference(xn);
case ref of
String do
iInteger  x.endIndex;
sString  t.str;
jInteger  i + |ref|;
if j  |sand s[i ... j – 1] = ref then
return c(REMatchendIndexjcapturesx.captures)
else return failure
end if;
{undefineddo return c(x)
end case
end proc;
return m
end proc;
proc nthBackreference(xREMatchnInteger): Capture
return x.captures[n – 1]
end proc;

Syntax

CharacterEscape 
ControlLetter 
   A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
|  a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
IdentityEscape  UnicodeCharacter except _ | UnicodeAlphanumeric
ControlEscape 
   f
|  n
|  r
|  t
|  v

Semantics

CharacterValue[CharacterEscape]: Char16;
CharacterValue[CharacterEscape  ControlEscape] = CharacterValue[ControlEscape];
CharacterValue[CharacterEscape  c ControlLetter] = integerToChar16(bitwiseAnd(char16ToInteger(ControlLetter), 31));
CharacterValue[CharacterEscape  HexEscape] = CharacterValue[HexEscape];
CharacterValue[CharacterEscape  IdentityEscape] = IdentityEscape;
CharacterValue[ControlEscape]: Char16;
CharacterValue[ControlEscape  f] = ‘«FF»’;
CharacterValue[ControlEscape  n] = ‘«LF»’;
CharacterValue[ControlEscape  r] = ‘«CR»’;
CharacterValue[ControlEscape  t] = ‘«TAB»’;
CharacterValue[ControlEscape  v] = ‘«VT»’;

Decimal Escapes

Syntax

DecimalEscape  DecimalIntegerLiteral [lookahead{DecimalDigit}]
DecimalIntegerLiteral 
   0
NonZeroDecimalDigits 
NonZeroDigit  1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Semantics

EscapeValue[DecimalEscape  DecimalIntegerLiteral [lookahead{DecimalDigit}]]: IntegerIntegerValue[DecimalIntegerLiteral];
IntegerValue[DecimalIntegerLiteral  0] = 0;
IntegerValue[DecimalIntegerLiteral  NonZeroDecimalDigits] = IntegerValue[NonZeroDecimalDigits];
IntegerValue[NonZeroDecimalDigits]: Integer;
IntegerValue[NonZeroDecimalDigits  NonZeroDigit] = DecimalValue[NonZeroDigit];
IntegerValue[NonZeroDecimalDigits0  NonZeroDecimalDigits1 DecimalDigit] = 10IntegerValue[NonZeroDecimalDigits1] + DecimalValue[DecimalDigit];

Hexadecimal Escapes

Syntax

HexEscape 
   x HexDigit HexDigit
HexDigit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | a | b | c | d | e | f

Semantics

CharacterValue[HexEscape]: Char16;
CharacterValue[HexEscape  x HexDigit1 HexDigit2] = integerToChar16(16HexValue[HexDigit1] + HexValue[HexDigit2]);
CharacterValue[HexEscape  u HexDigit1 HexDigit2 HexDigit3 HexDigit4] = integerToChar16(4096HexValue[HexDigit1] + 256HexValue[HexDigit2] + 16HexValue[HexDigit3] + HexValue[HexDigit4]);

Character Class Escapes

Syntax

CharacterClassEscape 
   s
|  S
|  d
|  D
|  w
|  W

Semantics

AcceptanceSet[CharacterClassEscape]: Char16{};
AcceptanceSet[CharacterClassEscape  s] = reWhitespaces;
AcceptanceSet[CharacterClassEscape  S] = {‘«NUL»’ ... ‘«uFFFF»’} – reWhitespaces;
AcceptanceSet[CharacterClassEscape  d] = reDigits;
AcceptanceSet[CharacterClassEscape  D] = {‘«NUL»’ ... ‘«uFFFF»’} – reDigits;
AcceptanceSet[CharacterClassEscape  w] = reWordCharacters;
AcceptanceSet[CharacterClassEscape  W] = {‘«NUL»’ ... ‘«uFFFF»’} – reWordCharacters;

User-Specified Character Classes

Syntax

CharacterClass 
   [ [lookahead{^}] ClassRanges ]
|  [ ^ ClassRanges ]
ClassRanges 
   «empty»
  {dashnoDash}
NonemptyClassRanges 
   ClassAtomdash
|  ClassAtom NonemptyClassRangesnoDash
|  ClassAtom - ClassAtomdash ClassRanges

Semantics

AcceptanceSet[CharacterClass]: Char16{};
AcceptanceSet[CharacterClass  [ [lookahead{^}] ClassRanges ]] = AcceptanceSet[ClassRanges];
AcceptanceSet[CharacterClass  [ ^ ClassRanges ]] = AcceptanceSet[ClassRanges];
Invert[CharacterClass  [ [lookahead{^}] ClassRanges ]] = false;
Invert[CharacterClass  [ ^ ClassRanges ]] = true;
AcceptanceSet[ClassRanges]: Char16{};
AcceptanceSet[ClassRanges  «empty»] = {};
AcceptanceSet[ClassRanges  NonemptyClassRangesdash] = AcceptanceSet[NonemptyClassRangesdash];
AcceptanceSet[NonemptyClassRanges]: Char16{};
AcceptanceSet[NonemptyClassRanges  ClassAtomdash] = AcceptanceSet[ClassAtomdash];
AcceptanceSet[NonemptyClassRanges0  ClassAtom NonemptyClassRangesnoDash1] = AcceptanceSet[ClassAtom AcceptanceSet[NonemptyClassRangesnoDash1];
AcceptanceSet[NonemptyClassRanges  ClassAtom1 - ClassAtomdash2 ClassRanges] = characterRange(AcceptanceSet[ClassAtom1], AcceptanceSet[ClassAtomdash2])  AcceptanceSet[ClassRanges];
AcceptanceSet[NonemptyClassRanges  NullEscape ClassRanges] = AcceptanceSet[ClassRanges];
proc characterRange(lowChar16{}, highChar16{}): Char16{}
if |low 1 or |high 1 then throw syntaxError end if;
lChar16  the one element of low;
hChar16  the one element of high;
if l  h then return {l ... helse throw syntaxError end if
end proc;

Character Class Range Atoms

Syntax

ClassAtom 
   ClassCharacter
|  \ ClassEscape
ClassCharacterdash  UnicodeCharacter except \ | ]
ClassCharacternoDash  ClassCharacterdash except -
ClassEscape 
|  b

Semantics

AcceptanceSet[ClassAtom]: Char16{};
AcceptanceSet[ClassAtom  ClassCharacter] = {ClassCharacter};
AcceptanceSet[ClassAtom  \ ClassEscape] = AcceptanceSet[ClassEscape];
AcceptanceSet[ClassEscape]: Char16{};
AcceptanceSet[ClassEscape  DecimalEscape]
begin
if EscapeValue[DecimalEscape] = 0 then return {‘«NUL»’}
else throw syntaxError
end if
end;
AcceptanceSet[ClassEscape  b] = {‘«BS»’};
AcceptanceSet[ClassEscape  CharacterEscape] = {CharacterValue[CharacterEscape]};
AcceptanceSet[ClassEscape  CharacterClassEscape] = AcceptanceSet[CharacterClassEscape];

ECMAScript 4 Netscape Proposal
Formal Description
Syntactic Grammar
previousupnext

Monday, June 30, 2003

This LALR(1) grammar describes the syntax of the ECMAScript 4 proposal. The starting nonterminal is Program. See also the description of the grammar notation.

This document is also available as a Word RTF file.

Terminals

General tokens: Identifier NegatedMinLong Number RegularExpression String VirtualSemicolon

Punctuation tokens: ! != !== % %= & && &&= &= ( ) * *= + ++ += , - -- -= . ... / /= : :: ; < << <<= <= = == === > >= >> >>= >>> >>>= ? [ ] ^ ^= ^^ ^^= { | |= || ||= } ~

Reserved words: as break case catch class const continue default delete do else extends false finally for function if import in instanceof is namespace new null package private public return super switch this throw true try typeof use var void while with

Future reserved words: abstract debugger enum export goto implements interface native protected synchronized throws transient volatile

Non-reserved words: get set

Expressions

  {allowInnoIn}

Identifiers

Identifier 
   Identifier
|  get
|  set

Qualified Identifiers

SimpleQualifiedIdentifier 
ExpressionQualifiedIdentifier  ParenExpression :: Identifier
QualifiedIdentifier 

Primary Expressions

PrimaryExpression 
   null
|  true
|  false
|  Number
|  String
|  this
|  RegularExpression