Simple Packages in JavaScript
Norris Boyd
Why "simple" packages?
There are a large number of proposals on the table for the next version of the ECMA standard. Among these are ideas for packages, classes, and static types. Using these features should allow JavaScript programmers to write larger programs in a way that is robust and maintainable.
However, the question arises: Is it possible to define a simple packages
feature that will provide a large amount of utility to JavaScript programmers
in the short term? I believe so. This document describes a small set of
additional features for JavaScript that provide for many of the modularity
benefits that we expect from the eventual complete packages design.
Defining a package
A package is defined using a PackageDeclaration at the start, followed by normal JavaScript source (with a few restrictions detailed below). The PackageDeclaration has the following syntax:
The syntax of function definitions would be broadened to include a public attribute that will be used to indicate which functions may be accessed by importers of the package.
So a simple, complete package definition might look like
package pkg; public function increment(x) { return x+1; }
Packages can only contain function and variable declarations (and no
variable initialization). This heavy restriction allows us to dodge the
potential for problems of timing the execution of global code.
Using a package
To use a package, just begin the script with an import declaration. The syntax of the import declaration is
DottedList ::= Identifier
| DottedList . Identifier
When an import declaration is encountered, the engine first checks to see if it has already loaded the package. If so, it can just import the names from that package. Otherwise, it locates the source of the package, creates a new object to use as the top-level object for the package, and executes the source of the package. Since the package is restricted from having top-level code, executing the source will merely initialize the package's top-level object with all the top-level functions and variables. It also creates a list of all the public functions in the package.
If DottedList ends with the package name, then all public names are imported into the top-level object of the importing script. Otherwise, DottedList must end with a package name followed by the name of a public function in the package, and just that name is imported. In either case, if any name being imported is already used for a function or variable, an error is reported. When the name is imported, a read-only, permanent property of the importer's top-level object is created and initialized with a closure of the associated function object in the package's top-level object. Issue: is the closure really needed?
When an imported function is called, the closure in turn calls the function
object in the package's top-level object. The scope chain thus consists
of the activation object of the function followed by the top-level object
of the package. This means that any global objects other than public functions
will be visible to functions inside the package but cannot be seen by functions
outside the package unless returned from function calls.
Standard objects
ECMA defines several standard objects and functions (Function, isNaN, Date.prototype, etc.) that are properties of the top-level object or are reachable from the top-level. The requirement of separate namespaces for packages in turn leads to the requirement for several "global" objects in the ECMA sense, which are now called "top-level" objects since they are now no longer global. (This situation has existed forever in the browser embedding, where each window object is a top-level object.)
However, now that we have language constructs that result in several top-level objects, we now need to rationalize the behavior of these standard objects so that they can be shared across multiple top-level objects. The only other option is to have multiple copies of the global objects for each top-level scope. But then we run into the series of problems that Mike McCabe was unable to work around in his implementation of errors as exceptions in order to make instanceofand equality work as expected. Also, if scripts begin to make heavy use of the package mechanism, then we'll have a large number of duplicate copies of the built-in objects consuming the memory resources of the system. In fact, by defining packages as a global repository of code and state we fix the problem that plagues people trying to write JavaScript applications with multiple pages: where to put the shared functionality and state across multiple windows.
I believe, therefore, that we need to define the standard objects as
being sealed with all properties ReadOnly. By "sealed" I mean that all
properties are DontDelete and it is not possible to add additional properties.
Would this impose a burden on existing scripts? We could come up with some
way to impose this restriction selectively through versioning or by detecting
the presence of import or package declarations. However, I believe this
approach could be problematic (what if some scripts on the page are "old-style"
and others are "new-style": we can't substitute out the global objects).
We could also make the change more globally. I've already implemented this
restriction on a private copy of 4.5 and haven't been able to find any
sites that depend on this behavior.
Example session
The following sessions were run against a version of our Java version of our JavaScript engine (which we can't publish yet, unfortunately) in which I've made changes for simple packages. The implementation is not complete; in particular, error handling is often not implemented.
I've implemented a single set of standard objects that are shared across all packages. These standard objects are sealed and have readonly properties.
E:\src\ns\js\rhino> cat pkg.js package pkg; var count = 0; public function incr() { return g(); } function g() { return count++; } public function getDate() { return new Date(); }
E:\src\ns\js\rhino> rhino js import pkg.incr from "file:///E:/src/ns/js/rhino/pkg.js"; js incr() 0 js incr() 1 js typeof getDate undefined js typeof g undefined js quit()
E:\src\ns\js\rhino> rhino js import pkg from "file:///E:/src/ns/js/rhino/pkg.js"; js incr() 0 js incr() 1 js typeof getDate function js typeof g undefined js var d = getDate() js d Tue Jan 26 15:05:26 GMT-0800 (PST) 1999 js d instanceof Date true js quit()
E:\src\ns\js\rhino> jsc pkg.js E:\src\ns\js\rhino> rhino js import pkg from "classpath:pkg"; js incr() 0 js incr() 1 js typeof g undefined js typeof getDate function js getDate() instanceof Date true js quit()
Now we compile the package into a Java class and import the class.
Here's an example of a non-URI form of the from clause. There's
nothing special about the generated Java class except that it implements
the Script interface, so it would also be possible to write a Java class
and make it appear as a JavaScript package. I would expect that XPConnect
objects could be loaded in a similar way.
E:\src\ns\js\rhino> rhino js String.x = 87 js: Cannot add a property to a sealed object. js String.prototype.x = 879 js: Cannot add a property to a sealed object. js String.prototype.substring.x = 8374 js: Cannot add a property to a sealed object. js String.prototype = 7 7 js String.prototype js String.prototype.substring = 88 88 js String.prototype.substring function substring() { [native code] } js quit()
Possible extensions
Several possible extensions come to mind to augment the simple scheme outlined above.
Versioning
It would be easy to add support for versioning by allowing multiple version names to be defined and then importing only the names corresponding to the appropriate version. Additional lookups could be performed as they are today, and then when classes are available, the special versioned member lookup could be added. Dynamic properties could continue to be looked up without regard to version. Otherwise, you'd have to specify a version corresponding to the dynamic property when you created it, which would mean that you'd have to declare the dynamic property in some way, which would mean that it is no longer dynamic.
Top-level renaming
I initially implemented top-level renaming so that name conflicts could be resolved at import time. However, Waldemar Howat pointed out that top-level renaming makes top-level scripts less like classes, so I've disabled the feature. Currently there's no way to resolve name conflicts. We could also add some object that works like LiveConnect's Packages object so that it is possible to walk down into the exported properties of a package. However, Waldemar has pointed out that he's considering some possibly different syntax that is an analog to the :: operator in C++ to perform this sort of operation.
Top-level scripts
Currently top-level scripts of packages are executed when the import
declaration is processed. However, I've made no attempt to deal with possible
cycles in a package loading graph and, as stated above, think the simplest
thing would be to disallow top-level scripts. One possible extension would
be to define top-level scripts as executing when the first function is
called, but this doesn't appear to have a good migration path to classes.
Acknowlegements
Brendan Eich's original design of import/export in JavaScript 1.2 had
the idea of names imported from multiple global objects.
Waldemar Horwat's ideas on language futures are the inspiration for
versioning and the syntax, and has given good feedback on what not
to do.