You are currently viewing a snapshot of taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to, please file a bug.

blue sky: extension

transformation services
May 22nd
Submitted by Pat Gunn <> to Extension.
The ability to create ways to preprocess pages before they display would be a welcome addition to Mozilla's capabilities. This could be made available for a variety of purposes, from providing a basis for Machine Translation to having HTML filters. While it is true that some of these features are already possible to some degree or another by use of some of the undocumented tools in Mozilla, the adoption of the TS services described in this document would provide for an easy, flexible, and consistant means to do this.

TS is based on the idea of a very simple, open API, and the use of various modules which users may install and configure through the preferences panels. These modules would receive the webpage before it is fully parsed, and transform it as they are programmed, passing the transformed webpage either to the next module (they may be chained), or to the rendering/parsing engine. Naturally, users may want to run more than one module at a time, perhaps one that acts as a HTML filter to remove hostile tags (like BLINK and EMBED), and another as a simple lingual translation engine. Similarly, users may wish for modules to be applied to only some webpages, perhaps those in a foreign language or with a hostile PICS rating, and not other. Properly designed and integrated, such a system should not slow browsing significantly, provided that the plugins are not doing something highly computationally expensive (i.e. lingual translation or PDF->HTML conversion).

Plugins should be highly controllable using the preferences panel, with the user being able to specify if the plugin should be active for all pages, pages only in a certain language or PICS rating, or only applied on command (perhaps using aurora or a menu). This naturally is a concern for time spent on the page before it displays, as the browser may need to determine the language or PICS rating of the page before it parses it fully. Furthermore, javascript generated pages are a major concern. Ideally, the metacode for the browser receiving a webpage would be:

  • Retrieve page
  • Preparse:
    • Get PICS rating (if any)
    • Get Language declaration (if any)
    • Resolve Javascript's document.write(), staticising the pages (is this possible?)
  • Pass the page to TS Layer:
    • Pass through preferred ordering of modules, running any that are:
      • set to run for all PICS levels
      • set to run for this PICS level (if any)
      • set to run for all Languages
      • set to run for this Language
      • explicitly told by the user to run for this page
  • Pass the page to the parser
  • Pass the page to the display engine
Considering that for the majority of users, the time for a webpage to display is I/O bound, rather than CPU bound, for simple plugins the impact of TS should not be a major performance concern.

The API of TS should be as simple as possible, with the full HTML/XML source being passed to the module. The module may set up a system to preprocess and defer the handling of the webpages to another piece of software, if the platform permits, or load files needed to transform specific translation tasks. To point, on Unix and possibly other systems, a module might pass the document to a perl or shell script, and receive the resulting document back. Another plugin might be a dejargonizing filter, and load a dictionary file to do the translation. Yet another filter might prepare the document in another format (RDF?) to prepare for advanced machine translation between human languages, pass the prepared document to another process, and receive the translated document back.

Given these factors, it should be evident to the user, especially for users of HTTPS, that they are not receiving the true page. Addition of a third state to the lock logo in Mozilla, perhaps a red X, would serve to indicate that a filter has been used in rendering the current page.

Because plugins could be written that allow the user to do almost anything imaginable in any programming language to enchance their web experience, and because the interface to that functionality is so simple and lightweight, MT services in Mozilla would be a popular and useful addition to Mozilla's already wide array of capabilities.