transformation services |
May 22nd |
Submitted by Pat Gunn <pgunn01@ibm.net> to Extension. |
TS is based on the idea of a very simple, open API, and the use of various modules which users may install and configure through the preferences panels. These modules would receive the webpage before it is fully parsed, and transform it as they are programmed, passing the transformed webpage either to the next module (they may be chained), or to the rendering/parsing engine. Naturally, users may want to run more than one module at a time, perhaps one that acts as a HTML filter to remove hostile tags (like BLINK and EMBED), and another as a simple lingual translation engine. Similarly, users may wish for modules to be applied to only some webpages, perhaps those in a foreign language or with a hostile PICS rating, and not other. Properly designed and integrated, such a system should not slow browsing significantly, provided that the plugins are not doing something highly computationally expensive (i.e. lingual translation or PDF->HTML conversion).
Plugins should be highly controllable using the preferences panel, with the user being able to specify if the plugin should be active for all pages, pages only in a certain language or PICS rating, or only applied on command (perhaps using aurora or a menu). This naturally is a concern for time spent on the page before it displays, as the browser may need to determine the language or PICS rating of the page before it parses it fully. Furthermore, javascript generated pages are a major concern. Ideally, the metacode for the browser receiving a webpage would be:
- Retrieve page
- Preparse:
- Get PICS rating (if any)
- Get Language declaration (if any)
- Resolve Javascript's document.write(), staticising the pages (is this possible?)
- Pass the page to TS Layer:
- Pass through preferred ordering of modules,
running any that are:
- set to run for all PICS levels
- set to run for this PICS level (if any)
- set to run for all Languages
- set to run for this Language
- explicitly told by the user to run for this page
- Pass through preferred ordering of modules,
running any that are:
- Pass the page to the parser
- Pass the page to the display engine
The API of TS should be as simple as possible, with the full HTML/XML source being passed to the module. The module may set up a system to preprocess and defer the handling of the webpages to another piece of software, if the platform permits, or load files needed to transform specific translation tasks. To point, on Unix and possibly other systems, a module might pass the document to a perl or shell script, and receive the resulting document back. Another plugin might be a dejargonizing filter, and load a dictionary file to do the translation. Yet another filter might prepare the document in another format (RDF?) to prepare for advanced machine translation between human languages, pass the prepared document to another process, and receive the translated document back.
Given these factors, it should be evident to the user, especially for users of HTTPS, that they are not receiving the true page. Addition of a third state to the lock logo in Mozilla, perhaps a red X, would serve to indicate that a filter has been used in rendering the current page.
Because plugins could be written that allow the user to do almost anything imaginable in any programming language to enchance their web experience, and because the interface to that functionality is so simple and lightweight, MT services in Mozilla would be a popular and useful addition to Mozilla's already wide array of capabilities.