Seamonkey Localization and Leveraging Tools
Document History
- 03/29/99: Second pass revision. Post to the groups to collect comments and feedbacks.
- 03/25/99: Move "XUL Localizability dependency" to XUL Localizability issues; start to work on the full spec of L10n Tools.
- 03/18/99: Finishing up the first draft.
Introduction
The quality of localization of a software product relies on collective efforts from various sources including good internationalization planning and support, localizability consideration in the software design and implementation phases, and a set of state-of-the-art localization tools. A set of good localization tools not only increases the quality of the localization but also reduces the related cost. The localization and leveraging tools we created for earlier generations of Communicator clients set good example of this concept.Seamonkey is our next generation browsing and messaging client whose internal architecture is undergoing drastic restructuring. Most of the components, including user interface, will be rewritten using new technologies such as AOM, XUL, and XPCOM style modularization. Subsequently, the tools for earlier client release become practically insufficient and potentially inefficient. To solve this problem, we need to review our existing tools and revamp them to fulfill the requirements of the new generation client products.
In this document, the author will analyze the requirements of the new tools and examine our existing tools for their capacity. Then come up with a outline of how to develop localization and leveraging tools for our next generation client product, Seamonkey.
Finally, as many other documents, this document is to be updated as we collect more information and the client modules evolve.
Design Principles
Before jumping to the design and implementation of the tools, it is a good idea to list the principles we intend to follow along the design and implementation process:- Portability. Seamonkey is designed toward the direction of platform independence. There is no reason to develop platform dependent tools for a platform independent product.
- Consistency. It would be a localization nightmare if we need a different set of tools for each individual component of the client product. Needing different sets of tools for different components of the client indicates our client localizability solutions are not consistent nor developers friendly. For localizers, multiple file formats reduces their productivity, too. Therefore, we shall advocate a singular localizability solution across all client modules and components wherever feasible.
- Validability. Normally, localization work is performed at a remote site. The translators need to be able to validate the results in their local environment. Sending translations back and forth between we and localizers is time consuming and inclined to misplace or lose data.
- Leveragibility. Localization is costly. The ability to leverage is essential to bring down the cost. In addition, it will make the translation consistency among releases.
- Flat file. Ideally, we want to developers to put localizable resources in flat file; if not feasible, we shall convert them to flat files.
Strategies
With the principles in mind, here are our design and implementation strategies:- Centralized. We shall eliminate the number of different file formats need to process and localize. In client development, we want to adopt the same localizability solution whenever applicable. For client localization , we want to reduce the variety of file format the translators need to deal with. Therefore, we need to adopt a singular file format as the center of localization. Other file formats need to be converted to this format before sending to localization vendors.
- Incremental. In consideration of time constraint, we want to develop the tools incrementally: begin with basic features, then gradually migrate to the full blown version. This strategy also fits the nature of software development cycle: incremental evolution and iterative refinement.
- Componentized. A good software system shall be be highly componentized. The concept of Data abstraction and information hiding shall be applied to identify our problem domain and system functionality. Properties and behaviors of each component will be analyzed and identified so that they are highly cohered. Relationships and interfaces between components will be clearly defined so that there are loosely coupled.
- Pluggable. If our system is highly componentized, individual modules shall be pluggable. This will give us a great advantage of leveraging existing tool components while keeping the flexibility of adapting to new technology.
Localization work in Seamonkey
In Mozilla 5.0, L10n work can be divided into three categories:XUL-based User Interfaces. The adopted localizability solution for XUL based UI is the language-specific DTD approach. All localizable resources needed in XUL will be declared as general entities in the external DTD and substituted by the XUL parser before the XUL is converted into DOM tree. Resource strings in this type of DTD files need to be extracted into intermediate file format, say L10n file, that the localization vendors can easily deal with. Base component modules and non-XUL describable UI such as native widgets. The "nsIStringBundle" is our 5.0 equivalent of XP_GetString(). It provides a COMified interface to retrieve resource strings from Java-property-like files. Resource strings in such files need to extracted into the so called L10n file for the vendors as well. Legacy code. Seamonkey contains some 4.x modules which use XP_GetString() to get resource strings from resource files. To localize the legacy code, we can either create a set of C wrappers around nsStringBundle() or use the 4.x XP_GetString() approach directly. The former is preferred.
Plan
Based on what we have established so far, the author would like to layout the plan, by priority, for localization and leveraging tools as below.- Define the intermediate file format, say L10n, as the center of the localization and leveraging work.
- Write tools to convert files between the L10n format and all existing file formats such as the property file and DTD file in Seamonkey or the DOG file in legacy code. These tools shall be highly modularized and pluggable so that they can be built into either command line tools or graphical user interface based tools.
- Write tools to collect resource strings from all L10n files, organize them into a sortable data structure. The purpose of this is to compare translations between modules or UI components so that we can leverage them among different components, different releases and even different products. The ability of collecting and comparing also allows us to increase the consistency and quality of translations.
- Build graphic user interface based tools on top of the command line tools. Since modularization and pluggability has been the design and implementation strategies, we shall be able to reuse developed functions in the command line tools and construct user interface on top of them.
Tasks breakdown and estimated resources allocation
The plan laid out above will be executed in an incremental fashion, phase by phase, as listed below.-
Prerequisite. Define a flat file format which will be used to store
translation. So far, there are two file formats, the Java property file
format and the 4.x DOG file format, under discussion. The final decision
is pending on the inputs from our localization vendors and OEM customers.
Java property file
Pros
- Industry standard. The Java property file format is the center file format of localization for Java applications. It's more likely that localization vendors are more comfortable with this file format.
- It's the file format being used by the nsStringBundle interface in Seamonkey. Another standalone module also uses a similar file format.
Cons
- Some feedbacks suggest that translators might accidentally damage the keys in the property file.
DOG file
Pros
- It's the central file format for 4.x localization. All 4.x localization and leveraging tools work with DOG files. We can save lots of implementation efforts by adopting DOG format.
Cons
- It's a proprietary file format; we might be the only company uses this format.
- The property file used by nsStringBundle and other modules needs to be converted into DOG format. Although, this shall not be a significant effort.
- Phase 1 (entry level). Command line localization tools needed for each of the first two categories:
- DTD files (for XUL)
- DTD files-> L10n files. A DTD parser to extract entity names and values pairs into L10n files which will be sent to vendors to localization. Entity values containing markup or URLs can be dealt separately.
- L10n files-> DTD files. A flat file parser to extract result strings and replace English strings in DTD with localized ones.
- Property files (for nsStringBundle)
- Property file-> L10n files.
- L10n files -> property files.
- Legacy code. Create a set of C wrappers around nsStringBundle() or use the 4.x XP_GetString() approach. The former is preferred.
- Phase 2 (intermediate level).
- A localization tool (and a potential leveraging tool) to
- Collect all localization results from the L10n files generated from phase 1.
- Sort localization results by different attributes such as
- by English text
- by UI components (menu item...)
- by application component (navigator window, mail/news windows)
- Dump collected translations by sorted attributed (mostly ID) to a single L10n file.
- Leveraging tool
- Match/find a localized resource by a given "English string".
- Automated batch job to leverage existing translations.
- Phase 3 (advanced level)
- GUI based cross platform tool (might be a browser-based) to give WYSWYG affect. The translators can verify the result as they progress.
- The translators can leverage/import existing results from other components.
- Leveraging 4.x results.
- Dump 4.x result to L01n files for 5.0 use.
Tasks breakdown & Estimated Time Table *
Task | ID | Predecessors | Start | Finish | Status | Ownership |
Seamonkey L10n/leveraging Tools spec. - first draft | 03/17/99 | 03/19/99 | collected feedbacks from bobj: 1), need more discussion on how to locate the DTD file; 2), our long term plan for the DTD parser shall base on the xpat library. | tao | ||
Seamonkey L10n/leveraging Tools spec. - second draft | 03/26/99 | 03/30/99 | tao | |||
L10n and Leveraging Tools | ----- | ----- | ||||
Define a flat file format, *.l10n, which will be used to store translation. | XL13 | 03/19/99 | 03/23/99 | 3 days; need to decide what type of files for localization vendors to work on; pending on inputs from l10n vendors, Dublin, and IBM. | tao | |
L10n Tools - phase 1 (XUL) - use xpat to extract entity names/values from DTD | ----- | ----- | = | |||
L10n Tools - phase 1 (XUL) 1). Learn xpat | XL6 | 3 days | ||||
L10n Tools - phase 1 (XUL) 2). Write a standalone parser lives in client | XL6 | 5 days | ||||
L10n Tools - phase 1 (XUL) 3). Dump entity names/values into the *.l10n file | XL10 | XL13 | 2 days | |||
L10n Tools - phase 1 (XUL) 4). XP testing on parser | XL6 | XL6, XL10 | 2 days | |||
L10n Tools - phase 1 (XUL) 5). Extract L10n results from *.l10n file | XL11 | XL13 | 2 days (might be able to use existing L10n/Leveraging tools) | |||
L10n Tools - phase 1 (XUL) 6). Replace the associated entity names/values in DTD | XL11 | 3 days (need to preserve the original position and context of the DTD) | ||||
L10n Tools - phase 1 (property file) 1) Extract resource id and value pair from property file | 1.5 days | |||||
L10n Tools - phase 1 (property file) 2) dump resource ID and US strings to *.l1n file | 1 day | |||||
L10n Tools - phase 1 (XUL) 5). Extract L10n results from *.l10n file and replace the associated entity names/values in property file | 3 days (need to preserve the original position and context of the property file) | |||||
L10n Tools - phase 2 | ----- | ----- | = | |||
L10n Tools - phase 2. 1). Tool to collect localization results from *.l10n file | 3 days | |||||
L10n Tools - phase 2. 2). Make localization results sortable. | 4 days | |||||
L10n Tools - phase 2. 3). Sort localization results by unique resource ID | XL9 | 1 day | ||||
L10n Tools - phase 2. 4). Sort localization results by English text. | XL11 | 1 day | ||||
L10n Tools - phase 2. 5). Sort localization results by UI components (menu item...). | XL7 | 1 day | ||||
L10n Tools - phase 2. 6). (optional) Sort localization results by application component (navigator window, mail/news windows) | 1 day | |||||
Leveraging tool - 1). Match/find a localized resource by a given "English string". | 1 day | |||||
Leveraging tool - 2). Automated batch job to leverage existing translations. | 3 days | |||||
L10n Tools - phase 3 | ||||||
L10n Tools - phase 1 (legacy code) | ----- | ----- | ||||
Need a set of C wrappers around nsIStringBundle so that non C++ components can use property file. | 10 days | |||||
Investigate if we need to leverage 4.x translation results. | XL17 | 3 days | ||||
Leveraging tool - collect all translations from existing DOG files and build a table for matching. | XL17 | 5 days | ||||
Leveraging tool - build a bridge to import translation from 4.x to 5.0. | XL17 | 5 days | ||||
JavaScript L12y | XL20 | 20 days | tao | |||
L10n validator | XL21 | 15 days | ||||
Pseudo localization | XL18 | 5 days | mantse |
+ M4: 04/06/99; M5: 04/27/99; M6: 05/18/99.
! XL18, XL20, XL21 might need to be reassigned or swapped with some lower priority items.