You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.

Seamonkey Localization and Leveraging Tools

by Tao Cheng < tao@netscape.com >

Document History

03/29/99: Second pass revision. Post to the groups to collect comments and feedbacks.
03/25/99: Move "XUL Localizability dependency" to XUL Localizability issues; start to work on the full spec of L10n Tools.
03/18/99: Finishing up the first draft.

Introduction

The quality of localization of a software product relies on collective efforts from various sources including good internationalization planning and support, localizability consideration in the software design and implementation phases, and a set of state-of-the-art localization tools. A set of good localization tools not only increases the quality of the localization but also reduces the related cost. The localization and leveraging tools we created for earlier generations of Communicator clients set good example of this concept.

Seamonkey is our next generation browsing and messaging client whose internal architecture is undergoing drastic restructuring. Most of the components, including user interface, will be rewritten using new technologies such as AOM, XUL, and XPCOM style modularization. Subsequently, the tools for earlier client release become practically insufficient and potentially inefficient. To solve this problem, we need to review our existing tools and revamp them to fulfill the requirements of the new generation client products.

In this document, the author will analyze the requirements of the new tools and examine our existing tools for their capacity. Then come up with a outline of how to develop localization and leveraging tools for our next generation client product, Seamonkey.

Finally, as many other documents, this document is to be updated as we collect more information and the client modules evolve.

Design Principles

Before jumping to the design and implementation of the tools, it is a good idea to list the principles we intend to follow along the design and implementation process:

Portability. Seamonkey is designed toward the direction of platform independence. There is no reason to develop platform dependent tools for a platform independent product.
Consistency. It would be a localization nightmare if we need a different set of tools for each individual component of the client product. Needing different sets of tools for different components of the client indicates our client localizability solutions are not consistent nor developers friendly. For localizers, multiple file formats reduces their productivity, too. Therefore, we shall advocate a singular localizability solution across all client modules and components wherever feasible.
Validability. Normally, localization work is performed at a remote site. The translators need to be able to validate the results in their local environment. Sending translations back and forth between we and localizers is time consuming and inclined to misplace or lose data.
Leveragibility. Localization is costly. The ability to leverage is essential to bring down the cost. In addition, it will make the translation consistency among releases.
Flat file. Ideally, we want to developers to put localizable resources in flat file; if not feasible, we shall convert them to flat files.

Strategies

With the principles in mind, here are our design and implementation strategies:

Centralized. We shall eliminate the number of different file formats need to process and localize. In client development, we want to adopt the same localizability solution whenever applicable. For client localization , we want to reduce the variety of file format the translators need to deal with. Therefore, we need to adopt a singular file format as the center of localization. Other file formats need to be converted to this format before sending to localization vendors.
Incremental. In consideration of time constraint, we want to develop the tools incrementally: begin with basic features, then gradually migrate to the full blown version. This strategy also fits the nature of software development cycle: incremental evolution and iterative refinement.
Componentized. A good software system shall be be highly componentized. The concept of Data abstraction and information hiding shall be applied to identify our problem domain and system functionality. Properties and behaviors of each component will be analyzed and identified so that they are highly cohered. Relationships and interfaces between components will be clearly defined so that there are loosely coupled.
Pluggable. If our system is highly componentized, individual modules shall be pluggable. This will give us a great advantage of leveraging existing tool components while keeping the flexibility of adapting to new technology.

Localization work in Seamonkey

In Mozilla 5.0, L10n work can be divided into three categories:

XUL-based User Interfaces. The adopted localizability solution for XUL based UI is the language-specific DTD approach. All localizable resources needed in XUL will be declared as general entities in the external DTD and substituted by the XUL parser before the XUL is converted into DOM tree. Resource strings in this type of DTD files need to be extracted into intermediate file format, say L10n file, that the localization vendors can easily deal with.

Base component modules and non-XUL describable UI such as native widgets. The "nsIStringBundle" is our 5.0 equivalent of XP_GetString(). It provides a COMified interface to retrieve resource strings from Java-property-like files. Resource strings in such files need to extracted into the so called L10n file for the vendors as well.

Legacy code. Seamonkey contains some 4.x modules which use XP_GetString() to get resource strings from resource files. To localize the legacy code, we can either create a set of C wrappers around nsStringBundle() or use the 4.x XP_GetString() approach directly. The former is preferred.

Plan

Based on what we have established so far, the author would like to layout the plan, by priority, for localization and leveraging tools as below.

Define the intermediate file format, say L10n, as the center of the localization and leveraging work.
Write tools to convert files between the L10n format and all existing file formats such as the property file and DTD file in Seamonkey or the DOG file in legacy code. These tools shall be highly modularized and pluggable so that they can be built into either command line tools or graphical user interface based tools.
Write tools to collect resource strings from all L10n files, organize them into a sortable data structure. The purpose of this is to compare translations between modules or UI components so that we can leverage them among different components, different releases and even different products. The ability of collecting and comparing also allows us to increase the consistency and quality of translations.
Build graphic user interface based tools on top of the command line tools. Since modularization and pluggability has been the design and implementation strategies, we shall be able to reuse developed functions in the command line tools and construct user interface on top of them.

Tasks breakdown and estimated resources allocation

The plan laid out above will be executed in an incremental fashion, phase by phase, as listed below.

Prerequisite. Define a flat file format which will be used to store translation. So far, there are two file formats, the Java property file format and the 4.x DOG file format, under discussion. The final decision is pending on the inputs from our localization vendors and OEM customers.
Java property file
Pros
- Industry standard. The Java property file format is the center file format of localization for Java applications. It's more likely that localization vendors are more comfortable with this file format.
- It's the file format being used by the nsStringBundle interface in Seamonkey. Another standalone module also uses a similar file format.
Cons
- Some feedbacks suggest that translators might accidentally damage the keys in the property file.
DOG file
Pros
- It's the central file format for 4.x localization. All 4.x localization and leveraging tools work with DOG files. We can save lots of implementation efforts by adopting DOG format.
Cons
- It's a proprietary file format; we might be the only company uses this format.
- The property file used by nsStringBundle and other modules needs to be converted into DOG format. Although, this shall not be a significant effort.
Phase 1 (entry level). Command line localization tools needed for each of the first two categories:

DTD files (for XUL)

DTD files-> L10n files. A DTD parser to extract entity names and values pairs into L10n files which will be sent to vendors to localization. Entity values containing markup or URLs can be dealt separately.
L10n files-> DTD files. A flat file parser to extract result strings and replace English strings in DTD with localized ones.

Property files (for nsStringBundle)

Property file-> L10n files.
L10n files -> property files.

Legacy code. Create a set of C wrappers around nsStringBundle() or use the 4.x XP_GetString() approach. The former is preferred.

Phase 2 (intermediate level).

A localization tool (and a potential leveraging tool) to

Collect all localization results from the L10n files generated from phase 1.
Sort localization results by different attributes such as

by English text
by UI components (menu item...)
by application component (navigator window, mail/news windows)

Dump collected translations by sorted attributed (mostly ID) to a single L10n file.

Leveraging tool

Match/find a localized resource by a given "English string".
Automated batch job to leverage existing translations.

Phase 3 (advanced level)

GUI based cross platform tool (might be a browser-based) to give WYSWYG affect. The translators can verify the result as they progress.
The translators can leverage/import existing results from other components.
Leveraging 4.x results.

Dump 4.x result to L01n files for 5.0 use.

Tasks breakdown & Estimated Time Table *

Task	ID	Predecessors	Start	Finish	Status	Ownership
Seamonkey L10n/leveraging Tools spec. - first draft			03/17/99	03/19/99	collected feedbacks from bobj: 1), need more discussion on how to locate the DTD file; 2), our long term plan for the DTD parser shall base on the xpat library.	tao
Seamonkey L10n/leveraging Tools spec. - second draft			03/26/99	03/30/99		tao
L10n and Leveraging Tools			-----	-----
Define a flat file format, *.l10n, which will be used to store translation.	XL13		03/19/99	03/23/99	3 days; need to decide what type of files for localization vendors to work on; pending on inputs from l10n vendors, Dublin, and IBM.	tao
L10n Tools - phase 1 (XUL) - use xpat to extract entity names/values from DTD			-----	-----	=
L10n Tools - phase 1 (XUL) 1). Learn xpat	XL6				3 days
L10n Tools - phase 1 (XUL) 2). Write a standalone parser lives in client	XL6				5 days
L10n Tools - phase 1 (XUL) 3). Dump entity names/values into the *.l10n file	XL10	XL13			2 days
L10n Tools - phase 1 (XUL) 4). XP testing on parser	XL6	XL6, XL10			2 days
L10n Tools - phase 1 (XUL) 5). Extract L10n results from *.l10n file	XL11	XL13			2 days (might be able to use existing L10n/Leveraging tools)
L10n Tools - phase 1 (XUL) 6). Replace the associated entity names/values in DTD	XL11				3 days (need to preserve the original position and context of the DTD)
L10n Tools - phase 1 (property file) 1) Extract resource id and value pair from property file					1.5 days
L10n Tools - phase 1 (property file) 2) dump resource ID and US strings to *.l1n file					1 day
L10n Tools - phase 1 (XUL) 5). Extract L10n results from *.l10n file and replace the associated entity names/values in property file					3 days (need to preserve the original position and context of the property file)
L10n Tools - phase 2			-----	-----	=
L10n Tools - phase 2. 1). Tool to collect localization results from *.l10n file					3 days
L10n Tools - phase 2. 2). Make localization results sortable.					4 days
L10n Tools - phase 2. 3). Sort localization results by unique resource ID	XL9				1 day
L10n Tools - phase 2. 4). Sort localization results by English text.	XL11				1 day
L10n Tools - phase 2. 5). Sort localization results by UI components (menu item...).	XL7				1 day
L10n Tools - phase 2. 6). (optional) Sort localization results by application component (navigator window, mail/news windows)					1 day
Leveraging tool - 1). Match/find a localized resource by a given "English string".					1 day
Leveraging tool - 2). Automated batch job to leverage existing translations.					3 days
L10n Tools - phase 3
L10n Tools - phase 1 (legacy code)			-----	-----
Need a set of C wrappers around nsIStringBundle so that non C++ components can use property file.					10 days
Investigate if we need to leverage 4.x translation results.	XL17				3 days
Leveraging tool - collect all translations from existing DOG files and build a table for matching.	XL17				5 days
Leveraging tool - build a bridge to import translation from 4.x to 5.0.	XL17				5 days
JavaScript L12y	XL20				20 days	tao
L10n validator	XL21				15 days
Pseudo localization	XL18				5 days	mantse

* finish date is estimated.
+ M4: 04/06/99; M5: 04/27/99; M6: 05/18/99.
! XL18, XL20, XL21 might need to be reassigned or swapped with some lower priority items.