XUL Localizability issues
Document History
- 06/22/99:
- Move obsolete history entries to the bottom.
- Clean up doc content.
Goals
This document serves the following purposes:
- Identify the Internationalization and Localization requirements in the Seamonkey project.
- Discuss the XUL Localizability issues in Seamonkey, 5.0.
- Record the proposed solutions, and their pros and cons.
- Keep track of the status of the related issues.
- Feature complete by first beta so it can be tested.
Localizability issues
Historically, we encountered some difficulties in localizing Web-based documents:
- Locale sensitive data is hard to separate from the rest of the document. Localizable resources reside in the same file with language and culture neutral data. Localizing Web-based documents often means going through the whole set of document to identify and translate localizable resources.
- Markup tags are often translated or damaged in the process of localization. Validating the localization results is difficult, if not impossible.
It is the desire of the Mozilla Internationalization group to address these issues in the XUL world.
Criteria of the solution
Before embarking on the solution-seeking journey, let?s layout a set of criteria we intend to meet:
- Standard Compliant. We shall avoid re-inventing the wheel. Both existing internationalization solutions and new proposals shall be taken into consideration. We will weight in standard compliance in the decision making process against proprietary approaches.
- Simple. The localizability solution shall be easy to implement, and will integrate seamlessly with core development. A localizability solution often needs the support from peer development groups. Should localization enabling become a burden or even hinder the development process, the proposed solution would less likely be accepted by supporting module owners.
- Leveragible. Leveraging can be defined as taking the localization results of previous releases for the use of the current release. A full-scale localization of a huge software product is not only time-consuming but also costly. It requires collaborative effort from localization engineers, translation vendors, and internationalization QA throughout the whole process. It is neither economic nor efficient to repeat the whole process from scratch for each release. Leveraging also helps improve the consistency of translation between releases.
- Portable. The final solution will be achievable on all platforms including Unix, Windows, Mac, and others. We do not want a solution for one platform and another solution for the other platform. The majority of Mozilla modules are platform-independent, so will be the localizability solution.
- Extensible. Technology evolves. A solution adopted now might not be valid in the future, however, we want it to be flexible enough for future extension.
- Separable. To simplify the localization task, localizable resources shall be placed into external files so that the translators can concentrate on localization instead of language or culture neutral resources.
- Consistent. If possible, we shall seek a scheme that will work across modules instead of within the XUL component only. We don't want to have distinguished solutions for different modules.
- Dynamic binding. Some of the resources requiring translation may be dynamic, usually because they require string composition, e.g., "Installing item 5 of 10".
- Validatable. The result of localization shall be easily validated. Translators need to be able to verify the result as they progress.
- Parseable. It should be possible to unambiguously and automatically determine which embedded items contain localizable text, and which items need not be localized.
- Invisible (Internationalization). As much as possible, the standard tools that create English UI should emit files that are ready to be localized, without requiring additional processing. We want to make the localizability solution a part of the standard XUL authoring process instead of internationalization specific action. Localizability shall not be the only reason we adopt the solution.
- Efficient. The implementation of the solution shall not cause any noticeable performance degradation or memory bloat.
XUL Localizability dependency
The XUL localizability has dependency on the following items.- XUL Coding Style Guidelines: posted to newsgroup.
XUL/XML parser dependency
The proposed XUL Localizability solution requires two features from XML parser:
- General entities substitution. The proposal requires all localizable resources to be declared as general entities in a language-specific external DTD.
Location of this DTD will be determined by the combination of system (or application) locale and the URI referenced by the SYSTEM identifier in the doctype declaration. The current solution is to take the advantage of the chrome type URLs as described in Configurable Chrome.
In XUL, the system ID is written as<!DOCTYPE xui SYSTEM "chrome://navigator/locale/toolbar.dtd">
Assume the base part of chrome URL is being converted into "resource:/navigator" and the current system locale is "fr", then the parser will fetch
"resource:/navigator/locale/fr/toolbar.dtd"
for entity declarations.
To sum up, the XUL localizability dependency on XML parser is:
- Internal and external general entity support.
- External DTD support.
Candidates of the final solution
- XUL + language-specific DTD. (adopted)
Description:
In this approach, we declare general (text) entities for all locale sensitive resources in an external DTD (Document Type Definition) subset and use XML entity reference, "&entity;", to reference them:
- Put all localizable resources in a language DTD file. Example of such resources are text strings, customizable icons, and URLs. Most of them can be described by text/parsed entities.
- Non locale sensitive resources shall not be in this DTD file.
- Use SYSTEM identifier to reference this DTD file.
- Use chrome type URLs to locate the external DTD subset.
- Put format strings, such as "Item %d of %d", in text entities and compute the value in the application code such as MailCore or BrowserCore.
- To dynamically switch languages, we need to reload the XUL and its DTD (probably from a remote host). This is because once the DOM tree is created, the entities and DTDs have already been processed.
Sample XUL: toolbar.xul
<!DOCTYPE xui SYSTEM "chrome://navigator/locale/toolbar.dtd">
<xul:toolbar>
&txtContentData;
<button cmd="nsCmd:BrowserBack" style="background-color:rgb(192,192,192);">-
<img src="chrome://navigator/locale/TB_Back.gif"/>
&txtBack;
<button cmd="nsCmd:BrowserForward" style="background-color:rgb(192,192,192);">-
<img src="chrome://navigator/locale/TB_Forward.gif"/>
&txtForward;
<button cmd="nsCmd:BrowserWizard" style="background-color:rgb(192,192,192);">-
<img src="&iconWizard;"/>
&txtWizard;
</xul:toolbar>Sample DTD: toolbar.dtd
<!ENTITY txtContentData "Random content data">
<!ENTITY txtBack "Back to %s">
<!ENTITY txtForward "Forward">
<!ENTITY iconWizard "chrome://navigator/locale/TB_Wizard.gif">
<!ENTITY txtWizard "Wizard">Pros:
- Already standard compliant; no new syntax names or tags need to be introduced.
- Text replacement can be in either content or attribute values (but not in the attribute names).
Cons:
- The language-specific DTD file is not flat file. Need a DTD parser to extract localizable resources into a flat file for localizers.
- Two file formats to deal with: the property file and the DTD file.
- Hard to group text entities by UI components. Note that XUL Coding Style Guidelines recommends XUL writers to name entities after the target widgets.
- We lose the information of text entities after parsing.
- In switching languages, we need to reload the XUL and its
DTD (probably from a remote host) and reconstruct the DOM tree.
In the example of a dialog UI, if we used entities and DTDs, we would have to tear down the whole DOM tree and the dialog that sits on top of that, and then rebuild a new DOM tree and dialog. This would be wasteful, since our layout manager is able to resize elements dynamically, so we can "edit" the DOM tree and have the dialogs redraw themselves automatically.
However, we can live with this performance drag since the users might not switch language in runtime that often.
- Need to escape "%" used in formatting string, such as "%d out of %d" for dynamic strings binding. For example, use a numeric character reference (NCR), '%' to escape '%'.
- Resource ID + String Resource Manager.
(ruled out due to technical difficulty)
Descriptions:
- In XUL file, assign a widget ID, e.g. widgetID, to each UI element and a resource tag, e.g. resTag, to each localizable attribute of the widget. Then, during widget initialization, call a C function, said gettext(widgetID, resTag, default_string), to retrieve the resource from a Java-like property file. For example, a label widget can be described as in a XUL file. Then, the function call to retrieve localized text will be gettext(345, RES_TEXT, "label string").
- The English version of property file will be automatically generated during XUL to DOM conversion. The front-end developers can easily update a UI element's attribute/resource without leaving the XUL file.
- All localizable resources are uniquely identified by the combination of widget ID and resource tag.
- In function "gettext()", if the property file does not exist or the combination of widget ID and resource tag does not resolve to a resource string, the "default_string" will be returned instead.
Sample XUL: toolbar.xul
<!DOCTYPE xui SYSTEM "toolbar.dtd">
<!-- L10N-PTY type of data: file format can be found at http://www.netscape.com/PropertyFile -->
<!NOTATION L10N-PTY SYSTEM "http://www.netscape.com/PropertyFile">
<!ENTITY JFile SYSTEM "http://www.home.org/l10n.property" NDATA L10N-PTY><xul:toolbar>
<label widgetID="8000">Random content data <label>
<button widgetID="8001"-
cmd="nsCmd:BrowserBack"
style="background-color:rgb(192,192,192);"
img="resource:/res/toolbar/TB_Back.gif">Back to %s
<button widgetID="8002"-
cmd="nsCmd:BrowserForward"
style="background-color:rgb(192,192,192);"
img="resource:/res/toolbar/TB_Forward.gif">Forward
<button widgetID="8003"-
cmd="nsCmd:BrowserWizard"
style="background-color:rgb(192,192,192);"
img="resource:/res/toolbar/TB_Wizard.gif">Wizard
</xul:toolbar>Sample property file: property.toolbar
8000: Random content data
8001.img: resource:/res/toolbar/TB_Back.gif
8001: Back to %s
8002.img: resource:/res/toolbar/TB_Forward.gif
8002: Forward
8003.img: resource:/res/toolbar/TB_Wizard.gif
8003: WizardSample resource tags definition
#define RES_TEXT 0x1234
#define RES_IMG 0x1235To get the text string for a "Back" button's label, we call
gettext(8001, RES_TEXT, "Back to %s")
Pros
- All localizable resources are uniquely identified by the combination of widgetID and the resource tags. The application/front end developers can easily update a UI element's attribute/resource.
- Core development work will not be block by gettext() implementation. However, we shall request the UI developers to put English string, localization notes, and comments in the property file.
- The fallback mechanism allows the developers to work without the presence of property files.
- The English version of property file can be automatically generated during XUL to DOM conversion.
- Provide fallback mechanism to default strings.
- The property file is flat and in clear text; easy to localize and leverage.
- The implementation of nsStringBoundle interface is about to finish. The basic facilities of parsing the property file and retrieving text are ready to check in.
- Consistent with the scheme in "String Resources"; only one file format to deal with.
- Resources are grouped by widgets. This also makes the property file more readable.
- Easy to leverage the property file. All resources are IDed and ready for comparision.
Cons
- Need to treat content data as the text resource of a label widget. (So it can be identified and edited by application code.)
- Need to implement a mechanism to automatically bind localizable resources to widgets. However, the amount of work can be reduced by performing the localized resources binding in widget initialization time since we need to bind the UI attributes in the DOM to the underlying widgets anyway.
- Need to ensure the uniqueness of the widgetID. However, the appCore developers need to have a way to uniquely identify a widget anyway.
- Localizable resources strings are duplicated twice: one in XUL and the other in property file.
- Need to extend the Java-like property file to support structured resources.
- Technical difficulty:once XUL has been converted to DOM tree, the content can't be changed anymore.
- The entity solution is more XML compliant and less work to implement.
Description: Assuming the "timely access" problem can be overcome, we could get around the "syntax constraint" problem by using an entity-like syntax of our own. That is, we invent something, say we use the "@" symbol like entities use the "&" symbol. Then these things are used throughout the content just like entities would have been used to do localization. This still assumes we have some way to get at the language-specific-substitution text after parsing (so it can't be a parser directive; it may have to be some sort of special element that XUL will recognize and not display). If all this worked, we'd be free to add localizable text anywhere without constraining the element and attribute structure.
For example
<element l10nID="100" text="english version"/>
becomes
<element text="@100;"/> ( or <element>@100;</element>, if that's more appropriate for the widget).
There just needs to be a central single routine that knows where to find the table of localized text strings. It finds "@.*;" sequences and substitutes them. We have to walk the content model after parsing and hand every string to this routine, and widgets have to pass all their text strings through it before they do anything with them.
Cons:
Abstract:
Use XLink & Xpointer to specifically reference a text in a file that is
separate from the base XUL file so that this text can be easily
localized and displayed to the end user in a manner consistent with XPFE
(Cross Platform Front End) requirements.
Pros:
Since it is all written in vanilla XML there is no need to create custom file
types. Thus this system can accept anything the parser can handle. It
maintains the name value paring essential for localization. It allows us to
add localization and developer notes to the object (e.g. button) and the
localized text separately, but maintain a direct link between the two. The
text is pulled into the UI elements when the XUL file is parsed. This also
addresses the goal of separating markup, style and content.
Cons:
This does not leave us with a flat file solution. However the file
containing the text to be localised is of such a simple format that writing
a tool to parse it is a trivial exercise. We are going need some form of
tool to convert native encoding to unicode character references.
There are 4 files to track! Actually the language specific DTD is complete
and valid as is so it could easily be declared inline in the language specific
XML file. The link-attributes has been entitised and could conceivably
be inherited from a higher level DTD.
In reloading downloadble chrome, not all related files can be
blown away by the client.
Here is an example syntax needed for a button UI element.
|
|
<button href="&locale/uilang.xml| id(1234).child(text)"> <content-info> Put comments on button functionality here </content-info> other xul markup
|
<!ENTITY % link-attributes
"xlink:form CDATA #FIXED 'simple' href CDATA #REQUIRED content-info CDATA #IMPLIED show CDATA #FIXED 'embed' actuate CDATA #FIXED 'auto'" > <!ELEMENT button (#PCDATA)>
|
UILANG.XML | UILANG.DTD |
<loctext id = "1244">
<text> Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur. </text> <note>These are Ceasars first words on Gaul. This button soulld be centered on column 1 of the dialog </note> |
<!ELEMENT loctext (text, note?)>
<!ATTLIST loctext id ID #REQUIRED> <!ELEMENT text (#PCDATA)> <!ELEMENT note (#PCDATA)> |
So, when <button> tag is parsed the "simple" xlink href (which is #REQUIRED) is automatically (actuate = 'auto') embedded (show = 'embed') with the text from the <text> child element of the element with id = 1234 in the file at URI location which is the value of &locale(some more globally set value)/UILANG.XML.
Comparision (*****: excellent, *: show stopper)
Criteria to examine | XUL + Language-specific DTD | XUL + Language-specific property file | Description |
Simple | ***** | **** Need to define resources tags in widget code and widgetID in XUL file. | Both core development and localization work shall be made easy and less error prone. |
Leveragible | *** Need a parser to list, identify, and compare resources. | ***** All resources are in property files which are flat and easier for leveraging. | Localization results shall be leveragible from release to release. |
Consistent | *** Two file formats, DTD and property file, to deal with. | ***** Only one file format, property file. | This scheme that will work across modules instead of within the XUL component only. |
Standard compliant | ***** | **** (We can extend property file format to have similar syntax to X/MOTIF's application default file.) | Achievable on all platforms including Unix, Windows, Mac, and others. |
Portable | ***** | ***** | Achievable on all platforms including Unix, Windows, Mac, and others. |
Extensible | ***** In the same direction as XML. | **** Need to treat content data as the text resource of a label widget. | The adopted solution will be flexible for customization and future extension. |
Dynamic binding | *** Resources binding mostly appens in XML parser. | ***** Resources binding occurs at the last minute. | Some of the items requiring translation may be dynamic, usually because they require string composition ("Installing item 5 of 10"). |
Validatible | *** (need a DTD parser) | ***** (right on the scene) | Localizers/translators will be able to validate the localization results. |
Parsable | *** (DTD file contains XML tags, keywords, and others) | ***** (localizable resources are easily identified) | It should be possible to unambiguously and automatically determine which embedded items contain localizable text, and what items need to be locked. |
Invisible (Internationalization) | *** (entity defined in external DTD) | **** (developers need to assign an id to each resource; but the generation of the US/EN property file could be done by the XUL parser.) | As much as possible, the standard tools that create US UI should emit files that already localizable, without requiring additional processing. |
Identifiable | **** (entity names are unique; but we lose them after parsing.) | **** (all resources are identified by the combination of widgetID and resTag; but we must treat content data as the text resource of a label widget) | All resources shall be uniquely identifiable |
Dynamic Language Switching | *** Need to reload the XUL | **** We can design it to modify localizable attributes only. | Dynamically switch to different language and reflect it to UI. (this does not happen quite often.) |
References
Document History (old)
- 03/25/99: Add XUL Localizability dependency.
- 02/26/99: Correct entity reference syntax.
- 02/19/99: Insert two XUL L12y solutions ""@.*;" + property file", proposed by Daniel Matejka, and "Using XLinks and XPointers for XUL Localisation" by Daniel McGowan.
- 02/17/99: At last, a consensus of the solution for XUL localizability has been reached. The "XUL + language-specific DTD" is adopted.
- 02/10/99: In solution #2, relax the rule that the content data must be a text attribute value; instead, treat content data as the element content. See the samples in solution #2.
- 02/09/99: Add explanation of the "default string" to solution #2.
- 02/08/99: Add "How to locate the language-specific file" section.
- 02/08/99: Take out all out-dated sections.
- 02/08/99: Re-evaluate solution #1 and #2 in the comparison table.
- 02/08/99: Revised solution #2 to reduce the number of IDs needed for resources identification. In the revised version, resources are uniquely identified by the combination of widget ID and resource tag. Java-property file format is also extended to support structuralized resources.
- 02/05/99: Revamp the "Candidates of the final solution" section and add a table of comparison.
- 02/05/99: Compile for final review; record more discussion.
- 01/29/99: Record more discussion.
- 01/28/99: Added two sections: "Ideas" and "Candidates of the final solution".
- 01/28/99: Compiled feedback from Daniel Matejka.
- 01/25/99: Feedback from Daniel Matejka in red.
- 01/25/99: Post this document to newsgroup, news://news.mozilla.org/netscape.public.mozilla.xpfe, for broader audience and discussion.
- 01/25/99: Per Scott Collins, http://www.meer.net/ScottCollins/, not all UI widgets can be described in XUL file. Q: does this mean we need another mechanism to solve the non-XUL UI components?
- 01/20/99: Add a new section, "How does the XUL concept work?"
- 01/20/99: In the "XUL Localizability issues" held today, it's proposed that
- Get more up-to-date documents on XUL spec architecture so that we can come up with a solution.
- Need get a workable sample XUL application so that we can identify the problem better.
- Choose from option #1 and #2 to embed localization information in XUL. Then combine the strength of option #3, String Resources, option #4, gettext() as the underlying mechanism to retrieve text strings. We may load the string resources from a locale suffixed property file and fall back to the default strings, as described in gettext(), when needed.
- 01/20/99: Incorporate Rob Thorne's comments and suggestions
- 01/19/99: Add reference to Erik's String Resources . We might be able to consolidate the idea presented there with the gettext() scheme.
- 01/16/99: First draft.
1 and 2 are mostly headaches for localization and build people, and I can't really speak for them. But both numbers 3 and 4 demand extra implementation work.