libi18n Module Description
Last Update: March 31,1998
Contact: Bob Jung <firstname.lastname@example.org>
Libi18n provides the underlying internationalization utility functions used in Mozilla to support international Web browsing and Internet Mail/News functionality. The emphasis is on underlying because there is a lot of other code that must be written in order to internationalize features.
Mozilla programmers should call the libi18n APIs wherever possible, but should also expect to write module and feature specific I18N aware code. Check out the other Mozilla modules to see how this has been done. In addition to calling libi18n, significant amount of programming has been required to internationalize the HTML layout engine, the front end (UI and text rendering) code, and mail/news.
This document only provides an overview of the libi18n module. For information on general I18N issues and the I18N of other Mozilla modules see I18N Guidelines.
The functions that libi18n provides to other Mozilla modules include:
The corresponding libi18n public API specifications are documented in the International Library Reference.
Character Code Conversion Finding Character Boundaries Handling I18N related HTTP Headers Line/Word Breaking (for text layout support) Locale Sensitive Operations (collation, date/time formatting) Mail/News Header Processing Platform Independent String Resources String Comparison Unicode String Functions
Our initial work for Netscape Navigator (NN) 1.1 focused on adding Japanese Web browsing capability. We invented the notion of a document character set and a window (or font encoding) character set and provided a stream module to convert incoming text documents from the document charset to the window charset. This streams module and various Japanese charset converters were the first libi18n functions. After the first Beta, we added the ability in libi18n to auto-detect between the 3 common Japanese charset encodings: Shift_JIS, JIS and EUC-JP.
NN1.1 was a significant advancement for Japanese Web browsing and was well received. However, all of its UI was still in English. In order to localize NN, we created a special "i-build" (NN1.1i) because NN1.1 was full of hard-coded strings and other localization unfriendly coding practices. We added libi18n APIs to make it easier to resource user visible strings. NN1.1i was then localized into Japanese, German and French -- Netscape's first localized releases! The localizability infrastructure created for NN1.1i was then merged back into the mainstream source code for NN2.x and later releases.
NN2.x extended our charset support beyond Western and Japanese. Our NN1.1 stream module and charset converter architecture were designed to be extensible (not Japanese centric) which made it straightforward to add Chinese, Korean and Central European charset encodings support in the NN2.0 libi18n.
Other NN2.x libi18n additions included:
- Enhancing the charset concept to be on a per window/context base instead of globally affecting all windows/contexts
- RFC1522 support to handle MIME headers. (Really these functions should migrate from libi18n to the libmime library.)
- XP locale support (e.g., sorting, time & date)
- HTTP Accept-Language header support
- Additional charset converters for Cyrillic, Greek and Turkish
- Enhanced line wrapping for Asian languages (kinsoku shori)
- Unicode 2.0 converters
- Korean charset auto-detection
- HTTP Accept-Charset header support
Document Charset ConversionOne of the most important functions provided by libi18n is character set conversion of the incoming text data. As each block of text data is received from the net (or cache), the libi18n stream module heuristically determines (to the best of its ability) the character set encoding of the incoming document, then it converts the data block from the "document" character encoding to the "window" character encoding (usually equivalent to the font encoding) before passing the data downstream to the HTML parser and layout engine.
Currently the HTML parser and layout engine assumes HTML special characters (e.g., '<', '>') in text data passed downstream to them are encoded as ASCII values. Therefore ISO-2022-xx and other 7-bit encodings such as UTF-7 and HZ are converted to an ASCII "superset" encoding, and UCS-2 is converted to UTF-8 by the libi18n conversion module before being sent downstream to the HTML parser.
The character set converters called by the libi18n stream module must maintain state because (1) the text data may be stateful or contain multibyte characters and (2) state is needed in some cases in which libi18n auto-selects from a few character encodings (e.g., between the 3 common Japanese encodings).
The actual character set conversion functions can be categorized in three types:
- Algorithmic conversions for Chinese, Japanese and Korean (e.g., Shift_JIS <-> EUC-JP)
- Table driven for 1-byte to 1-byte encodings (e.g., CP1250 <-> ISO8859-2)
- Table driven for Unicode conversions
See the documentation on the Mozilla network library in the mozilla.org list of technical papers for more information on the Mozilla streams architecture.
Managing Charset EncodingsIn addition, to doing the initial charset conversion of the text document data, Mozilla needs to track and manage the charset information, so that any text input, display or manipulation is performed correctly. The charset has significant effect on layout and editing including the behavior of line wrapping, selection, copy and paste. The behavior of the front ends (MacFE, WinFE and XFE) is also greatly affected by the charset information (e.g., how they measure and draw).
There are several types of Mozilla contexts (e.g., Web browsing, HTML composing, mail reading, mail composing) that need to track and use the charset information. Libi18n provides the APIs to manage the getting and setting for information in the charset object.
XP Locale FunctionsThe Cross Platform (XP) locale functions provide platform independent APIs for string collation and date/time formatting. Because these are wrappers to the existing locale functions provided by the operating system the behavior may not be totally consistent across platforms.
Other libi18n FunctionsThere's more functionality provided by libi18n, but this document is intended to provide a brief overview. For more info on how to write code using the libi18n functions see the description of the libi18n public APIs, International Library Reference.
- The most important next step for the libi18n module is to modularize. Currently the interfaces are not cleanly separated from the rest of the client. This is the highest priority because when we achieve modularity it will make further development easier and faster.
- The second most important step is to make underlying support (e.g., encoding conversions) easily extensible without modifying the library itself. Adding a new simple language or character encoding should be simply a matter of dropping in a new binary module.
- Resource Handling
- The localizable resources need to be modularized. Each component/module should maintain its own set of localized resources rather than the one pot (e.g., allxpstr.h) of resources for all modules. This should go hand-in-hand with the general Mozilla push towards modularization.
- More flexible and powerful message formatting
- Enhancements to current string and character processing
- String creation/destruction
- String functions (extract, replace, concatenate...)
- Character attributes
- Platform Independent Locale management
- Platform Independent Collation
- Enhanced code set detection
- Date/Time/Number formatting
- Text boundary detection
- Word, Sentence, Line
More utility functionality
- Extending Charsets will become easier:
- Traditional Chinese charset encoding converters (Big5 to/from CNS 11643) are missing:
We have been working on making it easier to add additional charset support, but could not complete this in time for the initial Mozilla source release on 3/31/98. Now that the source clean-up is complete, we will resume working on this.
In the process of cleaning up the Mozilla sources by 3/31/98 to then net, we had to remove this code because it was not freely distributable. This will be fixed soon by implementing an NPL version of this functionality. If anyone on the net wants to help us, please let us know!
- International Library Reference describes libi18n public APIs in more detail
- Mozilla Internationalization & Localization Guidelines provides overview and definitions of Mozilla internationalization and localization
- I18N Guidelines provides programming guidelines on how to keep Mozilla internationalized
- Mozilla Localizability Guidelines provides programming guidelines on how to ensure it stays easy to create localized versions Mozilla
- Netscape I18N/L10N Client History outlines the the I18N and L10N evolution of the Netscape clients
- International Users provides information and tips on using international features of the Netscape clients
- mozilla.org documentation: browse the technical papers for various technical documents on Mozilla source code