You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



libi18n Module Description

Discussion: netscape.public.mozilla.i18n or mozilla-i18n@mozilla.org
Last Update: March 31,1998
Contact: Bob Jung <bobj@netscape.com>

Contents


Introduction

The Mozilla family (Navigator and Communicator) is globally enabled. Globally enabled software shares common source code from which we build a single binary executable (per platform) that supports a wide variety of languages. The initial Mozilla source release supports Western, Central European, Chinese, Japanese, Korean, Greek, Turkish and Cyrillic languages. (For an overview of Mozilla Internationalization (I18N) and Localization (L10N), check out the Mozilla Internationalization & Localization Guidelines.)

Libi18n provides the underlying internationalization utility functions used in Mozilla to support international Web browsing and Internet Mail/News functionality. The emphasis is on underlying because there is a lot of other code that must be written in order to internationalize features.

Mozilla programmers should call the libi18n APIs wherever possible, but should also expect to write module and feature specific I18N aware code. Check out the other Mozilla modules to see how this has been done. In addition to calling libi18n, significant amount of programming has been required to internationalize the HTML layout engine, the front end (UI and text rendering) code, and mail/news.

This document only provides an overview of the libi18n module. For information on general I18N issues and the I18N of other Mozilla modules see I18N Guidelines.

The functions that libi18n provides to other Mozilla modules include:

  • Character Code Conversion
  • Finding Character Boundaries
  • Handling I18N related HTTP Headers
  • Line/Word Breaking (for text layout support)
  • Locale Sensitive Operations (collation, date/time formatting)
  • Mail/News Header Processing
  • Platform Independent String Resources
  • String Comparison
  • Unicode String Functions
  • The corresponding libi18n public API specifications are documented in the International Library Reference.


    History

    With a very small I18N team and tight product release schedules, our strategy over the past 3 years has been to incrementally add features -- prioritized by Netscape's international market needs.

    Our initial work for Netscape Navigator (NN) 1.1 focused on adding Japanese Web browsing capability. We invented the notion of a document character set and a window (or font encoding) character set and provided a stream module to convert incoming text documents from the document charset to the window charset. This streams module and various Japanese charset converters were the first libi18n functions. After the first Beta, we added the ability in libi18n to auto-detect between the 3 common Japanese charset encodings: Shift_JIS, JIS and EUC-JP.

    NN1.1 was a significant advancement for Japanese Web browsing and was well received. However, all of its UI was still in English. In order to localize NN, we created a special "i-build" (NN1.1i) because NN1.1 was full of hard-coded strings and other localization unfriendly coding practices. We added libi18n APIs to make it easier to resource user visible strings. NN1.1i was then localized into Japanese, German and French -- Netscape's first localized releases! The localizability infrastructure created for NN1.1i was then merged back into the mainstream source code for NN2.x and later releases.

    NN2.x extended our charset support beyond Western and Japanese. Our NN1.1 stream module and charset converter architecture were designed to be extensible (not Japanese centric) which made it straightforward to add Chinese, Korean and Central European charset encodings support in the NN2.0 libi18n.

    Other NN2.x libi18n additions included:

    • Enhancing the charset concept to be on a per window/context base instead of globally affecting all windows/contexts
    • RFC1522 support to handle MIME headers. (Really these functions should migrate from libi18n to the libmime library.)
    • XP locale support (e.g., sorting, time & date)
    • HTTP Accept-Language header support
    NN3.x libi18n added:
    • Additional charset converters for Cyrillic, Greek and Turkish
    • Enhanced line wrapping for Asian languages (kinsoku shori)
    NN4.x libi18n added:
    • Unicode 2.0 converters
    • Korean charset auto-detection
    • HTTP Accept-Charset header support
    The overall (not just libi18n) evolution of the Netscape client I18N and L10N support is highlighted by a table of the Netscape I18N/L10N Client History.


    How It Works

    Libi18n is a collection of fundamental internationalization functions. So it is difficult to write How It Works because there really are several "it"s. In this document, we mention a few of the bigger "it"s and include links to others.

    Document Charset Conversion

    One of the most important functions provided by libi18n is character set conversion of the incoming text data. As each block of text data is received from the net (or cache), the libi18n stream module heuristically determines (to the best of its ability) the character set encoding of the incoming document, then it converts the data block from the "document" character encoding to the "window" character encoding (usually equivalent to the font encoding) before passing the data downstream to the HTML parser and layout engine.

    Currently the HTML parser and layout engine assumes HTML special characters (e.g., '<', '>') in text data passed downstream to them are encoded as ASCII values. Therefore ISO-2022-xx and other 7-bit encodings such as UTF-7 and HZ are converted to an ASCII "superset" encoding, and UCS-2 is converted to UTF-8 by the libi18n conversion module before being sent downstream to the HTML parser.

    The character set converters called by the libi18n stream module must maintain state because (1) the text data may be stateful or contain multibyte characters and (2) state is needed in some cases in which libi18n auto-selects from a few character encodings (e.g., between the 3 common Japanese encodings).

    The actual character set conversion functions can be categorized in three types:

    1. Algorithmic conversions for Chinese, Japanese and Korean (e.g., Shift_JIS <-> EUC-JP)
    2. Table driven for 1-byte to 1-byte encodings (e.g., CP1250 <-> ISO8859-2)
    3. Table driven for Unicode conversions
    The document character set encodings currently supported by Communicator are listed in the Netscape More Tips and Technical Information for International Users.

    See the documentation on the Mozilla network library in the mozilla.org list of technical papers for more information on the Mozilla streams architecture.

    Managing Charset Encodings

    In addition, to doing the initial charset conversion of the text document data, Mozilla needs to track and manage the charset information, so that any text input, display or manipulation is performed correctly. The charset has significant effect on layout and editing including the behavior of line wrapping, selection, copy and paste. The behavior of the front ends (MacFE, WinFE and XFE) is also greatly affected by the charset information (e.g., how they measure and draw).

    There are several types of Mozilla contexts (e.g., Web browsing, HTML composing, mail reading, mail composing) that need to track and use the charset information. Libi18n provides the APIs to manage the getting and setting for information in the charset object.

    XP Locale Functions

    The Cross Platform (XP) locale functions provide platform independent APIs for string collation and date/time formatting. Because these are wrappers to the existing locale functions provided by the operating system the behavior may not be totally consistent across platforms.

    Other libi18n Functions

    There's more functionality provided by libi18n, but this document is intended to provide a brief overview. For more info on how to write code using the libi18n functions see the description of the libi18n public APIs, International Library Reference.


    Where It's Headed

    Modularity
    The most important next step for the libi18n module is to modularize. Currently the interfaces are not cleanly separated from the rest of the client. This is the highest priority because when we achieve modularity it will make further development easier and faster.
    Extensibility
    The second most important step is to make underlying support (e.g., encoding conversions) easily extensible without modifying the library itself. Adding a new simple language or character encoding should be simply a matter of dropping in a new binary module.
    Resource Handling
    The localizable resources need to be modularized. Each component/module should maintain its own set of localized resources rather than the one pot (e.g., allxpstr.h) of resources for all modules. This should go hand-in-hand with the general Mozilla push towards modularization.

    More utility functionality
    • More flexible and powerful message formatting
    • Enhancements to current string and character processing
      • String creation/destruction
      • String functions (extract, replace, concatenate...)
      • Character attributes
    • Platform Independent Locale management
    • Platform Independent Collation
    • Enhanced code set detection
    • Date/Time/Number formatting
    • Text boundary detection
      • Word, Sentence, Line
    Please contact us at netscape.public.mozilla.i18n and let us know if you would like to help work on enhancments to libi18n.


    Known Issues

    • Extending Charsets will become easier:

    • We have been working on making it easier to add additional charset support, but could not complete this in time for the initial Mozilla source release on 3/31/98. Now that the source clean-up is complete, we will resume working on this.
    • Traditional Chinese charset encoding converters (Big5 to/from CNS 11643) are missing:

    • In the process of cleaning up the Mozilla sources by 3/31/98 to then net, we had to remove this code because it was not freely distributable. This will be fixed soon by implementing an NPL version of this functionality. If anyone on the net wants to help us, please let us know!


    See Also


    Copyright © 1998 Netscape Communications Corporation