I18N Guidelines
Discussion: netscape.public.mozilla.i18n or mozilla-i18n@mozilla.org
Last Update: April 22, 1998
Contents
- Introduction
- General I18N Guidelines
- Standards Compliance
- Coming Soon to a Page Near Here
- Ideas for the Future
- Resources of I18N Information
- See Also
Introduction
This document provides some I18N (internationalization) guidelines for Mozilla. These guidelines should be followed by all Mozilla programmers, regardless of country of residence.There is a related document, called the Localizability Guidelines.
General I18N Guidelines
One code base for the world. The localization process is simplified if recompilation from source code is not necessary. Only the (external) resource files need to be altered. This means that there cannot be any conditional compilation for specific languages. For example, #ifdef JAPANESE is not allowed. This model is different from that used in the past in the PC world. It is possible, for example, to browse Japanese Web pages even if you are using the English version of the client. It is also possible to browse Chinese pages even if your OS is not Chinese.8-bit clean. Do not assume that the 8th bit of a byte is unused, and can therefore be employed for your own purposes. Many character encodings use the 8th bit for non-ASCII characters.
Character != byte. A character is not necessarily one byte. In Asian "multibyte" character encodings, some characters take up 2 bytes or more, while others are one byte each. Do not jump directly into the middle of a byte array. Do not increment a char * pointer by one to move to the next character. Use the libi18n functions to find character boundaries and to walk strings (see also ns/include/libi18n.h):
- INTL_NextChar
- INTL_CharLen
- INTL_NextCharIdxInText
- INTL_PrevCharIdxInText
- etc
Locale-sensitive operations. Converting a date/time integer into a string is a locale-sensitive operation. There are various date/time formatting conventions used around the world. Use XP_StrfTime() to produce a string in the appropriate format. Similarly, textual sorting rules vary depending on the country. Use the appropriate collation function: XP_StrColl().
English protocol elements. Some protocols use strings that are in English. For example, email headers use strings like "Subject:". These should not be presented directly to the user. Instead, a localized version of the string should be retrieved from the resources. The protocol itself must still be honored, though. The string "Subject:" should still be used on-the-wire, while the translated version is presented to the user in the UI.
Special encodings of non-ASCII text. Some protocols apply a special encoding to non-ASCII text in order to protect it while it is in transit over the Net. For example, RFC 2047 specifies the standard to use for transmitting non-ASCII text in email headers. These encoded strings look like this:
=?ISO-8859-1?Q?Andr=E9?=
These strings should not be directly presented to the user. They should first be decoded. Conversely, strings must be encoded before sending them out onto the Net.
Use libi18n. Use libi18n wherever possible.
Standards Compliance
Mozilla should adhere to all relevant standards. There are a number of RFCs from the IETF, Recommendations from W3C, and other specifications. Here is a list of some of the relevant specifications.Coming Soon to a Page Near Here
- Adding a New Character Set or Language
- Layout
- Front Ends (FEs)
- Windows (winfe)
- Macintosh (macfe)
- Unix (xfe)
- LibNet
Ideas for the Future
- Multilingual Widgets
- Complex Language Support
- BiDi, Thai, Indic, etc.
- Non-Latin Layout Styles
- Vertical
- Ruby (see Internet Draft Ruby in the Hypertext Markup Language)
- Platform Independent IME Support
- Natural Language Dictionary Lookup
- Proofing API (in addition to spell checking)
Resources of I18N Information
- Netscape Communicator Information for International Users provides information on using the international features of Communicator
- The ISO 8859 Alphabet Soup describes the ISO-8859-1 through ISO8859-10 character set encodings and includes images of the glyphs for each encoding
- ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf: "This online document provides information on CJK (that is, Chinese, Japanese, and Korean) character set standards and encoding systems. In short, it provides detailed information on how CJK text is handled electronically." This plain text file is the online companion to "Understanding Japanese Information Processing" by Ken Lunde.
- ENGLISH: 1993, O'Reilly & Associates, Inc., ISBN 1-56592-043-0
- JAPANESE: 1995, SOFTBANK Corporation, ISBN 4-89052-708-7
- W3C's Non-western Character sets, Languages, and Writing Systems
- The Unicode Consortium
- Discussion of international software newsgroup
- Discussion of international standards newsgroup
- Introduction to I18N and L10N
- LibI18N Module Description
- Localizability Guidelines
- Mozilla I18N History