Discussion: netscape.public.mozilla.i18n or email@example.com
Last Update: April 22, 1998
- General I18N Guidelines
- Standards Compliance
- Coming Soon to a Page Near Here
- Ideas for the Future
- Resources of I18N Information
- See Also
There is a related document, called the Localizability Guidelines.
8-bit clean. Do not assume that the 8th bit of a byte is unused, and can therefore be employed for your own purposes. Many character encodings use the 8th bit for non-ASCII characters.
Character != byte. A character is not necessarily one byte. In Asian "multibyte" character encodings, some characters take up 2 bytes or more, while others are one byte each. Do not jump directly into the middle of a byte array. Do not increment a char * pointer by one to move to the next character. Use the libi18n functions to find character boundaries and to walk strings (see also ns/include/libi18n.h):
Locale-sensitive operations. Converting a date/time integer into a string is a locale-sensitive operation. There are various date/time formatting conventions used around the world. Use XP_StrfTime() to produce a string in the appropriate format. Similarly, textual sorting rules vary depending on the country. Use the appropriate collation function: XP_StrColl().
English protocol elements. Some protocols use strings that are in English. For example, email headers use strings like "Subject:". These should not be presented directly to the user. Instead, a localized version of the string should be retrieved from the resources. The protocol itself must still be honored, though. The string "Subject:" should still be used on-the-wire, while the translated version is presented to the user in the UI.
Special encodings of non-ASCII text. Some protocols apply a special encoding to non-ASCII text in order to protect it while it is in transit over the Net. For example, RFC 2047 specifies the standard to use for transmitting non-ASCII text in email headers. These encoded strings look like this:
These strings should not be directly presented to the user. They should first be decoded. Conversely, strings must be encoded before sending them out onto the Net.
Use libi18n. Use libi18n wherever possible.a list of some of the relevant specifications.
- Adding a New Character Set or Language
- Front Ends (FEs)
- Windows (winfe)
- Macintosh (macfe)
- Unix (xfe)
- Multilingual Widgets
- Complex Language Support
- BiDi, Thai, Indic, etc.
- Non-Latin Layout Styles
- Ruby (see Internet Draft Ruby in the Hypertext Markup Language)
- Platform Independent IME Support
- Natural Language Dictionary Lookup
- Proofing API (in addition to spell checking)
- Netscape Communicator Information for International Users provides information on using the international features of Communicator
- The ISO 8859 Alphabet Soup describes the ISO-8859-1 through ISO8859-10 character set encodings and includes images of the glyphs for each encoding
- ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf: "This online document provides information on CJK (that is, Chinese, Japanese, and Korean) character set standards and encoding systems. In short, it provides detailed information on how CJK text is handled electronically." This plain text file is the online companion to "Understanding Japanese Information Processing" by Ken Lunde.
- ENGLISH: 1993, O'Reilly & Associates, Inc., ISBN 1-56592-043-0
- JAPANESE: 1995, SOFTBANK Corporation, ISBN 4-89052-708-7
- W3C's Non-western Character sets, Languages, and Writing Systems
- The Unicode Consortium
- Discussion of international software newsgroup
- Discussion of international standards newsgroup
- Introduction to I18N and L10N
- LibI18N Module Description
- Localizability Guidelines
- Mozilla I18N History