by Tague Griffith <tague@netscape.com>
Last Modified:
1.0 Summary
This document details both the locale model and the locale functions that will be implemented for the mozilla.org. A locale model is a strategy and a set of usage requirements for handling language and location preferences, while locale functions are a set of APIs provided to programmers to manipulate objects representing Locale preferences. The goal of the locale model is to enable developers to correctly handle the expectations of users as they view web pages in many different languages, which may even be different from the languages installed on their computer.
2.0 Potential User Scenarios
The following sections detail potential user scenarios and how differences in locale and language between the users machine and the content on the web should interact. This is not an exhaustive list, this doesn't illustrate all of the ways a user might want a browser to behave with regards to locale preferences, but it does try to capture the most important usage scenarios. The purpose of this list is to highlight the expectation of a user in paSince this list is not exhaustive, please contribute additional usage scenarios that illustrate user expectations that need to be supported by Mozilla..
2.1 Just Japanese
The simplest user scenario is a user using a single language, the user is looking at web pages in their native language, on a localized version of communicator with a matching localized version of the operating system. Although, this scenario covers the majority of users, we feel that it is important to point out that browsing in English in addition to a user's native language is also extremely common (common enough to be a high priority item). In this case, we have a Japanese user with a Japanese localized version of Mozilla running on a Japanese version of the operating system, only browsing Japanese content from Japan. This user wants all of his menus and page content in Japanese, following Japanese locale conventions.2.2 French Railways International
An American user running an English localized version of Mozilla is viewing a French web page that is running Javascript. This particular page is from the French Ministry of Public Transportation and provides internet sales of French rail passes. The Javascript on this page does date and currency formatting. The user is planning a trip to France and wants to purchase a rail pass. The user can speak a bit of functional French, but is by no means fluent. Naturally, the user would prefer that all of the browser chrome remain in English, but since French Railways only provides a French version of their page, the user has no choice but to view the content in French. Since the page in particular is dealing with both the French language and purchases in France, the correct Locale choice for the entire page including the Javascript is French.2.3 Chinese Stock Trading Page
A Chinese speaking user living in the Bay Area is accessing a stock brokerage homepage. The stock brokerage recognizing the potential market for non-English speaking customers, has provided several different translations of their trading page. This page uses Javascript to calculate both dates and do some currency formatting. Since the user is a native Chinese speaker, they would prefer to work in Chinese as much as possible. Their version of Mozilla is localized for Chinese.In this scenario, it the appropriate behavior for Navigator is to keep the chrome in Chinese, but all the content, including the Javascript should be Chinese with one exception - the monetary formatting. Because this page is for a US company, all of the transactions are in US dollars. The Javascript should continue to format the dollar amounts using the standard US formatting for currency.
2.4 Der Deutche Wired
This scenario involves the Web site for a German localized version of Wired. Wired being the slick, technological magazine that it is has already implemented XUI interfaces for its website. For this web site, the distinction between chrome and content doesn't really exist. In this case, since the chrome is part of the presentation of the web site. In this case, the user wants the chrome to honor the web site designers wishes. In this case, the chrome has german menu and buttons pointing to various parts of the web site.2.5 European Internet Cafe
This scenario involves the computers at a European cafe. Because of the proximity of people speaking a variety of different languages in Europe, businesses often provide access to multiple languages.
At a public "terminal" office at the UN, or the EU or a multinational corporation, or in a European Cybercafe, or at some sort of public kiosk (e.g., train station) there may be systems that are available for use in several languages (like most ATMs in this area) where the browser and web applications need to dynamically switch the UI for different languages. The user should be able to return to the home page and select the language as you do for the first screen on ATMs.
3.0 General Requirements
The Mozilla Locale model is intended to provide programmers with a strategy for handling locale preferences in a way that supports our multilingual model of web browsing. From the user-scenarios above, we can outline the following principles :-
- the language of the page should usually determine the locale of the scripts and formatting applied to elements in the page
- it is necessary to allow the page author to select the locale for scripts and formatting for many instances
- when the content provider is providing XUI content, we should honor their locale specifications since this is part of the presentation of their page
- the user must be able to select a default application locale, to control the default UI selections
- Extensible Locale Support - the locale system needs to provide for "drop-in" locale support which doesn't require modifying the base binary to add additional locale support
- Extensible Locale Categories - the locale system needs to provide for extending the data stored in a locale without requiring modifications to the base binary
- Multiple Active Locales - the locale model needs to provide for multiple locales being applied to different windows in the system as well as different layers of the "application hierarchy"
- Interoberability = the locale model needs to interoperate with both Java and ECMAScript locale proposal in addition to the underlying operating system
- Locale model needs to support dynamic language switching
-
good software design principles
- lightweight
- object oriented
- easy to use for other developers
- locale system needs to be available as a shared library
4.0 Terminology
4.1 Locale
Locale has two different meanings. In the most general case, a locale is a set of user preferences that are tied to the language and location of the particular user. The preferences captured by a locale are usually those things that change across languages and regions but aren't necessary changed by the user. Common examples are date and currency formatting. A locale can also be a specific instance of this collection of preferences. Usually a locale is named by a combination of language and location, such as en-US or en-GB representing the American and British variants of English respectively.4.2 Locale Category
A locale is a collection of different properties. A locale category is a particular property that can be extracted from a locale. Traditionally the properties extracted from a locale have been called categories, so we continue this usage.4.3 Locale Model
A locale model is a usage strategy/guide for a product. A locale model is an abstract design about what entities in a piece of software need to store and maintain locale information. In particular, it addresses where locale is inherited and where it is explicitly stated.4.4 Document Tree
A model of the content of a page or document. A document tree breaks down the elements of a document into a tree like structure. This is taken from XML where the structure than an XML document represents is required to take the form of a tree with a single root. This concept can also be applied to HTML and XUI documents.4.5 XML
XML is an abbreviation for eXtensible Markup Language. XML is a general purpose markup language used for specifying the structure of a document. XML is a W3C standard which can be found here. For our usage, when we talk about XML, we are talking about documents or data - specifically we are not talking about the use of XML to represent cross platform UI resources (see XUI)4.6 HTML
HyperText Markup Language. HTML is the markup language used in the world wide web for authoring documents. HTML is also a W3C standard which can be found here. For our purposes, HTML refers specifically to user data and content, not cross platform UI resources implemented as markup.4.7 XUI
XUI (pronounced "Zooey") is a specialization of either HTML or XML used in Mozilla to describe resources that can be rendered by the XPFE using the ngLayout layout engine.4.8 Override
By override, we mean changing the entire locale. An example of this would be changing the locale from US English (en-US) to French (fr-FR). See also customization.4.9 Customization
Customization refers to changing a category of the locale. An example of customizing would be using European (day, month, year) date formats with the US English (en-US) locale. See also Override.
5.0 Locale Model
Mozilla will be using a percolation model of locale support connected to a document tree model. Each element in a document tree will inherit it's initial local settings from it's parent; however, these locale settings can be overridden by elements like <xml:lang> or <html:lang> found in the document source code. At the top level, the application will inherit it's locale from the system. The application will be inherited element for the top of a document tree in absence of any particular document source code overriding that inheritance. There is an assumption in the locale model that the the presence of XUI, the document content tree is a subset of the XUI UI tree.
5.1 Percolation Model
Percolation as the name implies requires that the locale percolate down from the topmost (or least specific element) in a document to the most specific element in a document. In this percolation model, each element will initially inherit the default locale settings from it's parent. In the absence of an override such as an xml:lang or HTML:LANG tag, the element will provide this locale to any children which it creates. HTML 4.0 provides for the lang attribute to be attached to any element in an HTML document with a few exceptions (see Index of Attributes.) The locale library will provide services which map a particular LANG attribute/tag to a Locale object.In this model, we treat each window, at least conceptually, as a single document. What is traditionally the chrome is at the root of the document tree, and what is traditionally been the content is treated as a frame an "über-document." As XUI and downloadable chrome become more commonplace, there is no reason to make a distinction between the two. Since the chrome is rooted at the top of the document tree, the XUI selected is determined from application preferences [when the XUI is resident on the disk] or accept language [when XUI is being fetched from another source]. After going through the preference/accept-language process, the locale is known by which preference/accept-language was able to be successfully loaded. Additionally, there is a requirement/convention that XUI documents need to be labeled with both language and character set information.
5.2 Overriding
In the locale model that we are proposing, overriding involves overriding the entire locale, not customizing it When the HTML parser encounters a lang tag applied to an element, the new locale for that element becomes whatever is specified in the lang attribute. For instance if the parser encounters the HTML <Q lang="de-DE">Hallo</Q> hat er gesagt, then the element contained within the <Q> picks up German (as spoken in Germany) as a locale.5.3 Customization
For Mozilla, customization is not going to be allowed on general web documents. Customization will only be allowed on the default locale (default application UI locale). xml:lang and HTML:LANG will always map to standard locales.5.4 Application Locale
The application locale will be determined from the default application locale as provided by the underlying operating system. The default UI language for this locale is determined from a user's accept language setting if multiple localizations are installed or from the available localization if only a single localization is installed. The application is the only special case.5.4 Document Locale
Well-written HTML documents should explicitly tag themselves with the appropriate language. In the absence of a language specification in the page, the page will execute in the locale of the parent.5.5 XUI Locale
XUI documents/UI specification resources are expected to contain a language specifier. The locale of the XUI document is selected by that specifier. When multiple XUI documents are available from a server, the browser should honor the accept-language preferences of the user and attempt to download the matching XUI interface in the order of the accept-language preferences. Since part of the XUI interfaces may be implemented in Javascript, XUI Javascript may need to be treated as a special kind of Javascript, written to get locale information from the locale subsystems.
6.0 Locale Interfaces
The following sections outline the details of the nsLocale object and the interfaces it provides.
6.1 Locale Object
A Locale is a collection of various preferences that are related to a customs and conventions of a particular language and region. Locale preferences are usually related to categories like date and time formatting, sort order, decimal separators, etc. In the API we are proposing for Mozilla, there is a Locale object which is the token developers use to get at this set of preferences.6.2 Locale Object Interface
The following class is taken from nsLocale.h. We considered using XPCOM for implementing the nsLocale interface, but decided to go with a standard C++ class. The XPCOM interface felt a bit awkward for use as means of specifying a Locale. Secondly, the Locale system is better represented as part of the base system, not a replaceable component module. Moving away from XPCOM to C++ classes will also make the interface lighter and simpler to use. Even though the interface is not XPCOM, it will be implemented as a shared object.class nsLocale {/* constructory methods */
nsLocale(void);
nsLocale(const nsString& localeString);
nsLocale(const nsString& countryCode, const nsString& languageCode, const nsString& variant);
~nsLocale();void GetLanguage(nsString& language);
void GetCountry(nsString& country);
void GetVariant(nsString& variant);void GetProperty(const nsString& key, nsString& value);
PRUint32 GetWindowsLCID(void);
PRUint32 GetMacintoshScriptCode(void);
void GetPosixLocaleID(char* aPosixLocaleID);static nsLocale* GetSystemDefault(void);
static nsLocale* GetApplicationDefault(void);
static nsLocale* SetApplicationDefault(void);static nsLocale* GetLocaleFromAcceptLangauge(const nsString& acceptLanguage);
static nsLocale* GetLocaleFromXMLLangTag(const nsString& langTag);
static nsLocale* GetLocaleFromHTMLLangAttribute(const nsString& langAtrribue);};
Locales are named using a string of the form <language>-<country>- <variant>. This form is described according to RFC1766. <Country> and <Language> are derived from the two-letter ISO standard country and language codes. <VARIANT> is a two-character (from the US-ASCII codeset) freeform field. For example, "en-US" would be US English, whereas "en-GB" would correspond to British English.
6.3 Locale Property Keys
ILocale_UserDisplayLanguage - returns a user-displayable string representing the language of the Locale ILocale_PlainOpenQuote - returns the character (as a string) for the Plain Open Quote (see <Q>) ILocale_PlainCloseQuote - returns the character (as a string) for the Plain Closed Quote (see </Q>) ILocale_SmartOpenQuote - similar to ILocale_PlainOpenQuote only with smart quotes ILocale_SmartCloseQuote - similar to ILocale_PlainCloseQuote only with smart quotes ILocale_AcceptLanguage - provides the appropriate Accept-Language string for this particular locale ILocale_XMLLangValue - return the appropriate xml::lang value for this locale ILocale_HTMLLangValue - returns the appropriate HTML 4.0 lang attribute for this locale
7.0 Reference Specs
[RFC1766] Tags for the Identification of Languges
[HTML4.0] Specification of HTML 4.0
8.0 Relevant Design Docs
[POEL1] String Resources
9.0 Historical Interest
[POEL2] Xena Locale Model