You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



string resources
by Erik van der Poel <erik@netscape.com>
Last Modified: May 6, 2000

Introduction

Many modules use human-readable strings that are normally translated into other languages as part of the localization process. However, the practice of "hardcoding" these strings inside your program code makes localization quite difficult, if not impossible. Therefore, you should store your localizable text resources in separate files. This document describes the APIs used to obtain such strings from resource files. It also describes the format of the resource files.

This document does not discuss UI resource files. (See XUI Window Language.)

API

First, you create an instance of the string bundle factory. See mozilla/intl/strres/tests for some sample code.

Next, you get a bundle of strings using the following method (please go and read the complete, commented interface nsIStringBundle):

interface nsIStringBundleService : nsISupports
{
  nsIStringBundle CreateBundle([const] in string aURLSpec, 
    in nsILocale aLocale);

  nsIStringBundle CreateExtensibleBundle([const] in string aRegistryKey, 
    in nsILocale aLocale);
};
The URL argument can be local or remote (e.g. resource: or http:).

See the locale spec for details about nsILocale. We will probably obtain the Accept-Language property from the locale object, and use that to look up string resource files. The precise fallback mechanism (to deal with the list of languages) still needs to be designed.

This is how you get individual strings from the bundle (please go and read the complete, commented interface nsIStringBundle):

interface nsIStringBundle : nsISupports
{
  wstring GetStringFromID(in long aID);
  wstring GetStringFromName([const] in wstring aName);
};

The Extensible String Bundles

Mozilla aims to be a modular, extensible product. Code extensiblity is achieved using XPCOM and Component Categories implemented through Registry. Anyone can add a component at runtime and see it integrated and used in the product. So the next logical question is "What about data extensibility?!". How can one add data to the product at runtime? The answer is "The Extensible String Bundles" - ESB's.

An ESB is basically a bunch of String Bundles being seen as a single one. When you create the ESB using CreateExtensibleBundle(), you specify a registry key (instead of a URL). Then the bundle service will go to the registry and enumerate all the keys in there. Each key should be the URL of a bundle. Then the service will return you an implementation of the nsIStringBundle interface that under the hood merges all the bundles for you.

So this way, one can simply add at runtime a new bundle and a regitry entry for it and thus add new values for the ESB by simply extending it, instead of patching. For some sample code, please take a look at the RegisterConverterManagerData() method in nsCharsetConverterManager.

String Resource File Format

We will use the Java property file format. Here is an example using string IDs:
    # arbitrary comment
    ## @loc a comment for the localizer (translator)
    ## @doc a comment by the documentation group
    cannotFindFile = Netscape is unable to find the file or directory named %s.
The ## stuff is similar to Java's /** (JavaDoc). All lines beginning with # are removed by a tool to produce the final, compact deliverable.

And here is the same example using integer IDs:

    # arbitrary comment
    ## @name NAV_CANNOT_FIND_FILE
    ## @loc a comment for the localizer (translator)
    ## @doc a comment by the documentation group
    1234 = Netscape is unable to find the file or directory named %s.
The @name attribute can be used to generate #define's for C/C++ programmers to use in their source code, for readability.

Note that the order of subject, verb and object in a sentence depends on the language, and that it is better to use numbered arguments if using printf-style formatting.

The resource file must be in US-ASCII (all bytes less than 127). Non-ASCII characters are represented as \uXXXX, where XXXX is a 4-digit hexadecimal number in Unicode (UTF-16). Non-ASCII characters are only permitted on the right hand side of the equals (=) sign.

File Naming Convention

Java uses file names like awtLocalization_zh_TW.properties. The extension (.properties) is unchanged, since some may depend on this. So the language and country (or region) are inserted before the extension, after an underscore (_). Language and country are also separated by an underscore (_). Language is lower case, and country is upper case. For US English property files, please use names like foo_en_US.properties.

Let's use the same convention for HTTP. I.e. don't use the Accept-Language feature; just insert the language and country in the file name, as we do with files.

Leveraging Old Translations

We are considering writing a tool to migrate some of the strings from the old versions of the product. This tool would also generate some info that a leveraging tool could use to reuse the translations of the old versions. For example:
    ## @oldid 1234
    cannotFindFile = Netscape is unable to find the file or directory named %s.
This would certainly work for all the strings in the old allxpstr.h. It may even be possible to do this for the WINFE-specific XP_GetString strings and the WINFE dialog strings. This needs to be investigated.

This would also allow some modules to continue to use the old XP_GetString in the short term, migrating to the new API in the long term.

String vs Integer IDs

The benefits of integers IDs:
  • speed: array indexing is faster than hash tables
  • size: integers are smaller than readable strings
The benefits of string IDs:
  • groupability: e.g. connect.refused, connect.timeout
  • insertability: can insert new string without changing IDs

Can We Get the Best of Both Worlds?

It should be possible to write a tool that works with a file format that gives us groupability and insertability, while retaining speed and compactness in the code. For example:
    connect.refused = Connection was refused
    connect.refused.id = 1234
    connect.timeout = Connection timed out
    connect.timeout.id = 5678
The human inserts new strings in the desired location, and a tool later finds the next available integer ID, and inserts it (*.id). Then, another tool generates the integer-ID based file for product delivery:
    ...
    1234=Connection was refused
    ...
    5678=Connection timed out
The tool could also generate a header file for C/C++ programmers:
    ...
    #define NET_CONNECT_REFUSED 1234L
    ...
    #define NET_CONNECT_TIMEOUT 5678L
Tools can be written in a ubiquitous language like Perl for local execution, or maybe Web-based tools could be provided for remote execution (e.g. CGI).

People that don't want to use or wait for such tools can do it manually.

There are also some logistical problems with such tools. What if several programmers are working on one file at the same time? Their tools might generate the same integer ID for different strings. I am sure we could come up with some system to manage such a process and avoid collisions, but do we want to go there?

Tools also add complexity. We may want to stick to simple text editors.

Questions

Do we really want to pass a URL as an argument? How does this mesh with DCOM?

Do we really want to return the bundle as an interface? If we ever decide to use DCOM, that means that we need to go across the Net for individual strings.

What is the philosophy behind resource: URLs? Are they always equivalent to file: URLs? Or are they sometimes remote resources?

Do we need the concept of a "path"? Like Java's CLASSPATH, X Windows' file search path, etc?

What are the XUI folks planning to do? Will they use resource: URLs, etc?

Use XPCOM for the GetBundle API, but non-XPCOM for GetString API since XPCOM is heavyweight?
 

Minimum set of DLLs required to use StringBundle (strres.dll)

The following data are based on Oct-20-99 Windows (or Oct-22-99 Linux) build. DLLs owned by I18n group are in red.

When resource url is used (by sequence)

bin/
-rw-r--r--   1 544      everyone   276160 Oct 20 10:54 xpcom.dll
-rw-r--r--   1 544      everyone    29776 Oct 20 10:54 plc3.dll
-rw-r--r--   1 544      everyone   146752 Oct 20 10:54 nspr3.dll
-rw-r--r--   1 544      everyone    37984 Oct 20 10:54 mozreg.dll
-rw-r--r--   1 544      everyone    29968 Oct 20 10:54 plds3.dll

On Linux

libmozjs.so
libxpcom.so
libplds3.so
libplc3.so
libnspr3.so
bin/components:
 
-rw-r--r--   1 544      everyone    21248 Oct 20 10:54 strres.dll
-rw-r--r--   1 544      everyone    46800 Oct 20 10:54 nslocale.dll

-rw-r--r--   1 544      everyone    71504 Oct 20 10:54 necko.dll
-rw-r--r--   1 544      everyone    21184 Oct 20 10:54 nkresrc.dll
-rw-r--r--  1 544      everyone    25840 Oct 20 10:54 nkfile.dll

-rw-r--r--   1 544      everyone    35648 Oct 20 10:54 uconv.dll
-rw-r--r--   1 544      everyone    85232 Oct 20 10:54 ucvlatin.dll

On Linux

libstrres.so
libnslocale.so

libnecko.so
libnecko_resource.so
libnecko_file.so

libuconv.so
libucvlatin.so

When chrome url is used (all the DLLs above + below: by sequence)

bin/
gkgfxwin.dll -3
img3250.dll -4
jsdom.dll -5
js3250.dll -6
oji.dll -13
jsj3250.dll -14
components/
chrome.dll  -1
rdf.dll -2
gkhtml.dll -7
gkparser.dll -8
jsurl.dll -9
mimetype.dll -10
ucharuti.dll -11
caps.dll -12

Document History

  • 10/22/99: Tao Cheng: Added "Document History" section and "Minimum set of DLLs required to use StringBundle (strres.dll)" section.
  • 06/10/99: Erik van der Poel: Updated the file naming convention. I.e. added country example, and specified underscore delimiters, lower and upper case convention, and also HTTP rule.
  • 5/5/2000: Catalin Rotaru: Added doc on the Extensible String Bundles.