You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.

How To Add Additional Charset Support

Incompleted Draft: April 20,1998 Contact: Frank Tang <ftang@netscape.com> Discussion: netscape.public.mozilla.i18n or mozilla-i18n@mozilla.org

Tasks Outline:

In Cross-platform code:

Add CharSetID with the property of the charset.
Associate CharSetID with MIME charset name and Java Encoding Name.
Add Unicode conversion tables.
Add additional codeset conversion support.
Add codeset conversion rules.
Add ASCII fallback for Name Entity or Numeric Character Reference Not Supported in this Charset.
Add character length and column width information for multibyte charset.
Add (pseudo) toLower table for case-insensitive folding.
Add line breaking prohibit information for multibyte charset.
Add word breaking information for multibyte charset.

Details

In Cross-Platform Code

1. Add CharSetID with the property of the charset.

A. Decide the property of the charset you need to add:

Type- one of the following four:

SINGLEBYTE- for single byte charset, such as KOI8-R, ISO-8859-x, TIS620
MULTIBYTE- for muiltibyte charset, such as Shift_JIS, EUC-KR
WIDECHAR- for process code, such as UCS2, UCS4
STATEFUL- for stateful encoding, like ISO-2022-JP, ISO-2022-KR

Basic line wrapping rule (only for MULTIBYTE): CS_SPACE- wrap line in spacing character only. Currently only Korean.

B. Define your CharSetID in file `include/csid.h`

2. Associate CharSetID with MIME charset name and Java Encoding Name.

You need to do this for every CharSetID you add.

Procedure:

A. Before you add any name, please look at two document first:

B. Decide the following:

MIME charset name- use the one labeled as "(Preferred MIME name)" in the IANA document. If there are no one in the IANA document, you should:

Considering register one to IANA
Put "x-" in the beginning to indicate it is a private one.

MIME charset aliases - all the name and alias for that CharSetID listed in the IANA document.
Java encoding name - the name list in the Java document

C. Open file lib/libi18n/csnametb.c , add entries to the csname2id_tbl. Only the first entry for that CharSetID will be used to map from a CharSetID to a MIME Charset or Java encoding name. The rest of the entries will be used only to map a MIME Charset alias to a CharSetID. For example, here are the entries for Japanese Shift_JIS charset:

  {"Shift_JIS", "SJIS", CS_SJIS},
  ...
  /* aliases for Shift_JIS: */
  {"x-sjis", "", CS_SJIS},
  {"ms_Kanji", "", CS_SJIS},
  {"csShiftJIS", "", CS_SJIS},
  {"Windows-31J", "", CS_SJIS},

in the above example, where

CS_SJIS is the CharSetID
"Shift_JIS" is the MIME charset name so it is listed in the first entry,
"SJIS" is the Java encoding name,
"x-sjis", "ms_Kanji", "csShiftJIS", and "Windows-31J" are the MIME charset aliases. Although "x-sjis" is not listed in the IANA document but we listed it for backward compatability reason.

3. Add Unicode conversion tables.

A. Create the conversion table: Before create a conversion table for your charset, take a look at the directory lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl. You may find the one for your charset is already there, even they are not build into the mozilla binary. If you cannot find one there, follow the instruction below. (Thanks for Hovik Melikyan <hovik@undp.am> to write the following section for me)

A.1. Create a 8 to 16-bit conversion table for the new encoding. The file should contain two columns of hexadecimal numbers, where the left column represents 8-bit codes and the right column - corresponding 16-bit code. If there are some undefined code point or code point have no mapping to Unicode, do not list it in the file. Example:

   0x20    0x0020
   ...
   0xa0    0x00a0
   0xa2    0x00a7
   0xa3    0x0589
   0xa4    0x0029
   ...

This file will be used for generating both "FROM" and "TO" conversion tables.

A.2. Compile utilities in lib/libi18n/unicode/tbltool. Example for Visual C++:

    cl -I../../ fromu.c utblutil.c
    cl -I../../ tou.c utblutil.c

These programs accept conversion tables as described in A.1. from standard input and generate resources to standard output in a form suitable for including in Windows resource files and also UNIX C sources. Example:

  fromu < xscii.txt > xscii.uf
  tou < xcsii.txt > xscii.ut

A.3 Copy the generated *.uf and *.ut files to lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl respectively.

B. Include the Unicode conversion table into the binary by do the following:

Window
Macintosh
UNIX

C. You might also need to make additions to shift tables in lib/libi18n/ugendata.c

4. Add additional codeset conversion support.

[To Be Written]
How to write a new conversion routine.

5. Add codeset conversion rules.

[To Be Written, IDEA and OUTLINE only for now]
How to change lib/libi18n/fe_ccc.c

Outline:

For multibyte charset, if the from and to CharSetID are the same, apply mz_mbNullConv with a flag
Fro singlebyte charset, if the from and to CharSetID are the ame, cast 0.
For other singlebyte charset, apply One2OneCCC routine.Remember to link to the FE section about singlebyte table.
For algorithm base conversion, write a new conversion routine if it is algorithm base. Refer to previous section.
If it is desired to use Unicode as intermediate code for the conversion, use mz_AnyToAnyThroughUCS2 .
Need to have conversion rule for both direction so HTML Form posting will work.

6. Add ASCII fallback for Name Entity or Numeric Character Reference Not Supported in this Charset

If you want to add ASCII fallback for the Name Entity or Numeric Character References for the newly added charset, you should do this step:

Procedure:

Open file lib/libparse/pa_amp.c, locate the function pa_map_escape and goto the end of this function. There should be #ifdef PLATFORM section which have a big switch statement there, add your code there to provide different ASCII fallback for character you cannot display in the font CharSetID.

7. Add character length/column width information for multibyte charset.

If the font or document CharSetID you add is multibyte charset. You should provide character/column length information as below.

Procedure:

A. Open lib/libi18n/csstrlen.c , locate the csinfo_tbl in the beginning of the file. Look at the current entry, add addition entry if no entry match the characteristic of your charset. Here is the entry for Japanese EUC_JP charset.

    {{{2,2,{0xa1,0xfe}}, {2,1,{0x8e,0x8e}}, {3,2,{0x8f,0x8f}}}}, /* For EUC_JP */

This entry specify if the first byte of a character is

in the range of 0xa1 to 0xfe, the character length is 2 bytes and the column width is 2 columns (For 2 byte JIS x0208 characters),
0x8e, the character length is 2 bytes and the column width is 1 columns (For JIS x0201 katakana, there are a SS2 [0x8e] in front of the one byte katakana),
0x8f, the character length is 3 bytes and the column width is 2 columns (For JIS x0212 characters, threre are a SS3 [0x8f] in front of the two byte kanji)
the rest of the code point is one byte character

B. In the same file, register your entry by change the value in the csinfoindex table, where the value is the index of the entry you just add, (for EUC_JP, it is 1 in csinfo_tbl) and the entry you should change is the sequential number of the CharSetID, (for EUC_JP, it is 5 defined in include/csid.h). Value -1 indicate it is single byte charset which character length and column width are always 1.

To Be Improved

Break the limitation of only 256 charset could be defined.
Remove the assumption of CharSetID.

8. Add (pseudo) toLower table for case-insensitive folding.

You need to add tables to one of the following files to perform correct case-insensitive search. It is only need to be done for the font CharSetID. A document CharSetID which is not a font CharSetID in that platform do not need do this.

Procedure:

A. If the charset you add is single byte charset:
A.1 open file lib/libi18n/sblower.c ,
A.2 Add (pseudo) to-lower-case table for the code point 0x80-0xFF. To reduce size in the platform which do not use the charset as font CharSetID, put #ifdef PLATFORM around the new table.
A.3 Change function INTL_GetSingleByteToLowerMap to return the new (pseudo) to-lower-case table.

B. If the charset you add is multibyte charset:
B.1 open lib/libi18n/dblower.c,
B.2 add (pseudo) double-bytes to-lower-mapping table. To reduce size in the platform which do not use the charset as font CharSetID, put #ifdef PLATFORM around the new table. The table entry is defined as

    typedef struct {
         unsigned char src_b1;
         unsigned char src_b2_start;
         unsigned char src_b2_end;
         unsigned char dest_b1;
         unsigned char dest_b2_start;
    } DoubleByteToLowerMap;

B.3 Change function INTL_GetDoubleByteToLowerMap to return the new (pseudo) to-lower-case table.

9. Add line breaking prohibit information for multibyte charset.

Line wrapping behavior is affected by the following property of CharSetID:

Type of CharSetID- SINGLEBYTE or MULTIBYTE
CS_SPACE bit in CharSetID
Prohibit type of the character returned by funcion INTL_KinsokuClass.

It is difficult to describe how to change it. Take a look at the layout code to change the behavior. For multibyte charsest, INTL_KinsokuClass function play a big role. It return one of four prohibit type:

PROHIBIT_WORD_BREAK: Do not wrap line in between two character which have this prohibit type. For example, the Greek characters in UTF-8 have this type so yo won't wrap in between two Greek character.
PROHIBIT_NOWHERE: It is ok to break before or after this character. For example, all the Kanji character have this type.
PROHIBIT_BEGIN_OF_LINE: This character should not appeared in the beginning of the line. Do not warp line before it. For example, ')' has this type.
PROHIBIT_END_OF_LINE: This character should not appeared in the end of the line. Do not warp line after it. For example, '(' has this type.

Procedure:

To change INTL_KinsokuClass, change file lib/libi18n/kinsokuf.c and lib/libi18n/kinsokud.c

References:

Ken Lunde, Understanding Japanese Information Processing, page 148-151
Japanese Standards Association, JIS X 4051-1995, Line composition rules for Japanese documents.
Nadine Kano, Developing International Software For Windows 95 and Windows NT, page 238-245

To Be Improved:

This API should be changed to optimize the performance, at state machine implmentation should do better job.
It will be great if we could make the prohibit list changable by the document in HTML tag.

10.Add word breaking information for multibyte charset.

This is for word selection (by double click) feature.

Procedure:

Open file and change the function INTL_CharClass. It is hard to explain how to change it. Look at by yourself.

To be improved:

We should rewrite this function to include more class.
An extensable mechanism is needed.
There are no way to do dictionary-base word break.

In Window Code

1.Add CodePage to CharSetID mapping

You need to do the following if you add additional font CharSetID. This have to be done so a Window code page number could be map to the new CharSetID. You don't need to do this if you only add addition CharSetID for document CharSetID and use the existing CharSetID as font CharSetID.

Procedure:

Open file cmd/winfe/intlwin.cpp, and locate for function CIntlWin::CodePageToCsid , add code there if necessary. That function map a Window code page to a CharSetID you defined in ns/include/csid.h

To Be Improved:

The current implementation should be improved in near future by moving such mapping into resoruce so no C code need to be changed.

2.Add Single Byte Conversion Table

If the CharSetID you add is a single byte charset and you want to use One2One conversion procedure to convert between font CharSetID and document CharSetID on Window, you need to do this step.

Procedure:

Open file cmd/winfe/res/convtbls.rc and add lines:

    MIMECharsetFrom_TO_MIMECharsetTo RCDATA
    BEGIN
    /*8x*/  0x8180, 0x8382, 0x8584, 0x8786, 0x8988, 0x8B8A, 0x8D8C, 0x8F8E,
    /*9x*/  0x9190, 0x9392, 0x9594, 0x9796, 0x9998, 0x9B9A, 0x9D9C, 0x9F9E,
    /*Ax*/  ..............................................................
    /*Bx*/  ..............................................................
    /*Cx*/  ..............................................................
    /*Dx*/  ..............................................................
    /*Ex*/  ..............................................................
    /*Fx*/  ..............................................................
    END /* End of MIMECharsetFrom_TO_MIMECharsetTo */

where

MIMECharsetFrom_TO_MIMECharsetTo is the RESOURCE ID for the table. You must code this RESOURCE ID as MIMECharsetFrom_TO_MIMECharsetTo , where

MIMECharsetFrom is the 1st entry of the CONVERT_FROM CharSetID you can find in csname2id_tbl of file /lib/libi18n/csnametb.c. Notice you have to match the case, do not change from upper case to lower case or vis versa.
MIMECharsetTo is the the 1st entry of the CONVERT_TO CharSetID you can find in csname2id_tbl of file /lib/libi18n/csnametb.c. Notice you have to match the case, do not change from upper case to lower case or vis versa.

The red area are the mapping table for code point 0x80 to 0xFF. Please notice:

You have to put two byte togeter with a "0x" in front and "," after that.
You have to put the odd byte before the even byte (0x8180 instead of 0x8081)

3.Add Unicode Conversion Tables

You should add Unicode conversion table for the new CharSetID you add. The conversion tables for many charset is already generated into directory lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl . In most of the case you simply need to include them into the resource file. In case if the charset you add do not have a conversion table in that two directory and you need to generate one. Look at the tools in lib/libi18n/unicode/tbltool directory. Those tool expect the input file format as those table could be found on unicode ftp site (ftp://ftp.unicode.org/Public/MAPPINGS/).

Procedure:

A. Open file lib/libi18n/unicode/unitable.rc and add lines

    XXX.UF RCDATA
    BEGIN
    #include "ufrmtbl\\xxx.uf"
    END
    XXX.UT RCDATA
    BEGIN
    #include "utotbl\\xxx.ut"
    END

where

XXX.UF and XXX.UT are the RESOURCE ID we defined here. It should be put into lib/libi18n/unicode/unitbl.c file in the next step.
xxx.uf and xxx.ut is the unicode conversion table file which could be found in lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl directory.

B. Open file lib/libi18n/unicode/unitbl.c and put line

    {CS_XXX, {"XXX.UF", 0, NULL}, {"XXX.UT", 0, NULL}},

to the table utablenametbl before the last entry

    {CS_DEFAULT, {"", 0, NULL}, {"", 0, NULL}}

where

CS_XXX is the CharSetID we try to support Unicode conversion and
XXX.UF and XXX.UT are the RESOURCE ID we defined in file lib/libi18n/unicode/unitable.rc. Please notice that since a limitation of 16-bit window, you must put all upper case for RESOURCE ID in this C file, even you may define it as lower case in the file lib/libi18n/unicode/unitable.rc.

4.Add menu items to "View:Encoding" menu

If you want to add additional menu items to the "View:Encoding" menu, you need to do this step.

Procedure:

A. Open file cmd/winfe/genfram2.cpp, locate for funciton nIDToCsid() and add entry to the end of the talbe nid_to_csid, put down the CharSetID you want it appeared on the "View:Encoding" menu. The position of CharSetID do not affect the position of the "View:Encoding" menu. It simply decide what is the nID for that CharSetID (nID = index + ID_OPTIONS_ENCODING_1). You need to use that id in file cmd/winfe/res/mozilla.rc2 andcmd/winfe/res/editor.rc2. We currently reserved ID_OPTIONS_ENCODING_1 to ID_OPTIONS_ENCODING_70 for this purpose

B.1 Open file cmd/winfe/res/mozilla.rc2 , locate POPUP "&Encoding" , you will find 8 of them (with two comment out):

One in IDR_SRVR_INPLACE MENU: [No idea when this will show up. It looks like the same as the one in IDM_MAINFRAMEVIEWMENU MENU. Need to double check with other engineer]
One in IDR_MAINFRAME MENU: [Commented out by C++ comment]
One in IDM_MAINFRAMEVIEWMENU MENU: This is for the "Encoding" Menu you see in the Navigator window.
One in IDR_SRVR_EMBEDDED MENU: [No idea when this will show up. Need to double check with other engineer]
One in IDR_MESSAGEFRAME MENU: This is for the "Encoding" Menu you see in the window which you bring up by double clicking a message header in the "Messanger Mailbox" window.
One in IDR_MAILTHREAD MENU: [Commented out by C commnet ]
One in IDM_MAILTHREADVIEWMENU MENU: This is for the "Encoding" Menu you see in the "Messanger Mailbox" window.
One in IDR_COMPOSEPLAIN MENU: This is for the "Encoding" Menu you see in the mail composition window when the Preference "Normally send HTML mail" option is off.

B.2 Open file cmd/winfe/res/editor.rc2 , locate POPUP "&Encoding" , you will find 2 of them:

One in IDR_COMPOSEFRAME MENU: This is for the "Encoding" Menu you see in the mail composition window when the Preference "Normally send HTML mail" option is on.
One in IDR_EDITFRAME MENU: This is for the "Encoding" Menu you see in the "Page Composer" window.

B.3 For each of the "Encoding" Menu above, add your menu item by adding a line

    MENUITEM "LanguageGroup (Charset)", ID_OPTIONS_ENCODING_XX

where

"LanguageGroup (Charset)" is the human readable string on the "Encoding" Menu, and
ID_OPTIONS_ENCODING_XX is the nID you map in the file cmd/winfe/genfram2.cpp ( ID_OPTIONS_ENCODING_1 to ID_OPTIONS_ENCODING_70 is already reserved for you).

To Be Improved:

Currently all the menu are defined in the front end. We may move this into cross-platform code and make the menu item easily changable in the future.

5.Add menu item for "Font" preference "For the Encoding" menu

If you add additional font CharSetID, you need to do the following to make a new menu item appear in the font preference so people can associate fonts with that font CharSetID.

Procedure:

A. Open file cmd/winfe/mozilla.rc , by using DeveloperStudio, not text editor, and add additional Stirng for the menu items in "For the Encoding" menu. You should name the RESOURCE ID IDS_LANGUAGE_XX so we can easily find them later. Close the file, the Developer Studio should change cmd/winfe/mozilla.rc as well as cmd/winfe/resource.h .

B. Open file cmd/winfe/intlwin.cpp and

B.1 Locate lang_table and add new lines

    IDS_LANGUAGE_XX, CS_XXX, CS_XXX, CS_YYY, 0,

the 1st item IDS_LANGUAGE_XX is the RESOURCE ID you add to cmd/winfe/mozilla.rc (which get add to cmd/winfe/resource.h by Developer Studio automatically)
the 2nd and 3rd item CS_XXX is the CharSetID which will be used as the Font CharSetID and one of the document CharSetID this Font CharSetID support.
the 4th item is another document CharSetID you want it to convert to this Font CharSetID. If there are no other document CharSetID convert to this Font CharSetID, don't put it in here.

B.2 Locate table fontchar_tbl and add new lines before line

    CS_XXX, "PropoFont", 12, "FixedFonte", 10, CHARSET, CHARSET,

where

the 1st item CS_XXX is the Font CharSetID as list in the previouse step.
the 2nd item "PropoFont" is the default font name for Propotional font
the 3rd item12 is the default font size for Propotional font
the 4th item "FixedFonte" is the default font name for Fixed font
the 5th item10 is the default font size for Fixed font
the 6th and 7th item CHARSET are the Window CHARSET value for the font, where

Value	For	Comment
ANSI_CHARSET	Western European languages
SHIFTJIS_CHARSET	Japanese
CHINESEBIG5_CHARSET	Traditional Chinese
HANGEUL_CHARSET	Korean
DEFAULT_CHARSET	UCS2
134	Simplified Chiense	Same as GB2312_CHARSET. However, GB2312_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using GB2312_CHARSET instead of 134 will break 16-bits window build.
177	Hebrew	Same as HEBREW_CHARSET. However, HEBREW_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using HEBREW_CHARSET instead of 177 will break 16-bits window build.
178	Arabic	Same as ARABIC_CHARSET. However, ARABIC_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using ARABIC_CHARSET instead of 178 will break 16-bits window build.
161	Greek	Same as GREEK_CHARSET. However, GREEK_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using GREEK_CHARSET instead of 161 will break 16-bits window build.
162	Turkish	Same as TURKISH_CHARSET. However, TURKISH_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using TURKISH_CHARSET instead of 162 will break 16-bits window build.
163	Vietnamese	Same as VIETNAMESE_CHARSET. However, VIETNAMESE_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using VIETNAMESE_CHARSET instead of 163 will break 16-bits window build.
222	Thai	Same as THAI_CHARSET. However, THAI_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using THAI_CHARSET instead of 222 will break 16-bits window build.
238	East European languages	Same as EASTEUROPE_CHARSET. However, EASTEUROPE_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using EASTEUROPE_CHARSET instead of 238 will break 16-bits window build.
186	Baltic languages	Same as BALTIC_CHARSET. However, BALTIC_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using BALTIC_CHARSET instead of 186 will break 16-bits window build.

C. Open file cmd/winfe/intlwin.h . Change the definition of MAXLANGNUM to match the number of entries you changed in lang_table. (See step B.1)

[Note: Need to talk about modules/libpref/src/win/winpref.js]

In Macintosh Code

1.Add Script to CharSetID mapping

You need to do the following if you add additional font CharSetID. This have to be done so a Macintosh Script code could be map to the new CharSetID. You don't need to do this if you only add addition CharSetID for document CharSetID and use the existing CharSetID as font CharSetID.

Procedure:

Open file cmd/macfe/utility/uintl.cp , Look at the function ScriptToEncoding, uncomment the Script code which you should map to and add a return statement to return the CharSetID.

To Be Improved:

The current implementation should be improved in near future by moving such mapping into resoruce so no C code need to be changed.

2.Add Single Byte Conversion Table

If the CharSetID you add is a single byte charset and you want to use One2One conversion procedure to convert between font CharSetID and document CharSetID on Macintosh, you need to do this step.

Procedure:

A. [Optional] Open file cmd/macfe/include/resgui.h , go to the end, and add a macro define for the conversion table resource id. This macro value is used in file lib/libi18n/fe_ccc.c as described in section "XXXX", and the resoruce file you will create in the next step. We suggest you define the resoruce id as

    #define xlat_FromCharSetID_TO_ToCharSetID (((FromCharSetID & 0xff) << 8 ) | (ToCharSetID & 0xff))

where

FromCharSetID is the source CharSetID
ToCharSetID is the target CharSetID

B. Make a new resoruce file xxx.r and put into cmd/macfe/restext/ directory. The new file should look like [Look at cmd/macfe/restext/cp1250.r for real example] :

    #include "csid.h"
    #include "resgui.h"
    data 'xlat' ( xlat_FromCharSetID_TO_ToCharSetID, "Resoruce Name",purgeable){
    /*       x0x1 x2x3 x4x5 x6x7 x8x9 xAxB xCxD xExF */
    /*8x*/ $"C481 82C9 A5D6 DCE1 B9C8 E4E8 C6E6 E98F"
    /*9x*/ $"9FCF EDEF 9495 96F3 98F4 F69B FACC ECFC"
    /*Ax*/ $"86B0 CAA3 A795 B6DF AEA9 99EA A8AD AEAF"
    /*Bx*/ $"B0B1 B2B3 B4B5 B6B7 B3B9 BABC BEC5 E5BF"
    /*Cx*/ $"C0D1 ACC3 F1D2 C6AB BB85 A0F2 D5CD F5CF"
    /*Dx*/ $"9697 9394 9192 F7D7 D8C0 E0D8 8B9B F8DF"
    /*Ex*/ $"E08A 8284 9A8C 9CC1 8D9D CD8E 9EED D3D4"
    /*Fx*/ $"F0D9 DAF9 DBFB F6F7 DDFD FAAF A3BF FEA1"
    };

where

xlat_FromCharSetID_TO_ToCharSetID is the resource id you just defined in file cmd/macfe/include/resgui.h
The red area are the mapping table for code point 0x80 to 0xFF. Please notice that it is not the same format as the one in the Window section.

To Be Improved:

We probably should make the definitation of resource id a Macro in csid.h to remove the requirement of changing resgui.h .

3.Add Unicode Conversion Table

Procedure:

Open file cmd/macfe/restext/ufrm.r , and add lines:

    resource 'UFRM' ( CharSetID, "Resoruce Name", purgeable) {{
    #include "xxx.uf"
    }};
    resource 'UTO ' ( CharSetID, "Resoruce Name", purgeable) {{
    #include "xxx.ut"
    }};

where the xxx.uf and xxx.ut is the unicode conversion table file which could be found in lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl directory.

4.Add menu item to "View:Encoding" menu

If you want to add additional menu items to the "View:Encoding" menu, you need to do this step.

Procedure:

A. Open file cmd/macfe/include/resgui.h locate the word ENCODING_CEILING, and addition line before that line:

    #define cmd_XXXX 14nn

where

cmd_XXXX is a macro value which will be used in
14nn is the "Command ID" value you should put into file cmd/macfe/rsrc/navigator/MenusRat.cnst and cmd/macfe/rsrc/communicator/Menus.cnstas described in the next step. We currenly reserved id from 1401 to 1499 for this purpose. You should use a number which does not conflict with what currently defined in the file cmd/macfe/include/resgui.

B. Open file cmd/macfe/restext/macfe.r and locate "resource 'Csid'", add entries into this table:

    fontCharSetID, documentCharSetID, cmd_XXX,

where the first field is the font CharSetID, the second field is the document CharSetID, and the third one is the Command ID you defined in cmd/macfe/include/resgui.h.

C. Open file cmd/macfe/rsrc/navigator/MenusRat.cnst and cmd/macfe/rsrc/communicator/Menus.cnstby using Metrowerks Constructor. Open "> Common View Encoding" (Res ID = 8) in the "Menus" section. Add additional menu items into it. The "Command ID" should be the one you just defined in file cmd/macfe/include/resgui.h .

To Be Improved:

Currently Macintosh source use the same encoding menu resource for different types of window. We may change to use different menu resource in the future.
Currently all the menu are defined in the front end. We may move this into cross-platform code and make the menu item easily changable in the future.

5.Add menu item for "Font" Preference "For the Encoding" menu

If you add additional font CharSetID, you need to do the following to make a new menu item appear in the font preference so people can associate fonts with that font CharSetID.

Procedure:

A. Open file cmd/macfe/rsrc/communicator/TextTraits.cnst by using Metrowerks Constructor. Add two new TextTraits by coping TextTraits 4001 and 4002. Name your TextTraits with the Font CharSetID so we can figure out what the TextTraits for later on. The content of these two TextTraits is not important since it will dynamiclly changed in the application initialization time. You better make the TextTraits id follow the TextTraits currently defined for the same purpose (from 4001 to 4024).

B. Open file cmd/macfe/restext/macfe.r , locate "resource 'Fnec'", add additional line

   "LanguageGroup", "PropFont", "FixedFont", 12, 10, CharSetID, ScriptCode, TextTraitsID1, TextTraitsID2;

where

the 1st field LanguageGroup is the string which will appeared on the "for the Encoding" menu on the font prefernece,
the 2nd and 3rd fields PropFont and FixedFont are the default propotional and fixed font name for that encoding before user change them,
the 4th and 5th fields 12 and 10 are the default propotional and fied font size for that encoding before user change them,
the 6th field CharSetID is the font CharSetID of this menu item,
the 7th field ScriptCode is the Macintosh script code for the CharSetID.
the 8th and 9th fields TextTraitsID1, TextTraitsID2 are the TextTraits ID we defined in file cmd/macfe/rsrc/communicator/TextTraits.cnst

The order in this resouce will reflect the order in the "for the Encoding" menu.

In UNIX Code

1.Add Font Handling Code

Procedure:

A. Open file cmd/xfe/resources, locate the "This table maps X11 font charsets to MIME charsets" section, add or change lines

*documentFonts.charset*XLDFCharset:    MIMECharset

where

XLDFCharset is the XLDF Charset name and
MIMECharset is the MIME Charset of the font we defined in file lib/libi18n/csnametb.c .

B. In the same file, locate the "! This table maps MIME charsets to language groups" section, add lines after it

    *documentFonts.charsetlang*MIMECharset:    LanguageGroup

where

MIMECharset is the MIME Charset of the font we defined in file lib/libi18n/csnametb.c
LanguageGroup is the menu item display in the fong preference.

C. In the same file, locate the "! Fonts used for printing" section,
[To Be Written]

D. In the same file, locate the "! Unicode Pseudo Font" section,
[To Be Written]

E. Open file cmd/xfe/fonts.h,
[To Be Written]

2.Add Single Byte Conversion Table

If the CharSetID you add is a single byte charset and you want to use One2One conversion procedure to convert between font CharSetID and document CharSetID on Unix, you need to do this step.

Procedure:

A. Open file lib/libi18n/sbconvtb.c, locate the "#ifdef XP_UNIX" section, add new table as following

    PRIVATE unsigned char MIMECharsetFrom_to_MIMECharsetTo[] = {
    /*8x*/  '?', '?', ',', 'f', '?', '?', '?', '?', '^', '?', 'S', '<', '?', '?', '?', '?',
    /*9x*/  '?', '?', '?', '?', '?', '*', '-', '-', '~', '?', 's', '>', '?', '?', '?', 'Y',
    /*Ax*/ 0xA0,0xA1,0xA2,0xA3,0xA4,0xA5,0xA6,0xA7,0xA8,0xA9,0xAA,0xAB,0xAC,0xAD,0xAE,0xAF,
    /*Bx*/ 0xB0,0xB1,0xB2,0xB3,0xB4,0xB5,0xB6,0xB7,0xB8,0xB9,0xBA,0xBB,0xBC,0xBD,0xBE,0xBF,
    /*Cx*/ 0xC0,0xC1,0xC2,0xC3,0xC4,0xC5,0xC6,0xC7,0xC8,0xC9,0xCA,0xCB,0xCC,0xCD,0xCE,0xCF,
    /*Dx*/ 0xD0,0xD1,0xD2,0xD3,0xD4,0xD5,0xD6,0xD7,0xD8,0xD9,0xDA,0xDB,0xDC,0xDD,0xDE,0xDF,
    /*Ex*/ 0xE0,0xE1,0xE2,0xE3,0xE4,0xE5,0xE6,0xE7,0xE8,0xE9,0xEA,0xEB,0xEC,0xED,0xEE,0xEF,
    /*Fx*/ 0xF0,0xF1,0xF2,0xF3,0xF4,0xF5,0xF6,0xF7,0xF8,0xF9,0xFA,0xFB,0xFC,0xFD,0xFE,0xFF
    };
    
    PRIVATE char *MIMECharsetFrom_to_MIMECharsetTo_p = (char*)MIMECharsetFrom_to_MIMECharsetTo;

where

MIMECharsetFrom_TO_MIMECharsetTo is the RESOURCE ID for the table. You should code this RESOURCE ID as MIMECharsetFrom_TO_MIMECharsetTo , where

MIMECharsetFrom is the 1st entry of the CONVERT_FROM CharSetID you can find in csname2id_tbl of file /lib/libi18n/csnametb.c. Notice you have to match the case, do not change from upper case to lower case or vis versa.
MIMECharsetTo is the the 1st entry of the CONVERT_TO CharSetID you can find in csname2id_tbl of file /lib/libi18n/csnametb.c. Notice you have to match the case, do not change from upper case to lower case or vis versa.

The red area are the mapping table for code point 0x80 to 0xFF.

B. In the same file, locate the function INTL_GetSingleByteTable in the "#ifdef XP_UNIX" section. Add code to return the table you have add the the previous step:

    ...
    else if ((from_csid == CharSetIDFrom) && (to_csid == CharSetIDTo)) {
        return &MIMECharsetFrom_to_MIMECharsetTo_p;
    }

where

CharSetIDFrom is the CharSetID for MIMECharsetFrom,
CharSetIDTo is the CharSetID for MIMECharsetTo,
MIMECharsetFrom_to_MIMECharsetTo_p is the array pointer you just defined in the previouse step.

3.Add Unicode Conversion Table

Procedure:

A. Open file lib/libi18n/ucs2.c and locate the "#ifdef XP_UNIX" section, add lines

    PRIVATE uint16 XXFromTbl[] = {
    #include "xx.uf"
    };
    PRIVATE uint16 XXToTbl[] = {
    #include "xx.ut"
    };

where

XXFromTbl and XXToTblare the table for the Unicode conversion table.
xxx.uf and xxx.ut is the unicode conversion table file which could be found in lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl directory.

B. In the same file, locate "LoadToUCS2Table" function in the "#ifdef XP_UNIX" section, add code into that function to return XXFromTbl.

C. In the same file, locate "LoadFromUCS2Table" function in the "#ifdef XP_UNIX" section, add code into that function to return XXToTbl.

4.Add menu item to "View:Encoding" menu

If you want to add additional menu items to the "View:Encoding" menu, you need to do this step.

Procedure:

A. Open file cmd/xfe/resources, locate "! View/Encoding Submenu" and add line into that section:

    *languageGroupEncCmdString: LanguageGroup (Charset)

where

languageGroupEncCmdString is the resource ID for the menu item. It should be defined as languageGroupEncCmdString, where languageGroup is the name of the langauge group. It will be referred in the file cmd/xfe/src/HTMLView.cpp later. See the following steps for details.
LanguageGroup (Charset) is the text of the menu item.

B. Open file cmd/xfe/src/Frame.cpp, locate the table "XFE_Frame::encoding_menu_spec" and add line

    { xfeCmdChangeDocumentEncoding, TOGGLEBUTTON, NULL, "EncodingRadioGroup", False, (void*)CharSetID },

where CharSetID is the docuemnt CharSetID you defined in file include/csid.h.

C. Open file cmd/xfe/src/HTMLView.cpp , locate "XFE_HTMLView::commandToString" , add code like the following :

    else if (IS_CMD(xfeCmdChangeDocumentEncoding))
    {
        char *res = NULL;
        int doc_csid = (int)calldata;
        switch (doc_csid)
        {
            ...
            case CharSetID:
               res = "languageGroupEncCmdString";
               break;
            ...
        }
    }

where

CharSetID is the document CharSetID you add for the encoding menu
languageGroupEncCmdString is the resource ID you defined in file cmd/xfe/resources as desceibed above.

Appendix A: Files You Need To Change:

Cross-Platform	include/csid.h lib/libi18n/csnametb.c lib/libi18n/csstrlen.c lib/libi18n/dblower.c lib/libi18n/kinsokuf.c lib/libi18n/kinsokud.c lib/libi18n/sbconvtb.c lib/libi18n/sblower.c lib/libi18n/ucs2.c lib/libparse/pa_amp.c
Window	cmd/winfe/res/convtbls.rc lib/libi18n/unicode/unitbl.c lib/libi18n/unicode/unitable.rc cmd/winfe/genfram2.cpp cmd/winfe/res/editor.rc2 cmd/winfe/res/mozilla.rc2 cmd/winfe/intlwin.cpp cmd/winfe/intlwin.h cmd/winfe/mozilla.rc modules/libpref/src/win/winpref.js
Macintosh	cmd/macfe/utility/uintl.cp cmd/macfe/include/resgui.h cmd/macfe/restext/ufrm.r cmd/macfe/restext/macfe.r cmd/macfe/rsrc/navigator/MenusRat.cnst cmd/macfe/rsrc/communicator/Menus.cnst cmd/macfe/rsrc/communicator/TextTraits.cnst
UNIX	cmd/xfe/src/Frame.cpp cmd/xfe/src/HTMLView.cpp cmd/xfe/resources cmd/xfe/fonts.c lib/libi18n/ucs2.c

Mozilla

How To Add Additional Charset Support

Tasks Outline:

In Cross-platform code:

In Window Code:

In Macintosh Code:

In UNIX Code:

Details

In Cross-Platform Code

1. Add CharSetID with the property of the charset.

A. Decide the property of the charset you need to add:

B. Define your CharSetID in file include/csid.h

2. Associate CharSetID with MIME charset name and Java Encoding Name.

Procedure:

3. Add Unicode conversion tables.

4. Add additional codeset conversion support.

5. Add codeset conversion rules.

6. Add ASCII fallback for Name Entity or Numeric Character Reference Not Supported in this Charset

Procedure:

7. Add character length/column width information for multibyte charset.

Procedure:

To Be Improved

8. Add (pseudo) toLower table for case-insensitive folding.

Procedure:

9. Add line breaking prohibit information for multibyte charset.

Procedure:

References:

To Be Improved:

10.Add word breaking information for multibyte charset.

Procedure:

To be improved:

In Window Code

1.Add CodePage to CharSetID mapping

Procedure:

To Be Improved:

2.Add Single Byte Conversion Table

Procedure:

3.Add Unicode Conversion Tables

Procedure:

4.Add menu items to "View:Encoding" menu

Procedure:

To Be Improved:

5.Add menu item for "Font" preference "For the Encoding" menu

Procedure:

In Macintosh Code

1.Add Script to CharSetID mapping

Procedure:

To Be Improved:

2.Add Single Byte Conversion Table

Procedure:

To Be Improved:

3.Add Unicode Conversion Table

Procedure:

4.Add menu item to "View:Encoding" menu

Procedure:

To Be Improved:

5.Add menu item for "Font" Preference "For the Encoding" menu

Procedure:

In UNIX Code

1.Add Font Handling Code

Procedure:

2.Add Single Byte Conversion Table

Procedure:

3.Add Unicode Conversion Table

Procedure:

4.Add menu item to "View:Encoding" menu

Procedure:

Appendix A: Files You Need To Change:

B. Define your CharSetID in file `include/csid.h`