How To Add Additional Charset Support
Tasks Outline:
In Cross-platform code:
- Add CharSetID with the property of the charset.
- Associate CharSetID with MIME charset name and Java Encoding Name.
- Add Unicode conversion tables.
- Add additional codeset conversion support.
- Add codeset conversion rules.
- Add ASCII fallback for Name Entity or Numeric Character Reference Not Supported in this Charset.
- Add character length and column width information for multibyte charset.
- Add (pseudo) toLower table for case-insensitive folding.
- Add line breaking prohibit information for multibyte charset.
- Add word breaking information for multibyte charset.
In Window Code:
- Add CodePage to CharSetID mapping
- Add Single Byte Conversion Table
- Add Unicode Conversion Tables
- Add menu items to "View:Encoding" menu
- Add menu item for "Font" preference "For the Encoding" menu
In Macintosh Code:
- Add Script to CharSetID mapping
- Add Single Byte Conversion Table
- Add Unicode Conversion Table
- Add menu item to "View:Encoding" menu
- Add menu item for "Font" Preference "For the Encoding" menu
In UNIX Code:
- Add font handling code
- Add Single Byte Conversion Table
- Add Unicode Conversoin Table
- Add menu item to "View:Encoding" menu
Details
In Cross-Platform Code
1. Add CharSetID with the property of the charset.
A. Decide the property of the charset you need to add:
- Type- one of the following four:
- SINGLEBYTE- for single byte charset, such as KOI8-R, ISO-8859-x, TIS620
- MULTIBYTE- for muiltibyte charset, such as Shift_JIS, EUC-KR
- WIDECHAR- for process code, such as UCS2, UCS4
- STATEFUL- for stateful encoding, like ISO-2022-JP, ISO-2022-KR
- Basic line wrapping rule (only for MULTIBYTE): CS_SPACE- wrap line in spacing character only. Currently only Korean.
B. Define your CharSetID in file include/csid.h
2. Associate CharSetID with MIME charset name and Java Encoding Name.
You need to do this for every CharSetID you add.Procedure:
A. Before you add any name, please look at two document first:- IANA (Internet Assigned Numbers Authority) CHARACTER SET document.
- "Supported Encodings" section in the Java "JDK Internationalization Overview" document.
- MIME charset name- use the one labeled as "(Preferred MIME name)" in the IANA document. If there are no one in the IANA document, you should:
- Considering register one to IANA
- Put "x-" in the beginning to indicate it is a private one.
- MIME charset aliases - all the name and alias for that CharSetID listed in the IANA document.
- Java encoding name - the name list in the Java document
{"Shift_JIS", "SJIS", CS_SJIS}, ... /* aliases for Shift_JIS: */ {"x-sjis", "", CS_SJIS}, {"ms_Kanji", "", CS_SJIS}, {"csShiftJIS", "", CS_SJIS}, {"Windows-31J", "", CS_SJIS},in the above example, where
- CS_SJIS is the CharSetID
- "Shift_JIS" is the MIME charset name so it is listed in the first entry,
- "SJIS" is the Java encoding name,
- "x-sjis", "ms_Kanji", "csShiftJIS", and "Windows-31J" are the MIME charset aliases. Although "x-sjis" is not listed in the IANA document but we listed it for backward compatability reason.
3. Add Unicode conversion tables.
A. Create the conversion table: Before create a conversion table for your charset, take a look at the directory lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl. You may find the one for your charset is already there, even they are not build into the mozilla binary. If you cannot find one there, follow the instruction below. (Thanks for Hovik Melikyan <hovik@undp.am> to write the following section for me)A.1. Create a 8 to 16-bit conversion table for the new encoding. The file should contain two columns of hexadecimal numbers, where the left column represents 8-bit codes and the right column - corresponding 16-bit code. If there are some undefined code point or code point have no mapping to Unicode, do not list it in the file. Example:
0x20 0x0020 ... 0xa0 0x00a0 0xa2 0x00a7 0xa3 0x0589 0xa4 0x0029 ...This file will be used for generating both "FROM" and "TO" conversion tables.
A.2. Compile utilities in lib/libi18n/unicode/tbltool. Example for Visual C++:
cl -I../../ fromu.c utblutil.c cl -I../../ tou.c utblutil.cThese programs accept conversion tables as described in A.1. from standard input and generate resources to standard output in a form suitable for including in Windows resource files and also UNIX C sources. Example:
fromu < xscii.txt > xscii.uf tou < xcsii.txt > xscii.utA.3 Copy the generated *.uf and *.ut files to lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl respectively.
B. Include the Unicode conversion table into the binary by do the following:
C. You might also need to make additions to shift tables in lib/libi18n/ugendata.c
4. Add additional codeset conversion support.
[To Be Written]How to write a new conversion routine.
5. Add codeset conversion rules.
[To Be Written, IDEA and OUTLINE only for now]How to change lib/libi18n/fe_ccc.c
Outline:
- For multibyte charset, if the from and to CharSetID are the same, apply mz_mbNullConv with a flag
- Fro singlebyte charset, if the from and to CharSetID are the ame, cast 0.
- For other singlebyte charset, apply One2OneCCC routine.Remember to link to the FE section about singlebyte table.
- For algorithm base conversion, write a new conversion routine if it is algorithm base. Refer to previous section.
- If it is desired to use Unicode as intermediate code for the conversion, use mz_AnyToAnyThroughUCS2 .
- Need to have conversion rule for both direction so HTML Form posting will work.
6. Add ASCII fallback for Name Entity or Numeric Character Reference Not Supported in this Charset
If you want to add ASCII fallback for the Name Entity or Numeric Character References for the newly added charset, you should do this step:Procedure:
Open file lib/libparse/pa_amp.c, locate the function pa_map_escape and goto the end of this function. There should be #ifdef PLATFORM section which have a big switch statement there, add your code there to provide different ASCII fallback for character you cannot display in the font CharSetID.7. Add character length/column width information for multibyte charset.
If the font or document CharSetID you add is multibyte charset. You should provide character/column length information as below.Procedure:
A. Open lib/libi18n/csstrlen.c , locate the csinfo_tbl in the beginning of the file. Look at the current entry, add addition entry if no entry match the characteristic of your charset. Here is the entry for Japanese EUC_JP charset.{{{2,2,{0xa1,0xfe}}, {2,1,{0x8e,0x8e}}, {3,2,{0x8f,0x8f}}}}, /* For EUC_JP */This entry specify if the first byte of a character is
- in the range of 0xa1 to 0xfe, the character length is 2 bytes and the column width is 2 columns (For 2 byte JIS x0208 characters),
- 0x8e, the character length is 2 bytes and the column width is 1 columns (For JIS x0201 katakana, there are a SS2 [0x8e] in front of the one byte katakana),
- 0x8f, the character length is 3 bytes and the column width is 2 columns (For JIS x0212 characters, threre are a SS3 [0x8f] in front of the two byte kanji)
- the rest of the code point is one byte character
To Be Improved
- Break the limitation of only 256 charset could be defined.
- Remove the assumption of CharSetID.
8. Add (pseudo) toLower table for case-insensitive folding.
You need to add tables to one of the following files to perform correct case-insensitive search. It is only need to be done for the font CharSetID. A document CharSetID which is not a font CharSetID in that platform do not need do this.Procedure:
A. If the charset you add is single byte charset:A.1 open file lib/libi18n/sblower.c ,
A.2 Add (pseudo) to-lower-case table for the code point 0x80-0xFF. To reduce size in the platform which do not use the charset as font CharSetID, put #ifdef PLATFORM around the new table.
A.3 Change function INTL_GetSingleByteToLowerMap to return the new (pseudo) to-lower-case table.
B. If the charset you add is multibyte charset:
B.1 open lib/libi18n/dblower.c,
B.2 add (pseudo) double-bytes to-lower-mapping table. To reduce
size in the platform which do not use the charset as font CharSetID, put
#ifdef PLATFORM around the new table. The table entry is defined as
typedef struct { unsigned char src_b1; unsigned char src_b2_start; unsigned char src_b2_end; unsigned char dest_b1; unsigned char dest_b2_start; } DoubleByteToLowerMap;B.3 Change function INTL_GetDoubleByteToLowerMap to return the new (pseudo) to-lower-case table.
9. Add line breaking prohibit information for multibyte charset.
Line wrapping behavior is affected by the following property of CharSetID:- Type of CharSetID- SINGLEBYTE or MULTIBYTE
- CS_SPACE bit in CharSetID
- Prohibit type of the character returned by funcion INTL_KinsokuClass.
- PROHIBIT_WORD_BREAK: Do not wrap line in between two character which have this prohibit type. For example, the Greek characters in UTF-8 have this type so yo won't wrap in between two Greek character.
- PROHIBIT_NOWHERE: It is ok to break before or after this character. For example, all the Kanji character have this type.
- PROHIBIT_BEGIN_OF_LINE: This character should not appeared in the beginning of the line. Do not warp line before it. For example, ')' has this type.
- PROHIBIT_END_OF_LINE: This character should not appeared in the end of the line. Do not warp line after it. For example, '(' has this type.
Procedure:
To change INTL_KinsokuClass, change file lib/libi18n/kinsokuf.c and lib/libi18n/kinsokud.cReferences:
- Ken Lunde, Understanding Japanese Information Processing, page 148-151
- Japanese Standards Association, JIS X 4051-1995, Line composition rules for Japanese documents.
- Nadine Kano, Developing International Software For Windows 95 and Windows NT, page 238-245
To Be Improved:
- This API should be changed to optimize the performance, at state machine implmentation should do better job.
- It will be great if we could make the prohibit list changable by the document in HTML tag.
10.Add word breaking information for multibyte charset.
This is for word selection (by double click) feature.Procedure:
Open file and change the function INTL_CharClass. It is hard to explain how to change it. Look at by yourself.To be improved:
- We should rewrite this function to include more class.
- An extensable mechanism is needed.
- There are no way to do dictionary-base word break.
In Window Code
1.Add CodePage to CharSetID mapping
You need to do the following if you add additional font CharSetID. This have to be done so a Window code page number could be map to the new CharSetID. You don't need to do this if you only add addition CharSetID for document CharSetID and use the existing CharSetID as font CharSetID.Procedure:
Open file cmd/winfe/intlwin.cpp, and locate for function CIntlWin::CodePageToCsid , add code there if necessary. That function map a Window code page to a CharSetID you defined in ns/include/csid.hTo Be Improved:
The current implementation should be improved in near future by moving such mapping into resoruce so no C code need to be changed.2.Add Single Byte Conversion Table
If the CharSetID you add is a single byte charset and you want to use One2One conversion procedure to convert between font CharSetID and document CharSetID on Window, you need to do this step.Procedure:
Open file cmd/winfe/res/convtbls.rc and add lines:MIMECharsetFrom_TO_MIMECharsetTo RCDATA BEGIN /*8x*/ 0x8180, 0x8382, 0x8584, 0x8786, 0x8988, 0x8B8A, 0x8D8C, 0x8F8E, /*9x*/ 0x9190, 0x9392, 0x9594, 0x9796, 0x9998, 0x9B9A, 0x9D9C, 0x9F9E, /*Ax*/ .............................................................. /*Bx*/ .............................................................. /*Cx*/ .............................................................. /*Dx*/ .............................................................. /*Ex*/ .............................................................. /*Fx*/ .............................................................. END /* End of MIMECharsetFrom_TO_MIMECharsetTo */where
- MIMECharsetFrom_TO_MIMECharsetTo is the RESOURCE ID for the table. You must code this RESOURCE ID as MIMECharsetFrom_TO_MIMECharsetTo , where
- MIMECharsetFrom is the 1st entry of the CONVERT_FROM CharSetID you can find in csname2id_tbl of file /lib/libi18n/csnametb.c. Notice you have to match the case, do not change from upper case to lower case or vis versa.
- MIMECharsetTo is the the 1st entry of the CONVERT_TO CharSetID you can find in csname2id_tbl of file /lib/libi18n/csnametb.c. Notice you have to match the case, do not change from upper case to lower case or vis versa.
- The red area are the mapping table for code point 0x80 to 0xFF. Please notice:
- You have to put two byte togeter with a "0x" in front and "," after that.
- You have to put the odd byte before the even byte (0x8180 instead of 0x8081)
3.Add Unicode Conversion Tables
You should add Unicode conversion table for the new CharSetID you add. The conversion tables for many charset is already generated into directory lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl . In most of the case you simply need to include them into the resource file. In case if the charset you add do not have a conversion table in that two directory and you need to generate one. Look at the tools in lib/libi18n/unicode/tbltool directory. Those tool expect the input file format as those table could be found on unicode ftp site (ftp://ftp.unicode.org/Public/MAPPINGS/).Procedure:
A. Open file lib/libi18n/unicode/unitable.rc and add linesXXX.UF RCDATA BEGIN #include "ufrmtbl\\xxx.uf" END XXX.UT RCDATA BEGIN #include "utotbl\\xxx.ut" ENDwhere
- XXX.UF and XXX.UT are the RESOURCE ID we defined here. It should be put into lib/libi18n/unicode/unitbl.c file in the next step.
- xxx.uf and xxx.ut is the unicode conversion table file which could be found in lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl directory.
{CS_XXX, {"XXX.UF", 0, NULL}, {"XXX.UT", 0, NULL}},to the table utablenametbl before the last entry
{CS_DEFAULT, {"", 0, NULL}, {"", 0, NULL}}where
- CS_XXX is the CharSetID we try to support Unicode conversion and
- XXX.UF and XXX.UT are the RESOURCE ID we defined in file lib/libi18n/unicode/unitable.rc. Please notice that since a limitation of 16-bit window, you must put all upper case for RESOURCE ID in this C file, even you may define it as lower case in the file lib/libi18n/unicode/unitable.rc.
4.Add menu items to "View:Encoding" menu
If you want to add additional menu items to the "View:Encoding" menu, you need to do this step.Procedure:
A. Open file cmd/winfe/genfram2.cpp, locate for funciton nIDToCsid() and add entry to the end of the talbe nid_to_csid, put down the CharSetID you want it appeared on the "View:Encoding" menu. The position of CharSetID do not affect the position of the "View:Encoding" menu. It simply decide what is the nID for that CharSetIDB.1 Open file cmd/winfe/res/mozilla.rc2 , locate POPUP "&Encoding" , you will find 8 of them (with two comment out):
- One in IDR_SRVR_INPLACE MENU: [No idea when this will show up. It looks like the same as the one in IDM_MAINFRAMEVIEWMENU MENU. Need to double check with other engineer]
- One in IDR_MAINFRAME MENU: [Commented out by C++ comment]
- One in IDM_MAINFRAMEVIEWMENU MENU: This is for the "Encoding" Menu you see in the Navigator window.
- One in IDR_SRVR_EMBEDDED MENU: [No idea when this will show up. Need to double check with other engineer]
- One in IDR_MESSAGEFRAME MENU: This is for the "Encoding" Menu you see in the window which you bring up by double clicking a message header in the "Messanger Mailbox" window.
- One in IDR_MAILTHREAD MENU: [Commented out by C commnet ]
- One in IDM_MAILTHREADVIEWMENU MENU: This is for the "Encoding" Menu you see in the "Messanger Mailbox" window.
- One in IDR_COMPOSEPLAIN MENU: This is for the "Encoding" Menu you see in the mail composition window when the Preference "Normally send HTML mail" option is off.
- One in IDR_COMPOSEFRAME MENU: This is for the "Encoding" Menu you see in the mail composition window when the Preference "Normally send HTML mail" option is on.
- One in IDR_EDITFRAME MENU: This is for the "Encoding" Menu you see in the "Page Composer" window.
MENUITEM "LanguageGroup (Charset)", ID_OPTIONS_ENCODING_XXwhere
- "LanguageGroup (Charset)" is the human readable string on the "Encoding" Menu, and
- ID_OPTIONS_ENCODING_XX is the nID you map in the file cmd/winfe/genfram2.cpp ( ID_OPTIONS_ENCODING_1 to ID_OPTIONS_ENCODING_70 is already reserved for you).
To Be Improved:
Currently all the menu are defined in the front end. We may move this into cross-platform code and make the menu item easily changable in the future.5.Add menu item for "Font" preference "For the Encoding" menu
If you add additional font CharSetID, you need to do the following to make a new menu item appear in the font preference so people can associate fonts with that font CharSetID.Procedure:
A. Open file cmd/winfe/mozilla.rc , by using DeveloperStudio, not text editor, and add additional Stirng for the menu items in "For the Encoding" menu. You should name the RESOURCE ID IDS_LANGUAGE_XX so we can easily find them later. Close the file, the Developer Studio should change cmd/winfe/mozilla.rc as well as cmd/winfe/resource.h .B. Open file cmd/winfe/intlwin.cpp and
B.1 Locate lang_table and add new lines
IDS_LANGUAGE_XX, CS_XXX, CS_XXX, CS_YYY, 0,
- the 1st item IDS_LANGUAGE_XX is the RESOURCE ID you add to cmd/winfe/mozilla.rc (which get add to cmd/winfe/resource.h by Developer Studio automatically)
- the 2nd and 3rd item CS_XXX is the CharSetID which will be used as the Font CharSetID and one of the document CharSetID this Font CharSetID support.
- the 4th item is another document CharSetID you want it to convert to this Font CharSetID. If there are no other document CharSetID convert to this Font CharSetID, don't put it in here.
CS_XXX, "PropoFont", 12, "FixedFonte", 10, CHARSET, CHARSET,where
- the 1st item CS_XXX is the Font CharSetID as list in the previouse step.
-
the 2nd item "
PropoFont " is the default font name for Propotional font - the 3rd item12 is the default font size for Propotional font
-
the 4th item "
FixedFonte " is the default font name for Fixed font - the 5th item10 is the default font size for Fixed font
-
the 6th and 7th item
CHARSET are the Window CHARSET value for the font, where
Value | For | Comment |
ANSI_CHARSET | Western European languages | |
SHIFTJIS_CHARSET | Japanese | |
CHINESEBIG5_CHARSET | Traditional Chinese | |
HANGEUL_CHARSET | Korean | |
DEFAULT_CHARSET | UCS2 | |
134 | Simplified Chiense | Same as GB2312_CHARSET. However, GB2312_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using GB2312_CHARSET instead of 134 will break 16-bits window build. |
177 | Hebrew | Same as HEBREW_CHARSET. However, HEBREW_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using HEBREW_CHARSET instead of 177 will break 16-bits window build. |
178 | Arabic | Same as ARABIC_CHARSET. However, ARABIC_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using ARABIC_CHARSET instead of 178 will break 16-bits window build. |
161 | Greek | Same as GREEK_CHARSET. However, GREEK_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using GREEK_CHARSET instead of 161 will break 16-bits window build. |
162 | Turkish | Same as TURKISH_CHARSET. However, TURKISH_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using TURKISH_CHARSET instead of 162 will break 16-bits window build. |
163 | Vietnamese | Same as VIETNAMESE_CHARSET. However, VIETNAMESE_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using VIETNAMESE_CHARSET instead of 163 will break 16-bits window build. |
222 | Thai | Same as THAI_CHARSET. However, THAI_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using THAI_CHARSET instead of 222 will break 16-bits window build. |
238 | East European languages | Same as EASTEUROPE_CHARSET. However, EASTEUROPE_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using EASTEUROPE_CHARSET instead of 238 will break 16-bits window build. |
186 | Baltic languages | Same as BALTIC_CHARSET. However, BALTIC_CHARSET is not defined in 16-bits version of header file (WINDOWS.H). Using BALTIC_CHARSET instead of 186 will break 16-bits window build. |
[Note: Need to talk about modules/libpref/src/win/winpref.js]
In Macintosh Code
1.Add Script to CharSetID mapping
You need to do the following if you add additional font CharSetID. This have to be done so a Macintosh Script code could be map to the new CharSetID. You don't need to do this if you only add addition CharSetID for document CharSetID and use the existing CharSetID as font CharSetID.Procedure:
Open file cmd/macfe/utility/uintl.cp , Look at the function ScriptToEncoding, uncomment the Script code which you should map to and add a return statement to return the CharSetID.To Be Improved:
The current implementation should be improved in near future by moving such mapping into resoruce so no C code need to be changed.2.Add Single Byte Conversion Table
If the CharSetID you add is a single byte charset and you want to use One2One conversion procedure to convert between font CharSetID and document CharSetID on Macintosh, you need to do this step.Procedure:
A. [Optional] Open file cmd/macfe/include/resgui.h , go to the end, and add a macro define for the conversion table resource id. This macro value is used in file lib/libi18n/fe_ccc.c as described in section "XXXX", and the resoruce file you will create in the next step. We suggest you define the resoruce id as#define xlat_FromCharSetID_TO_ToCharSetID (((FromCharSetID & 0xff) << 8 ) | (ToCharSetID & 0xff))where
- FromCharSetID is the source CharSetID
- ToCharSetID is the target CharSetID
#include "csid.h" #include "resgui.h" data 'xlat' ( xlat_FromCharSetID_TO_ToCharSetID, "Resoruce Name",purgeable){ /* x0x1 x2x3 x4x5 x6x7 x8x9 xAxB xCxD xExF */ /*8x*/ $"C481 82C9 A5D6 DCE1 B9C8 E4E8 C6E6 E98F" /*9x*/ $"9FCF EDEF 9495 96F3 98F4 F69B FACC ECFC" /*Ax*/ $"86B0 CAA3 A795 B6DF AEA9 99EA A8AD AEAF" /*Bx*/ $"B0B1 B2B3 B4B5 B6B7 B3B9 BABC BEC5 E5BF" /*Cx*/ $"C0D1 ACC3 F1D2 C6AB BB85 A0F2 D5CD F5CF" /*Dx*/ $"9697 9394 9192 F7D7 D8C0 E0D8 8B9B F8DF" /*Ex*/ $"E08A 8284 9A8C 9CC1 8D9D CD8E 9EED D3D4" /*Fx*/ $"F0D9 DAF9 DBFB F6F7 DDFD FAAF A3BF FEA1" };where
- xlat_FromCharSetID_TO_ToCharSetID is the resource id you just defined in file cmd/macfe/include/resgui.h
- The red area are the mapping table for code point 0x80 to 0xFF. Please notice that it is not the same format as the one in the Window section.
To Be Improved:
We probably should make the definitation of resource id a Macro in csid.h to remove the requirement of changing resgui.h .3.Add Unicode Conversion Table
You should add Unicode conversion table for the new CharSetID you add. The conversion tables for many charset is already generated into directory lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl . In most of the case you simply need to include them into the resource file. In case if the charset you add do not have a conversion table in that two directory and you need to generate one. Look at the tools in lib/libi18n/unicode/tbltool directory. Those tool expect the input file format as those table could be found on unicode ftp site (ftp://ftp.unicode.org/Public/MAPPINGS/).Procedure:
Open file cmd/macfe/restext/ufrm.r , and add lines:resource 'UFRM' ( CharSetID, "Resoruce Name", purgeable) {{ #include "xxx.uf" }}; resource 'UTO ' ( CharSetID, "Resoruce Name", purgeable) {{ #include "xxx.ut" }};where the xxx.uf and xxx.ut is the unicode conversion table file which could be found in lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl directory.
4.Add menu item to "View:Encoding" menu
If you want to add additional menu items to the "View:Encoding" menu, you need to do this step.Procedure:
A. Open file cmd/macfe/include/resgui.h locate the word ENCODING_CEILING, and addition line before that line:#define cmd_XXXX 14nnwhere
- cmd_XXXX is a macro value which will be used in
- 14nn is the "Command ID" value you should put into file cmd/macfe/rsrc/navigator/MenusRat.cnst and cmd/macfe/rsrc/communicator/Menus.cnst as described in the next step. We currenly reserved id from 1401 to 1499 for this purpose. You should use a number which does not conflict with what currently defined in the file cmd/macfe/include/resgui.
fontCharSetID, documentCharSetID, cmd_XXX,where the first field is the font CharSetID, the second field is the document CharSetID, and the third one is the Command ID you defined in cmd/macfe/include/resgui.h.
C. Open file cmd/macfe/rsrc/navigator/MenusRat.cnst and cmd/macfe/rsrc/communicator/Menus.cnst by using Metrowerks Constructor. Open "> Common View Encoding" (Res ID = 8) in the "Menus" section. Add additional menu items into it. The "Command ID" should be the one you just defined in file cmd/macfe/include/resgui.h .
To Be Improved:
- Currently Macintosh source use the same encoding menu resource for different types of window. We may change to use different menu resource in the future.
- Currently all the menu are defined in the front end. We may move this into cross-platform code and make the menu item easily changable in the future.
5.Add menu item for "Font" Preference "For the Encoding" menu
If you add additional font CharSetID, you need to do the following to make a new menu item appear in the font preference so people can associate fonts with that font CharSetID.Procedure:
A. Open file cmd/macfe/rsrc/communicator/TextTraits.cnst by using Metrowerks Constructor. Add two new TextTraits by coping TextTraits 4001 and 4002. Name your TextTraits with the Font CharSetID so we can figure out what the TextTraits for later on. The content of these two TextTraits is not important since it will dynamiclly changed in the application initialization time. You better make the TextTraits id follow the TextTraits currently defined for the same purpose (from 4001 to 4024).B. Open file cmd/macfe/restext/macfe.r , locate "resource 'Fnec'", add additional line
"LanguageGroup", "PropFont", "FixedFont", 12, 10, CharSetID, ScriptCode, TextTraitsID1, TextTraitsID2;where
-
the 1st field LanguageGroup is the string which will appeared on the "for the Encoding" menu on the font prefernece, -
the 2nd and 3rd fields PropFont andFixedFont are the default propotional and fixed font name for that encoding before user change them, - the 4th and 5th fields 12 and 10 are the default propotional and fied font size for that encoding before user change them,
-
the 6th field
CharSetID is the font CharSetID of this menu item, -
the 7th field
ScriptCode is the Macintosh script code for the CharSetID. -
the 8th and 9th fields
TextTraitsID1, TextTraitsID2 are the TextTraits ID we defined in file cmd/macfe/rsrc/communicator/TextTraits.cnst
In UNIX Code
1.Add Font Handling Code
Procedure:
A. Open file cmd/xfe/resources, locate the "This table maps X11 font charsets to MIME charsets" section, add or change lines*documentFonts.charset*XLDFCharset: MIMECharsetwhere
-
XLDFCharset is the XLDF Charset name and -
MIMECharset is the MIME Charset of the font we defined in file lib/libi18n/csnametb.c .
*documentFonts.charsetlang*MIMECharset: LanguageGroupwhere
-
MIMECharset is the MIME Charset of the font we defined in file lib/libi18n/csnametb.c -
LanguageGroup is the menu item display in the fong preference.
[To Be Written]
D. In the same file, locate the "! Unicode Pseudo Font" section,
[To Be Written]
E. Open file cmd/xfe/fonts.h,
[To Be Written]
2.Add Single Byte Conversion Table
If the CharSetID you add is a single byte charset and you want to use One2One conversion procedure to convert between font CharSetID and document CharSetID on Unix, you need to do this step.Procedure:
A. Open file lib/libi18n/sbconvtb.c, locate the "#ifdef XP_UNIX" section, add new table as followingPRIVATE unsigned char MIMECharsetFrom_to_MIMECharsetTo[] = { /*8x*/ '?', '?', ',', 'f', '?', '?', '?', '?', '^', '?', 'S', '<', '?', '?', '?', '?', /*9x*/ '?', '?', '?', '?', '?', '*', '-', '-', '~', '?', 's', '>', '?', '?', '?', 'Y', /*Ax*/ 0xA0,0xA1,0xA2,0xA3,0xA4,0xA5,0xA6,0xA7,0xA8,0xA9,0xAA,0xAB,0xAC,0xAD,0xAE,0xAF, /*Bx*/ 0xB0,0xB1,0xB2,0xB3,0xB4,0xB5,0xB6,0xB7,0xB8,0xB9,0xBA,0xBB,0xBC,0xBD,0xBE,0xBF, /*Cx*/ 0xC0,0xC1,0xC2,0xC3,0xC4,0xC5,0xC6,0xC7,0xC8,0xC9,0xCA,0xCB,0xCC,0xCD,0xCE,0xCF, /*Dx*/ 0xD0,0xD1,0xD2,0xD3,0xD4,0xD5,0xD6,0xD7,0xD8,0xD9,0xDA,0xDB,0xDC,0xDD,0xDE,0xDF, /*Ex*/ 0xE0,0xE1,0xE2,0xE3,0xE4,0xE5,0xE6,0xE7,0xE8,0xE9,0xEA,0xEB,0xEC,0xED,0xEE,0xEF, /*Fx*/ 0xF0,0xF1,0xF2,0xF3,0xF4,0xF5,0xF6,0xF7,0xF8,0xF9,0xFA,0xFB,0xFC,0xFD,0xFE,0xFF }; PRIVATE char *MIMECharsetFrom_to_MIMECharsetTo_p = (char*)MIMECharsetFrom_to_MIMECharsetTo;where
- MIMECharsetFrom_TO_MIMECharsetTo is the RESOURCE ID for the table. You should code this RESOURCE ID as MIMECharsetFrom_TO_MIMECharsetTo , where
- MIMECharsetFrom is the 1st entry of the CONVERT_FROM CharSetID you can find in csname2id_tbl of file /lib/libi18n/csnametb.c. Notice you have to match the case, do not change from upper case to lower case or vis versa.
- MIMECharsetTo is the the 1st entry of the CONVERT_TO CharSetID you can find in csname2id_tbl of file /lib/libi18n/csnametb.c. Notice you have to match the case, do not change from upper case to lower case or vis versa.
- The red area are the mapping table for code point 0x80 to 0xFF.
... else if ((from_csid == CharSetIDFrom) && (to_csid == CharSetIDTo)) { return &MIMECharsetFrom_to_MIMECharsetTo_p; }where
-
CharSetIDFrom is the CharSetID for
MIMECharsetFrom, -
CharSetIDTo is the CharSetID for
MIMECharsetTo, -
MIMECharsetFrom_to_MIMECharsetTo _p is the array pointer you just defined in the previouse step.
3.Add Unicode Conversion Table
You should add Unicode conversion table for the new CharSetID you add. The conversion tables for many charset is already generated into directory lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl . In most of the case you simply need to include them into the resource file. In case if the charset you add do not have a conversion table in that two directory and you need to generate one. Look at the tools in lib/libi18n/unicode/tbltool directory. Those tool expect the input file format as those table could be found on unicode ftp site (ftp://ftp.unicode.org/Public/MAPPINGS/).Procedure:
A. Open file lib/libi18n/ucs2.c and locate the "#ifdef XP_UNIX" section, add linesPRIVATE uint16 XXFromTbl[] = { #include "xx.uf" }; PRIVATE uint16 XXToTbl[] = { #include "xx.ut" };where
- XXFromTbl and XXToTblare the table for the Unicode conversion table.
- xxx.uf and xxx.ut is the unicode conversion table file which could be found in lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl directory.
C. In the same file, locate "LoadFromUCS2Table" function in the "#ifdef XP_UNIX" section, add code into that function to return XXToTbl.
4.Add menu item to "View:Encoding" menu
If you want to add additional menu items to the "View:Encoding" menu, you need to do this step.Procedure:
A. Open file cmd/xfe/resources, locate "! View/Encoding Submenu" and add line into that section:*languageGroupEncCmdString: LanguageGroup (Charset)where
- languageGroupEncCmdString is the resource ID for the menu item. It should be defined as languageGroupEncCmdString, where languageGroup is the name of the langauge group. It will be referred in the file cmd/xfe/src/HTMLView.cpp later. See the following steps for details.
-
LanguageGroup (Charset) is the text of the menu item.
{ xfeCmdChangeDocumentEncoding, TOGGLEBUTTON, NULL, "EncodingRadioGroup", False, (void*)CharSetID },where CharSetID is the docuemnt CharSetID you defined in file include/csid.h.
C. Open file cmd/xfe/src/HTMLView.cpp , locate "XFE_HTMLView::commandToString" , add code like the following :
else if (IS_CMD(xfeCmdChangeDocumentEncoding)) { char *res = NULL; int doc_csid = (int)calldata; switch (doc_csid) { ... case CharSetID: res = "languageGroupEncCmdString"; break; ... } }where
- CharSetID is the document CharSetID you add for the encoding menu
- languageGroupEncCmdString is the resource ID you defined in file cmd/xfe/resources as desceibed above.
Appendix A: Files You Need To Change:
Cross-Platform | include/csid.h
lib/libi18n/csnametb.c lib/libi18n/csstrlen.c lib/libi18n/dblower.c lib/libi18n/kinsokuf.c lib/libi18n/kinsokud.c lib/libi18n/sbconvtb.c lib/libi18n/sblower.c lib/libi18n/ucs2.c lib/libparse/pa_amp.c |
Window | cmd/winfe/res/convtbls.rc
lib/libi18n/unicode/unitbl.c lib/libi18n/unicode/unitable.rc cmd/winfe/genfram2.cpp cmd/winfe/res/editor.rc2 cmd/winfe/res/mozilla.rc2 cmd/winfe/intlwin.cpp cmd/winfe/intlwin.h cmd/winfe/mozilla.rc modules/libpref/src/win/winpref.js |
Macintosh | cmd/macfe/utility/uintl.cp
cmd/macfe/include/resgui.h cmd/macfe/restext/ufrm.r cmd/macfe/restext/macfe.r cmd/macfe/rsrc/navigator/MenusRat.cnst cmd/macfe/rsrc/communicator/Menus.cnst cmd/macfe/rsrc/communicator/TextTraits.cnst |
UNIX | cmd/xfe/src/Frame.cpp
cmd/xfe/src/HTMLView.cpp cmd/xfe/resources cmd/xfe/fonts.c lib/libi18n/ucs2.c |