You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



Mozilla Charset Detectors

Frank Yung-Fong Tang <ftang@netscaep.com >

What is Charset Detector in Mozilla?

Charset detector in Mozilla is a XPCOM component which receive bytes as incoming data and base on the bytes of data "guess" what the charset of the data is and report it to the caller.

What is Not Charset Detector?

Charset detector in Mozilla is not the code module which look document labeled information such as  "<META HTTP-EQUIV='content-type CONTENT='text/html; charset=big5'>" . The code which response for looking at charset label is not called charset detector in mozilla. It is called Meta charset observer. Charset Detector is the module which "guess" charset by unreliable data.

How to setup default charset detector for your embedding users?

You can set up the default value of which charset detector to be used as default by specify the " intl.charset.detector " value in navigator.properties . The source of navigator.properties is located under mozilla/xpfe/browser/resources/locale/en-US/navigator.properties . In the binary, the file is packaged into en-US.jar (or other language package $LOCALE$.jar)  as resources/locale/en-US/navigator.properties ( or resources/locale/$LOCALE$/navigator.properties )

Which charset detectors is available?

Value for "intl.charset.detector"
English Name in charsetTitles.properties
Charset it can detect
Implemented by
Windows Binary Files
Comment

Off
None
mozilla/intl/chardet/
chardet.dll

universal_charset_detector
Universial
ISO-8859-2
ISO-8859-5
ISO-8859-7
windows-1250
windows-1251
windows-1253
Big5
EUC-JP
EUC-KR
x-euc-tw
HZ-GB2312
ISO-2022-CN
ISO-2022-KR
ISO-2022-JP
UTF-8
Shift_JIS
UTF-16BE
UTF-16LE
KOI8-R
x-mac-cyrillic
IBM866
IBM855
TIS-620
mozilla/extensions/ univerialchardet/
universialchardet.dll
Not build by default yet. Waiting for super review.
ja_parallel_state_machine
Japanese
UTF-8
Shift_JIS
EUC-JP
ISO-2022-JP
windows-1252
UTF-16BE
UTF-16LE
mozilla/intl/chardet/
chardet.dll

ko_parallel_state_machine
Korean
UTF-8
EUC-KR
ISO-2022-KR
windows-1252
UTF-16BE
UTF-16LE
mozilla/intl/chardet/

chardet.dll

zhtw_parallel_state_machine
Traditional Chinese
UTF-8
Big5
ISO-2022-CN
x-euc-tw
windows-1252
UTF-16BE
UTF-16LE
mozilla/intl/chardet/

 
chardet.dll

zhcn_parallel_state_machine
Simplified Chinese
UTF-8
GB2312
ISO-2022-CN
HZ-GB-2312
windows-1252
UTF-16BE
UTF-16LE
mozilla/intl/chardet/


chardet.dll

zh_parallel_state_machine
Chinese
UTF-8
GB2312
Big5
ISO-2022-CN
HZ-GB-2312
x-euc-tw
windows-1252
UTF-16BE
UTF-16LE
mozilla/intl/chardet/


chardet.dll

cjk_parallel_state_machine
East Asian
UTF-8
Shift_JIS
EUC-JP
ISO-2022-JP
EUC-KR
Big5
x-euc-tw
GB2312
ISO-2022-CN
HZ-GB-2312
windows-1252
UTF-16BE
UTF-16LE
mozilla/intl/chardet/


chardet.dll

ruprob
Russian
windows-1251
KOI8-R
ISO-8859-5
x-mac-cyrillic
IBM866
mozilla/intl/chardet/


chardet.dll

ukprob
Ukrainian
windows-1251
KOI8-U
ISO-8859-5
x-mac-ukrainian
IBM866
mozilla/intl/chardet/


chardet.dll