Mozilla Charset Detectors
Frank Yung-Fong Tang <ftang@netscaep.com >What is Charset Detector in Mozilla?
Charset detector in Mozilla is a XPCOM component which receive bytes as incoming data and base on the bytes of data "guess" what the charset of the data is and report it to the caller.What is Not Charset Detector?
Charset detector in Mozilla is not the code module which look document labeled information such as "<META HTTP-EQUIV='content-type CONTENT='text/html; charset=big5'>" . The code which response for looking at charset label is not called charset detector in mozilla. It is called Meta charset observer. Charset Detector is the module which "guess" charset by unreliable data.How to setup default charset detector for your embedding users?
You can set up the default value of which charset detector to be used as default by specify the " intl.charset.detector " value in navigator.properties . The source of navigator.properties is located under mozilla/xpfe/browser/resources/locale/en-US/navigator.properties . In the binary, the file is packaged into en-US.jar (or other language package $LOCALE$.jar) as resources/locale/en-US/navigator.properties ( or resources/locale/$LOCALE$/navigator.properties )Which charset detectors is available?
Value for "intl.charset.detector" |
English Name in charsetTitles.properties |
Charset it can detect |
Implemented by |
Windows Binary Files |
Comment |
Off |
None |
mozilla/intl/chardet/
|
chardet.dll |
||
universal_charset_detector |
Universial |
ISO-8859-2 ISO-8859-5 ISO-8859-7 windows-1250 windows-1251 windows-1253 Big5 EUC-JP EUC-KR x-euc-tw HZ-GB2312 ISO-2022-CN ISO-2022-KR ISO-2022-JP UTF-8 Shift_JIS UTF-16BE UTF-16LE KOI8-R x-mac-cyrillic IBM866 IBM855 TIS-620 |
mozilla/extensions/ univerialchardet/ |
universialchardet.dll |
Not build by default yet. Waiting for super review. |
ja_parallel_state_machine |
Japanese |
UTF-8 Shift_JIS EUC-JP ISO-2022-JP windows-1252 UTF-16BE UTF-16LE |
mozilla/intl/chardet/
|
chardet.dll |
|
ko_parallel_state_machine |
Korean |
UTF-8 EUC-KR ISO-2022-KR windows-1252 UTF-16BE UTF-16LE |
mozilla/intl/chardet/
|
chardet.dll |
|
zhtw_parallel_state_machine |
Traditional Chinese |
UTF-8 Big5 ISO-2022-CN x-euc-tw windows-1252 UTF-16BE UTF-16LE |
mozilla/intl/chardet/
|
chardet.dll |
|
zhcn_parallel_state_machine |
Simplified Chinese |
UTF-8 GB2312 ISO-2022-CN HZ-GB-2312 windows-1252 UTF-16BE UTF-16LE |
mozilla/intl/chardet/
|
chardet.dll |
|
zh_parallel_state_machine |
Chinese |
UTF-8 GB2312 Big5 ISO-2022-CN HZ-GB-2312 x-euc-tw windows-1252 UTF-16BE UTF-16LE |
mozilla/intl/chardet/
|
chardet.dll |
|
cjk_parallel_state_machine |
East Asian |
UTF-8 Shift_JIS EUC-JP ISO-2022-JP EUC-KR Big5 x-euc-tw GB2312 ISO-2022-CN HZ-GB-2312 windows-1252 UTF-16BE UTF-16LE |
mozilla/intl/chardet/
|
chardet.dll |
|
ruprob |
Russian |
windows-1251 KOI8-R ISO-8859-5 x-mac-cyrillic IBM866 |
mozilla/intl/chardet/
|
chardet.dll |
|
ukprob |
Ukrainian |
windows-1251 KOI8-U ISO-8859-5 x-mac-ukrainian IBM866 |
mozilla/intl/chardet/
|
chardet.dll |