You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



Specify charset in internet search data set file

8/16/2000 nhotta@netscape.com

Link to the original Apple spec
Technote 1141 Extending and Controlling Sherlock

Charset related tag attributes.
queryCharset - a charset name string for server queries, specify in "SEARCH" section
charset - a charset name string used by server result html, specify in "INTERPRET" section
queryEncoding - a charset encoding ID for server queries, specify in "SEARCH" section
resultEncoding - a charset encoding ID used by server result html, specify in "INTERPRET" section

Default
If no charset is specified "ISO-8859-1" is used as a default charset.

Precedence
Charset has higher precedence over encoding.
For example, if both "queryCharset" and "queryEncoding" are specified in a data set file, "queryCharset" is used.

Example 1:
Specifying a query charset as "UTF-8" in SEARCH section.

<SEARCH
queryCharset="UTF-8"
 name="Netscape Search"
 description = "Netscape Search"
...............................................
>
...............................................
</SEARCH>
 

Example 2:
Specifying queryEncoding as "Shift_JIS" in SEARCH section and resultEncoding as "EUC-JP" in INTERPRET section.

<SEARCH
...............................................
 queryEncoding="2561"
 ...............................................
<INTERPRET
 charset = "EUC-JP"
 resultEncoding = "2336"
 ...............................................
>
</SEARCH>

Mapping table between encoding IDs and charset names.
(defined in nsInternetSearchService.cpp MapEncoding())

  { "0", "x-mac-roman" },
  { "6", "x-mac-greek" },
  { "35", "x-mac-turkish" },
  { "513", "ISO-8859-1" },
  { "514", "ISO-8859-2" },
  { "517", "ISO-8859-5" },
  { "518", "ISO-8859-6" },
  { "519", "ISO-8859-7" },
  { "520", "ISO-8859-8" },
  { "521", "ISO-8859-9" },
  { "1049", "IBM864" },
  { "1280", "windows-1252" },
  { "1281", "windows-1250" },
  { "1282", "windows-1251" },
  { "1283", "windows-1253" },
  { "1284", "windows-1254" },
  { "1285", "windows-1255" },
  { "1286", "windows-1256" },
  { "1536", "us-ascii" },
  { "1584", "GB2312" },
  { "1585", "x-gbk"  },
  { "1600", "EUC-KR" },
  { "2080", "ISO-2022-JP" },
  { "2096", "ISO-2022-CN" },
  { "2112", "ISO-2022-KR" },
  { "2336", "EUC-JP" },
  { "2352", "GB2312" },
  { "2353", "x-euc-tw" },
  { "2368", "EUC-KR" },
  { "2561", "Shift_JIS" },
  { "2562", "KOI8-R" },
  { "2563", "Big5"  },
  { "2565", "HZ-GB-2312" },