The Search for Mozilla
Technical Information
Document Creation: April 6, 2000
Introduction:
This document describes the core search functionality in Mozilla from a technical implementation level. For a high-level, non-technical overview, please see the introduction.Overview:
The core search functionality in Mozilla is a XPCOM component which uses RDF as its data store, Necko for networking support, and XUL/CSS & JavaScript for its user interface, with a bit of XPConnect for "glue" support .You can view
the source-code for Mozilla's core search component via LXR.
Using JavaScript to add new Search Engines
Beginning with the M15 release of Mozilla, the following JavaScript can be used by any web page to add a new search engine (with the user's consent via a confirmation dialog).For example: Click here to add a search engine for "mozilla.org" into your sidebar's "Search" panel.
Add this JavaScript function to your web pages to allow people to use your
search engine with Mozilla
(make sure to use the appropriate values for the engine URL,
icon URL, engine name, and category name) |
function addEngine() { if ((typeof window.sidebar == "object") && (typeof window.sidebar.addSearchEngine == "function")) { window.sidebar.addSearchEngine( "http://www.mozilla.org/projects/search/mozilla.src", /* engine URL */ "http://www.mozilla.org/projects/search/mozilla.gif", /* icon URL */ "mozilla.org", /* engine name */ "Web" ); /* category name */ } else { alert("Mozilla M15 or later is required to add a search engine."); } } |
Notes:
|
Implementation - How it Works
You should become familiar with Apple's
technote on Sherlock files. Mozilla currently supports version
1.3 of the "Sherlock 2" specification with the exception of "LDAP" support.
Mozilla Extensions - additions to the Sherlock specification
Below is an example of a search file used to Netscape's Netcenter search engine at "http://search.netscape.com/".
# Netscape Search - Sherlock Plug-in <SEARCH name="Netscape" description = "Netscape Search" method="GET" action="http://search.netscape.com/cgi-bin/search" update="http://www.someurl.com/test.src.hqx" updateCheckDays="3" > <INPUT NAME="search" user> <INTERPRET browserResultType = "category" bannerStart="<!--AD_TAG VENDOR=AOL WIDTH=468 HEIGHT=60-->" bannerEnd="<!--/AD_TAG-->" resultListStart = "Groups of reviewed web sites related to your search term." resultListEnd = "Web sites reviewed and categorized by a team of editors." resultItemStart = "<LI>" > <INTERPRET browserResultType = "result" bannerStart="<!--AD_TAG VENDOR=AOL WIDTH=468 HEIGHT=60-->" bannerEnd="<!--/AD_TAG-->" resultListStart = "Web sites reviewed and categorized by a team of editors." resultItemStart = "<LI>" > </SEARCH> <BROWSER alsomatch="http://search.netscape.com/search.tmpl" update="http://www.someurl.com/test.src" updateIcon="http://www.someurl.com/test.gif" updateCheckDays="3" > |
This is an excellent example as it demonstrates usage of multiple <INTERPRET> tags (newly allowed by the "Sherlock 2" specification) as well as Mozilla extensions (in bold-red) to the Sherlock specification.
In every <INTERPRET> section, a "browserResultType" attribute has been added which can be used to specify the "type" of search result that is being used. There are currently two defined types: "category" and "result" (the default type, if unspecified).
"category" types will be displayed in the "Search" sidebar panel with a folder icon, while "result" types will be displayed in the "Search" sidebar panel with the branding icon/image for the search engine.
The <BROWSER> section allows a "alsomatch" attribute to be specified. Along with the "action" attribute in the <SEARCH> section, the URLs are matched against the current HTML page being displayed in the context area of the browser window to determine if the HTML is for a search result and, if so, is used to display abbreviated search results in the sidebar "Search" panel. The "alsomatch" attribute comes in handy if a URL re-direct is being used in the "action" attribute, for example.
New for Mozilla M17:
Also, the <BROWSER> section adds "update", "updateIcon" and "updateCheckDays" attributes. While the specification defines a "update" attribute in the <SEARCH> section, unfortunately it is required to be a Macintosh-specific, binhex'ed file (and the filename must end with ".src.hqx"). To be platform-agnostic, Mozilla allows a URL to the text-only, non-binhexed ".src" search file to be specified via the "update" attribute in the <BROWSER> section instead (which MUST end with ".src"), along with a reference to an image via the "updateIcon" attribute (which must be one of the supported image formats) in the <BROWSER> section. For consistency, Mozilla also checks for a "updateCheckDays" attribute in the <BROWSER> section which specifies how many days to check for changes to the search file.
If none of these attributes are specified in the <BROWSER> section,
Mozilla WILL look at the "update" attribute in the <SEARCH> section
and, if it ends with ".src.hqx", Mozilla will strip off the ".hqx" and
use that as the URL to check. So, for example, if <SECTION update="http://www.someurl.com/test.src.hqx">
is specified, Mozilla will check the URL "http://www.someurl.com/test.src"
(notice that the .hqx is removed).
By doing this, Mozilla allows a web site to provide BOTH a binhex'ed file
as well as a text-only file to be specified so that both the Macintosh
as well as Mozilla can use the search file(s). [Note: By doing this,
it is the responsibility of the web site to keep both files in sync.]
Tracking
Search engines often want the ability to "track" searches through
their service. To accomplish this, one way is to track the number
of hits to the URL specified in the "action" attribute in the <SEARCH>
section. If necessary, that URL can even point to a HTTP re-direct
page, as long as the "alsomatch" attribute in the <BROWSER> section
contains the "final" destination URL [so that if a user goes directly to
the "final" URL (for example, by typing in the URL directory or by clicking
on a link that takes them there), the Sherlock file in question will still
"match" so that the search sidebar panel is able to display an abbreviated
list of results for the user's convenience].
Given the above example for the search file used to Netscape's Netcenter search engine at "http://search.netscape.com/", you'll notice that the "action attribute is action="http://search.netscape.com/cgi-bin/search". If one wanted to use the HTTP re-direct mechanism for page counting (while still having a distinct count of the number of hits directly to that URL), the "action" attribute could be changed to a HTTP redirect page such as action="http://info.netscape.com/fwd/sidb_ns/http://search.netscape.com/cgi-bin/search" and modifying the "alsomatch" attribute to contain the final destination URL, i.e. alsomatch="http://search.netscape.com/cgi-bin/search"
Note: The "alsomatch" attribute in the <BROWSER> section may contain several URLs; separate each with a single space. This comes in handy for complex search engines which have multiple URLs resolve to a final distinct URL.
In the multi-engine search scenario, due to privacy issues it is not
possible for a search engine to discover information sent to any other
search engine. For example, if a user chooses to do a multi-engine search
using search engines "A" and "B", engine "B" can not discover that the
user also searched engine "A" at the same time. However, it is possible
to differentiate a single-engine search from a multi-engine search in that
the HTTP request sent to the server(s) in question will have an additional
HTTP header of "MultiSearch: true".
Common Problems - what can go wrong and how to fix it
Q: I use the search engine at "http://somerandomwebsite.com/" but its search results never show up in Mozilla's sidebar "Search" panel. |
A: Mozilla determines when to show the
abbreviated results in the "Search" sidebar panel by matching the current
URL being displayed in the browser window against the "action" attribute
in the <SEARCH> section of all installed Sherlock files. (For
the example above, its "http://search.netscape.com/cgi-bin/search").
Basically, the installed Sherlock files are being used as a list of well-known
search engines. You'll need to find or create a Sherlock file for
the search engine at "http://somerandomwebsite.com/" to get it to work.
Better yet, send e-mail to "http://somerandomwebsite.com/" and ask them to create their own Sherlock file and use the window.sidebar.AddSearchEngine() JavaScript function (see above) so that when a user visits their site, they can easily use that search engine with Mozilla's sidebar "Search" panel. |
Q: Where should I install new Sherlock files? |
A: (For Mozilla M15 builds) Inside of
the directory where Mozilla is installed, look for "res/rdf/datasets/"
and place the Sherlock files (in text format) there. You can add
branding images as well... just ensure that the image filename exactly
matches the engine filename, except for the extension which must be one
of the following: ".gif", ".jpg", ".jpeg", or ".png"
Note: The location of this directory may change. |
Q: I found some Sherlock files on the web, but they don't seem to work with Mozilla. |
A: Sherlock files are often "binhexed"
before being made available for download via the web. You'll need
to "unbinhex" these files. The Sherlock files you install into Mozilla
must end in ".src" and must be text-only files before installing them in
Mozilla.
For Macintosh users: its easy as various compression utilities handle
"Binhex" files.
|
Q: I added a Sherlock file for "http://somerandomwebsite.com/" but Mozilla still doesn't show search results in the sidebar when I search the site. |
A: Examine the Sherlock file and make
sure that the "action" attribute in the <SEARCH> section exactly matches
the public URL that is being used when searching.
Some Sherlock files actually use private URLs which return search results in a different format that what you get when searching the site via the browser. These types of Sherlock files won't work with Mozilla's sidebar "Search" panel. |
Q: I'm using the window.sidebar.AddSearchEngine() JavaScript function, but it seems to fail to install the Sherlock file. |
A: Make sure that the engine & icon
URLs are correct.
Note: Mozilla currently only supports downloading of the engine and icon files via the "http:" protocol. |
Q: What happens if a search engine changes the HTML they send? Do Sherlock files become out of date? |
A: If a search engine changes the layout/format
of the HTML they send, the Sherlock file needs to be updated. Sherlock
files can specify a "update" attribute in the <SEARCH> section (along
with an "updateCheckDays" attribute) which is checked for new updates to
the file.
Note: Mozilla M15 builds don't have support for these "update" attributes. Support was added for the Mozilla M17 release. |
Q: Why the <BROWSER> "update", "updateIcon" and "updateCheckDays" extensions? |
A: Mozilla does not have intrinsic
support for Macintosh's BinHex format due to also supporting many other
platforms. While it would be technically possible to automatically unbinhex
the file and only use the text part, doing so would still not provide support
for branding icons.
As mentioned above, if none of these <BROWSER> attributes are specified, Mozilla will take the "update" attribute in the <SEARCH> section, remove the trailing ".hqx" from the URL and use that. By doing this, Mozilla allows a web site to provide BOTH a binhex'ed file as well as a text-only file to be specified so that both the Macintosh as well as Mozilla can use the search file(s). [Note: By doing this, it is the responsibility of the web site to keep both files in sync.] The recommendation is that both a <SEARCH> "update" attribute (which must refer to a Macintosh BinHex'ed file) as well as <BROWSER> "update" (to a basic text file) / "updateIcon" / "updateCheckDays" attributes are specified, so that both the Macintosh as well as other platforms will be happy. |