You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



The Search for Mozilla


Technical Information

Robert John Churchill (rjc@netscape.com)

Document Creation: April 6, 2000

Introduction:

This document describes the core search functionality in Mozilla from a technical implementation level. For a high-level, non-technical overview, please see the introduction.

Overview:

The core search functionality in Mozilla is a XPCOM component which uses RDF as its data store, Necko for networking support,  and XUL/CSS & JavaScript for its user interface, with a bit of XPConnect for "glue" support .

You can view the source-code for Mozilla's core search component via LXR.
 

Using JavaScript to add new Search Engines

Beginning with the M15 release of Mozilla, the following JavaScript can be used by any web page to add a new search engine (with the user's consent via a confirmation dialog).
For example: Click here to add a search engine for "mozilla.org" into your sidebar's "Search" panel.
 
 

Add this JavaScript function to your web pages to allow people to use your search engine with Mozilla
(including abbreviated search results via Mozilla's "Search" sidebar panel)

(make sure to use the appropriate values for the engine URL, icon URL, engine name, and category name)
function addEngine()
{
    if ((typeof window.sidebar == "object") && (typeof window.sidebar.addSearchEngine == "function"))
    {
        window.sidebar.addSearchEngine(
            "http://www.mozilla.org/projects/search/mozilla.src",  /* engine URL */
            "http://www.mozilla.org/projects/search/mozilla.gif",  /* icon URL */
            "mozilla.org",                                         /* engine name */
            "Web" );                                               /* category name */
    }
    else
    {
        alert("Mozilla M15 or later is required to add a search engine.");
    }
}
Notes:
  • search engine filename extensions must be ".src"
  • search icon filename extensions must be one of the following: ".gif", ".jpg", ".jpeg", or ".png"
  • when using the window.sidebar.addSearchEngine() function, the basename of the engine URL and the icon URL (in this example, "mozilla.src" and "mozilla.gif") must exactly match, except for the filename extension
  • the "engine name" parameter is only used for display in the confirmation dialog (which the user must agree to)
  • if the "category name" doesn't exist, it will be ignored (Note: this behavior may change so that the category is instead created)

Implementation - How it Works


You should become familiar with Apple's technote on Sherlock files.  Mozilla currently supports version 1.3 of the "Sherlock 2" specification with the exception of "LDAP" support.
 
 

Mozilla Extensions - additions to the Sherlock specification

Below is an example of a search file used to Netscape's Netcenter search engine at "http://search.netscape.com/".
 
 
# Netscape Search - Sherlock Plug-in

<SEARCH
    name="Netscape"
    description = "Netscape Search"
    method="GET"
    action="http://search.netscape.com/cgi-bin/search"
    update="http://www.someurl.com/test.src.hqx"
    updateCheckDays="3"
>

<INPUT NAME="search" user>

<INTERPRET
    browserResultType = "category"
    bannerStart="<!--AD_TAG VENDOR=AOL WIDTH=468 HEIGHT=60-->"
    bannerEnd="<!--/AD_TAG-->"
    resultListStart = "Groups of reviewed web sites related to your search term."
    resultListEnd = "Web sites reviewed and categorized by a team of editors."
    resultItemStart = "<LI>"
    >

<INTERPRET
    browserResultType = "result"
    bannerStart="<!--AD_TAG VENDOR=AOL WIDTH=468 HEIGHT=60-->"
    bannerEnd="<!--/AD_TAG-->"
    resultListStart = "Web sites reviewed and categorized by a team of editors."
    resultItemStart = "<LI>"
    >

</SEARCH>
<BROWSER
        alsomatch="http://search.netscape.com/search.tmpl"
        update="http://www.someurl.com/test.src"
        updateIcon="http://www.someurl.com/test.gif"
        updateCheckDays="3"
>

This is an excellent example as it demonstrates usage of multiple <INTERPRET> tags (newly allowed by the "Sherlock 2" specification) as well as Mozilla extensions (in bold-red) to the Sherlock specification.

In every <INTERPRET> section, a "browserResultType" attribute has been added which can be used to specify the "type" of search result that is being used. There are currently two defined types: "category" and "result" (the default type, if unspecified).

"category" types will be displayed in the "Search" sidebar panel with a folder icon, while "result" types will be displayed in the "Search" sidebar panel with the branding icon/image for the search engine.

The <BROWSER> section allows a "alsomatch" attribute to be specified.  Along with the "action" attribute in the <SEARCH> section, the URLs are matched against the current HTML page being displayed in the context area of the browser window to determine if the HTML is for a search result and, if so, is used to display abbreviated search results in the sidebar "Search" panel.  The "alsomatch" attribute comes in handy if a URL re-direct is being used in the "action" attribute, for example.

New for Mozilla M17:

Also, the <BROWSER> section adds "update", "updateIcon" and "updateCheckDays" attributes. While the specification defines a "update" attribute in the <SEARCH> section, unfortunately it is required to be a Macintosh-specific, binhex'ed file (and the filename must end with ".src.hqx").  To be platform-agnostic, Mozilla allows a URL to the text-only, non-binhexed ".src" search file to be specified via the "update" attribute in the <BROWSER> section instead (which MUST end with ".src"), along with a reference to an image via the "updateIcon" attribute (which must be one of the supported image formats) in the <BROWSER> section. For consistency, Mozilla also checks for a "updateCheckDays" attribute in the <BROWSER> section which specifies how many days to check for changes to the search file.

If none of these attributes are specified in the <BROWSER> section, Mozilla WILL look at the "update" attribute in the <SEARCH> section and, if it ends with ".src.hqx", Mozilla will strip off the ".hqx" and use that as the URL to check. So, for example, if <SECTION update="http://www.someurl.com/test.src.hqx"> is specified, Mozilla will check the URL "http://www.someurl.com/test.src" (notice that the .hqx is removed).  By doing this, Mozilla allows a web site to provide BOTH a binhex'ed file as well as a text-only file to be specified so that both the Macintosh as well as Mozilla can use the search file(s).  [Note: By doing this, it is the responsibility of the web site to keep both files in sync.]
 
 

Tracking


Search engines often want the ability to "track" searches through their service.  To accomplish this, one way is to track the number of hits to the URL specified in the "action" attribute in the <SEARCH> section.  If necessary, that URL can even point to a HTTP re-direct page, as long as the "alsomatch" attribute in the <BROWSER> section contains the "final" destination URL [so that if a user goes directly to the "final" URL (for example, by typing in the URL directory or by clicking on a link that takes them there), the Sherlock file in question will still "match" so that the search sidebar panel is able to display an abbreviated list of results for the user's convenience].

Given the above example for the search file used to Netscape's Netcenter search engine at "http://search.netscape.com/", you'll notice that the "action attribute is action="http://search.netscape.com/cgi-bin/search".  If one wanted to use the HTTP re-direct mechanism for page counting (while still having a distinct count of the number of hits directly to that URL), the "action" attribute could be changed to a HTTP redirect page such as action="http://info.netscape.com/fwd/sidb_ns/http://search.netscape.com/cgi-bin/search" and modifying the "alsomatch" attribute to contain the final destination URL, i.e. alsomatch="http://search.netscape.com/cgi-bin/search"

Note: The "alsomatch" attribute in the <BROWSER> section may contain several URLs; separate each with a single space. This comes in handy for complex search engines which have multiple URLs resolve to a final distinct URL.

In the multi-engine search scenario, due to privacy issues it is not possible for a search engine to discover information sent to any other search engine. For example, if a user chooses to do a multi-engine search using search engines "A" and "B", engine "B" can not discover that the user also searched engine "A" at the same time. However, it is possible to differentiate a single-engine search from a multi-engine search in that the HTTP request sent to the server(s) in question will have an additional HTTP header of "MultiSearch: true".
 

Common Problems - what can go wrong and how to fix it

 
Q:  I use the search engine at "http://somerandomwebsite.com/" but its search results never show up in Mozilla's sidebar "Search" panel.
A:  Mozilla determines when to show the abbreviated results in the "Search" sidebar panel by matching the current URL being displayed in the browser window against the "action" attribute in the <SEARCH> section of all installed Sherlock files.  (For the example above, its "http://search.netscape.com/cgi-bin/search").  Basically, the installed Sherlock files are being used as a list of well-known search engines.  You'll need to find or create a Sherlock file for the search engine at "http://somerandomwebsite.com/" to get it to work.

Better yet, send e-mail to "http://somerandomwebsite.com/" and ask them to create their own Sherlock file and use the window.sidebar.AddSearchEngine() JavaScript function (see above) so that when a user visits their site, they can easily use that search engine with Mozilla's sidebar "Search" panel.

Q:  Where should I install new Sherlock files?
A:  (For Mozilla M15 builds) Inside of the directory where Mozilla is installed, look for "res/rdf/datasets/" and place the Sherlock files (in text format) there.  You can add branding images as well... just ensure that the image filename exactly matches the engine filename, except for the extension which must be one of the following: ".gif", ".jpg", ".jpeg", or ".png"

Note: The location of this directory may change.

Q:  I found some Sherlock files on the web, but they don't seem to work with Mozilla.
A:  Sherlock files are often "binhexed" before being made available for download via the web.  You'll need to "unbinhex" these files. The Sherlock files you install into Mozilla must end in ".src" and must be text-only files before installing them in Mozilla.

For Macintosh users: its easy as various compression utilities handle "Binhex" files.
For Windows/Unix platforms: you'll have to either find a utility which understand "Binhex"ed files or locate a text-only version of the file.

Q:  I added a Sherlock file for "http://somerandomwebsite.com/" but Mozilla still doesn't show search results in the sidebar when I search the site.
A:  Examine the Sherlock file and make sure that the "action" attribute in the <SEARCH> section exactly matches the public URL that is being used when searching.

Some Sherlock files actually use private URLs which return search results in a different format that what you get when searching the site via the browser. These types of Sherlock files won't work with Mozilla's sidebar "Search" panel.

Q:  I'm using the  window.sidebar.AddSearchEngine() JavaScript function, but it seems to fail to install the Sherlock file.
A:  Make sure that the engine & icon URLs are correct.

Note: Mozilla currently only supports downloading of the engine and icon files via the "http:" protocol.

Q:  What happens if a search engine changes the HTML they send?  Do Sherlock files become out of date?
A:  If a search engine changes the layout/format of the HTML they send, the Sherlock file needs to be updated.  Sherlock files can specify a "update" attribute in the <SEARCH> section (along with an "updateCheckDays" attribute) which is checked for new updates to the file.

Note: Mozilla M15 builds don't have support for these "update" attributes. Support was added for the Mozilla M17 release.

Q: Why the <BROWSER> "update", "updateIcon" and "updateCheckDays" extensions?
A: Mozilla does not have intrinsic support for Macintosh's BinHex format due to also supporting many other platforms. While it would be technically possible to automatically unbinhex the file and only use the text part, doing so would still not provide support for branding icons.

As mentioned above, if none of these <BROWSER> attributes are specified, Mozilla will take the "update" attribute in the <SEARCH> section, remove the trailing ".hqx" from the URL and use that. By doing this, Mozilla allows a web site to provide BOTH a binhex'ed file as well as a text-only file to be specified so that both the Macintosh as well as Mozilla can use the search file(s).

[Note: By doing this, it is the responsibility of the web site to keep both files in sync.]

The recommendation is that both a <SEARCH> "update" attribute (which must refer to a Macintosh BinHex'ed file) as well as <BROWSER> "update" (to a basic text file) / "updateIcon" / "updateCheckDays" attributes are specified, so that both the Macintosh as well as other platforms will be happy.


 

Additional Reference(s):

Apple Developer Technote 1141: Extending and Controlling Sherlock - http://developer.apple.com/technotes/tn/tn1141.html