You are currently viewing a snapshot of taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to, please file a bug.

blue sky: miscellaneous

collaborative bookmark indexing
May 16th
Submitted by Michael Bayne <> to Miscellaneous.

Recently a new web indexing service was developed at Stanford University by the name of Google which is based on the debatably innovative idea of determining a document's significance based on the significance of documents that link to it.

If you have the time, feel free to jump down and read a motivational episode about how much of a pain in the neck it was to locate the Google site not having my bookmark to it handily available. If you don't have the time, you can just keep reading with no loss of continuity.

The idea is that Google indexes web pages that are referenced by other "important" pages so that when you search, you are searching through what ends up being the useful subset of all the crap available on the Internet. However, the group of people that dictate this importance are either those who run major web sites (they hold clout because a bunch of people link to them and whatever they link to becomes important by association), or the slightly less than teeming masses of people who put up personal web pages that link to all the stuff they think is cool.

Frankly, I find that neither of these things suit my needs. What I would really find useful is to collect those significance metrics from the millions of reasonable individuals out there who don't happen to have a home page full of links but who have happened to locate and evaluate a few of the myriad resources and bookmarked what they found to be useful.

I envision a system whereby bookmarks are periodically submitted to a central server (anonymously of course). Those bookmarks are then used to seed a web index that full-text indexes the contents of the bookmarked pages and perhaps "nearby" pages. The index could be searched via normal mechanisms of word count and word proximity, and would be ranked in addition to those factors by their bookmark popularity count.

How does this relate to Mozilla you might be asking yourself? Well, if it's not completely effort-free to contribute to this bookmark index, people won't do it. So I'm calling for support built into the browser. Clearly it's a very small amount of code that would ship the bookmarks off to a server (perhaps a little more if a network of anonymizing proxies was desired to ultimately prevent the the tracing of bookmarks back to their owners). Then a simple installation-time dialog could be popped up asking the user if they'd like to participate in the index (with facilities in the preferences panel to allow them to change their mind).

Someone would certainly have to operate the web indexing service and that would likely be a commercial service. This would probably mean web advertising since it would be audacious to turn around and charge those that contribute their bookmarks for the use of the service.

It's possible that something like this could operate on a smaller scale. If the server software were freely available, people with vested interest in particular topics could set up smaller indexing services to which people contributed subsets of their bookmarks relevant to the topic. My suspicion is that this would be more problem-prone than the all or nothing approach but it remains a possibility.

In search of the Google
A tale of web searching

The name of the Google search engine had slipped my mind, and not happening to be at work where I had it bookmarked, all I had to go on was that I remembered that it was from Stanford. So I had to go looking for it.

First I tried the MetaCrawler which searches all the major search engines and that miserably failed to turn up any trace (which is somewhat embarassing considering that I wrote the commercial implementation of the MetaCrawler but don't blame me, blame the underlying engines). Later I realized that I had been trying searches that were too specific, but that comes as no surprise because I'm not much better than the next guy when it comes to finding stuff on the web.

That having failed, I figured that it must be the recency of the thing that has prevented it from being indexed by the major search engines, so I decided to search on to see if maybe they did an article on it. No dice. In fact, I got side tracked at this point because I noticed an article about plans for a new Amiga and being a die hard Amiga fan (as all Amiga fans are), I just had to read it.

Getting back to my search, I decided to head over to Stanford's web site. Bad idea; miserable failure; ended up wading through mailing list archives where people were talking about proposed additions to robots.txt hoping for a serendipitous reference to a Stanford native web indexing project.

Then by some stroke of inspiration, I remembered that it was called "Google" and headed back to the MetaCrawler and searched on that term alone and lo and behold it was the first result.

Hopefully the relation of this little searching experience will strike a chord in those of you who find yourselves having similar experiences on as regular a basis as I do. Hopefully we'll all realize that something is sorely wrong with both our collective abilities to manage links to resources on the Internet and, by extension, the facilities available for finding those resources.

Back up to the good stuff.