You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.




blue sky: miscellaneous

cache: not the root of all evil
April 28th
Submitted by Paul Phillips <paulp@go2net.com> to Miscellaneous.

The browser cache isn't just a network optimization, it's a living history of the user's recent browsing. Create a full text index of the HTML and plain-text contents of the browser cache and incrementally update it as the user browses. The index can reach back much further in time than the actual contents of the cache, and retrieve non-local documents from the web once again.

It looks like glimpse's license is too restrictive for use in Mozilla development, though they may be willing to relax it for such a purpose. ht://Dig is another possibility despite being GPLed, as the author has indicated to me that he is open to new licenses.

Sometimes it's interesting to track a document's changes over time. Allow a user to mark a document as such a one, and then use the cache to track revisions instead of only keeping the most recent version, using RCS or similar. Periodically check whether these documents have changed (whether or not the user visits them) and track the revision. Disk is cheap and getting cheaper, don't be afraid of using storage so long as you keep the user aware of how much you're using and make it easy to reclaim. Although it was pointed out to me that the cache could easily be using zlib, which would make this even less of an issue.

Used in combination, these features could allow you to search your cache for a document you barely remember it and bring it up although it's been gone from the web for months! And even this only scratches the surface. Consider the Internet Archive, which acts as a cache for the entire web (to the extent they can.) When you hit a 404 not found or on a failed cache refresh, ask Alexa to call up the most recently archived version. Alexa is free but ad-supported, so I'd wager that something could be worked out with Alexa Internet (since archiving the entire web obviously takes substantial resources, there's going to have to be some give and take here.)

This could change the way people think about the web. Joe Bloggs doesn't want to hear about a 404, but maybe Joe can be accommodated more nicely than he has been thus far.