|clean up after yourself
|Submitted by Paul Phillips <email@example.com> to Miscellaneous.|
Back in 1996 we had a column at go2net called "404 Not Found" (because, the slogan went, nobody wants the 404 when looking for the 411.) The only way in was to hit a 404 at go2net. We wrote a few funny screeds about brokenness on the web, and then the amusement value of having a column on the 404 page wore off. The slogan never became funny.
The non-obvious moral of the story is that 404s get tiresome, and there isn't any compelling reason why lots of people should hit the same 404s. When I follow a broken link, the knowledge that the link is broken is suddenly in the hands of two pieces of software, the web server and my browser. For the two of us to both continue with our lives as if nothing had happened is so gauche.
There is an HTML tag for specifying the owner/creator of a page: it looks like <LINK REV="made" HREF="mailto:foo@bar">. Given this, the browser could offer to email the page owner in response to a 404. This isn't much of a solution because too many humans are still in the loop, but it's a start.
To illustrate the dangers of a naive approach to this problem, I've seen some web servers that will automatically email the webmaster at the referring site when they are hit by a 404. While this may sound like a great idea on the surface, consider the plight we are in as operators of the MetaCrawler, where we're dynamically generating links based on external information. We can't tell if they're 404s and we can't fix them if they are. Whiny email that our links are broken is unwelcome.
The problem would be less severe if we had better tools for cleaning
up web sites. Take a look at services like the
Web Site Garage and consider
which of these features would make sense embedded in the browser, and
which would make sense as external tools. I know a ton of you have home
grown tools for performing web site
I want a tool that I can point at my home page tree, and have it tell me which links are dead, and how likely they are to be permanently dead (it would know this by noting that they were "likely dead" last week, too.)
I want part of the generated report to tell me which ones have changed since I last looked at them (by cross-referencing against my history.db file). And how often they change (which it would know by noting whether they had changed last week, too.)
I also want this tool to notice HTTP Location headers, and redirects, and attempt to parse "404" pages in an effort to tell me where the page has moved to. (It doesn't have to be 100% right -- I'd just like the output log to make a guess when it can, to save me from having to figure it out by hand.)
If we had anything like true collaborative browsing we could share 404 information as well as all kinds of other tidbits regarding our web lives, but that's another sky.