Well, since I beat Slashdot to the punch with two of my last three posts, I feel no pangs of redundancy in quoting and linking a Slashdot article now.
"Dan Gillmor is reporting on the White House website's use of its robots.txt file to disable search engines from crawling certain material. Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."
Slashdot: White House Website Limits Iraq-related crawling.
The geekword-filter: a robots.txt file is a text file placed on a website by its administrator. It is not something a visitor to the site normally sees. But when a search engine "visits" the site to index its content, it first checks the robots.txt file to determine what it should and shouldn't look at.
I use an equivalent of the robots.txt file to prevent Google indexing this site's error pages, because nobody in their right minds would be searching for my 404 page, I figure. But using robots.txt as a method of restricting public access to public information is more than a little sinister.
Joseph | 29 Oct 2003
Sorry, comments are not available on this post.