Web Servers on Unix and Linux - all HTTP_USER_AGENTs with URLs are robots?

This is Interesting: Free IT Magazines  
Home > Archive > Web Servers on Unix and Linux > January 2008 > all HTTP_USER_AGENTs with URLs are robots?





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author all HTTP_USER_AGENTs with URLs are robots?
jidanni@jidanni.org

2008-01-03, 1:32 am

Gentlemen, to keep robots who disobey robots.txt out, into .htaccess I shall put
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{QUERY_STRING} &
RewriteCond %{HTTP_USER_AGENT} http://.*\.com
RewriteRule . - [F]
</IfModule>
because all HTTP_USER_AGENTs that have a URL in them are robots, no?
Yes, not all robots have a URL in HTTP_USER_AGENT, but at least I need
not maintain goofy lists or scripts while still stopping the big ones.

The only problem is might there be a browser or its plug-in out there
that has a URL in HTTP_USER_AGENT?

Klaus Johannes Rusch

2008-01-03, 7:18 pm

jidanni@jidanni.org wrote:

> The only problem is might there be a browser or its plug-in out there
> that has a URL in HTTP_USER_AGENT?


http://www.zytrax.com/tech/web/browser_ids.htm lists some browsers which
also have http:// in the agent string, not exactly the most popular
browsers.

Plugins and firewalls which rewrite header fields are two other possible
sources of problems with this approach.

--
Klaus Johannes Rusch
KlausRusch@atmedia.net
http://www.atmedia.net/KlausRusch/
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com