Apache Server configuration support - robots.txt

This is Interesting: Free IT Magazines  
Home > Archive > Apache Server configuration support > January 2005 > robots.txt





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author robots.txt
John Taylor-Johnston

2005-01-23, 2:52 am

Both urls use /var/www/html. I want /var/www/html/robots.txt to work for ccl.flsh but not for compcanlit. Is this doable? Should I put it elsewhere and create an alias or a redirect?

<VirtualHost *>
ServerName ccl.flsh.usherbrooke.ca
DocumentRoot /var/www/html
</VirtualHost>
<VirtualHost *>
ServerName compcanlit.usherbrooke.ca
DocumentRoot /var/www/html
</VirtualHost>

--
John Taylor-Johnston
-----------------------------------------------------------------------------
°v° Bibliography of Comparative Studies in Canadian, Québec and Foreign Literatures
/(_)\ Université de Sherbrooke
^ ^ http://compcanlit.ca/


Fred Atkinson

2005-01-23, 2:52 am

If I understand you correctly, this is what I think you want
to do. You want to have two domains going to the same subdirectory on
your Webserver. You want to have two separate robots.txt files, one
for each domain (http://www.domain1.com and http://www.domain2.com).

Well, you can't do it. The robots.txt file for each domain
must be in the main directory for the site. There can't be two
robots.txt files in the same directory.

What you can do is have one file that blocks each subdirectory
separately. Example: http://www.domain1.com/myfirstpage and
http://www.domain2.com/mysecondpage.

Make these entries in your robots.txt file and you will
prevent both pages from being scanned by the spiders:

User-agent: *
Disallow: /myfirstpage/
Disallow: /mysecondpage/

If I've misunderstood you, please understand it's late and I'm
a little tired.

If this helps you, then I'm glad.

There is a lot of information about robots.txt at
http://www.robotstxt.org.

Regards,


Fred

John Taylor-Johnston

2005-01-23, 8:48 pm

Hi,

> If I understand you correctly, ... , you can't do it. The robots.txt file for each domain
> must be in the main directory for the site.


Maybe I was not clear enough?

Alias will not work if I placed robots.txt in /var/www/elsewhere? Please see this example below.
My thinking was if robots.txt does not reside in /var/www/html compcanlit crawlers will not see it or be affected by it, but ccl.flsh crawlers will be forced to obey it.

<VirtualHost *>
ServerName ccl.flsh.usherbrooke.ca
DocumentRoot /var/www/html
Alias /robots.txt /var/www/elsewhere/robots.txt
</VirtualHost>
<VirtualHost *>
ServerName compcanlit.usherbrooke.ca
DocumentRoot /var/www/html
</VirtualHost>

(I know how to program robots.txt pretty much.)

John

Fred Atkinson

2005-01-23, 8:48 pm

I must confess that I don't understand why you are doing this
or what you intend to accomplish.

Sorry if I misunderstood you before.


Fred

D. Stussy

2005-01-24, 2:52 am

On Sun, 23 Jan 2005, John Taylor-Johnston wrote:
> Hi,
>
> Maybe I was not clear enough?
>
> Alias will not work if I placed robots.txt in /var/www/elsewhere? Please see this example below.
> My thinking was if robots.txt does not reside in /var/www/html compcanlit crawlers will not see it or be affected by it, but ccl.flsh crawlers will be forced to obey it.
>
> <VirtualHost *>
> ServerName ccl.flsh.usherbrooke.ca
> DocumentRoot /var/www/html
> Alias /robots.txt /var/www/elsewhere/robots.txt
> </VirtualHost>
> <VirtualHost *>
> ServerName compcanlit.usherbrooke.ca
> DocumentRoot /var/www/html
> </VirtualHost>
>
> (I know how to program robots.txt pretty much.)


That could work.

You could also make your robots.txt a dynamic file (i.e. a CGI result) if you
know how to program it directly (and do a redirect to a CGI-type file like
PHP).

However, I don't really see the point. Many if not most systems will only
disallow the cgi-bin directory(-ies) and perhaps some internals (e.g. the error
pages).

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com