Missing robots.txt file


J

Joe Blow

I am experiencing difficulties with a supposed missing "robots.txt"
file. I receive on average 5-6 notifications per day letting me know
that the file was requested was missing, when in fact it is there.

Our server logs indicate that the file is also being accessed
successfully, but and wondering why I am still receiving notifications.

This morning so far:
* NutchCVS/0.06-dev (Nutch; http://www.nutch.org/docs/en/bot.html;
(e-mail address removed))
* msnbot/0.11 (+http://search.msn.com/msnbot.htm)

Can someone shed some light on what is happening?

The web site is www.wiavic.org.au

Thanks,
 
Ad

Advertisements

N

Nik Coughin

Joe said:
I am experiencing difficulties with a supposed missing "robots.txt"
file. I receive on average 5-6 notifications per day letting me know
that the file was requested was missing, when in fact it is there.

Our server logs indicate that the file is also being accessed
successfully, but and wondering why I am still receiving
notifications.

Hate to answer a question with a question (or not answer it as the case may
be) but are spiders only supposed to look for robots.txt in the base
directory, or do they look for it at the entry point from which they start
crawling? Or do spiders check for a copy of robots.txt in every directory
that they crawl?
 
J

Joe Blow

The robots.txt file resides in the root directory of your server. The
file instructs the crawler which directories are accessible and which
are not.
 
B

Big Bill

Hate to answer a question with a question (or not answer it as the case may
be) but are spiders only supposed to look for robots.txt in the base
directory, or do they look for it at the entry point from which they start
crawling? Or do spiders check for a copy of robots.txt in every directory
that they crawl?

It should be in the root dir.

BB
 
B

Big Bill

The robots.txt file resides in the root directory of your server. The
file instructs the crawler which directories are accessible and which
are not.

Being picky, it just says which are not. All files are deemed
accessible by default unless there's a statement against it in the
robots txt.

BB
 
Ad

Advertisements

D

data64

Joe Blow said:
I am experiencing difficulties with a supposed missing "robots.txt"
file. I receive on average 5-6 notifications per day letting me know
that the file was requested was missing, when in fact it is there.

Our server logs indicate that the file is also being accessed
successfully, but and wondering why I am still receiving notifications.

This morning so far:
* NutchCVS/0.06-dev (Nutch; http://www.nutch.org/docs/en/bot.html;
(e-mail address removed))
* msnbot/0.11 (+http://search.msn.com/msnbot.htm)

Have you looked up the corresponding request in the access_log ?

data64
 
Ad

Advertisements


Top