Missing robots.txt file

Discussion in 'HTML' started by Joe Blow, Aug 29, 2004.

  1. Joe Blow

    Joe Blow Guest

    I am experiencing difficulties with a supposed missing "robots.txt"
    file. I receive on average 5-6 notifications per day letting me know
    that the file was requested was missing, when in fact it is there.

    Our server logs indicate that the file is also being accessed
    successfully, but and wondering why I am still receiving notifications.

    This morning so far:
    * NutchCVS/0.06-dev (Nutch; http://www.nutch.org/docs/en/bot.html;
    )
    * msnbot/0.11 (+http://search.msn.com/msnbot.htm)

    Can someone shed some light on what is happening?

    The web site is www.wiavic.org.au

    Thanks,
    Joe Blow, Aug 29, 2004
    #1
    1. Advertising

  2. Joe Blow

    Nik Coughin Guest

    Joe Blow wrote:
    > I am experiencing difficulties with a supposed missing "robots.txt"
    > file. I receive on average 5-6 notifications per day letting me know
    > that the file was requested was missing, when in fact it is there.
    >
    > Our server logs indicate that the file is also being accessed
    > successfully, but and wondering why I am still receiving
    > notifications.


    Hate to answer a question with a question (or not answer it as the case may
    be) but are spiders only supposed to look for robots.txt in the base
    directory, or do they look for it at the entry point from which they start
    crawling? Or do spiders check for a copy of robots.txt in every directory
    that they crawl?
    Nik Coughin, Aug 29, 2004
    #2
    1. Advertising

  3. Joe Blow

    Joe Blow Guest

    The robots.txt file resides in the root directory of your server. The
    file instructs the crawler which directories are accessible and which
    are not.


    Nik Coughin wrote:
    >
    > Joe Blow wrote:
    > > I am experiencing difficulties with a supposed missing "robots.txt"
    > > file. I receive on average 5-6 notifications per day letting me know
    > > that the file was requested was missing, when in fact it is there.
    > >
    > > Our server logs indicate that the file is also being accessed
    > > successfully, but and wondering why I am still receiving
    > > notifications.

    >
    > Hate to answer a question with a question (or not answer it as the case may
    > be) but are spiders only supposed to look for robots.txt in the base
    > directory, or do they look for it at the entry point from which they start
    > crawling? Or do spiders check for a copy of robots.txt in every directory
    > that they crawl?
    Joe Blow, Aug 29, 2004
    #3
  4. Joe Blow

    Big Bill Guest

    On Mon, 30 Aug 2004 09:47:35 +1200, "Nik Coughin"
    <nrkn!no-spam!@woosh.co.nz> wrote:

    >Joe Blow wrote:
    >> I am experiencing difficulties with a supposed missing "robots.txt"
    >> file. I receive on average 5-6 notifications per day letting me know
    >> that the file was requested was missing, when in fact it is there.
    >>
    >> Our server logs indicate that the file is also being accessed
    >> successfully, but and wondering why I am still receiving
    >> notifications.

    >
    >Hate to answer a question with a question (or not answer it as the case may
    >be) but are spiders only supposed to look for robots.txt in the base
    >directory, or do they look for it at the entry point from which they start
    >crawling? Or do spiders check for a copy of robots.txt in every directory
    >that they crawl?


    It should be in the root dir.

    BB
    Big Bill, Aug 30, 2004
    #4
  5. Joe Blow

    Big Bill Guest

    On Sun, 29 Aug 2004 22:53:19 GMT, Joe Blow <> wrote:

    >The robots.txt file resides in the root directory of your server. The
    >file instructs the crawler which directories are accessible and which
    >are not.


    Being picky, it just says which are not. All files are deemed
    accessible by default unless there's a statement against it in the
    robots txt.

    BB


    >Nik Coughin wrote:
    >>
    >> Joe Blow wrote:
    >> > I am experiencing difficulties with a supposed missing "robots.txt"
    >> > file. I receive on average 5-6 notifications per day letting me know
    >> > that the file was requested was missing, when in fact it is there.
    >> >
    >> > Our server logs indicate that the file is also being accessed
    >> > successfully, but and wondering why I am still receiving
    >> > notifications.

    >>
    >> Hate to answer a question with a question (or not answer it as the case may
    >> be) but are spiders only supposed to look for robots.txt in the base
    >> directory, or do they look for it at the entry point from which they start
    >> crawling? Or do spiders check for a copy of robots.txt in every directory
    >> that they crawl?
    Big Bill, Aug 30, 2004
    #5
  6. Joe Blow

    data64 Guest

    Joe Blow <> wrote in news::

    > I am experiencing difficulties with a supposed missing "robots.txt"
    > file. I receive on average 5-6 notifications per day letting me know
    > that the file was requested was missing, when in fact it is there.
    >
    > Our server logs indicate that the file is also being accessed
    > successfully, but and wondering why I am still receiving notifications.
    >
    > This morning so far:
    > * NutchCVS/0.06-dev (Nutch; http://www.nutch.org/docs/en/bot.html;
    > )
    > * msnbot/0.11 (+http://search.msn.com/msnbot.htm)
    >


    Have you looked up the corresponding request in the access_log ?

    data64
    data64, Aug 30, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    53
    Views:
    2,126
    John Bokma
    Aug 26, 2006
  2. John Nagle
    Replies:
    5
    Views:
    447
    Nikita the Spider
    Jul 13, 2007
  3. John Nagle
    Replies:
    5
    Views:
    1,053
    Nikita the Spider
    Oct 4, 2007
  4. Cal Who
    Replies:
    8
    Views:
    796
    Cal Who
    Jun 14, 2010
  5. Tim w

    meta robots and robots txt

    Tim w, May 22, 2014, in forum: HTML
    Replies:
    1
    Views:
    119
Loading...

Share This Page