simple robots.txt question

Discussion in 'HTML' started by CRON, Jul 24, 2006.

  1. CRON

    CRON Guest

    CRON, Jul 24, 2006
    #1
    1. Advertising

  2. CRON

    jojo Guest

    CRON wrote:

    > How do i disallow all search engines access to:
    >
    > http://www.scouttalk.ie/user.php?userID=1
    >
    >
    > where 1 in the above line can be any number?
    >


    Put all the pages you want to save in one directory and disallow it for
    the whole directory.
     
    jojo, Jul 24, 2006
    #2
    1. Advertising

  3. CRON wrote:

    > Hi,
    > How do i disallow all search engines access to:
    >
    > http://www.scouttalk.ie/user.php?userID=1
    >
    >
    > where 1 in the above line can be any number?


    Closest thing would be:

    User-agent: *
    Disallow: /user.php

    You can disallow single files or entire directories, but not specific
    query strings.

    http://www.robotstxt.org/wc/exclusion-admin.html

    Keep in mind that the robots.txt file is usually followed by "good"
    spiders, such as MSN, Google and Overture. It doesn't specifically
    disallow access for search engines, it only serves as a suggestion to
    the spiders what they should ignore on their journey; more of a "please
    don't include these files/directories in your index".

    Rogue spiders/bots might ignore your robots.txt file altogether or even
    specifically go to the "disallowed" locations, just to grab exploitable
    content.

    --
    Kim André Akerø
    -
    (remove NOSPAM to contact me directly)
     
    =?iso-8859-1?Q?Kim_Andr=E9_Aker=F8?=, Jul 24, 2006
    #3
  4. CRON

    CRON Guest

    OK thanks,
    I guess I'll leave it out then. It's strange that it can't be done. Is
    it possible in the page header code to tell the spiders to ignore it?
    is there a meta tag maybe?

    Cheers,
    Ciaran
     
    CRON, Jul 24, 2006
    #4
  5. CRON

    CRON Guest

    Found this:

    <meta name="robots" content="noindex, nofollow">

    Apparantly only a few robots support it. Anyone know which ones?
     
    CRON, Jul 24, 2006
    #5
  6. In article <>,
    "CRON" <> wrote:

    > Found this:
    >
    > <meta name="robots" content="noindex, nofollow">
    >
    > Apparantly only a few robots support it. Anyone know which ones?


    Hi Cron,
    What makes you say that only a few robots support it? I had always
    assumed the opposite; that most robots support it. (Most decent ones,
    anyway -- the same that would respect robots.txt.)

    Just for the record, Nikita the Spider supports it. =)

    --
    Philip
    http://NikitaTheSpider.com/
    Whole-site HTML validation, link checking and more
     
    Nikita the Spider, Jul 25, 2006
    #6
  7. In article
    <
    m>,
    Nikita the Spider <> wrote:

    > What makes you say that only a few robots support it? I had always
    > assumed the opposite; that most robots support it. (Most decent ones,
    > anyway -- the same that would respect robots.txt.)


    I don't think robots are that difficult to create. I seem to remember
    that I saw how to create a rudimentary one in a Perl book. If I wanted
    to mine information from the net and was unscrupulous, I certainly
    wouldn't worry about robots.txt and configure the robot to look for what
    I wanted.
    I think there are a pile of robots you don't see looking at your site if
    it's available through httpd.conf or .htaccess holes. But then again,
    I'm often wrong.

    leo

    --
    <http://web0.greatbasin.net/~leo/>
     
    Leonard Blaisdell, Jul 25, 2006
    #7
  8. CRON

    CRON Guest


    > What makes you say that only a few robots support it? I had always
    > assumed the opposite; that most robots support it. (Most decent ones,
    > anyway -- the same that would respect robots.txt.)


    I saw it on http://www.robotstxt.org/wc/exclusion.html but i think its
    mentioned in a few places. try a search for robots meta tag nofollow.

    Ciaran
     
    CRON, Jul 25, 2006
    #8
  9. In article <>,
    "CRON" <> wrote:

    > > What makes you say that only a few robots support it? I had always
    > > assumed the opposite; that most robots support it. (Most decent ones,
    > > anyway -- the same that would respect robots.txt.)

    >
    > I saw it on http://www.robotstxt.org/wc/exclusion.html but i think its
    > mentioned in a few places. try a search for robots meta tag nofollow.


    That page and all of the pages on robotstxt.org are very old. It is
    still the closest thing there is to an authoritative standard, but only
    because the standard hasn't changed much, not because the site's been
    kept up to date.

    Given that the majors (Google, Yahoo & friends) even support the
    non-standard nofollow on individual links
    (http://blog.searchenginewatch.com/blog/050118-204728), I think it is
    safe to assume that they respect it when applied to the whole page.

    Cheers

    --
    Philip
    http://NikitaTheSpider.com/
    Whole-site HTML validation, link checking and more
     
    Nikita the Spider, Jul 25, 2006
    #9
  10. In article <-sjc.supernews.net>,
    Leonard Blaisdell <> wrote:

    > In article
    > <
    > m>,
    > Nikita the Spider <> wrote:
    >
    > > What makes you say that only a few robots support it? I had always
    > > assumed the opposite; that most robots support it. (Most decent ones,
    > > anyway -- the same that would respect robots.txt.)

    >
    > I don't think robots are that difficult to create. I seem to remember
    > that I saw how to create a rudimentary one in a Perl book. If I wanted
    > to mine information from the net and was unscrupulous, I certainly
    > wouldn't worry about robots.txt and configure the robot to look for what
    > I wanted.
    >
    > I think there are a pile of robots you don't see looking at your site if
    > it's available through httpd.conf or .htaccess holes. But then again,
    > I'm often wrong.


    True, a quick and sloppy bot is not hard to create. But anyone looking
    to use robots.txt or a META noindex/nofollow as security against
    unscrupulous or sloppy bots is misguided, regardless of whether such
    bots are numerous or few. I think (hope!) the OP understands that.
    That's just not what robots.txt and noindex/nofollow were intended for:
    dealing with evil or sloppy bots (or nosy human surfers for that matter)
    is a job for other technology (like httpd.conf, as you suggest).

    So, setting aside the issue that robots.txt doesn't do something it was
    not intended to accomplish, it remains an effective way of controlling
    well-behaved bots like Googlebot. (And Nikita!)

    Cheers

    --
    Philip
    http://NikitaTheSpider.com/
    Whole-site HTML validation, link checking and more
     
    Nikita the Spider, Jul 25, 2006
    #10
  11. CRON

    Andy Dingley Guest

    CRON wrote:

    > Found this:
    > <meta name="robots" content="noindex, nofollow">
    > Apparantly only a few robots support it. Anyone know which ones?


    It's not a question of support, it's a question of meaning. The <meta>
    tag obviously only applies to that page, and links from that page. It's
    also available before needing to retrieve any pages. robots.txt applies
    to whole directories and is processed before retrieving pages.

    They're complementary, not substitutes. Use both if you wish. Certainly
    use robots.txt
     
    Andy Dingley, Jul 25, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Frankie

    OT: Opinions on Robots.txt

    Frankie, Oct 9, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    1,053
    S. Justin Gengo
    Oct 10, 2005
  2. Daniel Vesma
    Replies:
    15
    Views:
    1,550
    Jacqui or (maybe) Pete
    Jul 2, 2003
  3. Neil White

    Re: robots.txt

    Neil White, Aug 8, 2003, in forum: HTML
    Replies:
    0
    Views:
    415
    Neil White
    Aug 8, 2003
  4. lostinspace

    Re: robots.txt

    lostinspace, Aug 8, 2003, in forum: HTML
    Replies:
    0
    Views:
    395
    lostinspace
    Aug 8, 2003
  5. Tim w

    meta robots and robots txt

    Tim w, May 22, 2014, in forum: HTML
    Replies:
    1
    Views:
    163
Loading...

Share This Page