simple robots.txt question

CRON · Jul 24, 2006

Hi,
How do i disallow all search engines access to:

http://www.scouttalk.ie/user.php?userID=1

where 1 in the above line can be any number?

Thanks a lot
Ciaran

jojo · Jul 24, 2006

CRON said:
How do i disallow all search engines access to:

http://www.scouttalk.ie/user.php?userID=1

where 1 in the above line can be any number?

Put all the pages you want to save in one directory and disallow it for
the whole directory.

=?iso-8859-1?Q?Kim_Andr=E9_Aker=F8?= · Jul 24, 2006

CRON said:
Hi,
How do i disallow all search engines access to:

http://www.scouttalk.ie/user.php?userID=1

where 1 in the above line can be any number?

Closest thing would be:

User-agent: *
Disallow: /user.php

You can disallow single files or entire directories, but not specific
query strings.

http://www.robotstxt.org/wc/exclusion-admin.html

Keep in mind that the robots.txt file is usually followed by "good"
spiders, such as MSN, Google and Overture. It doesn't specifically
disallow access for search engines, it only serves as a suggestion to
the spiders what they should ignore on their journey; more of a "please
don't include these files/directories in your index".

Rogue spiders/bots might ignore your robots.txt file altogether or even
specifically go to the "disallowed" locations, just to grab exploitable
content.

CRON · Jul 24, 2006

OK thanks,
I guess I'll leave it out then. It's strange that it can't be done. Is
it possible in the page header code to tell the spiders to ignore it?
is there a meta tag maybe?

Cheers,
Ciaran

CRON · Jul 24, 2006

Found this:

<meta name="robots" content="noindex, nofollow">

Apparantly only a few robots support it. Anyone know which ones?

Nikita the Spider · Jul 25, 2006

"CRON said:
Found this:

<meta name="robots" content="noindex, nofollow">

Apparantly only a few robots support it. Anyone know which ones?

Hi Cron,
What makes you say that only a few robots support it? I had always
assumed the opposite; that most robots support it. (Most decent ones,
anyway -- the same that would respect robots.txt.)

Just for the record, Nikita the Spider supports it. =)

Leonard Blaisdell · Jul 25, 2006

<NikitaTheSpider-2A1CFE.19502124072006@news-rdr-02-ge0-1.southeast.rr.co
m>,

Nikita the Spider said:
What makes you say that only a few robots support it? I had always
assumed the opposite; that most robots support it. (Most decent ones,
anyway -- the same that would respect robots.txt.)

I don't think robots are that difficult to create. I seem to remember
that I saw how to create a rudimentary one in a Perl book. If I wanted
to mine information from the net and was unscrupulous, I certainly
wouldn't worry about robots.txt and configure the robot to look for what
I wanted.
I think there are a pile of robots you don't see looking at your site if
it's available through httpd.conf or .htaccess holes. But then again,
I'm often wrong.

leo

CRON · Jul 25, 2006

What makes you say that only a few robots support it? I had always
assumed the opposite; that most robots support it. (Most decent ones,
anyway -- the same that would respect robots.txt.)

I saw it on http://www.robotstxt.org/wc/exclusion.html but i think its
mentioned in a few places. try a search for robots meta tag nofollow.

Ciaran

Nikita the Spider · Jul 25, 2006

What makes you say that only a few robots support it? I had always
assumed the opposite; that most robots support it. (Most decent ones,
anyway -- the same that would respect robots.txt.)

I saw it on http://www.robotstxt.org/wc/exclusion.html but i think its
mentioned in a few places. try a search for robots meta tag nofollow.[/QUOTE]

That page and all of the pages on robotstxt.org are very old. It is
still the closest thing there is to an authoritative standard, but only
because the standard hasn't changed much, not because the site's been
kept up to date.

Given that the majors (Google, Yahoo & friends) even support the
non-standard nofollow on individual links
(http://blog.searchenginewatch.com/blog/050118-204728), I think it is
safe to assume that they respect it when applied to the whole page.

Cheers

Nikita the Spider · Jul 25, 2006

Leonard Blaisdell said:
<NikitaTheSpider-2A1CFE.19502124072006@news-rdr-02-ge0-1.southeast.rr.co
m>,

I don't think robots are that difficult to create. I seem to remember
that I saw how to create a rudimentary one in a Perl book. If I wanted
to mine information from the net and was unscrupulous, I certainly
wouldn't worry about robots.txt and configure the robot to look for what
I wanted.

I think there are a pile of robots you don't see looking at your site if
it's available through httpd.conf or .htaccess holes. But then again,
I'm often wrong.

True, a quick and sloppy bot is not hard to create. But anyone looking
to use robots.txt or a META noindex/nofollow as security against
unscrupulous or sloppy bots is misguided, regardless of whether such
bots are numerous or few. I think (hope!) the OP understands that.
That's just not what robots.txt and noindex/nofollow were intended for:
dealing with evil or sloppy bots (or nosy human surfers for that matter)
is a job for other technology (like httpd.conf, as you suggest).

So, setting aside the issue that robots.txt doesn't do something it was
not intended to accomplish, it remains an effective way of controlling
well-behaved bots like Googlebot. (And Nikita!)

Cheers

Andy Dingley · Jul 25, 2006

CRON said:
Found this:
<meta name="robots" content="noindex, nofollow">
Apparantly only a few robots support it. Anyone know which ones?

It's not a question of support, it's a question of meaning. The <meta>
tag obviously only applies to that page, and links from that page. It's
also available before needing to retrieve any pages. robots.txt applies
to whole directories and is processed before retrieving pages.

They're complementary, not substitutes. Use both if you wish. Certainly
use robots.txt

robots.txt	2	Jan 16, 2006
robots.txt	3	Jan 4, 2007
uses of robots.txt	5	Oct 6, 2007
A simple form question	2	Nov 7, 2023
Simple Program	0	Sep 27, 2022
robots.txt and regular expressions?	3	May 3, 2008
Site owners check your site for robots.txt file!	53	Aug 8, 2006
Problem with Python's "robots.txt" file parser in module robotparser	5	Jul 11, 2007

simple robots.txt question

CRON

jojo

=?iso-8859-1?Q?Kim_Andr=E9_Aker=F8?=

CRON

CRON

Nikita the Spider

Leonard Blaisdell

CRON

Nikita the Spider

Nikita the Spider

Andy Dingley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads