simple robots.txt question

?

=?iso-8859-1?Q?Kim_Andr=E9_Aker=F8?=

CRON said:
Hi,
How do i disallow all search engines access to:

http://www.scouttalk.ie/user.php?userID=1


where 1 in the above line can be any number?

Closest thing would be:

User-agent: *
Disallow: /user.php

You can disallow single files or entire directories, but not specific
query strings.

http://www.robotstxt.org/wc/exclusion-admin.html

Keep in mind that the robots.txt file is usually followed by "good"
spiders, such as MSN, Google and Overture. It doesn't specifically
disallow access for search engines, it only serves as a suggestion to
the spiders what they should ignore on their journey; more of a "please
don't include these files/directories in your index".

Rogue spiders/bots might ignore your robots.txt file altogether or even
specifically go to the "disallowed" locations, just to grab exploitable
content.
 
C

CRON

OK thanks,
I guess I'll leave it out then. It's strange that it can't be done. Is
it possible in the page header code to tell the spiders to ignore it?
is there a meta tag maybe?

Cheers,
Ciaran
 
C

CRON

Found this:

<meta name="robots" content="noindex, nofollow">

Apparantly only a few robots support it. Anyone know which ones?
 
N

Nikita the Spider

"CRON said:
Found this:

<meta name="robots" content="noindex, nofollow">

Apparantly only a few robots support it. Anyone know which ones?

Hi Cron,
What makes you say that only a few robots support it? I had always
assumed the opposite; that most robots support it. (Most decent ones,
anyway -- the same that would respect robots.txt.)

Just for the record, Nikita the Spider supports it. =)
 
L

Leonard Blaisdell

<NikitaTheSpider-2A1CFE.19502124072006@news-rdr-02-ge0-1.southeast.rr.co
m>,
Nikita the Spider said:
What makes you say that only a few robots support it? I had always
assumed the opposite; that most robots support it. (Most decent ones,
anyway -- the same that would respect robots.txt.)

I don't think robots are that difficult to create. I seem to remember
that I saw how to create a rudimentary one in a Perl book. If I wanted
to mine information from the net and was unscrupulous, I certainly
wouldn't worry about robots.txt and configure the robot to look for what
I wanted.
I think there are a pile of robots you don't see looking at your site if
it's available through httpd.conf or .htaccess holes. But then again,
I'm often wrong.

leo
 
N

Nikita the Spider

What makes you say that only a few robots support it? I had always
assumed the opposite; that most robots support it. (Most decent ones,
anyway -- the same that would respect robots.txt.)

I saw it on http://www.robotstxt.org/wc/exclusion.html but i think its
mentioned in a few places. try a search for robots meta tag nofollow.[/QUOTE]

That page and all of the pages on robotstxt.org are very old. It is
still the closest thing there is to an authoritative standard, but only
because the standard hasn't changed much, not because the site's been
kept up to date.

Given that the majors (Google, Yahoo & friends) even support the
non-standard nofollow on individual links
(http://blog.searchenginewatch.com/blog/050118-204728), I think it is
safe to assume that they respect it when applied to the whole page.

Cheers
 
N

Nikita the Spider

Leonard Blaisdell said:
<NikitaTheSpider-2A1CFE.19502124072006@news-rdr-02-ge0-1.southeast.rr.co
m>,


I don't think robots are that difficult to create. I seem to remember
that I saw how to create a rudimentary one in a Perl book. If I wanted
to mine information from the net and was unscrupulous, I certainly
wouldn't worry about robots.txt and configure the robot to look for what
I wanted.

I think there are a pile of robots you don't see looking at your site if
it's available through httpd.conf or .htaccess holes. But then again,
I'm often wrong.

True, a quick and sloppy bot is not hard to create. But anyone looking
to use robots.txt or a META noindex/nofollow as security against
unscrupulous or sloppy bots is misguided, regardless of whether such
bots are numerous or few. I think (hope!) the OP understands that.
That's just not what robots.txt and noindex/nofollow were intended for:
dealing with evil or sloppy bots (or nosy human surfers for that matter)
is a job for other technology (like httpd.conf, as you suggest).

So, setting aside the issue that robots.txt doesn't do something it was
not intended to accomplish, it remains an effective way of controlling
well-behaved bots like Googlebot. (And Nikita!)

Cheers
 
A

Andy Dingley

CRON said:
Found this:
<meta name="robots" content="noindex, nofollow">
Apparantly only a few robots support it. Anyone know which ones?

It's not a question of support, it's a question of meaning. The <meta>
tag obviously only applies to that page, and links from that page. It's
also available before needing to retrieve any pages. robots.txt applies
to whole directories and is processed before retrieving pages.

They're complementary, not substitutes. Use both if you wish. Certainly
use robots.txt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top