Search Engine Tag?

S

Scott

Is there a tag that I can put on a page that will prevent search
engines from indexing the page? The page is not behind a password.

Thanks!
Scott
 
L

Luigi Donatello Asero

Scott said:
Is there a tag that I can put on a page that will prevent search
engines from indexing the page? The page is not behind a password.

Thanks!
Scott


As far as I know you could insert the adress of the page into a file
called robots.txt and
indicate which search engine you do not want to index it.
But I tried to use this method to stop Google indexing some images and they
are still there, so I do not know whether it works, actually.
 
B

Beauregard T. Shagnasty

Scott said:
Is there a tag that I can put on a page that will prevent search
engines from indexing the page? The page is not behind a password.

<head>
<meta name="robots" content="noindex,nofollow">
....
 
S

Steve Pugh

Luigi said:
As far as I know you could insert the adress of the page into a file
called robots.txt and
indicate which search engine you do not want to index it.
But I tried to use this method to stop Google indexing some images

You should have used robots.txt before Google index the page not
afterwards.

"Prevention is better than cure" and all that.
and they are still there, so I do not know whether it works, actually.

Wait a few months for Google to get around to reindexing your pages and
if you've used robots.txt properly the images should be removed. Or use
the URL removal form on Google to remove them now.

Steve
 
S

Scott

Steve said:
You should have used robots.txt before Google index the page not
afterwards.

"Prevention is better than cure" and all that.


Wait a few months for Google to get around to reindexing your pages and
if you've used robots.txt properly the images should be removed. Or use
the URL removal form on Google to remove them now.

Steve

Steve,

OK, I figured out what to write in robots.txt. What I'm wondering is exactly
where to place that file on the host server.

Scott
 
S

Steve Pugh

Scott said:
OK, I figured out what to write in robots.txt. What I'm wondering is exactly
where to place that file on the host server.

At the root of your site.

If a spider wants to visit http://www.example.com/foo/bar/page.html
then it will look for http://www.example.com/foo/bar/robots.txt,
http://www.example.com/foo/robots.txt and
http://www.example.com/robots.txt and apply all the rules it finds.
From your point of view having a single robots.txt in your root folder
makes for easy maintenance.

Steve
 
?

=?iso-8859-1?Q?Kim_Andr=E9_Aker=F8?=

Steve said:
At the root of your site.

If a spider wants to visit http://www.example.com/foo/bar/page.html
then it will look for http://www.example.com/foo/bar/robots.txt,
http://www.example.com/foo/robots.txt and
http://www.example.com/robots.txt and apply all the rules it finds.
makes for easy maintenance.

Where did you get that idea?
http://www.robotstxt.org/wc/exclusion-admin.html

<quote>
Note that there can only be a single "/robots.txt" on a site.
Specifically, you should not put "robots.txt" files in user
directories, because a robot will never look at them. If you want your
users to be able to create their own "robots.txt", you will need to
merge them all into a single "/robots.txt".
</quote>
 
S

Steve Pugh

Kim said:
Where did you get that idea?

Empirical evidence. Maybe out of date. Maybe robots now follow the
standard, they certainly didn't always. It's been a long time since I
maintained a site that didn't have access to the server root so I
haven't had any direct experience of this part of robots behaviour for
over several years.
http://www.robotstxt.org/wc/exclusion-admin.html

<quote>
Note that there can only be a single "/robots.txt" on a site.
Specifically, you should not put "robots.txt" files in user
directories, because a robot will never look at them. If you want your
users to be able to create their own "robots.txt", you will need to
merge them all into a single "/robots.txt".
</quote>

Learn something new every day.

Steve
 
S

Scott

Steve said:
At the root of your site.

If a spider wants to visit http://www.example.com/foo/bar/page.html
then it will look for http://www.example.com/foo/bar/robots.txt,
http://www.example.com/foo/robots.txt and
http://www.example.com/robots.txt and apply all the rules it finds.
makes for easy maintenance.

Steve

Steve,

So, you're saying I can just upload the robots.txt file to the same place I
upload all my website files? In my case, my web account on the server is
"public_html". And I should configure robots.txt to exclude the one
particular url that I wish not to be indexed?

Thanks!
Scott
 
K

Ken

Hi Scott -

Steve,

So, you're saying I can just upload the robots.txt file to the same place I
upload all my website files? In my case, my web account on the server is
"public_html". And I should configure robots.txt to exclude the one
particular url that I wish not to be indexed?

In the example that Steve gave, according to the standards the robot
would look ONLY for:
http://www.example.com/robots.txt

I don't recall that I have ever seen a robot look for robots.txt other
than in the host root; certainly not in the last several years.

See http://www.robotstxt.org/wc/exclusion.html If you don't have
access to the host root, you can try using the "ROBOTS" META tag
within the individual page(s).
 
S

Scott

In the example that Steve gave, according to the standards the robot
would look ONLY for:
http://www.example.com/robots.txt

I don't recall that I have ever seen a robot look for robots.txt other
than in the host root; certainly not in the last several years.

See http://www.robotstxt.org/wc/exclusion.html If you don't have
access to the host root, you can try using the "ROBOTS" META tag
within the individual page(s).

Ken,

Please pardon my density, but where exactly is the "host root"? Is this the
same place where I upload all my website files to my account on the host's
server?

Thanks!
Scott
 
B

Beauregard T. Shagnasty

Scott said:
Please pardon my density, but where exactly is the "host root"? Is
this the same place where I upload all my website files to my account
on the host's server?

The "root" is your main directory, the place you (usually) have your
main index.html file.
 
K

Ken

Hi Scott -

Please pardon my density, but where exactly is the "host root"? Is this the
same place where I upload all my website files to my account on the host's
server?

The host root is wherever the files reside that are served for
http://www.example.com/[file]

The actual location on the hard drive depends on the server software
and configuration.

For example, the host root for my main website
http://www.ke9nr.net/
is
/save/internet/www/sites/www.ke9nr.net

That's not at all standard. The directory layout is the way that it
is because of the way I have the partitions set up and how I want to
do things. I configured Apache to match my directory structure, not
the other way around. (I have my own domains and my own server so I
can do as I please.)

If you don't have your own domain it is unlikely that you will have
access to the host root. E.g. if your ISP were example.net and your
files are accessed at http://www.example.net/~user/, it's unlikely
that you are going to be able to upload a robots.txt file so that it
is accessible at http://www.example.net/robots.txt. Uploading a
robots.txt so that it is accessible at
http://www.example.net/~user/robots.txt isn't going to work.
 
S

Scott

Ken said:
Hi Scott -

Please pardon my density, but where exactly is the "host root"? Is this the
same place where I upload all my website files to my account on the host's
server?

The host root is wherever the files reside that are served for
http://www.example.com/[file]

The actual location on the hard drive depends on the server software
and configuration.

For example, the host root for my main website
http://www.ke9nr.net/
is
/save/internet/www/sites/www.ke9nr.net

That's not at all standard. The directory layout is the way that it
is because of the way I have the partitions set up and how I want to
do things. I configured Apache to match my directory structure, not
the other way around. (I have my own domains and my own server so I
can do as I please.)

If you don't have your own domain it is unlikely that you will have
access to the host root. E.g. if your ISP were example.net and your
files are accessed at http://www.example.net/~user/, it's unlikely
that you are going to be able to upload a robots.txt file so that it
is accessible at http://www.example.net/robots.txt. Uploading a
robots.txt so that it is accessible at
http://www.example.net/~user/robots.txt isn't going to work.

Ken,

Darn. My website is: www.uslink.net/~golden. It's not my own domain,
so it looks like the host root is out of my reach. The page that I don't
want to be indexed is www.uslink.net/~golden/order1.html.

I'm trying not to use a password. It's only this one page that I want to
prevent from being indexed. Everything else on the site is fair game.

What are the chances that <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
will do the job?

Thanks!
Scott
 
K

Ken Sims

Hi Scott -

Darn. My website is: www.uslink.net/~golden. It's not my own domain,
so it looks like the host root is out of my reach. The page that I don't
want to be indexed is www.uslink.net/~golden/order1.html.

Yes, robots.txt has to be at http://www.uslink.net/robot.txt

Your only option for robots.txt is to see if you can convince USLink
to add a robots.txt with your Disallow. If you click the above link,
you will see that they don't have a robots.txt.
I'm trying not to use a password. It's only this one page that I want to
prevent from being indexed. Everything else on the site is fair game.

What are the chances that <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
will do the job?

It's better than nothing, but that's all I can say.

I think you are at the point where you need your domain. Not just so
that you can have a robots.txt but also to make what you are doing
look more professional.
 
S

Scott

Ken said:
Hi Scott -



Yes, robots.txt has to be at http://www.uslink.net/robot.txt

Your only option for robots.txt is to see if you can convince USLink
to add a robots.txt with your Disallow. If you click the above link,
you will see that they don't have a robots.txt.


It's better than nothing, but that's all I can say.

I think you are at the point where you need your domain. Not just so
that you can have a robots.txt but also to make what you are doing
look more professional.


Ken,

I agree. In fact, the only reason I'm staying with my ISP-provided webspace
is that I don't want to have to start over being found by the search engines
again (although my Google ranking...under "GNLD" has slipped out of the top 20
this past year, but it's still pretty high with Yahoo). Also, my email address
has been around for nine years. I do have my own domain (www.teamone.net) for
a business site I'm starting to build. Then I'll have more control over things.

Scott
 
K

Ken Sims

Hi Scott -

I agree. In fact, the only reason I'm staying with my ISP-provided webspace
is that I don't want to have to start over being found by the search engines
again (although my Google ranking...under "GNLD" has slipped out of the top 20
this past year, but it's still pretty high with Yahoo).

If you can set up 301 redirects, it ought be pretty smooth, both for
the search engines switching over as they attempt to re-spider the old
site, and for users clicking links that lead to the old site.
Also, my email address has been around for nine years.

I'm not suggesting that you get rid of your USLink account.
I do have my own domain (www.teamone.net) for a business site I'm starting to build. >Then I'll have more control over things.

Control is good. I went from a user website on the ISP's domain (like
what you have with USLink), to my own domain with virtual hosting, to
my own domains on a virtual server, to my own domains on my own
physical server that is about six feet away from me. And this is for
non-incoming-producing domains.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,187
Latest member
RosaDemko

Latest Threads

Top