Search Engine Tag?

Discussion in 'HTML' started by Scott, Nov 20, 2005.

  1. Scott

    Scott Guest

    Is there a tag that I can put on a page that will prevent search
    engines from indexing the page? The page is not behind a password.

    Thanks!
    Scott
    Scott, Nov 20, 2005
    #1
    1. Advertising

  2. "Scott" <> skrev i meddelandet
    news:...
    > Is there a tag that I can put on a page that will prevent search
    > engines from indexing the page? The page is not behind a password.
    >
    > Thanks!
    > Scott



    As far as I know you could insert the adress of the page into a file
    called robots.txt and
    indicate which search engine you do not want to index it.
    But I tried to use this method to stop Google indexing some images and they
    are still there, so I do not know whether it works, actually.

    --
    Luigi Donatello Asero
    https://www.scaiecat-spa-gigi.com/de/italien/ligurien/ferienwohnung-in-le-cinque-terre-kueche.php
    Luigi Donatello Asero, Nov 20, 2005
    #2
    1. Advertising

  3. Scott wrote:

    > Is there a tag that I can put on a page that will prevent search
    > engines from indexing the page? The page is not behind a password.


    <head>
    <meta name="robots" content="noindex,nofollow">
    ....

    --
    -bts
    -Warning: I brake for lawn deer
    Beauregard T. Shagnasty, Nov 20, 2005
    #3
  4. Scott

    bernhard Guest

    Scott schrieb:

    > Is there a tag that I can put on a page that will prevent search
    > engines from indexing the page? The page is not behind a password.


    You'll find the anserw here: http://www.robotstxt.org/wc/exclusion.html

    --

    bernhard
    bernhard, Nov 20, 2005
    #4
  5. Scott

    Steve Pugh Guest

    Luigi Donatello Asero wrote:
    > "Scott" <> skrev i meddelandet
    > news:...
    > > Is there a tag that I can put on a page that will prevent search
    > > engines from indexing the page? The page is not behind a password.

    >
    > As far as I know you could insert the adress of the page into a file
    > called robots.txt and
    > indicate which search engine you do not want to index it.
    > But I tried to use this method to stop Google indexing some images


    You should have used robots.txt before Google index the page not
    afterwards.

    "Prevention is better than cure" and all that.

    > and they are still there, so I do not know whether it works, actually.


    Wait a few months for Google to get around to reindexing your pages and
    if you've used robots.txt properly the images should be removed. Or use
    the URL removal form on Google to remove them now.

    Steve
    Steve Pugh, Nov 21, 2005
    #5
  6. Scott

    Scott Guest

    Steve Pugh wrote:
    >
    > Luigi Donatello Asero wrote:
    > > "Scott" <> skrev i meddelandet
    > > news:...
    > > > Is there a tag that I can put on a page that will prevent search
    > > > engines from indexing the page? The page is not behind a password.

    > >
    > > As far as I know you could insert the adress of the page into a file
    > > called robots.txt and
    > > indicate which search engine you do not want to index it.
    > > But I tried to use this method to stop Google indexing some images

    >
    > You should have used robots.txt before Google index the page not
    > afterwards.
    >
    > "Prevention is better than cure" and all that.
    >
    > > and they are still there, so I do not know whether it works, actually.

    >
    > Wait a few months for Google to get around to reindexing your pages and
    > if you've used robots.txt properly the images should be removed. Or use
    > the URL removal form on Google to remove them now.
    >
    > Steve


    Steve,

    OK, I figured out what to write in robots.txt. What I'm wondering is exactly
    where to place that file on the host server.

    Scott
    Scott, Nov 21, 2005
    #6
  7. Scott

    Steve Pugh Guest

    Scott wrote:
    > > Luigi Donatello Asero wrote:
    > > > "Scott" <> skrev i meddelandet
    > > > news:...
    > > > >
    > > > > Is there a tag that I can put on a page that will prevent search
    > > > > engines from indexing the page?
    > > >
    > > > As far as I know you could insert the adress of the page into a file
    > > > called robots.txt and indicate which search engine you do not want to
    > > > index it.

    >
    > OK, I figured out what to write in robots.txt. What I'm wondering is exactly
    > where to place that file on the host server.


    At the root of your site.

    If a spider wants to visit http://www.example.com/foo/bar/page.html
    then it will look for http://www.example.com/foo/bar/robots.txt,
    http://www.example.com/foo/robots.txt and
    http://www.example.com/robots.txt and apply all the rules it finds.
    >From your point of view having a single robots.txt in your root folder

    makes for easy maintenance.

    Steve
    Steve Pugh, Nov 22, 2005
    #7
  8. Steve Pugh wrote:

    > Scott wrote:
    > > > Luigi Donatello Asero wrote:
    > > > > "Scott" <> skrev i meddelandet
    > > > > news:...
    > > > > >
    > > > > > Is there a tag that I can put on a page that will prevent
    > > > > > search engines from indexing the page?
    > > > >
    > > > > As far as I know you could insert the adress of the page into
    > > > > a file called robots.txt and indicate which search engine you
    > > > > do not want to index it.

    > >
    > > OK, I figured out what to write in robots.txt. What I'm wondering
    > > is exactly where to place that file on the host server.

    >
    > At the root of your site.
    >
    > If a spider wants to visit http://www.example.com/foo/bar/page.html
    > then it will look for http://www.example.com/foo/bar/robots.txt,
    > http://www.example.com/foo/robots.txt and
    > http://www.example.com/robots.txt and apply all the rules it finds.
    > > From your point of view having a single robots.txt in your root
    > > folder

    > makes for easy maintenance.


    Where did you get that idea?
    http://www.robotstxt.org/wc/exclusion-admin.html

    <quote>
    Note that there can only be a single "/robots.txt" on a site.
    Specifically, you should not put "robots.txt" files in user
    directories, because a robot will never look at them. If you want your
    users to be able to create their own "robots.txt", you will need to
    merge them all into a single "/robots.txt".
    </quote>

    --
    Kim André Akerø
    -
    (remove NOSPAM to contact me directly)
    =?iso-8859-1?Q?Kim_Andr=E9_Aker=F8?=, Nov 22, 2005
    #8
  9. Scott

    Steve Pugh Guest

    Kim André Akerø wrote:
    > Steve Pugh wrote:
    >
    > > If a spider wants to visit http://www.example.com/foo/bar/page.html
    > > then it will look for http://www.example.com/foo/bar/robots.txt,
    > > http://www.example.com/foo/robots.txt and
    > > http://www.example.com/robots.txt and apply all the rules it finds.
    > > > From your point of view having a single robots.txt in your root
    > > > folder

    > > makes for easy maintenance.

    >
    > Where did you get that idea?


    Empirical evidence. Maybe out of date. Maybe robots now follow the
    standard, they certainly didn't always. It's been a long time since I
    maintained a site that didn't have access to the server root so I
    haven't had any direct experience of this part of robots behaviour for
    over several years.

    > http://www.robotstxt.org/wc/exclusion-admin.html
    >
    > <quote>
    > Note that there can only be a single "/robots.txt" on a site.
    > Specifically, you should not put "robots.txt" files in user
    > directories, because a robot will never look at them. If you want your
    > users to be able to create their own "robots.txt", you will need to
    > merge them all into a single "/robots.txt".
    > </quote>


    Learn something new every day.

    Steve
    Steve Pugh, Nov 22, 2005
    #9
  10. Scott

    Scott Guest

    Steve Pugh wrote:
    >
    > Scott wrote:
    > > > Luigi Donatello Asero wrote:
    > > > > "Scott" <> skrev i meddelandet
    > > > > news:...
    > > > > >
    > > > > > Is there a tag that I can put on a page that will prevent search
    > > > > > engines from indexing the page?
    > > > >
    > > > > As far as I know you could insert the adress of the page into a file
    > > > > called robots.txt and indicate which search engine you do not want to
    > > > > index it.

    > >
    > > OK, I figured out what to write in robots.txt. What I'm wondering is exactly
    > > where to place that file on the host server.

    >
    > At the root of your site.
    >
    > If a spider wants to visit http://www.example.com/foo/bar/page.html
    > then it will look for http://www.example.com/foo/bar/robots.txt,
    > http://www.example.com/foo/robots.txt and
    > http://www.example.com/robots.txt and apply all the rules it finds.
    > >From your point of view having a single robots.txt in your root folder

    > makes for easy maintenance.
    >
    > Steve


    Steve,

    So, you're saying I can just upload the robots.txt file to the same place I
    upload all my website files? In my case, my web account on the server is
    "public_html". And I should configure robots.txt to exclude the one
    particular url that I wish not to be indexed?

    Thanks!
    Scott
    Scott, Nov 22, 2005
    #10
  11. Scott

    Ken Guest

    Hi Scott -

    On Tue, 22 Nov 2005 13:24:25 -0600, Scott <> wrote:

    >Steve Pugh wrote:
    >>
    >> Scott wrote:
    >> > > Luigi Donatello Asero wrote:
    >> > > > "Scott" <> skrev i meddelandet
    >> > > > news:...
    >> > > > >
    >> > > > > Is there a tag that I can put on a page that will prevent search
    >> > > > > engines from indexing the page?
    >> > > >
    >> > > > As far as I know you could insert the adress of the page into a file
    >> > > > called robots.txt and indicate which search engine you do not want to
    >> > > > index it.
    >> >
    >> > OK, I figured out what to write in robots.txt. What I'm wondering is exactly
    >> > where to place that file on the host server.

    >>
    >> At the root of your site.
    >>
    >> If a spider wants to visit http://www.example.com/foo/bar/page.html
    >> then it will look for http://www.example.com/foo/bar/robots.txt,
    >> http://www.example.com/foo/robots.txt and
    >> http://www.example.com/robots.txt and apply all the rules it finds.
    >> >From your point of view having a single robots.txt in your root folder

    >> makes for easy maintenance.
    >>
    >> Steve

    >
    >Steve,
    >
    >So, you're saying I can just upload the robots.txt file to the same place I
    >upload all my website files? In my case, my web account on the server is
    >"public_html". And I should configure robots.txt to exclude the one
    >particular url that I wish not to be indexed?


    In the example that Steve gave, according to the standards the robot
    would look ONLY for:
    http://www.example.com/robots.txt

    I don't recall that I have ever seen a robot look for robots.txt other
    than in the host root; certainly not in the last several years.

    See http://www.robotstxt.org/wc/exclusion.html If you don't have
    access to the host root, you can try using the "ROBOTS" META tag
    within the individual page(s).

    --
    Ken
    http://www.ke9nr.net/
    Ken, Nov 22, 2005
    #11
  12. Scott

    Scott Guest


    > >
    > >Steve,
    > >
    > >So, you're saying I can just upload the robots.txt file to the same place I
    > >upload all my website files? In my case, my web account on the server is
    > >"public_html". And I should configure robots.txt to exclude the one
    > >particular url that I wish not to be indexed?

    >
    > In the example that Steve gave, according to the standards the robot
    > would look ONLY for:
    > http://www.example.com/robots.txt
    >
    > I don't recall that I have ever seen a robot look for robots.txt other
    > than in the host root; certainly not in the last several years.
    >
    > See http://www.robotstxt.org/wc/exclusion.html If you don't have
    > access to the host root, you can try using the "ROBOTS" META tag
    > within the individual page(s).
    >
    > --
    > Ken
    > http://www.ke9nr.net/


    Ken,

    Please pardon my density, but where exactly is the "host root"? Is this the
    same place where I upload all my website files to my account on the host's
    server?

    Thanks!
    Scott
    Scott, Nov 23, 2005
    #12
  13. Scott wrote:

    > Please pardon my density, but where exactly is the "host root"? Is
    > this the same place where I upload all my website files to my account
    > on the host's server?


    The "root" is your main directory, the place you (usually) have your
    main index.html file.

    --
    -bts
    -Warning: I brake for lawn deer
    Beauregard T. Shagnasty, Nov 23, 2005
    #13
  14. Scott

    Scott Guest

    "Beauregard T. Shagnasty" wrote:
    >
    > Scott wrote:
    >
    > > Please pardon my density, but where exactly is the "host root"? Is
    > > this the same place where I upload all my website files to my account
    > > on the host's server?

    >
    > The "root" is your main directory, the place you (usually) have your
    > main index.html file.
    >
    > --
    > -bts
    > -Warning: I brake for lawn deer


    Bts:

    Thanks!!!!

    Scott
    Scott, Nov 23, 2005
    #14
  15. Scott

    Ken Guest

    Hi Scott -

    On Tue, 22 Nov 2005 18:02:26 -0600, Scott <> wrote:

    >Please pardon my density, but where exactly is the "host root"? Is this the
    >same place where I upload all my website files to my account on the host's
    >server?


    The host root is wherever the files reside that are served for
    http://www.example.com/[file]

    The actual location on the hard drive depends on the server software
    and configuration.

    For example, the host root for my main website
    http://www.ke9nr.net/
    is
    /save/internet/www/sites/www.ke9nr.net

    That's not at all standard. The directory layout is the way that it
    is because of the way I have the partitions set up and how I want to
    do things. I configured Apache to match my directory structure, not
    the other way around. (I have my own domains and my own server so I
    can do as I please.)

    If you don't have your own domain it is unlikely that you will have
    access to the host root. E.g. if your ISP were example.net and your
    files are accessed at http://www.example.net/~user/, it's unlikely
    that you are going to be able to upload a robots.txt file so that it
    is accessible at http://www.example.net/robots.txt. Uploading a
    robots.txt so that it is accessible at
    http://www.example.net/~user/robots.txt isn't going to work.

    --
    Ken
    http://www.ke9nr.net/
    Ken, Nov 23, 2005
    #15
  16. Scott

    Scott Guest

    Ken wrote:
    >
    > Hi Scott -
    >
    > On Tue, 22 Nov 2005 18:02:26 -0600, Scott <> wrote:
    >
    > >Please pardon my density, but where exactly is the "host root"? Is this the
    > >same place where I upload all my website files to my account on the host's
    > >server?

    >
    > The host root is wherever the files reside that are served for
    > http://www.example.com/[file]
    >
    > The actual location on the hard drive depends on the server software
    > and configuration.
    >
    > For example, the host root for my main website
    > http://www.ke9nr.net/
    > is
    > /save/internet/www/sites/www.ke9nr.net
    >
    > That's not at all standard. The directory layout is the way that it
    > is because of the way I have the partitions set up and how I want to
    > do things. I configured Apache to match my directory structure, not
    > the other way around. (I have my own domains and my own server so I
    > can do as I please.)
    >
    > If you don't have your own domain it is unlikely that you will have
    > access to the host root. E.g. if your ISP were example.net and your
    > files are accessed at http://www.example.net/~user/, it's unlikely
    > that you are going to be able to upload a robots.txt file so that it
    > is accessible at http://www.example.net/robots.txt. Uploading a
    > robots.txt so that it is accessible at
    > http://www.example.net/~user/robots.txt isn't going to work.
    >
    > --
    > Ken
    > http://www.ke9nr.net/


    Ken,

    Darn. My website is: www.uslink.net/~golden. It's not my own domain,
    so it looks like the host root is out of my reach. The page that I don't
    want to be indexed is www.uslink.net/~golden/order1.html.

    I'm trying not to use a password. It's only this one page that I want to
    prevent from being indexed. Everything else on the site is fair game.

    What are the chances that <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    will do the job?

    Thanks!
    Scott
    Scott, Nov 29, 2005
    #16
  17. Scott

    Ken Sims Guest

    Hi Scott -

    On Tue, 29 Nov 2005 14:19:32 -0600, Scott <> wrote:

    >Darn. My website is: www.uslink.net/~golden. It's not my own domain,
    >so it looks like the host root is out of my reach. The page that I don't
    >want to be indexed is www.uslink.net/~golden/order1.html.


    Yes, robots.txt has to be at http://www.uslink.net/robot.txt

    Your only option for robots.txt is to see if you can convince USLink
    to add a robots.txt with your Disallow. If you click the above link,
    you will see that they don't have a robots.txt.

    >I'm trying not to use a password. It's only this one page that I want to
    >prevent from being indexed. Everything else on the site is fair game.
    >
    >What are the chances that <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    >will do the job?


    It's better than nothing, but that's all I can say.

    I think you are at the point where you need your domain. Not just so
    that you can have a robots.txt but also to make what you are doing
    look more professional.

    --
    Ken
    http://www.ke9nr.net/
    Ken Sims, Nov 29, 2005
    #17
  18. Scott

    Scott Guest

    Ken Sims wrote:
    >
    > Hi Scott -
    >
    > On Tue, 29 Nov 2005 14:19:32 -0600, Scott <> wrote:
    >
    > >Darn. My website is: www.uslink.net/~golden. It's not my own domain,
    > >so it looks like the host root is out of my reach. The page that I don't
    > >want to be indexed is www.uslink.net/~golden/order1.html.

    >
    > Yes, robots.txt has to be at http://www.uslink.net/robot.txt
    >
    > Your only option for robots.txt is to see if you can convince USLink
    > to add a robots.txt with your Disallow. If you click the above link,
    > you will see that they don't have a robots.txt.
    >
    > >I'm trying not to use a password. It's only this one page that I want to
    > >prevent from being indexed. Everything else on the site is fair game.
    > >
    > >What are the chances that <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    > >will do the job?

    >
    > It's better than nothing, but that's all I can say.
    >
    > I think you are at the point where you need your domain. Not just so
    > that you can have a robots.txt but also to make what you are doing
    > look more professional.
    >
    > --
    > Ken
    > http://www.ke9nr.net/



    Ken,

    I agree. In fact, the only reason I'm staying with my ISP-provided webspace
    is that I don't want to have to start over being found by the search engines
    again (although my Google ranking...under "GNLD" has slipped out of the top 20
    this past year, but it's still pretty high with Yahoo). Also, my email address
    has been around for nine years. I do have my own domain (www.teamone.net) for
    a business site I'm starting to build. Then I'll have more control over things.

    Scott
    Scott, Nov 30, 2005
    #18
  19. Scott

    Ken Sims Guest

    Hi Scott -

    On Tue, 29 Nov 2005 18:18:09 -0600, Scott <> wrote:

    >I agree. In fact, the only reason I'm staying with my ISP-provided webspace
    >is that I don't want to have to start over being found by the search engines
    >again (although my Google ranking...under "GNLD" has slipped out of the top 20
    >this past year, but it's still pretty high with Yahoo).


    If you can set up 301 redirects, it ought be pretty smooth, both for
    the search engines switching over as they attempt to re-spider the old
    site, and for users clicking links that lead to the old site.

    >Also, my email address has been around for nine years.


    I'm not suggesting that you get rid of your USLink account.

    >I do have my own domain (www.teamone.net) for a business site I'm starting to build. >Then I'll have more control over things.


    Control is good. I went from a user website on the ISP's domain (like
    what you have with USLink), to my own domain with virtual hosting, to
    my own domains on a virtual server, to my own domains on my own
    physical server that is about six feet away from me. And this is for
    non-incoming-producing domains.

    --
    Ken
    http://www.ke9nr.net/
    Ken Sims, Nov 30, 2005
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    688
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. shruds
    Replies:
    1
    Views:
    756
    John C. Bollinger
    Jan 27, 2006
  3. Replies:
    1
    Views:
    360
    Sybren Stuvel
    Apr 10, 2006
  4. Sasha
    Replies:
    3
    Views:
    576
    Sasha
    May 22, 2007
  5. pandi
    Replies:
    5
    Views:
    439
    pandi
    Dec 14, 2009
Loading...

Share This Page