robots.txt

Discussion in 'HTML' started by Paul Furman, Jan 4, 2007.

  1. Paul Furman

    Paul Furman Guest

    I've had this web site up for years and I don't know maybe a couple
    years ago i added a more advanced php driven system under edgehill.net/1
    many of the images are annotated or at least the title gives an
    indication of the location and date. I just did a test and was able to
    find a keyword under the new /1 section but google images doesn't see it
    at all. The problem I think is the galleries are set up as subfolders
    and the php files format the gallery so I've got all these indexless
    folders which are basically garbage for a web browser:

    What should look like this:
    <http://www.edgehill.net/1/?SC=go.php&DIR=California/Bay-Area/Oakland/2005-11-05-pinehurst>
    Is indexed like this:
    <http://www.edgehill.net/1/California/Bay-Area/Oakland/2005-11-05-pinehurst/>

    On my other baynatives site I added a line in the longest irrelevant
    page to prevent indexing that like such:
    <meta name='googlebot' content='noarchive, noindex'>
    and that works like a charm but the edgehill.net site as far as I know
    does not directly point to these nested content folder except in the php
    address [?SC=go.php&DIR=]

    I'm not opposed to people viewing raw directories, sometimes that's
    useful to point to an image or file without all the formatting but I
    don't want them indexed in search engines.

    Any Advice?

    --
    Paul Furman
    Bay Natives Nursery
    http://www.baynatives.com
    Photography
    http://www.edgehill.net/1
    (415) 722-6037
     
    Paul Furman, Jan 4, 2007
    #1
    1. Advertising

  2. Paul Furman

    Paul Furman Guest

    Seems to be a .htaccess issue, I added this line:

    Options -Indexes

    Although I'd prefer to simply prevent search engines from indexing. I
    probably have some folders in there which are now inacessible.

    Paul Furman wrote:

    > I've had this web site up for years and I don't know maybe a couple
    > years ago i added a more advanced php driven system under edgehill.net/1
    > many of the images are annotated or at least the title gives an
    > indication of the location and date. I just did a test and was able to
    > find a keyword under the new /1 section but google images doesn't see it
    > at all. The problem I think is the galleries are set up as subfolders
    > and the php files format the gallery so I've got all these indexless
    > folders which are basically garbage for a web browser:
    >
    > What should look like this:
    > <http://www.edgehill.net/1/?SC=go.php&DIR=California/Bay-Area/Oakland/2005-11-05-pinehurst>
    >
    > Is indexed like this:
    > <http://www.edgehill.net/1/California/Bay-Area/Oakland/2005-11-05-pinehurst/>
    >
    >
    > On my other baynatives site I added a line in the longest irrelevant
    > page to prevent indexing that like such:
    > <meta name='googlebot' content='noarchive, noindex'>
    > and that works like a charm but the edgehill.net site as far as I know
    > does not directly point to these nested content folder except in the php
    > address [?SC=go.php&DIR=]
    >
    > I'm not opposed to people viewing raw directories, sometimes that's
    > useful to point to an image or file without all the formatting but I
    > don't want them indexed in search engines.
    >
    > Any Advice?
    >
     
    Paul Furman, Jan 4, 2007
    #2
    1. Advertising

  3. Paul Furman

    Bergamot Guest

    Paul Furman wrote:
    >
    > On my other baynatives site I added a line in the longest irrelevant
    > page to prevent indexing that like such:
    > <meta name='googlebot' content='noarchive, noindex'>


    Your subject line indicates you are asking about robots.txt, but you
    haven't mentioned it in either of your messages.
    http://www.robotstxt.org/wc/robots.html

    --
    Berg
     
    Bergamot, Jan 4, 2007
    #3
  4. Paul Furman

    Paul Furman Guest

    Bergamot wrote:

    > Paul Furman wrote:
    >
    >>On my other baynatives site I added a line in the longest irrelevant
    >>page to prevent indexing that like such:
    >><meta name='googlebot' content='noarchive, noindex'>

    >
    >
    > Your subject line indicates you are asking about robots.txt, but you
    > haven't mentioned it in either of your messages.
    > http://www.robotstxt.org/wc/robots.html
    >


    I used a googlebot metadata in the one instance.

    The best I can figure robots.txt requires you to list each directory
    that's forbidden and I've got a huge list or directories that grows
    weekly. I was hoping for a robot command that forbids indexing indexless
    directories.

    Do you know if robots.txt is effected by php calls? All the pages appear
    to be located in edgehill.net/1/ but they are really in many deeply
    nested subdirectories.
     
    Paul Furman, Jan 5, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Frankie

    OT: Opinions on Robots.txt

    Frankie, Oct 9, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    1,016
    S. Justin Gengo
    Oct 10, 2005
  2. Daniel Vesma
    Replies:
    15
    Views:
    1,527
    Jacqui or (maybe) Pete
    Jul 2, 2003
  3. Neil White

    Re: robots.txt

    Neil White, Aug 8, 2003, in forum: HTML
    Replies:
    0
    Views:
    407
    Neil White
    Aug 8, 2003
  4. lostinspace

    Re: robots.txt

    lostinspace, Aug 8, 2003, in forum: HTML
    Replies:
    0
    Views:
    384
    lostinspace
    Aug 8, 2003
  5. Tim w

    meta robots and robots txt

    Tim w, May 22, 2014, in forum: HTML
    Replies:
    1
    Views:
    133
Loading...

Share This Page