robots.txt and regular expressions?

Discussion in 'HTML' started by Tomasz Chmielewski, May 3, 2008.

  1. I'm not sure what is the right group for asking questions about
    robots.txt file, so I'm asking it here.

    I would like to exclude robots from accessing such links:

    /index.php?title=One_page&action=edit
    /index.php?title=Other_page&action=edit

    What should be a robots.txt line to exclude such pages (for bots which
    understand regexps, like Googlebot, Yahoo Slurp etc.)?

    1) Disallow: /index.php*action=edit

    2) Disallow: /index\.php.*action=edit


    According to http://www.google.com/help/faq_codesearch.html#regexp (and
    http://en.wikipedia.org/wiki/Regular_expression#Syntax), it should be
    the 2) one.

    However, almost every "robots.txt regexp" search result seem to point to
    the 1) one.

    What is the correct answer?
     
    Tomasz Chmielewski, May 3, 2008
    #1
    1. Advertisements

  2. Tomasz Chmielewski

    faerber.jan Guest

    faerber.jan, May 3, 2008
    #2
    1. Advertisements

  3. Tomasz Chmielewski

    faerber.jan Guest

  4. Tomasz Chmielewski

    faerber.jan Guest


    but some regex is allowed like

    Disallow: /*.php$

    (isn't it?)

    which blocks access to all your php files.

    Jan
     
    faerber.jan, May 4, 2008
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.