how google spider access my web site?

Discussion in 'ASP .Net' started by baroque Chou, Jan 26, 2006.

  1. baroque Chou

    baroque Chou Guest

    anyone know how google spiders access web site, how dose they manage to
    get the href information? do they have special access right or
    something? any help is appreciated
    baroque Chou, Jan 26, 2006
    #1
    1. Advertising

  2. baroque Chou

    bb Guest

    bb, Jan 26, 2006
    #2
    1. Advertising

  3. baroque Chou

    Brian Cryer Guest

    "baroque Chou" <> wrote in message
    news:...
    > anyone know how google spiders access web site, how dose they manage to
    > get the href information? do they have special access right or
    > something? any help is appreciated


    No, google doesn't have any special access rights, they access your website
    the same way as anyone else. This means that if you have a login screen
    which you need to get past to view your site then the google spider won't
    get past it. Some sites explicitly grant the google bot (or other bots)
    access, but that's an exception not the rule.

    In summary, what you can see in your browser (or better still, what I could
    see in my browser if you gave me the url) is what the google spider can see.
    The only exception to this is that the google spider is a little more fussy
    about correct html than most browsers, so its worth checking that your code
    validates and links are correct.
    --
    Brian Cryer
    www.cryer.co.uk/brian
    Brian Cryer, Jan 26, 2006
    #3
  4. baroque Chou

    baroque Chou Guest

    thanks, seems google spider has some attributes that browser has. but
    if I am using dynamic page, say, apsx, which dosen't produce an output
    page before the web server execute it. how google know the href in that
    page, and most time,even the executed page, the href is more like has a
    form of
    <a href='Middlelayer_Top10.aspx?id=105>
    how will the spider make a deeper crawl? if both not access my source
    code nor dose it make any request
    baroque Chou, Jan 26, 2006
    #4
  5. baroque Chou

    KMA Guest

    Generally it goes like this:

    You send google a reference to your homepage. Obviously this page shouldn't
    requre logging in or a password.
    The google bot downloads this page and strips out all the links. It makes a
    "score" of the page for the Google index then downloads every page from the
    link list and repeats the same procedure until all links are processed.

    The exact details of the scoring mechanism are not published to prevent
    people artificially pushing their page up the page rankings.

    Some say that parameterised links (with a gfdg.aspx?productID=1234) are not
    followed.

    To get more of an idea, create an aspx page with links, run your prog, then
    in the browser, right click and choose View Source. This is exactly what the
    googlebot gets.



    "baroque Chou" <> wrote in message
    news:...
    > thanks, seems google spider has some attributes that browser has. but
    > if I am using dynamic page, say, apsx, which dosen't produce an output
    > page before the web server execute it. how google know the href in that
    > page, and most time,even the executed page, the href is more like has a
    > form of
    > <a href='Middlelayer_Top10.aspx?id=105>
    > how will the spider make a deeper crawl? if both not access my source
    > code nor dose it make any request
    >
    KMA, Jan 26, 2006
    #5
  6. baroque Chou

    baroque Chou Guest

    thank you very much, some one suggest that you should use some rewrite
    rule to make the url more search engine friendly,
    e.g. gfdg.aspx?productID=1234 rewirte to gfdg.aspx/productID/1234
    , but this page actully dosen't exist in my web server,
    what exist is just the source page, the "instance" of that page is
    created everytime by individual request. so do I need to archive the
    instance of that page to some location(the hierarchy of the directory
    may be well packaged following the url patten so that the spider can
    have a better crawl)?
    baroque Chou, Jan 27, 2006
    #6
  7. baroque Chou

    KMA Guest

    OK, basically it goes like this.

    On your web pages you write bot-friendly urls, like
    gfds/product/toasters/toastomatic5000.aspx.

    But like you say, this page doesn't really exist. When the bot requests the
    page, IIS will not be able to find the page, but if you implement your own
    404 handler then IIS will call this. A normal 404 handler just gives back a
    page saying "Sorry, page not found" but your special 404 handler will be
    passed the url of the requested page. You then can strip off the productID
    from the url and build your page for the product. This page is then sent
    back to the bot.

    In a way you are fooling the bot that you have lots of web pages, but really
    you just have one page handler plus a databse of product data. Bot writers
    expect this, because they know that it's very difficult to maintain a large
    site in any other way.

    "baroque Chou" <> wrote in message
    news:...
    > thank you very much, some one suggest that you should use some rewrite
    > rule to make the url more search engine friendly,
    > e.g. gfdg.aspx?productID=1234 rewirte to gfdg.aspx/productID/1234
    > , but this page actully dosen't exist in my web server,
    > what exist is just the source page, the "instance" of that page is
    > created everytime by individual request. so do I need to archive the
    > instance of that page to some location(the hierarchy of the directory
    > may be well packaged following the url patten so that the spider can
    > have a better crawl)?
    >
    KMA, Jan 27, 2006
    #7
  8. baroque Chou

    Alan Silver Guest

    Why not just URL rewriting? Much cleaner.

    >OK, basically it goes like this.
    >
    >On your web pages you write bot-friendly urls, like
    >gfds/product/toasters/toastomatic5000.aspx.
    >
    >But like you say, this page doesn't really exist. When the bot requests the
    >page, IIS will not be able to find the page, but if you implement your own
    >404 handler then IIS will call this. A normal 404 handler just gives back a
    >page saying "Sorry, page not found" but your special 404 handler will be
    >passed the url of the requested page. You then can strip off the productID
    >from the url and build your page for the product. This page is then sent
    >back to the bot.
    >
    >In a way you are fooling the bot that you have lots of web pages, but really
    >you just have one page handler plus a databse of product data. Bot writers
    >expect this, because they know that it's very difficult to maintain a large
    >site in any other way.
    >
    >"baroque Chou" <> wrote in message
    >news:...
    >> thank you very much, some one suggest that you should use some rewrite
    >> rule to make the url more search engine friendly,
    >> e.g. gfdg.aspx?productID=1234 rewirte to gfdg.aspx/productID/1234
    >> , but this page actully dosen't exist in my web server,
    >> what exist is just the source page, the "instance" of that page is
    >> created everytime by individual request. so do I need to archive the
    >> instance of that page to some location(the hierarchy of the directory
    >> may be well packaged following the url patten so that the spider can
    >> have a better crawl)?
    >>

    >
    >


    --
    Alan Silver
    (anything added below this line is nothing to do with me)
    Alan Silver, Feb 2, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JeepGary
    Replies:
    2
    Views:
    466
    Roedy Green
    Oct 21, 2003
  2. Jemdam.com

    Google Spider questions / problem

    Jemdam.com, Dec 31, 2005, in forum: HTML
    Replies:
    10
    Views:
    725
    Richard Sexton
    Feb 9, 2006
  3. Auction software
    Replies:
    0
    Views:
    400
    Auction software
    Jun 27, 2004
  4. Thomas Lindgaard

    Web Spider

    Thomas Lindgaard, Jul 6, 2004, in forum: Python
    Replies:
    3
    Views:
    584
    Peter Hansen
    Jul 7, 2004
  5. Replies:
    0
    Views:
    380
Loading...

Share This Page