Web-crawling

Discussion in 'Python' started by John Bradbury, Oct 4, 2003.

  1. I am trying to develop a special putpose crawler using htmllib & urllib.
    How do you tell the server application that you are a modern browser and can
    handle frames?

    Thanks,

    john Bradbury
    John Bradbury, Oct 4, 2003
    #1
    1. Advertising

  2. John Bradbury

    Rene Pijlman Guest

    John Bradbury:
    >I am trying to develop a special putpose crawler using htmllib & urllib.
    >How do you tell the server application that you are a modern browser and can
    >handle frames?


    I don't know of any "I can handle frames" header and I don't see why the
    server would care, but you could mimic the User-agent header sent by a
    modern browser.

    --
    René Pijlman
    Rene Pijlman, Oct 4, 2003
    #2
    1. Advertising

  3. I don't know what is causing the problem, but the site I am accessing is
    sending out forms for a browser that has a low resolution and does not
    support frames. Excuse my ignorance, but where do you set up the User-agent
    header you suggested.

    Many thanks for your prompt reply.

    John Bradbury

    "Rene Pijlman" <> wrote in
    message news:...
    > John Bradbury:
    > >I am trying to develop a special putpose crawler using htmllib & urllib.
    > >How do you tell the server application that you are a modern browser and

    can
    > >handle frames?

    >
    > I don't know of any "I can handle frames" header and I don't see why the
    > server would care, but you could mimic the User-agent header sent by a
    > modern browser.
    >
    > --
    > René Pijlman
    John Bradbury, Oct 4, 2003
    #3
  4. John Bradbury

    Rene Pijlman Guest

    Rene Pijlman, Oct 4, 2003
    #4
  5. John Bradbury

    John J. Lee Guest

    "John Bradbury" <john_bradbury@___cableinet.co.uk> writes:

    > "Rene Pijlman" <> wrote in
    > message news:...
    > > John Bradbury:
    > > >I am trying to develop a special putpose crawler using htmllib & urllib.
    > > >How do you tell the server application that you are a modern browser
    > > >and can handle frames?

    [...]
    > > server would care, but you could mimic the User-agent header sent by a

    [...]
    > I don't know what is causing the problem, but the site I am accessing is
    > sending out forms for a browser that has a low resolution and does not
    > support frames. Excuse my ignorance, but where do you set up the
    > User-agent header you suggested.


    For urllib2 (well, almost):

    http://wwwsearch.sourceforge.net/ClientCookie/doc.html#headers


    John
    John J. Lee, Oct 4, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark
    Replies:
    3
    Views:
    437
    fd123456
    Mar 7, 2005
  2. S Borg

    web crawling.

    S Borg, Jan 19, 2006, in forum: Python
    Replies:
    4
    Views:
    433
    John M. Gabriele
    Jan 20, 2006
  3. Remarkable
    Replies:
    1
    Views:
    321
  4. Rusty Hill

    Web Crawling Spidering Question

    Rusty Hill, Jun 1, 2007, in forum: ASP .Net
    Replies:
    3
    Views:
    316
    Hakan Fatih YILDIRIM
    Jun 3, 2007
  5. web crawling for books

    , Nov 25, 2007, in forum: Perl Misc
    Replies:
    2
    Views:
    102
    Adam Funk
    Nov 28, 2007
Loading...

Share This Page