web crawling.

Discussion in 'Python' started by S Borg, Jan 19, 2006.

  1. S Borg

    S Borg Guest

    Hello,

    I have been writing very simple Python programs that parse HTML and
    such, mainly just to get
    a better feel for the language. Here is my question: If I parsed an
    HTML page into all of the image
    files listed on that page, how could I request all of those images and
    download them into some specified folder? I am sure this is quite easy,
    but I am stuck.

    Thank you very much.
    Burgeoning Pythonista
    S Borg, Jan 19, 2006
    #1
    1. Advertising

  2. S Borg <> wrote:

    > Hello,
    >
    > I have been writing very simple Python programs that parse HTML and
    > such, mainly just to get
    > a better feel for the language. Here is my question: If I parsed an
    > HTML page into all of the image
    > files listed on that page, how could I request all of those images and
    > download them into some specified folder? I am sure this is quite easy,
    > but I am stuck.


    There's a good crawler in the Demo directory of the Python source
    distribution, so download and unpack said sources and look there.


    Alex
    Alex Martelli, Jan 19, 2006
    #2
    1. Advertising

  3. S Borg

    gene tani Guest

    S Borg wrote:
    > Hello,
    >
    > I have been writing very simple Python programs that parse HTML and
    > such, mainly just to get
    > a better feel for the language. Here is my question: If I parsed an
    > HTML page into all of the image
    > files listed on that page, how could I request all of those images and
    > download them into some specified folder? I am sure this is quite easy,
    > but I am stuck.
    >
    > Thank you very much.
    > Burgeoning Pythonista


    http://sig.levillage.org/?p=588
    gene tani, Jan 19, 2006
    #3
  4. S Borg

    Fuzzyman Guest

    Use BeautifulSoup to get all the image tags out of the html.

    You'll need to join the urls of the images to the url of the page
    (urlparse.urljoin off the top of my head). If you look at BeautifulSoup
    you will see how to get the 'src' reference of each image tag.

    All the best,

    Fuzzyman
    http://www.voidspace.org.uk/python/index.shtml
    Fuzzyman, Jan 19, 2006
    #4
  5. Alex Martelli wrote:
    > S Borg <> wrote:
    >
    >
    >> Hello,
    >>
    >> I have been writing very simple Python programs that parse HTML and
    >>such, mainly just to get
    >>a better feel for the language. Here is my question: If I parsed an
    >>HTML page into all of the image
    >>files listed on that page, how could I request all of those images and
    >>download them into some specified folder? I am sure this is quite easy,
    >>but I am stuck.

    >
    >
    > There's a good crawler in the Demo directory of the Python source
    > distribution, so download and unpack said sources and look there.
    >
    >
    > Alex


    Hm. Looks like that's:

    Python-2.4.2/Tools/webchecker

    See 'pydoc ./webchecker.py' for more info.

    ---J


    --
    (remove zeez if demunging email address)
    John M. Gabriele, Jan 20, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark
    Replies:
    3
    Views:
    430
    fd123456
    Mar 7, 2005
  2. John Bradbury

    Web-crawling

    John Bradbury, Oct 4, 2003, in forum: Python
    Replies:
    4
    Views:
    417
    John J. Lee
    Oct 4, 2003
  3. Remarkable
    Replies:
    1
    Views:
    318
  4. Rusty Hill

    Web Crawling Spidering Question

    Rusty Hill, Jun 1, 2007, in forum: ASP .Net
    Replies:
    3
    Views:
    312
    Hakan Fatih YILDIRIM
    Jun 3, 2007
  5. web crawling for books

    , Nov 25, 2007, in forum: Perl Misc
    Replies:
    2
    Views:
    99
    Adam Funk
    Nov 28, 2007
Loading...

Share This Page