how to count and extract images

Discussion in 'Python' started by Joe, Oct 24, 2005.

  1. Joe

    Joe Guest

    I'm trying to get the location of the image uisng

    start = s.find('<a href="somefile') + len('<a
    href="somefile')
    stop = s.find('">Save File</a></B>',
    start) fileName = s[start:stop]
    and then construct the url with the filename to download the image
    which works fine as cause every image has the Save File link and I can
    count number of images easy the problem is when there is more than image I
    try using while loop downlaod files, wirks fine for the first one but
    always matches the same, how can count and thell the look to skip the fist
    one if it has been downloaded and go to next one, and if next one is
    downloaded go to next one, and so on.
     
    Joe, Oct 24, 2005
    #1
    1. Advertising

  2. Joe <> wrote:

    > I'm trying to get the location of the image uisng
    >
    > start = s.find('<a href="somefile') + len('<a
    > href="somefile')
    > stop = s.find('">Save File</a></B>',
    > start) fileName = s[start:stop]
    > and then construct the url with the filename to download the image
    > which works fine as cause every image has the Save File link and I can
    > count number of images easy the problem is when there is more than image I
    > try using while loop downlaod files, wirks fine for the first one but
    > always matches the same, how can count and thell the look to skip the fist
    > one if it has been downloaded and go to next one, and if next one is
    > downloaded go to next one, and so on.


    Pass the index from where the search must start as the second argument
    to the s.find method -- you're already doing that for the second call,
    so it should be pretty obvious it will also work for the first one, no?


    Alex
     
    Alex Martelli, Oct 24, 2005
    #2
    1. Advertising

  3. Joe

    Mike Meyer Guest

    Joe <> writes:
    > start = s.find('<a href="somefile') + len('<a
    > href="somefile')
    > stop = s.find('">Save File</a></B>',
    > start) fileName = s[start:stop]
    > and then construct the url with the filename to download the image
    > which works fine as cause every image has the Save File link and I can
    > count number of images easy the problem is when there is more than image I
    > try using while loop downlaod files, wirks fine for the first one but
    > always matches the same, how can count and thell the look to skip the fist
    > one if it has been downloaded and go to next one, and if next one is
    > downloaded go to next one, and so on.


    To answer your question, use the first optional argument to find in both
    invocations of find:

    stop = 0
    while end >= 0:
    start = s.find('<a href="somefile', stop) + len('<a href="somefile')
    stop = s.find('">Save File</a></B>', start)
    fileName = s[start:stop]

    Now, to give you some advice: don't do this by hand, use an HTML
    parsing library. The code above is incredibly fragile, and will break
    on any number of minor variations in the input text. Using a real
    parser not only avoids all those problems, it makes your code shorter.
    I like BeautifulSoup:

    soup = BeautifulSoup(s)
    for anchor in soup.fetch('a'):
    fileName = anchor['href']

    to get all the hrefs. If you only want the ones that have "Save File"
    in the link text, you'd do:

    soup = BeautifulSoup(s)
    for link in soup.fetchText('Save File'):
    fileName = link.findParent('a')['href']

    <mike
    --
    Mike Meyer <> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
     
    Mike Meyer, Oct 24, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dhananjay
    Replies:
    0
    Views:
    391
    Dhananjay
    Nov 29, 2006
  2. writeson

    Extract images from PDF files

    writeson, Jul 28, 2009, in forum: Python
    Replies:
    2
    Views:
    933
    writeson
    Jul 28, 2009
  3. efelnavarro09
    Replies:
    2
    Views:
    954
    efelnavarro09
    Jan 26, 2011
  4. Dhananjay
    Replies:
    0
    Views:
    103
    Dhananjay
    Nov 29, 2006
  5. Bill
    Replies:
    1
    Views:
    88
    Joakim Braun
    Nov 28, 2004
Loading...

Share This Page