how to count and extract images

J

Joe

I'm trying to get the location of the image uisng

start = s.find('<a href="somefile') + len('<a
href="somefile')
stop = s.find('">Save File</a></B>',
start) fileName = s[start:stop]
and then construct the url with the filename to download the image
which works fine as cause every image has the Save File link and I can
count number of images easy the problem is when there is more than image I
try using while loop downlaod files, wirks fine for the first one but
always matches the same, how can count and thell the look to skip the fist
one if it has been downloaded and go to next one, and if next one is
downloaded go to next one, and so on.
 
A

Alex Martelli

Joe said:
I'm trying to get the location of the image uisng

start = s.find('<a href="somefile') + len('<a
href="somefile')
stop = s.find('">Save File</a></B>',
start) fileName = s[start:stop]
and then construct the url with the filename to download the image
which works fine as cause every image has the Save File link and I can
count number of images easy the problem is when there is more than image I
try using while loop downlaod files, wirks fine for the first one but
always matches the same, how can count and thell the look to skip the fist
one if it has been downloaded and go to next one, and if next one is
downloaded go to next one, and so on.

Pass the index from where the search must start as the second argument
to the s.find method -- you're already doing that for the second call,
so it should be pretty obvious it will also work for the first one, no?


Alex
 
M

Mike Meyer

Joe said:
start = s.find('<a href="somefile') + len('<a
href="somefile')
stop = s.find('">Save File</a></B>',
start) fileName = s[start:stop]
and then construct the url with the filename to download the image
which works fine as cause every image has the Save File link and I can
count number of images easy the problem is when there is more than image I
try using while loop downlaod files, wirks fine for the first one but
always matches the same, how can count and thell the look to skip the fist
one if it has been downloaded and go to next one, and if next one is
downloaded go to next one, and so on.

To answer your question, use the first optional argument to find in both
invocations of find:

stop = 0
while end >= 0:
start = s.find('<a href="somefile', stop) + len('<a href="somefile')
stop = s.find('">Save File</a></B>', start)
fileName = s[start:stop]

Now, to give you some advice: don't do this by hand, use an HTML
parsing library. The code above is incredibly fragile, and will break
on any number of minor variations in the input text. Using a real
parser not only avoids all those problems, it makes your code shorter.
I like BeautifulSoup:

soup = BeautifulSoup(s)
for anchor in soup.fetch('a'):
fileName = anchor['href']

to get all the hrefs. If you only want the ones that have "Save File"
in the link text, you'd do:

soup = BeautifulSoup(s)
for link in soup.fetchText('Save File'):
fileName = link.findParent('a')['href']

<mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top