Error while downloading webpages

T

TimB

Hi everyone, new to python. I'm attempting to download a large amount
of webpages (about 600) to disk and for some reason a few of them
fail.

I'm using this in a loop where pagename and urlStr change each time:
import urllib
try:
urllib.urlretrieve(urlStr, 'webpages/'+pagename+'.htm')
except IOError:
print 'Cannot open URL %s for reading' % urlStr
str1 = 'error!'

Out of all the webpages, it does not work for these three:
http://exoplanet.eu/planet.php?p1=WASP-11/HAT-P-10&p2=b
http://exoplanet.eu/planet.php?p1=HAT-P-27/WASP-40&p2=b
http://exoplanet.eu/planet.php?p1=HAT-P-30/WASP-51&p2=b
giving "Cannot open URL http://exoplanet.eu/planet.php?p1=WASP-11/HAT-P-10&p2=b
for reading" etc.

however copying and pasting the URL from the error message
successfully opens in firefox

it successfully downloads the 500 or so other pages such as:
http://exoplanet.eu/planet.php?p1=HD+88133&p2=b

I guess it has something to do with the forward slash in the names
(e.g. HAT-P-30/WASP-51 compared to HD+88133 in the examples above)

Is there a way I can fix this? Thanks.
 
T

TimB

Hi everyone, new to python. I'm attempting to download a large amount
of webpages (about 600) to disk and for some reason a few of them
fail.

I'm using this in a loop where pagename and urlStr change each time:
import urllib
    try:
        urllib.urlretrieve(urlStr, 'webpages/'+pagename+'.htm')
    except IOError:
        print 'Cannot open URL %s for reading' % urlStr
        str1 = 'error!'

Out of all the webpages, it does not work for these three:http://exoplanet.eu/planet.php?p1=W...planet.eu/planet.php?p1=HAT-P-30/WASP-51&p2=b
giving "Cannot open URLhttp://exoplanet.eu/planet.php?p1=WASP-11/HAT-P-10&p2=b
for reading" etc.

however copying and pasting the URL from the error message
successfully opens in firefox

it successfully downloads the 500 or so other pages such as:http://exoplanet.eu/planet.php?p1=HD+88133&p2=b

I guess it has something to do with the forward slash in the names
(e.g. HAT-P-30/WASP-51 compared to HD+88133 in the examples above)

Is there a way I can fix this? Thanks.

sorry, I was attempting to save the page to disk with the forward
slash in the name, disreguard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top