Cannot able to retreive compressed html URL

R

rushik

Hi,
I am trying to build python script which retreives and analyze the
various URLs and generate reports.

Some of the urls are like "http://xyz.com/test.html.gz", I am trying
to retreive it using urllib2 library and then using gzip library
trying to decompress it.

ex - server_url is say - http://xyz.com/test.html.gz

logpage = urllib2.urlopen(server_url)
html_content = cal_logpage.read()
logpage.close()

gz_tmp = open("gzip.txt.gz", "w")
gz_tmp.write(html_content)
gz_tmp.close()
f = gzip.open("gzip.txt.gz", "rb")
file_content = f.read()
f.close()

#return the resulting html content.
return html_content

on executing the code, its giving

zlib.error - Error -3 while decompressing: invalid distance too far
back

the same URL I am able to retreive in proper html page format from
browser

please let me know if I am doing something wrong here, or is there any
other better way to do so.

Thanks,
R
 
R

rushik

Hi,
I am trying to build python script which retreives and analyze the
various URLs and generate reports.

Some of the urls are like "http://xyz.com/test.html.gz", I am trying
to retreive it using urllib2 library and then using gzip library
trying to decompress it.

ex - server_url is say -http://xyz.com/test.html.gz

                logpage = urllib2.urlopen(server_url)
                html_content = cal_logpage.read()
                logpage.close()

                gz_tmp = open("gzip.txt.gz", "w")
                gz_tmp.write(html_content)
                gz_tmp.close()
                f = gzip.open("gzip.txt.gz", "rb")
                file_content = f.read()
                f.close()

                #return the resulting html content.
                return html_content

on executing the code, its giving

zlib.error - Error -3 while decompressing: invalid distance too far
back

the same URL I am able to retreive in proper html page format from
browser

please let me know if I am doing something wrong here, or is there any
other better way to do so.

Thanks,
R

I got the solution !! using now urllib.retrieve

thx,
R
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top