extra gibberish interspersed into urllib2 output

D

Dan Stromberg

I'm attempting to retrieve some data from an http server using basic auth
via python 2.3 with the urllib2 and cookielib modules.

I'm finding that I'm getting the data I need, but unfortunately, there are
small bits of gibberish interspersed in it, rendering the data difficult
to use at best. For example:

p06,128.200.73.146,foobar,,,,,ES Servers,,li,,,,,
p07,128.200.73.147,foobar,,,,,
ffb
ES Servers,,li,,,,,
webmail2,128.200.224.22,foobar ,,,,,ES Servers,,li,blackhole:1,,,,

IOW, that "ffb" does not belong in the middle of the 2nd line of what
should be a 3 line snippet. There are also some spurious carriage returns
in there I believe, which may not show up in this message.

Has anyone seen this before? Is it premature to start using urllib2 from
python 2.4? Is it a bad idea to use this 2.4 module on python 2.3?

BTW, when I cut out the cookielib stuff, I still get the same strange
results.

On a bit of a bizarre note, mozilla is also unable to display this page,
however mozilla simply shows no content instead of adding in nonsense.
links (the text mode web browser) however displays the content of the page
just as it should.

TIA for any suggestions you can offer.
 
A

Andrew Dalke

Dan said:
I'm attempting to retrieve some data from an http server using basic auth
via python 2.3 with the urllib2 and cookielib modules.

I'm finding that I'm getting the data I need, but unfortunately, there are
small bits of gibberish interspersed in it, rendering the data difficult
to use at best. ..
Has anyone seen this before? Is it premature to start using urllib2 from
python 2.4? Is it a bad idea to use this 2.4 module on python 2.3?

I've been using urllib2 under Python 2.4 and not seen
problems. The code hasn't changed much in years, that
I know.

Have you tried doing the request manually? That is,

%telnet machine 80
GET /asdf HTTP/1.0
..put cookie and auth information here..


Doing that would help show the problem is coming from
upstream of Python (or that it's in Python).

To go real hard core you could get ethereal or some
other network sniffer and watch exactly what Python
does. That's easier in some sense because you don't
need to figure out what to send for the request headers.


Andrew
(e-mail address removed)
 
F

Fredrik Lundh

Dan said:
I'm finding that I'm getting the data I need, but unfortunately, there are
small bits of gibberish interspersed in it, rendering the data difficult
to use at best. For example:

p06,128.200.73.146,foobar,,,,,ES Servers,,li,,,,,
p07,128.200.73.147,foobar,,,,,
ffb
ES Servers,,li,,,,,
webmail2,128.200.224.22,foobar ,,,,,ES Servers,,li,blackhole:1,,,,

IOW, that "ffb" does not belong in the middle of the 2nd line of what
should be a 3 line snippet. There are also some spurious carriage returns
in there I believe, which may not show up in this message.

someone reported a similar problem on the XML-SIG a while ago.

in that case, like in this case, the extra characters are hexadecimal
numbers, which could mean that urllib, or some server out there,
isn't handling HTTP chunking properly:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top