why not in python 2.4.3

Discussion in 'Python' started by Rocco, May 28, 2006.

  1. Rocco

    Rocco Guest

    hi
    I made the upgrade to python 2.4.3 from 2.4.2.
    I want to take from google news some atom feeds with a funtion like
    this
    import urllib2
    def takefeed(url):
    request=urllib2.Request(url)
    request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5;
    Windows NT')
    opener = urllib2.build_opener()
    data=opener.open(request).read()
    return data
    url='http://news.google.it/?output=rss'
    d=takefeed(url)
    This woks well with python 2.3.5 but does not work with 2.4.3.
    Why?
    Thanks
    Rocco, May 28, 2006
    #1
    1. Advertising

  2. Rocco

    Carl Banks Guest

    Rocco wrote:
    > hi
    > I made the upgrade to python 2.4.3 from 2.4.2.
    > I want to take from google news some atom feeds with a funtion like
    > this
    > import urllib2
    > def takefeed(url):
    > request=urllib2.Request(url)
    > request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5;
    > Windows NT')
    > opener = urllib2.build_opener()
    > data=opener.open(request).read()
    > return data
    > url='http://news.google.it/?output=rss'
    > d=takefeed(url)
    > This woks well with python 2.3.5 but does not work with 2.4.3.
    > Why?


    Define "woks [sic] well". It works fine for me on 2.4.3 (and by "works
    fine" I mean it ran without an exception and it returned what appeared
    to be RSS data). If you would give us an exception trace it would help
    a lot.

    Maybe Google's server (or your ISP's) was down. That happens
    sometimes.

    Carl
    Carl Banks, May 28, 2006
    #2
    1. Advertising

  3. Rocco

    Rene Pijlman Guest

    Rocco:
    >but does not work with 2.4.3.


    Define "does not work".

    --
    René Pijlman
    Rene Pijlman, May 28, 2006
    #3
  4. Rocco

    Rocco Guest

    This is the problem when I run the function
    this is the result from 2.3.5
    >>> print rss

    <?xml version="1.0" encoding="UTF-8"?><feed version="0.3" xml:lang="it"
    xmlns="http://purl.org/atom/ns#"><generator>NFE/1.0</generator><title>Google
    News Italia</title><link rel="alternate" type="text/html"
    href="http://news.google.it/"/><tagline>Google News
    Italia</tagline><author><name>Google
    Inc.</name><email></email></author><copyright>&amp;copy;2006
    Google</copyright><modified>2006-05-28T19:09:13+00:00</modified>
    <!-- A couple notes:
    * add an "output=atom" param to get Atom
    * section pages have a "topic=?" param;
    use "topic=h" for a Top Stories section.
    --><entry><title>Benedetto XVI: Wojtyla santo subito - LibertÃ
    </title><link rel="alternate" type="text/html"
    href="http://www.liberta.it/default.asp?IDG=605282024"/><id>tag:news.google.com,2005:cluster=41b535fb</id><summary>Prima
    pagina</summary><issued>2006-05-28T11:05:00+00:00</issued><modified>2006-05-28T11:05:00+00:00</modified><content
    type="text/html" mode="escaped">&lt;br&gt;&lt;table border=0 align=
    cellpadding=5 cellspacing=0&gt;&lt;tr&gt;&lt;td width=80 align=center
    valign=top&gt;&lt;a .....

    >>> import sys
    >>> sys.getdefaultencoding()

    'ascii'
    >>>

    this is the result with 2.4.3
    >>> print rss

    ヒ
    >>> rss

    '\x1f\x8b\x08\x00\x00\x00\x00\x00\x02\xff\xe5}Ks\xe3F\xb6\xe6\xfeF\xdc\xff\x90\xd77\xba\xc3\x9e\x10D\xbc\x01\xcaU\xee\xa1\x9eM[\xa2\xd4$\xabl\xf7\x86\x93\x04\x93Tv\x81H\x1a\x0fV\xa9V\xfe\x0f3\x9b\x8e\x98\x89\xb8\xcb\x1b\xd1\xb3\x9a\xddD\xef\xec\x7f\xe2_2\xe7$\x00\x8a/\x11|\x93\xd6\xb4\xa3U"\x04\x02\x99\xe7d\x9e<\xdfy\xbe\xf9\xd3\xa7\xbeO\x86,\x8c\xb8\x08\xde~\xa1\x9d\xaa_\x10\x16x\xa2\xc3\x83\xde\xdb/\xde5\xaf\x15\xf7\x8b?}\xf3&\x8c\xa2\xe7\x9bt\xb8\xe9\x9b7\xde#\r\x02\xe6\x7f\xf3\xa6\xc7\x02\x16\xd2X\x84\xdf\xd4\xae\xafJ\xf0\x847\xa5\xe7Kob\x1e\xfb\xec\x9b\x1b!z>#5\xf61"\xd5\x98\xfa\x9c\xbe)\xa5\x7fy\xe3\xf3\xe0\xc37\x8fq<8+\x95\x02\xf8\xfbiO\xde{\xca\xe3\xd2\x9b\x92\xfc\xe3\x9b\x0e\x8b\xbc\x90\x0fbx\xfb\xdc\'\x8d\xff\xfd\x8dO\x83^B{\xec\x1b\x1e\xc3\xf7\xf3\x0fo>\xb2\xf6\x1d\x8db\x16~\x83/Q\xba\x8cu\xda\xd4\xfb\xf0_\xb3\xb7y\xa2\xff\xa6\xf4|\xcf\x1bO\x0c\x9eB\xde{\x8c\xbf\xf9#\xed\x0f\xbe\xc6\x8f_\xeb\xaaj\x93\xf4\xfdoJ\xcf7\xbc\x19$\xedK\x1a\xb3o\x1aIpBt\x97\xdc\xd1\'"\xef\xd5\xb53\xcd<3\x1crs\xd7|S\xcao\x83\x11F\xf1y\xc2\xfd\xce2\xdf\x9a\xbc\xf9_\xff\xe5\xcd\xbf)\n\xa9\x10O$\x03
    C b\x16\x9d\xfd\xeb\xbf\x10\xfc\xdf\x7f!\xb4\xd3!4
    _\x88$\x1e$\xf1[\xe0\xda\x17d@C\xda\'\xb1
    =\x16\x93z\xa31\xba9b\x1eR\x0cn\xe8\xb1\x88<\xd2!#\x94|\x11\x8b\x01\xf7\xde\xfe)\xfb\xde\xd7\xd9\xdd\x84$\x11\xcb\xff\xf8\xf8\x05\xe9\x8a\x10nn\x8a\x01i\x00\x979|?{\xda\xe9\xbf\xfe\x8b\xa2|\xf3\x86\xf7%\xd1\x0b\x99\x9f\x84\xfe<\xde\x037J<\x88\xfd\x12\x8f[\xb0\x0e\xe4\xd3"yG+dp\x17\xef\xbe)\xe1W\x97Y<\xa5l,<f\xfd|D\x15\xf2=\xed\x88\x8f\xdcc\'\xc4\xe3q\xfc\xeb\x7f\x90\x80\xc2\xc8\x18\xe9p\xf2\x1d\r\x85\x7fB\xb8O\x1eD\x10\xb3.\xdc\x05\xd4!\xb4\xdbea\x1f\x1659==%\n\xa9\xfa\xa4\xc9\xfa\x03\xb1\xccB\xc6\x8f8\xe0?E\xf4m3]P\xf1[\xb8\xae*\xf0\x9f\xfc\xdc\xed\xbc\xad\xcb_\xe0\xae\xb7\xd9C>~\xfcx\xca\xfd\x18_\x82\x0f\xa1\x83A(\xba"\xe8\xf0>\x0bb\x0e\x04\xea\xb0O\xa74\x

    >>> import sys
    >>> sys.getdefaultencoding()

    'latin_1'
    >>>

    No exception trace
    Thanks again
    Rocco, May 28, 2006
    #4
  5. On 28 May 2006 14:22:55 -0700, "Rocco" <>
    declaimed the following in comp.lang.python:

    > >>> import sys
    > >>> sys.getdefaultencoding()

    > 'latin_1'
    > >>>

    > No exception trace
    > Thanks again


    PythonWin 2.4.3 (#69, Apr 11 2006, 15:32:42) [MSC v.1310 32 bit (Intel)]
    on win32.
    Portions Copyright 1994-2004 Mark Hammond () -
    see 'Help/About PythonWin' for further copyright information.
    >>> import sys
    >>> sys.getdefaultencoding()

    'ascii'
    >>>


    Off-hand -- I'd say it is a problem with your installation... I
    don't know -- some site default package changing encoding, perhaps?
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
    Dennis Lee Bieber, May 28, 2006
    #5
  6. Rocco

    Serge Orlov Guest

    Rocco wrote:

    > >>> import sys
    > >>> sys.getdefaultencoding()

    > 'latin_1'


    Don't change default encoding. It should be always ascii.
    Serge Orlov, May 29, 2006
    #6
  7. Rocco

    Rocco Guest

    Also with ascii the function does not work.
    Rocco, May 29, 2006
    #7
  8. Rocco

    Serge Orlov Guest

    Rocco wrote:
    > Also with ascii the function does not work.


    Well, at least you fixed misconfiguration ;)

    Googling for 1F8B (that's two first bytes from your strange python 2.4
    result) gives a hint: it's a beginning of gzip stream. Maybe urllib2 in
    python 2.4 reports to the server that it supports compressed data but
    doesn't decompress it when receives the reply?
    Serge Orlov, May 29, 2006
    #8
  9. Rocco

    Rocco Guest

    Thanks Serge.
    It's a gzip string.
    So the code is
    >>> import urllib2
    >>> def takefeed(url):

    request=urllib2.Request(url)
    request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE
    5.5;Windows NT')
    opener = urllib2.build_opener()
    data=opener.open(request).read()
    return data

    >>> url='http://news.google.it/?output=rss'
    >>> d=takefeed(url)
    >>> from StringIO import StringIO
    >>> zipdata=StringIO(d)
    >>> import gzip
    >>> gz=gzip.GzipFile(fileobj=zipdata)
    >>> rss=gz.read()
    >>> len(rss)

    102529
    >>> print rss[0:100]

    <?xml version="1.0" encoding="UTF-8"?><rss
    version="2.0"><channel><generator>NFE/1.0</generator><tit
    >>>
    Rocco, May 29, 2006
    #9
  10. Rocco

    John Machin Guest

    On 29/05/2006 10:47 PM, Serge Orlov wrote:
    > Rocco wrote:
    >> Also with ascii the function does not work.

    >
    > Well, at least you fixed misconfiguration ;)
    >
    > Googling for 1F8B (that's two first bytes from your strange python 2.4
    > result) gives a hint: it's a beginning of gzip stream.


    Well done!

    > Maybe urllib2 in
    > python 2.4 reports to the server that it supports compressed data but
    > doesn't decompress it when receives the reply?
    >


    Something funny is happening here. Others reported it working with 2.4.3
    and Rocco's original code as posted in this thread -- which works for me
    on 2.4.2, Windows XP.

    There was one suss thing about Rocco's problem description:
    First message ended with d=takefeed(url)
    But next message said print rss
    Is rss == d?

    Cheers,
    John
    John Machin, May 30, 2006
    #10
  11. Rocco

    John Machin Guest

    On 30/05/2006 12:44 AM, Rocco wrote:
    > Thanks Serge.
    > It's a gzip string.


    Look, Ma, no gzip!!!

    C:\junk>rocco_rss.py
    '<?xml version="1.0" encoding="UTF-8"?><rss
    version="2.0"><channel><generator>NF
    E/1.0</generator><tit'

    C:\junk>type rocco_rss.py
    import urllib2
    def takefeed(url):
    request=urllib2.Request(url)
    request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE
    5.5; Win
    dows NT')
    opener = urllib2.build_opener()
    data=opener.open(request).read()
    return data
    url='http://news.google.it/?output=rss'
    d=takefeed(url)
    print repr(d[:100])
    John Machin, May 30, 2006
    #11
  12. Rocco

    Serge Orlov Guest

    John Machin wrote:
    > On 29/05/2006 10:47 PM, Serge Orlov wrote:
    > > Maybe urllib2 in
    > > python 2.4 reports to the server that it supports compressed data but
    > > doesn't decompress it when receives the reply?
    > >

    >
    > Something funny is happening here. Others reported it working with 2.4.3
    > and Rocco's original code as posted in this thread -- which works for me
    > on 2.4.2, Windows XP.


    It "works" for me too, returning raw uncompressed data.

    > There was one suss thing about Rocco's problem description:
    > First message ended with d=takefeed(url)
    > But next message said print rss
    > Is rss == d?


    Nope. If you look at html tags, 2.3 code returns <feed> <generator> ...
    whereas 2.4 code returns <rss> <channel> <generator> ... That may
    explain why 2.3 result is not compressed and 2.4 result is compressed,
    but that doesn't explain why 2.4 *is* compressed. I looked at python
    2.4 httplib, I'm sure it's not a problem, quote from httplib:

    # we only want a Content-Encoding of "identity" since we
    don't
    # support encodings such as x-gzip or x-deflate.

    I think there is a web accellerator sitting somewhere between Rocco and
    Google server that is confused that Rocco is "misinforming" web server
    saying he's using Firefox, but at the same time claiming that he cannot
    handle compressed data. That's why they teach little kids: don't lie :)
    Serge Orlov, May 30, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?VGltOjouLg==?=

    Why, why, why???

    =?Utf-8?B?VGltOjouLg==?=, Jan 27, 2005, in forum: ASP .Net
    Replies:
    6
    Views:
    549
    Juan T. Llibre
    Jan 27, 2005
  2. Horace Nunley

    why why why does function not work

    Horace Nunley, Sep 27, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    442
    =?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=
    Sep 27, 2006
  3. Mr. SweatyFinger

    why why why why why

    Mr. SweatyFinger, Nov 28, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    853
    Mark Rae
    Dec 21, 2006
  4. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,736
    Smokey Grindel
    Dec 2, 2006
  5. Skybuck Flying
    Replies:
    16
    Views:
    649
    tragomaskhalos
    Aug 25, 2007
Loading...

Share This Page