urllib2 (py2.6) vs urllib.request (py3)

Discussion in 'Python' started by R. David Murray, Mar 17, 2009.

  1. mattia <> wrote:
    > Hi all, can you tell me why the module urllib.request (py3) add extra
    > characters (b'fef\r\n and \r\n0\r\n\r\n') in a simple example like the
    > following and urllib2 (py2.6) correctly not?
    >
    > py2.6
    > >>> import urllib2
    > >>> f = urllib2.urlopen("http://www.google.com").read()
    > >>> fd = open("google26.html", "w")
    > >>> fd.write(f)
    > >>> fd.close()

    >
    > py3
    > >>> import urllib.request
    > >>> f = urllib.request.urlopen("http://www.google.com").read()
    > >>> with open("google30.html", "w") as fd:

    > ... print(f, file=fd)
    > ...
    > >>>

    >
    > Opening the two html pages with ff I've got different results (the extra
    > characters mentioned earlier), why?


    The problem isn't a difference between urllib2 and urllib.request, it
    is between fd.write and print. This produces the same result as
    your first example:


    >>> import urllib.request
    >>> f = urllib.request.urlopen("http://www.google.com").read()
    >>> with open("temp3.html", "wb") as fd:

    .... fd.write(f)


    The "b'....'" is the stringified representation of a bytes object,
    which is what urllib.request returns in python3. Note the 'wb',
    which is a critical difference from the python2.6 case. If you
    omit the 'b' in python3, it will complain that you can't write bytes
    to the file object.

    The thing to keep in mind is that print converts its argument to string
    before writing it anywhere (that's the point of using it), and that
    bytes (or buffer) and string are very different types in python3.

    --
    R. David Murray http://www.bitdance.com
    R. David Murray, Mar 17, 2009
    #1
    1. Advertising

  2. mattia <> wrote:
    > Il Tue, 17 Mar 2009 10:55:21 +0000, R. David Murray ha scritto:
    >
    > > mattia <> wrote:
    > >> Hi all, can you tell me why the module urllib.request (py3) add extra
    > >> characters (b'fef\r\n and \r\n0\r\n\r\n') in a simple example like the
    > >> following and urllib2 (py2.6) correctly not?
    > >>
    > >> py2.6
    > >> >>> import urllib2
    > >> >>> f = urllib2.urlopen("http://www.google.com").read() fd =
    > >> >>> open("google26.html", "w")
    > >> >>> fd.write(f)
    > >> >>> fd.close()
    > >>
    > >> py3
    > >> >>> import urllib.request
    > >> >>> f = urllib.request.urlopen("http://www.google.com").read() with
    > >> >>> open("google30.html", "w") as fd:
    > >> ... print(f, file=fd)
    > >> ...
    > >> >>>
    > >> >>>
    > >> Opening the two html pages with ff I've got different results (the
    > >> extra characters mentioned earlier), why?

    > >
    > > The problem isn't a difference between urllib2 and urllib.request, it is
    > > between fd.write and print. This produces the same result as your first
    > > example:
    > >
    > >
    > >>>> import urllib.request
    > >>>> f = urllib.request.urlopen("http://www.google.com").read() with
    > >>>> open("temp3.html", "wb") as fd:

    > > ... fd.write(f)
    > >
    > >
    > > The "b'....'" is the stringified representation of a bytes object, which
    > > is what urllib.request returns in python3. Note the 'wb', which is a
    > > critical difference from the python2.6 case. If you omit the 'b' in
    > > python3, it will complain that you can't write bytes to the file object.
    > >
    > > The thing to keep in mind is that print converts its argument to string
    > > before writing it anywhere (that's the point of using it), and that
    > > bytes (or buffer) and string are very different types in python3.

    >
    > Well... now in the saved file I've got extra characters "fef" at the
    > begin and "0" at the end...


    The 'fef' is reminiscent of a BOM. I don't see any such thing in the
    data file produced by my code snippet above. Did you try running that,
    or did you modify your code? If the latter, maybe if you post your
    exact code I can try to run it and see if I can figure out what is going on.

    I'm far from an expert in unicode issues, by the way :) Oh, and I'm running
    3.1a1+ from svn, by the way, so it is also possible there's been a bug
    fix of some sort.

    --
    R. David Murray http://www.bitdance.com
    R. David Murray, Mar 17, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Holger Joukl

    py2.1->py2.3.3 __getattr__ confusion

    Holger Joukl, Jul 2, 2004, in forum: Python
    Replies:
    1
    Views:
    308
    Peter Otten
    Jul 2, 2004
  2. Holger Joukl
    Replies:
    2
    Views:
    253
    Michael Hudson
    Jul 9, 2004
  3. Josef Dalcolmo

    getmtime differs between Py2.5 and Py2.4

    Josef Dalcolmo, May 7, 2007, in forum: Python
    Replies:
    16
    Views:
    535
    Joe Salmeri
    Jun 1, 2007
  4. gervaz
    Replies:
    0
    Views:
    767
    gervaz
    Dec 16, 2009
  5. Vlastimil Brom
    Replies:
    5
    Views:
    306
    Vlastimil Brom
    Nov 19, 2010
Loading...

Share This Page