A question about unicode() function

Discussion in 'Python' started by JTree, Dec 31, 2006.

  1. JTree

    JTree Guest

    Hi,all
    I encountered a problem when using unicode() function to fetch a
    webpage, I don't know why this happenned.
    My codes and error messages are:


    Code:
    #!/usr/bin/python
    #Filename: test.py
    #Modified: 2006-12-31

    import cPickle as p
    import urllib
    import htmllib
    import re
    import sys

    def funUrlFetch(url):
    lambda url:urllib.urlopen(url).read()

    objUrl = raw_input('Enter the Url:')
    content = funUrlFetch(objUrl)
    content = unicode(content,"gbk")
    print content
    content.close()


    error message:

    C:\WINDOWS\system32\cmd.exe /c python test.py
    Enter the Url:http://www.msn.com
    Traceback (most recent call last):
    File "test.py", line 16, in ?
    content = unicode(content,"gbk")
    TypeError: coercing to Unicode: need string or buffer, NoneType found
    shell returned 1
    Hit any key to close this window...

    Any suggestions would be appreciated!

    Thanks!
     
    JTree, Dec 31, 2006
    #1
    1. Advertising

  2. Re: A question about unicode() function

    On 31 Dec 2006 05:20:10 -0800, JTree <> wrote:
    > def funUrlFetch(url):
    > lambda url:urllib.urlopen(url).read()


    This function only creates a lambda function (that is not used or
    assigned anywhere), nothing more, nothing less. Thus, it returns None
    (sort of "void") no matter what is its argument. Probably you meant
    something like

    def funUrlFetch(url):
    return urllib.urlopen(url).read()

    or

    funUrlFetch = lambda url:urllib.urlopen(url).read()


    > objUrl = raw_input('Enter the Url:')
    > content = funUrlFetch(objUrl)


    content gets assigned None. Try putting "print content" before the unicode line.

    > content = unicode(content,"gbk")


    This, equivalent to unicode(None, "gbk"), leads to

    > TypeError: coercing to Unicode: need string or buffer, NoneType found


    None's are not strings nor buffers, so unicode() complains.

    See ya,

    --
    Felipe.
     
    Felipe Almeida Lessa, Dec 31, 2006
    #2
    1. Advertising

  3. JTree

    JTree Guest

    Re: A question about unicode() function

    Hi,

    I changed my codes to:

    #!/usr/bin/python
    #Filename: test.py
    #Modified: 2007-01-01

    import cPickle as p
    import urllib
    import htmllib
    import re
    import sys

    funUrlFetch = lambda url:urllib.urlopen(url).read()

    objUrl = raw_input('Enter the Url:')
    content = funUrlFetch(objUrl)
    content = content.encode('gb2312','ignore')
    print content
    content.close()

    I used "ignore" to deal with the data lose, but it still caused a
    error:

    C:\WINDOWS\system32\cmd.exe /c python tianya.py
    Enter the Url:http://www.tianya.cn
    Traceback (most recent call last):
    File "tianya.py", line 17, in ?
    content = content.encode('gb2312','ignore')
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position
    88: ordinal not in range(128)
    shell returned 1
    Hit any key to close this window...

    My python version is 2.4, Does it have some problems with asian
    encoding support?

    Thanks!


    On Dec 31 2006, 9:30 pm, "Felipe Almeida Lessa"
    <> wrote:
    > On 31 Dec 2006 05:20:10 -0800, JTree <> wrote:
    >
    > > def funUrlFetch(url):
    > > lambda url:urllib.urlopen(url).read()This function only creates a lambda function (that is not used or

    > assigned anywhere), nothing more, nothing less. Thus, it returns None
    > (sort of "void") no matter what is its argument. Probably you meant
    > something like
    >
    > def funUrlFetch(url):
    > return urllib.urlopen(url).read()
    >
    > or
    >
    > funUrlFetch = lambda url:urllib.urlopen(url).read()
    >
    > > objUrl = raw_input('Enter the Url:')
    > > content = funUrlFetch(objUrl)content gets assigned None. Try putting "print content" before the unicode line.

    >
    > > content = unicode(content,"gbk")This, equivalent to unicode(None, "gbk"), leads to

    >
    > > TypeError: coercing to Unicode: need string or buffer, NoneType foundNone's are not strings nor buffers, so unicode() complains.

    >
    > See ya,
    >
    > --
    > Felipe.
     
    JTree, Jan 1, 2007
    #3
  4. JTree

    Tim Roberts Guest

    "JTree" <> wrote:
    >
    >Hi,all
    > I encountered a problem when using unicode() function to fetch a
    >webpage, I don't know why this happenned.
    > My codes and error messages are:
    >
    >
    >Code:
    >#!/usr/bin/python
    >#Filename: test.py
    >#Modified: 2006-12-31
    >
    >import cPickle as p
    >import urllib
    >import htmllib
    >import re
    >import sys
    >
    >def funUrlFetch(url):
    > lambda url:urllib.urlopen(url).read()
    >
    >objUrl = raw_input('Enter the Url:')
    >content = funUrlFetch(objUrl)
    >content = unicode(content,"gbk")
    >print content
    >content.close()


    Once you fix the lambda, as Felipe described, there's another issue here.
    You are telling the unicode function that the string you're passing it is
    an 8-bit string encoded as gbk. How do you know that? In your specific
    example, www.msn.com, I can guarantee it will produce the wrong results:
    www.msn.com is encoded in UTF-8.
    --
    Tim Roberts,
    Providenza & Boekelheide, Inc.
     
    Tim Roberts, Jan 1, 2007
    #4
  5. JTree

    John Machin Guest

    Re: A question about unicode() function

    JTree wrote:
    > Hi,
    >
    > I changed my codes to:
    >
    > #!/usr/bin/python
    > #Filename: test.py
    > #Modified: 2007-01-01
    >
    > import cPickle as p
    > import urllib
    > import htmllib
    > import re
    > import sys
    >
    > funUrlFetch = lambda url:urllib.urlopen(url).read()
    >
    > objUrl = raw_input('Enter the Url:')
    > content = funUrlFetch(objUrl)
    > content = content.encode('gb2312','ignore')


    Why did you change what you had before? "content" is a str, encoded in
    gb2312 (according to the internal evidence). You are now pretending
    that it is unicode, and trying to encode it as gb2312. However because
    it is *not* unicode, Python tries to convert it to unicode first. What
    you have coded above is equivalent to:
    content = content.decode('ascii').encode('gb2312', 'ignore')

    and of course the *decode* fails, as the error message says:
    Unicode*Decode*Error: 'ascii' codec can't decode byte 0xbb in position
    88: ordinal not in range(128)

    It never got any where near the encode()

    So:
    If you want a str encoded in gb2312, leave it alone.
    If you want it in unicode, do this:
    ucontent = unicode(content, 'gb2312')

    > print content


    Try print repr(content)
    It's much better for diagnostic purposes.


    > content.close()


    This will be your next problem; "content" refers to a str object or a
    unicode object -- they don't have a close() method !!

    >
    > I used "ignore" to deal with the data lose, but it still caused a
    > error:


    What data loss???

    >
    > C:\WINDOWS\system32\cmd.exe /c python tianya.py
    > Enter the Url:http://www.tianya.cn
    > Traceback (most recent call last):
    > File "tianya.py", line 17, in ?
    > content = content.encode('gb2312','ignore')
    > UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position
    > 88: ordinal not in range(128)
    > shell returned 1
    > Hit any key to close this window...
    >
    > My python version is 2.4, Does it have some problems with asian
    > encoding support?


    "asian" is irrelevant. You would have got the same problem with just
    about any non-ascii encoding, including cp1252 and similar encodings
    commonly used in English-speaking countries and in western Europe. The
    only encoding support problem with 2.4 is that it can't read your mind.


    By the way, you should upgrade to 2.5, it can't read your mind either,
    but it has more functionality etc :)

    HTH,
    John
     
    John Machin, Jan 1, 2007
    #5
  6. JTree

    JTree Guest

    Re: A question about unicode() function

    Thanks everyone!

    Sorry for my ambiguous question.
    I changed the codes and now it works fine.



    JTree wrote:
    > Hi,all
    > I encountered a problem when using unicode() function to fetch a
    > webpage, I don't know why this happenned.
    > My codes and error messages are:
    >
    >
    > Code:
    > #!/usr/bin/python
    > #Filename: test.py
    > #Modified: 2006-12-31
    >
    > import cPickle as p
    > import urllib
    > import htmllib
    > import re
    > import sys
    >
    > def funUrlFetch(url):
    > lambda url:urllib.urlopen(url).read()
    >
    > objUrl = raw_input('Enter the Url:')
    > content = funUrlFetch(objUrl)
    > content = unicode(content,"gbk")
    > print content
    > content.close()
    >
    >
    > error message:
    >
    > C:\WINDOWS\system32\cmd.exe /c python test.py
    > Enter the Url:http://www.msn.com
    > Traceback (most recent call last):
    > File "test.py", line 16, in ?
    > content = unicode(content,"gbk")
    > TypeError: coercing to Unicode: need string or buffer, NoneType found
    > shell returned 1
    > Hit any key to close this window...
    >
    > Any suggestions would be appreciated!
    >
    > Thanks!
     
    JTree, Jan 1, 2007
    #6
  7. JTree

    Paul Watson Guest

    Re: A question about unicode() function

    JTree wrote:
    > Thanks everyone!
    >
    > Sorry for my ambiguous question.
    > I changed the codes and now it works fine.
    >
    >
    >
    > JTree wrote:
    >> Hi,all
    >> I encountered a problem when using unicode() function to fetch a
    >> webpage, I don't know why this happenned.
    >> My codes and error messages are:
    >>
    >>
    >> Code:
    >> #!/usr/bin/python
    >> #Filename: test.py
    >> #Modified: 2006-12-31
    >>
    >> import cPickle as p
    >> import urllib
    >> import htmllib
    >> import re
    >> import sys
    >>
    >> def funUrlFetch(url):
    >> lambda url:urllib.urlopen(url).read()
    >>
    >> objUrl = raw_input('Enter the Url:')
    >> content = funUrlFetch(objUrl)
    >> content = unicode(content,"gbk")
    >> print content
    >> content.close()
    >>
    >>
    >> error message:
    >>
    >> C:\WINDOWS\system32\cmd.exe /c python test.py
    >> Enter the Url:http://www.msn.com
    >> Traceback (most recent call last):
    >> File "test.py", line 16, in ?
    >> content = unicode(content,"gbk")
    >> TypeError: coercing to Unicode: need string or buffer, NoneType found
    >> shell returned 1
    >> Hit any key to close this window...
    >>
    >> Any suggestions would be appreciated!
    >>
    >> Thanks!


    So... How about posting the brief working code?
     
    Paul Watson, Jan 2, 2007
    #7
  8. JTree

    JTree Guest

    Re: A question about unicode() function

    hi,
    I just removed the unicode() method from my codes.
    As John Machin said, I had an wrong understanding of unicode and ascii.

    Paul Watson wrote:
    > JTree wrote:
    > > Thanks everyone!
    > >
    > > Sorry for my ambiguous question.
    > > I changed the codes and now it works fine.
    > >
    > >
    > >
    > > JTree wrote:
    > >> Hi,all
    > >> I encountered a problem when using unicode() function to fetch a
    > >> webpage, I don't know why this happenned.
    > >> My codes and error messages are:
    > >>
    > >>
    > >> Code:
    > >> #!/usr/bin/python
    > >> #Filename: test.py
    > >> #Modified: 2006-12-31
    > >>
    > >> import cPickle as p
    > >> import urllib
    > >> import htmllib
    > >> import re
    > >> import sys
    > >>
    > >> def funUrlFetch(url):
    > >> lambda url:urllib.urlopen(url).read()
    > >>
    > >> objUrl = raw_input('Enter the Url:')
    > >> content = funUrlFetch(objUrl)
    > >> content = unicode(content,"gbk")
    > >> print content
    > >> content.close()
    > >>
    > >>
    > >> error message:
    > >>
    > >> C:\WINDOWS\system32\cmd.exe /c python test.py
    > >> Enter the Url:http://www.msn.com
    > >> Traceback (most recent call last):
    > >> File "test.py", line 16, in ?
    > >> content = unicode(content,"gbk")
    > >> TypeError: coercing to Unicode: need string or buffer, NoneType found
    > >> shell returned 1
    > >> Hit any key to close this window...
    > >>
    > >> Any suggestions would be appreciated!
    > >>
    > >> Thanks!

    >
    > So... How about posting the brief working code?
     
    JTree, Jan 3, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Mark Bram
    Replies:
    0
    Views:
    3,928
    Robert Mark Bram
    Sep 28, 2003
  2. ygao

    unicode wrap unicode object?

    ygao, Apr 8, 2006, in forum: Python
    Replies:
    6
    Views:
    551
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Apr 8, 2006
  3. Gabriele *darkbard* Farina

    Unicode digit to unicode string

    Gabriele *darkbard* Farina, May 16, 2006, in forum: Python
    Replies:
    2
    Views:
    522
    Gabriele *darkbard* Farina
    May 16, 2006
  4. gabor
    Replies:
    13
    Views:
    556
    Leo Kislov
    Nov 18, 2006
  5. Jean-Paul Calderone
    Replies:
    23
    Views:
    680
    Leo Kislov
    Nov 21, 2006
Loading...

Share This Page