Re: UnicodeEncodeError: 'ascii' codec can't encode character

Discussion in 'Python' started by akhil1988, Jul 16, 2009.

  1. akhil1988

    akhil1988 Guest

    Chris,

    Using

    print (u'line: %s' % line).encode('utf-8')

    the 'line' gets printed, but actually this print statement I was using just
    for testing, actually my code operates on 'line', on which I use line =
    line.decode('utf-8') as 'line' is read as bytes from a stream.

    And if I use line = line.encode('utf-8'),

    I start getting other error like
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4561:
    ordinal not in range(128)
    at line = line.replace('<<', u'«').replace('>>', u'»')


    --Akhil

    Chris Rebert-6 wrote:
    >
    >> Chris Rebert-6 wrote:
    >>>
    >>> On Wed, Jul 15, 2009 at 9:34 PM, akhil1988<> wrote:
    >>>>
    >>>> Hi!
    >>>>
    >>>> Can anyone please help me getting rid of this error:
    >>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in
    >>>> position
    >>>> 13: ordinal not in range(128)
    >>>>
    >>>> I am not a python programmer (though intend to start learning this
    >>>> wonderful
    >>>> language), I am just using a python script.
    >>>>
    >>>> After doing some search, I found that 0xb7 is a 'middle dot character'
    >>>> that
    >>>> is not interpreted by the python.
    >>>> Even after inserting text = text.replace('\u00b7', '') in the script,
    >>>> the
    >>>> problem still persists.
    >>>>
    >>>> Can anyone please tell me the easiest way to get rid of this?
    >>>
    >>> We'll need the full error traceback. The error message at the end is
    >>> just not enough information.
    >>> As to fixing it, google for "UnicodeEncodeError". You should find
    >>> about a million mailinglist threads on it.

    > On Wed, Jul 15, 2009 at 10:05 PM, akhil1988<> wrote:
    >>
    >> Well,
    >> All I get is this traceback:
    >>
    >> File "./customWikiExtractor.py", line 492, in ?
    >> main()
    >> File "./customWikiExtractor.py", line 480, in main
    >> print >> sys.stdout, 'line: %s' % line
    >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in
    >> position
    >> 13: ordinal not in range(128)
    >>
    >> I am giving a string to the python code as input, and python processes it
    >> like this:
    >>
    >> line = line.decode('utf-8').strip()
    >>
    >> After this when I do,
    >> print >> sys.stdout, 'line: %s' % line
    >> I get this Unicode error.

    >
    > Try this instead (the ">> sys.stdout" part is redundant):
    > print (u'line: %s' % line).encode('utf8')
    > #if your system doesn't use UTF-8, change as necessary
    >
    > Cheers,
    > Chris
    > --
    > http://blog.rebertia.com
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    >


    --
    View this message in context: http://www.nabble.com/UnicodeEncode...al-not-in-range(128)-tp24509879p24510519.html
    Sent from the Python - python-list mailing list archive at Nabble.com.
    akhil1988, Jul 16, 2009
    #1
    1. Advertising

  2. Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

    >>>>> akhil1988 <> (a) wrote:

    >a> Chris,


    >a> Using


    >a> print (u'line: %s' % line).encode('utf-8')


    >a> the 'line' gets printed, but actually this print statement I was using just
    >a> for testing, actually my code operates on 'line', on which I use line =
    >a> line.decode('utf-8') as 'line' is read as bytes from a stream.


    >a> And if I use line = line.encode('utf-8'),


    >a> I start getting other error like
    >a> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4561:
    >a> ordinal not in range(128)
    >a> at line = line.replace('<<', u'«').replace('>>', u'»')


    You do a Unicode replace here, so line should be a unicode string.
    Therefore you have to do this before the line.encode('utf-8'), but after
    the decode('utf-8').

    It might be better to use different variables for Unicode strings and
    byte code strings to prevent confusion, like:

    'line' is read as bytes from a stream
    uline = line.decode('utf-8')
    uline = uline.replace('<<', u'«').replace('>>', u'»')
    line = uline.encode('utf-8')
    --
    Piet van Oostrum <>
    URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
    Private email:
    Piet van Oostrum, Jul 16, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Francach
    Replies:
    2
    Views:
    36,432
    Diez B. Roggisch
    Nov 6, 2005
  2. akhil1988
    Replies:
    0
    Views:
    872
    akhil1988
    Jul 16, 2009
  3. Chris Rebert
    Replies:
    0
    Views:
    397
    Chris Rebert
    Jul 16, 2009
  4. akhil1988
    Replies:
    16
    Views:
    1,247
    akhil1988
    Jul 18, 2009
  5. Chris Rebert
    Replies:
    0
    Views:
    746
    Chris Rebert
    Jul 16, 2009
Loading...

Share This Page