Re: 'ascii' codec can't encode character u'\xf3'

Discussion in 'Python' started by oziko, Aug 17, 2004.

  1. oziko

    oziko Guest

    I solve the problem using

    print str.encode('iso-8859-1')

    Now I can print the tags with no aparent problem. But now whe I tried to
    insert that value into a PostgreSQL data base I get the same error. I
    create the PostgreSQL database with default Unicode with

    createdb -E UNICODE oggtest

    The data T am putting into de database si in the u'Perfeccion' format so
    I understand it is UNICODE, but I get the same error:

    Traceback (most recent call last):
    File "./ogg2sql.py", line 82, in ?
    db_cursor.execute(do)
    File "/usr/lib/python2.3/site-packages/pyPgSQL/PgSQL.py", line 3035,
    in execute
    _qstr = self.__unicodeConvert(_qstr)
    File "/usr/lib/python2.3/site-packages/pyPgSQL/PgSQL.py", line 2740,
    in __unicodeConvert
    return obj.encode(*self.conn.client_encoding)
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in
    position 102: ordinal not in range(128)


    my insert query is:

    tracks_insert_values =(unicode(coments['TITLE']),coments['TRACKNUMBER'])

    y also tried with:

    tracks_insert_values=(coments['TITLE'].encode('utf-8'),coments['TRACKNUMBER'])

    insert_query = '''insert into tracks(titulo,no_pista)values(%s %i)''' %
    tracks_insert_values




    Martin Slouf wrote:
    > i had similar errors:
    >
    > Traceback (most recent call last):
    > File "/home/martin/skripty/accounts.py", line 125, in ?
    > main(sys.argv)
    > File "/home/martin/skripty/accounts.py", line 119, in main
    > print_accounts(accounts, url_part)
    > File "/home/martin/skripty/accounts.py", line 94, in print_accounts
    > print str(i).encode("utf-8", "replace")
    > UnicodeEncodeError: 'ascii' codec can't encode characters in position
    > 151-152: ordinal not in range(128)
    >
    > - - - -
    >
    > the solution seems to be:
    >
    > 0. string is not in unicode encoding (assumption)
    > 1. before printing out, convert the string to unicode
    > 2. when printing, convert to whatever charset you like
    >
    > though i dont understand much why (ive solved it a minute ago :) the
    > code should be:
    >
    > str = "any nonunicode string"
    > print unicode(str).encode("iso-8859-2", "replace")
    >
    > comments:
    >
    > 1. why the string is not in unicode can have several reasons -- i guess:
    > - does ogg stores tags in unicode?
    > - you have parsed an xml file with encoding attribute set (that
    > is what i do)
    > - etc
    >
    > 2. "replace" parameter in encode causes non-printable chars to be
    > replaced with '?' (you can use "ignore" or strict", see your python
    > doc)
    >
    > 3. the above will work _only_ _if_ the 'str' encoding is "iso-8859-2" --
    > a funny thing -- first line of code converts from unknown (but the
    > programmer must know it) to unicode and the second one converts it back
    > from unicode to unknown (now the programmer tells that secret to python
    > :)
    >
    > 4. i would like to know from any python expert whether/why/why not:
    >
    > * my assumptions are right
    >
    > * why is that behaviour? -- if you search google you get
    > thousands of errors like this -- with no proper solutions i must add
    >
    > * is there an easier portable way (no sitecustomize.py changes)
    > to do it
    >
    > * i was looking in site.py and there is deleted the
    > sys.setdefaultencoding() function, but from the comments i do
    > not know why -- you know it? why is user not allowed to change the
    > default encoding? it seems reasonable to me if he/she could do that.
    >
    > thx.
    >
    > m.
    >
    oziko, Aug 17, 2004
    #1
    1. Advertising

  2. oziko wrote:

    > Now I can print the tags with no aparent problem. But now whe I tried to
    > insert that value into a PostgreSQL data base I get the same error. I
    > create the PostgreSQL database with default Unicode with


    There seems to be a general misunderstanding about what unicode, an encoding
    and all that together in python means.

    Unicode is only an abstract definition of character-sets - the usual
    suspects like what is in ascii, but also nearly everything somebody on this
    planet of ours cares to write down once in a while.

    Now an actual encoding is how these totally abstract character sets are
    mapped to actual values. So for the capital letter "A", the ascii encoding
    maps it to the well known value 65.

    BUT: You can define another encoding, call it oziko or whatever, and map "A"
    to 1 - if you like it.

    Now UTF-8 is also only an encoding - with the capability to map most of
    ascii on the usual numbers where you expect them, and a few escape chars
    that allow for multi-byte seqhences to appear in the text that encode one
    character. So it can encode the whole unicode set, on the price of not
    beeing able to determine the length of a string by dividing the number of
    bytes it contains it by the number of bytes a character uses - usual one.

    So this is an extremely important lesson: unicode is _not_ - I repeat, _not_
    - UTF-8.

    Now python has unicode objects. They are sequences of characters - what
    shape these internally have is opaque to you and not of your concern. They
    are _not_ strings!!!! strings in python are sequences of bytes - as we are
    used to from C.

    Now whenever you want to use a string that is encoded in a special encoding,
    you can get it from a unicode-object by invoking encode on it. Thats what

    u.encode('iso-8859-1')

    does, if s is a unicode object.

    The other way round, if you have a byte-sequence - conveniently stored in a
    string - and want to get a unicode object from it, use decode

    s.decode('iso-8859-1')

    Now if you pass a unicode object to a function that wants a _string_, python
    applies for you an automatic encode - with the default encoding!!!! As this
    is usually ascii, you get the problems you had.

    So what do you need to solve your problem at hand? You need to know which
    encoding the sql driver wants for transmitting strings - most probably
    utf-8, so they can encode all possible characters. And thus you have to
    encode tthe strings you pass beforehand, or set the default encoding
    properly.

    The last thing is to explain where the u''-thingies fit in. They are a
    shortcut for getting a unicode object - whatever characters are encountered
    inside the u'', is interpreted with the encoding the python interpreter
    uses to parse file at hand. Which one that is can either be specified
    implicit (system settings) or explicit using the


    -*- coding: <codec> -*-

    line on top of the source file.

    You might want to start reading about unicode and python on the net, google
    is as always your friend.

    --
    Regards,

    Diez B. Roggisch
    Diez B. Roggisch, Aug 17, 2004
    #2
    1. Advertising

  3. > So what do you need to solve your problem at hand? You need to know which
    > encoding the sql driver wants for transmitting strings - most probably
    > utf-8, so they can encode all possible characters. And thus you have to
    > encode tthe strings you pass beforehand, or set the default encoding
    > properly.


    Just saw that setting the encoding doesn't work - sorry for suggesting it.
    --
    Regards,

    Diez B. Roggisch
    Diez B. Roggisch, Aug 17, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. oziko
    Replies:
    1
    Views:
    509
    Leif K-Brooks
    Aug 17, 2004
  2. Martin Slouf
    Replies:
    6
    Views:
    913
    Martin Slouf
    Aug 18, 2004
  3. Ben Last
    Replies:
    0
    Views:
    413
    Ben Last
    Aug 17, 2004
  4. thomas Armstrong

    'ascii' codec can't encode character u'\u2013'

    thomas Armstrong, Sep 30, 2005, in forum: Python
    Replies:
    3
    Views:
    4,450
    John J. Lee
    Sep 30, 2005
  5. Fredrik Lundh
    Replies:
    0
    Views:
    1,776
    Fredrik Lundh
    Sep 30, 2005
Loading...

Share This Page