'ascii' codec can't encode character u'\u2013'

Discussion in 'Python' started by thomas Armstrong, Sep 30, 2005.

  1. Hi

    Using Python 2.3.4 + Feedparser 3.3 (a library to parse XML documents)

    I'm trying to parse a UTF-8 document with special characters like
    acute-accent vowels:
    --------
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    ....
    -------

    But I get this error message:
    -------
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in
    position 122: ordinal not in range(128)
    -------

    when trying to execute a MySQL query:
    ----
    query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
    id='" + id + "'"
    cursor.execute (query) #<--- error line
    ----

    I tried with:
    -------
    text_extrated = text_extrated.encode('iso-8859-1') #<--- error line
    query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
    id='" + id + "'"
    cursor.execute (query)
    -------

    But I get this error:
    ------
    UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013'
    in position 92: ordinal not in range(256)
    -----

    I also tried with:
    ----
    text_extrated = re.sub(u'\u2013', '-' , text_extrated)
    query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
    id='" + id + "'"
    cursor.execute (query)
    -----

    It works, but I don't want to substitute each special character,
    because there are
    always forgotten ones which can crack the program.

    Any suggestion to fix it? Thank you very much.
    thomas Armstrong, Sep 30, 2005
    #1
    1. Advertising

  2. thomas Armstrong

    deelan Guest

    thomas Armstrong wrote:
    (...)
    > when trying to execute a MySQL query:
    > ----
    > query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
    > id='" + id + "'"
    > cursor.execute (query) #<--- error line
    > ----


    well, to start it's not the best way to do an update,
    try this instead:

    query = "UPDATE blogs_news SET text = %s WHERE id=%s"
    cursor.execute(query, (text_extrated, id))

    so mysqldb will take care to quote text_extrated automatically. this
    may not not your problem, but it's considered "good style" when dealing
    with dbs.

    apart for this, IIRC feedparser returns text as unicode strings, and
    you correctly tried to encode those as latin-1 str objects before to
    pass it to mysql, but not all glyphs in the orginal utf-8 feed can be
    translated to latin-1. the charecter set of latin-1 is very thin
    compared to the utf-8.

    you have to decide:

    * switch your mysql db to utf-8 and encode stuff before
    insertion to UTF-8

    * lose those characters that cannot be mapped into latin-1,
    using the:

    text_extrated.encode('latin-1', errors='replace')

    so unrecognized chars will be replaced by ?

    also, mysqldb has some support to manage unicode objects directly, but
    things changed a bit during recent releases so i cannot be precise in
    this regard.

    HTH.

    --
    deelan, #1 fan of adriana lima!
    <http://www.deelan.com/>
    deelan, Sep 30, 2005
    #2
    1. Advertising

  3. Hi.

    Thank you both for your answers.

    Finally I changed my MySQL table to UTF-8 and changed the structure
    of the query (with '%s').

    It works. Thank you very much.

    2005/9/30, deelan <>:
    > thomas Armstrong wrote:
    > (...)
    > > when trying to execute a MySQL query:
    > > ----
    > > query = "UPDATE blogs_news SET text = '" + text_extrated + "'WHERE
    > > id='" + id + "'"
    > > cursor.execute (query) #<--- error line
    > > ----

    >
    > well, to start it's not the best way to do an update,
    > try this instead:
    >
    > query = "UPDATE blogs_news SET text = %s WHERE id=%s"
    > cursor.execute(query, (text_extrated, id))
    >
    > so mysqldb will take care to quote text_extrated automatically. this
    > may not not your problem, but it's considered "good style" when dealing
    > with dbs.
    >
    > apart for this, IIRC feedparser returns text as unicode strings, and
    > you correctly tried to encode those as latin-1 str objects before to
    > pass it to mysql, but not all glyphs in the orginal utf-8 feed can be
    > translated to latin-1. the charecter set of latin-1 is very thin
    > compared to the utf-8.
    >
    > you have to decide:
    >
    > * switch your mysql db to utf-8 and encode stuff before
    > insertion to UTF-8
    >
    > * lose those characters that cannot be mapped into latin-1,
    > using the:
    >
    > text_extrated.encode('latin-1', errors='replace')
    >
    > so unrecognized chars will be replaced by ?
    >
    > also, mysqldb has some support to manage unicode objects directly, but
    > things changed a bit during recent releases so i cannot be precise in
    > this regard.
    >
    > HTH.
    >
    > --
    > deelan, #1 fan of adriana lima!
    > <http://www.deelan.com/>
    >
    >
    >
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    thomas Armstrong, Sep 30, 2005
    #3
  4. thomas Armstrong

    John J. Lee Guest

    deelan <> writes:
    [...]
    > query = "UPDATE blogs_news SET text = %s WHERE id=%s"
    > cursor.execute(query, (text_extrated, id))
    >
    > so mysqldb will take care to quote text_extrated automatically. this
    > may not not your problem, but it's considered "good style" when dealing
    > with dbs.

    [...]

    More than just good style: it prevents SQL injection attacks that
    could otherwise allow people to do bad things to your databases.


    John
    John J. Lee, Sep 30, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. oziko
    Replies:
    1
    Views:
    524
    Leif K-Brooks
    Aug 17, 2004
  2. Martin Slouf
    Replies:
    6
    Views:
    933
    Martin Slouf
    Aug 18, 2004
  3. Ben Last
    Replies:
    0
    Views:
    424
    Ben Last
    Aug 17, 2004
  4. oziko
    Replies:
    2
    Views:
    11,453
    Diez B. Roggisch
    Aug 17, 2004
  5. Fredrik Lundh
    Replies:
    0
    Views:
    1,799
    Fredrik Lundh
    Sep 30, 2005
Loading...

Share This Page