Text Encoding - Like Wrestling Oiled Pigs

Discussion in 'Python' started by apotheos@gmail.com, Dec 8, 2006.

  1. Guest

    So I've got a problem.

    I've got a database of information that is encoded in Windows/CP1252.
    What I want to do is dump this to a UTF-8 encoded text file (a RSS
    feed).

    While the overall problem seems to be related to the conversion, the
    only error I'm getting is a

    "UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position
    163: ordinal not in range(128)"

    So somewhere I'm missing an implicit conversion to ASCII which is
    completely aggrivating my brain.

    So, what fundamental issue am I completely overlooking?

    Code follows.

    def GenerateNoticeRSS():


    output = codecs.open(FILEBASE + 'noticeboard.xml','w','utf-8')


    conn = psycopg.connect(DSN)


    curs = conn.cursor()


    sql_query = "select story.subject as subject, story.content as
    content, story.summary as summary, story.sid as sid, posts.bid as
    board, posts.date_to_publish as date from story$
    curs.execute(sql_query)


    rows = curs.fetchall()


    output.write('<?xml version="1.0" encoding="utf-8"?>\n')


    output.write('<rss version="2.0">\n')



    output.write('<channel>\n')


    output.write('<title>U of L Notice Board</title>\n')


    output.write('<link>http://www.uleth.ca/notice</link>\n')


    output.write('<description>University of Lethbridge News and
    Events</description>\n')


    for each in rows:




    output.write('<item>\n')


    output.write('<title>' + rssTitlePrefix(each[4]) +
    unicode(each[0]) + '</title>\n')


    output.write('<link>http://www.uleth.ca/notice/display.html?b=' +
    str(each[4]) + '&amp;s=' + str(each[3]) + '</link>\n')


    output.write('<guid>http://www.uleth.ca/notice/display.html?b=' +
    str(each[4]) + '&amp;s=' + str(each[3]) + '</guid>\n')
    descript = each[2] + '<BR><BR>' + each[1]





    output.write(u'<description>' + unicode(descript) +
    u'</description>\n') # this is the line that causes the error.


    output.write('</item>\n')
    output.write('</channel>\n')
    output.write('</rss>\n')
    output.close()


    return 0
     
    , Dec 8, 2006
    #1
    1. Advertising

  2. John Machin Guest

    wrote:
    > So I've got a problem.
    >
    > I've got a database of information that is encoded in Windows/CP1252.
    > What I want to do is dump this to a UTF-8 encoded text file (a RSS
    > feed).
    >
    > While the overall problem seems to be related to the conversion, the
    > only error I'm getting is a
    >
    > "UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position
    > 163: ordinal not in range(128)"
    >
    > So somewhere I'm missing an implicit conversion to ASCII which is
    > completely aggrivating my brain.
    >
    > So, what fundamental issue am I completely overlooking?


    That nowhere in your *code* do you mention "I've got a database of
    information that is encoded in Windows/CP1252". This is not recorded
    anywhere in your database. Python is fantastic, but we don't expect a
    readauthorsmind() function until Python 4000 :)

    >
    > Code follows.
    >

    [snip]
    >
    > sql_query = "select story.subject as subject, story.content as
    > content, story.summary as summary, story.sid as sid, posts.bid as
    > board, posts.date_to_publish as date from story$


    The above line has been mangled ... fortunately it doesn't affect the
    diagnostic outcome.

    [snip]
    >
    >
    > output.write(u'<description>' + unicode(descript) +
    > u'</description>\n') # this is the line that causes the error.


    What is happening is that unicode(descript) has not been told what
    encoding to use to decode your "Windows/CP1252" text, and it uses the
    default encoding, "ascii". You need to put unicode(descript, 'cp1252').

    Cheers,
    John
     
    John Machin, Dec 8, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patrick Kowalzick
    Replies:
    5
    Views:
    477
    Patrick Kowalzick
    Mar 14, 2006
  2. Martin Marcher

    We need PIGs :)

    Martin Marcher, Aug 30, 2007, in forum: Python
    Replies:
    11
    Views:
    526
    Marc 'BlackJack' Rintsch
    Sep 16, 2007
  3. W. eWatson
    Replies:
    3
    Views:
    442
    W. eWatson
    Feb 18, 2010
  4. Rick DeNatale
    Replies:
    1
    Views:
    108
    Rick DeNatale
    May 24, 2009
  5. Replies:
    2
    Views:
    373
Loading...

Share This Page