Python nuube needs Unicode help

gheissenberger · Jan 11, 2007

HELP!
Guy who was here before me wrote a script to parse files in Python.

Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from Brazil. If I open file to
parse in VI, looks like:

<Utt id="3" transcribe="yes" audioRoot="A1"
audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
transcribedText="não" parsableText="não"/

Clearly those "n&#227" are some non-Ascii characters, but how do I get
print to understand that?

I keep getting:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 40:
ordinal not in range(128)"

Diez B. Roggisch · Jan 11, 2007

HELP!
Guy who was here before me wrote a script to parse files in Python.

Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from Brazil. If I open file to
parse in VI, looks like:

<Utt id="3" transcribe="yes" audioRoot="A1"
audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
transcribedText="não" parsableText="não"/

Clearly those "n&#227" are some non-Ascii characters, but how do I get
print to understand that?

I keep getting:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 40:
ordinal not in range(128)"

Does the error happen at the

print u

line? If yes, what happens is that you try and print a unicode object.
Which means that it has to be converted (actually the right term is
encoded) to a byte-string. If you don't do that explicitely, it will be
done implicitly, using the default encoding - which is ascii.

If you have non-ascii characters, you end up with the error you see.

What to do? Use something like this:

print u.encode('utf-8')

instead.

Diez

gheissenberger · Jan 11, 2007

Progress! You managed to change the error message.

File "./acc_test_script_generator.py", line 106, in loadData
print u.encode('utf-8')
AttributeError: Utterance instance has no attribute 'encode'

I'm missing somethign really obvious here, but I don't know what it
is...

Gabriel Genellina · Jan 12, 2007

At said:
HELP!
Guy who was here before me wrote a script to parse files in Python.

Includes line:
print u
where u is a line from a file we are parsing.
However, we have started recieving data from Brazil. If I open file to
parse in VI, looks like:

<Utt id="3" transcribe="yes" audioRoot="A1"
audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
transcribedText="não" parsableText="não"/

Is this part of an XML document? You should use a
true XML parser instead of doing that by hand.

Clearly those "n&#227" are some non-Ascii characters, but how do I get
print to understand that?

Understanding how Unicode works may be very
useful: http://www.amk.ca/python/howto/unicode

I keep getting:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
position 40:
ordinal not in range(128)"

py> u = u"áéíóú"
py> print u, repr(u)
áéíóú u'\xe1\xe9\xed\xf3\xfa'
py> print str(u)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode
characters in position 0-4: ordin
al not in range(128)
py> print u.encode('cp850')
áéíóú

(cp850 is my console encoding)

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Gabriel Genellina · Jan 12, 2007

At said:
Progress! You managed to change the error message.

File "./acc_test_script_generator.py", line 106, in loadData
print u.encode('utf-8')
AttributeError: Utterance instance has no attribute 'encode'

I'm missing somethign really obvious here, but I don't know what it
is...

Then you're not "printing a line from a file we are parsing", which
should be a string or unicode object. You're printing some
"Utterance" instance; probably it has a __str__ method, and there,
you're mixing unicode+strings.

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Right solution to unicode error?	21	Nov 7, 2012
print() and unicode strings (python 3.1)	12	Aug 24, 2009
Ascii to Unicode.	4	Jul 28, 2010
SMTPHandler and Unicode	13	Jul 5, 2010
Convert unicode escape sequences to unicode in a file	1	Jan 11, 2011
helping with unicode	4	Jul 3, 2012
Yet another unicode WTF	9	Jun 5, 2009
Python 3.3, gettext and Unicode problems	0	Dec 31, 2012

Python nuube needs Unicode help

gheissenberger

Diez B. Roggisch

gheissenberger

Gabriel Genellina

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads