mysterious unicode

Gerry · Mar 20, 2007

I'm using pyExcelerator and xlrd to read and write data from and to
two spreadsheets.

I created the "read" spreadsheet by importing a text file - and I had
no unicode aspirations.

When I read a cell, it appears to be unicode u'Q1", say.

I can try cleaning it, like this:

try:
s.encode("ascii", "replace")
except AttributeError:
pass

which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >> debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >> debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).

Can somebody help me out?

Gabriel Genellina · Mar 20, 2007

which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >> debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >> debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).

I bet qno was unicode from start. When you print an unicode object, you
get the "unadorned" contents. When you print a tuple, it uses repr() on
each item.

py> qno = u"Q1"
py> qno
u'Q1'
py> print qno
Q1
py> print (qno,2)
(u'Q1', 2)

Gerry · Mar 20, 2007

En Tue, 20 Mar 2007 19:35:00 -0300, Gerry <[email protected]>
escribió:

Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why
qno.encode("ascii", "replace") is still unicode.

Gerry

Gabriel Genellina · Mar 20, 2007

Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why

I can't tell...

qno.encode("ascii", "replace") is still unicode.

That *returns* a string, but you are discarding the return value. Should
be qno = qno.encode(...)
It's similar to lower(), by example.

jim-on-linux · Mar 21, 2007

I'm using pyExcelerator and xlrd to read and
write data from and to two spreadsheets.

I created the "read" spreadsheet by importing a
text file - and I had no unicode aspirations.

When I read a cell, it appears to be unicode
u'Q1", say.

I can try cleaning it, like this:

try:
s.encode("ascii", "replace")
except AttributeError:
pass

which seems to work. Here's the mysterious
part (aside from why anything was unicode in
the first place):

print >> debug, "c=", col,
"r=", row, "v=", value, "qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >> debug, "!!!", col,
row, qno, family, tuple, value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple
using qno is (u'Q1', ...).

Can somebody help me out?

I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database. I
take the database info which is in a list and do

name = str.record[0]
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.

jim-on-linux
http://www.inqvista.com

Carsten Haese · Mar 21, 2007

I'm still mystified why:
qno was ever unicode,

Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."

-Carsten

Carsten Haese · Mar 21, 2007

I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database.

Many APIs that exchange data choose to exchange text in Unicode because
that eliminates encoding uncertainty. Whether an API uses Unicode would
probably be noted somewhere in its documentation.

I take the database info which is in a list and do

name = str.record[0]

You probably mean str(record[0]) .

rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.

As a software engineer, I'd get worried if I didn't know why the code I
wrote works. Maybe that's just me.

Unicode is not rocket science. I suggest you read
http://www.amk.ca/python/howto/unicode to demystify what Unicode objects
are and do.

With str(), you're asking the Unicode object for its byte string
interpretation, which causes the Unicode object to give you its encoding
in the system default encoding. The default encoding is normally ascii.
That can be tweaked for your particular Python installation, but if you
need an encoding other than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you risk writing
non-portable code.

Using str() coercion of Unicode objects will work well enough until you
run into a string that contains characters that can't be represented in
the default encoding. Once that happens, you're better off explicitly
encoding the Unicode object into a well-defined encoding on input, or,
even better, just work with Unicode objects internally and only encode
to byte strings when absolutely necessary, such as when outputting to a
file or to the console.

Hope this helps,

Carsten.

jim-on-linux · Mar 21, 2007

I have been getting the same thing using
SQLite3 when extracting data fron an SQLite3
database.

Click to expand...

Many APIs that exchange data choose to exchange
text in Unicode because that eliminates
encoding uncertainty. Whether an API uses
Unicode would probably be noted somewhere in
its documentation.

I take the database info which is in a list
and do

name = str.record[0]

Click to expand...

You probably mean str(record[0]) .

Click to expand...

Yes,

rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure
out why.

Click to expand...

As a software engineer, I'd get worried if I
didn't know why the code I wrote works. Maybe
that's just me.

Click to expand...

I don't disagree, but sometime depending on the
situation, time to investigate is a luxury.
However,
( If you don't have the time to do it right the
first time when will you have the time to fix
it.)

Unicode is not rocket science. I suggest you
read http://www.amk.ca/python/howto/unicode to
demystify what Unicode objects are and do.

With str(), you're asking the Unicode object
for its byte string interpretation, which
causes the Unicode object to give you its
encoding in the system default encoding. The
default encoding is normally ascii. That can be
tweaked for your particular Python
installation, but if you need an encoding other
than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you
risk writing non-portable code.

Using str() coercion of Unicode objects will
work well enough until you run into a string
that contains characters that can't be
represented in the default encoding.

Click to expand...

Right,
even though None or null are not strings they are
common enough to cause a problem.
Try to run a loop through a list with None or
null in it.
Example,
x = str(list[2])
when list[2] = null or None, problems.
Easy to fix but more work.

I'll check the web site out.

Thanks for the update,
Jim-on-linux

John Machin · Mar 21, 2007

Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."

And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write() method.

HTH,
John

Gerry · Mar 21, 2007

And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write() method.

HTH,
John

John,

That helps a lot. Thanks again!

Gerry

Unicode	20	Dec 16, 2012
Yet another "simple" headscratcher	4	May 31, 2014
error messages containing unicode	9	Jan 30, 2007
MySQLdb not playing nice with unicode	1	Mar 30, 2013
How to copy hyperlinks using xlrd, xlwt and xlutils?	0	Oct 17, 2013
Why 'files.py' does not print the filenames into a table format?	32	Jun 15, 2013
Need help with this script	4	Mar 12, 2023
Sharing: File Reader Generator with & w/o Policy	14	Mar 15, 2014

mysterious unicode

Gerry

Gabriel Genellina

Gerry

Gabriel Genellina

jim-on-linux

Carsten Haese

Carsten Haese

jim-on-linux

John Machin

Gerry

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads