mysterious unicode

G

Gerry

I'm using pyExcelerator and xlrd to read and write data from and to
two spreadsheets.

I created the "read" spreadsheet by importing a text file - and I had
no unicode aspirations.

When I read a cell, it appears to be unicode u'Q1", say.

I can try cleaning it, like this:


try:
s.encode("ascii", "replace")
except AttributeError:
pass


which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >> debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >> debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).

Can somebody help me out?
 
G

Gabriel Genellina

which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >> debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >> debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).

I bet qno was unicode from start. When you print an unicode object, you
get the "unadorned" contents. When you print a tuple, it uses repr() on
each item.

py> qno = u"Q1"
py> qno
u'Q1'
py> print qno
Q1
py> print (qno,2)
(u'Q1', 2)
 
G

Gabriel Genellina

Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why

I can't tell...
qno.encode("ascii", "replace") is still unicode.

That *returns* a string, but you are discarding the return value. Should
be qno = qno.encode(...)
It's similar to lower(), by example.
 
J

jim-on-linux

I'm using pyExcelerator and xlrd to read and
write data from and to two spreadsheets.

I created the "read" spreadsheet by importing a
text file - and I had no unicode aspirations.

When I read a cell, it appears to be unicode
u'Q1", say.

I can try cleaning it, like this:


try:
s.encode("ascii", "replace")
except AttributeError:
pass


which seems to work. Here's the mysterious
part (aside from why anything was unicode in
the first place):

print >> debug, "c=", col,
"r=", row, "v=", value, "qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >> debug, "!!!", col,
row, qno, family, tuple, value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple
using qno is (u'Q1', ...).

Can somebody help me out?


I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database. I
take the database info which is in a list and do

name = str.record[0]
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.

jim-on-linux
http://www.inqvista.com
 
C

Carsten Haese

I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database.

Many APIs that exchange data choose to exchange text in Unicode because
that eliminates encoding uncertainty. Whether an API uses Unicode would
probably be noted somewhere in its documentation.
I take the database info which is in a list and do

name = str.record[0]

You probably mean str(record[0]) .
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.

As a software engineer, I'd get worried if I didn't know why the code I
wrote works. Maybe that's just me.

Unicode is not rocket science. I suggest you read
http://www.amk.ca/python/howto/unicode to demystify what Unicode objects
are and do.

With str(), you're asking the Unicode object for its byte string
interpretation, which causes the Unicode object to give you its encoding
in the system default encoding. The default encoding is normally ascii.
That can be tweaked for your particular Python installation, but if you
need an encoding other than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you risk writing
non-portable code.

Using str() coercion of Unicode objects will work well enough until you
run into a string that contains characters that can't be represented in
the default encoding. Once that happens, you're better off explicitly
encoding the Unicode object into a well-defined encoding on input, or,
even better, just work with Unicode objects internally and only encode
to byte strings when absolutely necessary, such as when outputting to a
file or to the console.

Hope this helps,

Carsten.
 
J

jim-on-linux

I have been getting the same thing using
SQLite3 when extracting data fron an SQLite3
database.

Many APIs that exchange data choose to exchange
text in Unicode because that eliminates
encoding uncertainty. Whether an API uses
Unicode would probably be noted somewhere in
its documentation.
I take the database info which is in a list
and do

name = str.record[0]

You probably mean str(record[0]) .

Yes,

rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure
out why.

As a software engineer, I'd get worried if I
didn't know why the code I wrote works. Maybe
that's just me.

I don't disagree, but sometime depending on the
situation, time to investigate is a luxury.
However,
( If you don't have the time to do it right the
first time when will you have the time to fix
it.)
Unicode is not rocket science. I suggest you
read http://www.amk.ca/python/howto/unicode to
demystify what Unicode objects are and do.

With str(), you're asking the Unicode object
for its byte string interpretation, which
causes the Unicode object to give you its
encoding in the system default encoding. The
default encoding is normally ascii. That can be
tweaked for your particular Python
installation, but if you need an encoding other
than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you
risk writing non-portable code.

Using str() coercion of Unicode objects will
work well enough until you run into a string
that contains characters that can't be
represented in the default encoding.
Right,
even though None or null are not strings they are
common enough to cause a problem.
Try to run a loop through a list with None or
null in it.
Example,
x = str(list[2])
when list[2] = null or None, problems.
Easy to fix but more work.

I'll check the web site out.

Thanks for the update,
Jim-on-linux
 
J

John Machin

Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."

And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write() method.

HTH,
John
 
G

Gerry

And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write() method.

HTH,
John

John,

That helps a lot. Thanks again!

Gerry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,119
Latest member
IrmaNorcro
Top