non standard path characters

Robin Becker · May 31, 2007

A kind user reports having problems running the reportlab tests because his path
has non-ascii characters in it eg

......\Mes documents\Mes Téléchargements\Firefox\...

somewhere in the tests we look at the path and then try and convert to utf8 for
display in pdf.

Is there a standard way to do these path string conversions?

Paths appear to come from all sorts of places and given the increasing use of
zip file packaging it doesn't seem appropriate to rely on the current platform
as a single choice for the default encoding.

aspineux · May 31, 2007

I thing you should change the code page before to run the test, doing
something like :

c:\> chcp 850
c:\> ....\python.exe ......\test.py

look for the good code page for you, maybe 850, 437 or 1230 or 1250
should work

Regards

Tijs · May 31, 2007

Robin said:
A kind user reports having problems running the reportlab tests because
his path has non-ascii characters in it eg

.....\Mes documents\Mes TÃ©lÃ©chargements\Firefox\...

somewhere in the tests we look at the path and then try and convert to
utf8 for display in pdf.

Is there a standard way to do these path string conversions?

Paths appear to come from all sorts of places and given the increasing use
of zip file packaging it doesn't seem appropriate to rely on the current
platform as a single choice for the default encoding.

Zip files contain a bit flag for the character encoding (cp430 or utf-8),
see the ZipInfo object in module zipfile and the link (on that page) to the
file format description.
But I think some zip programs just put the path in the zipfile, encoded in
the local code page, in which case you have no way of knowing.

Robin Becker · May 31, 2007

Tijs said:
Robin Becker wrote: ........
Zip files contain a bit flag for the character encoding (cp430 or utf-8),
see the ZipInfo object in module zipfile and the link (on that page) to the
file format description.
But I think some zip programs just put the path in the zipfile, encoded in
the local code page, in which case you have no way of knowing.

thanks for that. I guess the problem is that when a path is obtained from such
an object the code that gets the path usually has no way of knowing what the
intended use is. That makes storage as simple bytes hard. I guess the correct
way is to always convert to a standard (say utf8) and then always know the
required encoding when the thing is to be used.

Guest · May 31, 2007

thanks for that. I guess the problem is that when a path is obtained

from such an object the code that gets the path usually has no way of
knowing what the intended use is. That makes storage as simple bytes
hard. I guess the correct way is to always convert to a standard (say
utf8) and then always know the required encoding when the thing is to be
used.

Inside the program itself, the best things is to represent path names
as Unicode strings as early as possible; later, information about the
original encoding may be lost.

If you obtain path names from the os module, pass Unicode strings
to listdir in order to get back Unicode strings. If they come from
environment variables or command line arguments, use
locale.getpreferredencoding() to find out what the encoding should
be.

If they come from a zip file, Tijs already explained what the encoding
is.

Always expect encoding errors; if they occur, chose to either skip
the file name, or report an error to the user. Notice that listdir
may return a byte string if decoding fails (this may only happen
on Unix).

Regards,
Martin

attempting to print unicode characters.	23	Aug 29, 2010
PEP 3131: Supporting Non-ASCII Identifiers	399	May 13, 2007
logging.fileConfig limitations?	1	Jun 17, 2009
Xah's edu corner: the Journey of Foreign Characters thru Internet	13	Nov 1, 2005
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
[ANN] rest2web 0.5.0 Beta 1 Released	0	Aug 6, 2006
[ANN] JRuby 1.6.0.RC2 released	0	Feb 9, 2011
[ANN] JRuby 1.4.0RC1 Released	0	Oct 3, 2009

non standard path characters

Robin Becker

aspineux

Tijs

Robin Becker

Guest

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads