Is setdefaultencoding bad?

M

moerchendiser2k3

Hi, I embedded Py2.6.1 in my app and I use UTF-8 encoded strings
everywhere in the interface, so the interface between my app and
Python is UTF-8 so I can simply write:

print u"\uC042"
print u"\uC042".encode("utf_8")

and get the corresponding chinese char in the console. But currently
sys.defaultencoding is still ascii. Should I change it in the site.py
and turn it to utf-8 or is this not recommended somehow? I often read
its highly unrecommended but I can't find an explanation why.

Thanks for any hints!!
Bye, moerchendiser2k3
 
N

Nobody

Hi, I embedded Py2.6.1 in my app and I use UTF-8 encoded strings
everywhere in the interface, so the interface between my app and
Python is UTF-8 so I can simply write:

print u"\uC042"
print u"\uC042".encode("utf_8")

and get the corresponding chinese char in the console. But currently
sys.defaultencoding is still ascii. Should I change it in the site.py
and turn it to utf-8 or is this not recommended somehow? I often read
its highly unrecommended but I can't find an explanation why.

You shouldn't use it.

If your code needs to run on any system other than your own, it can't rely
upon the default encoding being set to anything in particular. So
changing the default encoding is an easy way to end up writing code which
doesn't work on any system except your own.

And you can't change the default encoding outside of site.py because the
value has to be constant throughout the lifetime of the process.

IIRC, if you use a unicode string as a dictionary key, and the key can be
converted using the default encoding, the hash is calculated on the
encoded byte string (so that if you have equivalent unicode and byte
strings, both hash to the same value). If you were to change the default
encoding after any dictionaries have been created (internally, Python uses
dictionaries quite extensively), subsequent lookups would use the wrong
hash values.
 
M

moerchendiser2k3

Ok, but that the interface handles UTF-8 strings
are still ok? The defaultencoding is still ascii.
 
C

Chris Rebert

Ok, but that the interface handles UTF-8 strings
are still ok? The defaultencoding is still ascii.

Yes, that's fine. UTF-8 is an excellent encoding choice, and
encoding/decoding should always be done explicitly in Python, so the
"default encoding" ideally ought to never come into play (and indeed,
Python 3 does away with bug-prone implicit encoding/decoding entirely
FWICT). Having ASCII as the "default encoding" ensures that implicit
encoding/decoding bugs are relatively apparent.

Cheers,
Chris
 
N

Nobody

Yes, that's fine. UTF-8 is an excellent encoding choice, and
encoding/decoding should always be done explicitly in Python, so the
"default encoding" ideally ought to never come into play (and indeed,
Python 3 does away with bug-prone implicit encoding/decoding entirely
FWICT).

On Unix, you have to go out of your way to avoid the use of implicit
encoding/decoding with the "filesystem" encoding. This is because Unix
extensively uses byte strings with no associated encoding, but Python 3
tries to use Unicode for everything.

3.0 was essentially unusable as a Unix scripting language for this reason,
as argv and environ were converted to Unicode, with no possibility of
recovering from unconvertible sequences.

3.1 added the surrogate-escape mechanism which allows recovery of the
original byte sequences, albeit with some effort (i.e. you had to
explicitly decode os.environ and sys.argv).

3.2 adds os.environb (bytes version of os.environ), but it appears that
sys.argv still has to be encoded manually. It also provides os.fsencode()
and os.fsdecode() to simplify the conversion.

Most functions accept bytes arguments, most either return bytes when
passed bytes or (if the function accepts no arguments) has a bytes
equivalent. But variables tend to be Unicode strings with no bytes version
(os.environb is the exception rather than the rule), and some functions
have no bytes equivalent (e.g. os.ctermid(), os.uname(), os.ttyname();
fortunately it's rather unlikely that the result from any of these
functions will contain non-ASCII characters).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top