Is setdefaultencoding bad?

Discussion in 'Python' started by moerchendiser2k3, Feb 23, 2011.

  1. Hi, I embedded Py2.6.1 in my app and I use UTF-8 encoded strings
    everywhere in the interface, so the interface between my app and
    Python is UTF-8 so I can simply write:

    print u"\uC042"
    print u"\uC042".encode("utf_8")

    and get the corresponding chinese char in the console. But currently
    sys.defaultencoding is still ascii. Should I change it in the site.py
    and turn it to utf-8 or is this not recommended somehow? I often read
    its highly unrecommended but I can't find an explanation why.

    Thanks for any hints!!
    Bye, moerchendiser2k3
    moerchendiser2k3, Feb 23, 2011
    #1
    1. Advertising

  2. moerchendiser2k3

    Nobody Guest

    On Tue, 22 Feb 2011 19:34:21 -0800, moerchendiser2k3 wrote:

    > Hi, I embedded Py2.6.1 in my app and I use UTF-8 encoded strings
    > everywhere in the interface, so the interface between my app and
    > Python is UTF-8 so I can simply write:
    >
    > print u"\uC042"
    > print u"\uC042".encode("utf_8")
    >
    > and get the corresponding chinese char in the console. But currently
    > sys.defaultencoding is still ascii. Should I change it in the site.py
    > and turn it to utf-8 or is this not recommended somehow? I often read
    > its highly unrecommended but I can't find an explanation why.


    You shouldn't use it.

    If your code needs to run on any system other than your own, it can't rely
    upon the default encoding being set to anything in particular. So
    changing the default encoding is an easy way to end up writing code which
    doesn't work on any system except your own.

    And you can't change the default encoding outside of site.py because the
    value has to be constant throughout the lifetime of the process.

    IIRC, if you use a unicode string as a dictionary key, and the key can be
    converted using the default encoding, the hash is calculated on the
    encoded byte string (so that if you have equivalent unicode and byte
    strings, both hash to the same value). If you were to change the default
    encoding after any dictionaries have been created (internally, Python uses
    dictionaries quite extensively), subsequent lookups would use the wrong
    hash values.
    Nobody, Feb 23, 2011
    #2
    1. Advertising

  3. Ok, but that the interface handles UTF-8 strings
    are still ok? The defaultencoding is still ascii.
    moerchendiser2k3, Feb 23, 2011
    #3
  4. moerchendiser2k3

    Chris Rebert Guest

    On Wed, Feb 23, 2011 at 3:07 AM, moerchendiser2k3
    <> wrote:
    > Ok, but that the interface handles UTF-8 strings
    > are still ok? The defaultencoding is still ascii.


    Yes, that's fine. UTF-8 is an excellent encoding choice, and
    encoding/decoding should always be done explicitly in Python, so the
    "default encoding" ideally ought to never come into play (and indeed,
    Python 3 does away with bug-prone implicit encoding/decoding entirely
    FWICT). Having ASCII as the "default encoding" ensures that implicit
    encoding/decoding bugs are relatively apparent.

    Cheers,
    Chris
    --
    http://blog.rebertia.com
    Chris Rebert, Feb 23, 2011
    #4
  5. moerchendiser2k3

    Nobody Guest

    On Wed, 23 Feb 2011 04:14:29 -0800, Chris Rebert wrote:

    >> Ok, but that the interface handles UTF-8 strings
    >> are still ok? The defaultencoding is still ascii.

    >
    > Yes, that's fine. UTF-8 is an excellent encoding choice, and
    > encoding/decoding should always be done explicitly in Python, so the
    > "default encoding" ideally ought to never come into play (and indeed,
    > Python 3 does away with bug-prone implicit encoding/decoding entirely
    > FWICT).


    On Unix, you have to go out of your way to avoid the use of implicit
    encoding/decoding with the "filesystem" encoding. This is because Unix
    extensively uses byte strings with no associated encoding, but Python 3
    tries to use Unicode for everything.

    3.0 was essentially unusable as a Unix scripting language for this reason,
    as argv and environ were converted to Unicode, with no possibility of
    recovering from unconvertible sequences.

    3.1 added the surrogate-escape mechanism which allows recovery of the
    original byte sequences, albeit with some effort (i.e. you had to
    explicitly decode os.environ and sys.argv).

    3.2 adds os.environb (bytes version of os.environ), but it appears that
    sys.argv still has to be encoded manually. It also provides os.fsencode()
    and os.fsdecode() to simplify the conversion.

    Most functions accept bytes arguments, most either return bytes when
    passed bytes or (if the function accepts no arguments) has a bytes
    equivalent. But variables tend to be Unicode strings with no bytes version
    (os.environb is the exception rather than the rule), and some functions
    have no bytes equivalent (e.g. os.ctermid(), os.uname(), os.ttyname();
    fortunately it's rather unlikely that the result from any of these
    functions will contain non-ASCII characters).
    Nobody, Feb 24, 2011
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. christof hoeke

    xml processing and sys.setdefaultencoding

    christof hoeke, Jul 20, 2003, in forum: Python
    Replies:
    4
    Views:
    828
    Martin v. =?iso-8859-15?q?L=F6wis?=
    Jul 21, 2003
  2. Askari

    sys.setdefaultencoding(name)

    Askari, Sep 18, 2004, in forum: Python
    Replies:
    5
    Views:
    5,330
    Askari
    Sep 20, 2004
  3. Robin Becker

    sys.setdefaultencoding

    Robin Becker, Aug 28, 2007, in forum: Python
    Replies:
    1
    Views:
    378
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Aug 28, 2007
  4. smalltalk

    setdefaultencoding error

    smalltalk, Dec 8, 2007, in forum: Python
    Replies:
    2
    Views:
    1,096
    smalltalk
    Dec 10, 2007
  5. rantingrick
    Replies:
    44
    Views:
    1,208
    Peter Pearson
    Jul 13, 2010
Loading...

Share This Page