unicode keys in dicts

Discussion in 'Python' started by Jiba, Jan 8, 2004.

  1. Jiba

    Jiba Guest

    Hi all,

    is the following behaviour normal :

    >>> d = {"é" : 1}
    >>> d["é"]

    1
    >>> d[u"é"]

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    KeyError: u'\xe9'


    it seems that "é" and u"é" are not considered as the same key (in Python
    2.3.3). Though they have the same hash code (returned by hash()).

    And "e" and u"e" (non accentuated characters) are considered as the same
    !

    Jiba
    Jiba, Jan 8, 2004
    #1
    1. Advertising

  2. Jiba

    Jeff Epler Guest

    >>> chr(0xe9) == unichr(0xe9)
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeError: ASCII decoding error: ordinal not in range(128)

    unequal objects can hash to the same value. Your two keys are not
    equal (in fact, you can't even compare them on my system). They would
    be comparable but not equal on many systems, for instance one where the
    system's encoding is Microsoft's CP850.

    You can misconfigure your system to assume that byte strings are in (eg)
    iso-8859-1 encoding by changing site.py.

    Jeff
    Jeff Epler, Jan 8, 2004
    #2
    1. Advertising

  3. Jiba

    Peter Hansen Guest

    Jiba wrote:
    >
    > is the following behaviour normal :
    >
    > >>> d = {"é" : 1}
    > >>> d["é"]

    > 1
    > >>> d[u"é"]

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > KeyError: u'\xe9'
    >
    > it seems that "é" and u"é" are not considered as the same key (in Python
    > 2.3.3). Though they have the same hash code (returned by hash()).
    >
    > And "e" and u"e" (non accentuated characters) are considered as the same
    > !


    Well, "e" and u"e" _are_ the same character, while the unicode that comes
    from decoding the "é" representation is entirely dependent on which codec
    you use for the decoding. It is only the same as u"é" when decoded using
    certain codecs, most likely. ASCII is 7-bit only, so the "é" value is
    not legal in ASCII, which is likely your default encoding.

    For example, try "é".decode('iso-8859-1') and you will probably get the
    unicode value you were expecting.

    I'm not the best to answer this, but I would at least say that the above
    behaviour is considered "normal", though it can be surprising to those
    of us not expert in Unicode issues...

    -Peter
    Peter Hansen, Jan 8, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sandeep Kanwal

    serial keys/validation keys

    sandeep Kanwal, Oct 29, 2004, in forum: C++
    Replies:
    1
    Views:
    583
    Mike Wahler
    Oct 29, 2004
  2. J Berends

    sorting on keys in a list of dicts

    J Berends, Jan 6, 2005, in forum: Python
    Replies:
    1
    Views:
    301
    Paul Rubin
    Jan 6, 2005
  3. Jp Calderone

    Re: sorting on keys in a list of dicts

    Jp Calderone, Jan 6, 2005, in forum: Python
    Replies:
    13
    Views:
    495
    Peter Hansen
    Jan 8, 2005
  4. Harry George
    Replies:
    9
    Views:
    702
    sonal
    Jun 13, 2006
  5. bruce
    Replies:
    0
    Views:
    243
    bruce
    Jan 10, 2012
Loading...

Share This Page