Dictionary Keys question


F

FireNWater

I'm curious why the different outputs of this code. If I make the
dictionary with letters as the keys, they are not listed in the
dictionary in alphabetical order, but if I use the integers then the
keys are in numerical order.

I know that the order of the keys is not important in a dictionary,
but I was just curious about what causes the differences. Thanks!!

list1 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
list2 = [1,2,3,4,5,6,7,8]

Dictionary = dict(zip(list1, list2))
print Dictionary

Dictionary1 = dict(zip(list2, list1))
print Dictionary1
 
Ad

Advertisements

C

Christian Heimes

FireNWater said:
I'm curious why the different outputs of this code. If I make the
dictionary with letters as the keys, they are not listed in the
dictionary in alphabetical order, but if I use the integers then the
keys are in numerical order.

I know that the order of the keys is not important in a dictionary,
but I was just curious about what causes the differences. Thanks!!

Currently the order of dict keys depend on the hash of the object. Try
hash(1) and hash('a').
 
D

Dustan

I'm curious why the different outputs of this code. If I make the
dictionary with letters as the keys, they are not listed in the
dictionary in alphabetical order, but if I use the integers then the
keys are in numerical order.

I know that the order of the keys is not important in a dictionary,
but I was just curious about what causes the differences. Thanks!!

list1 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
list2 = [1,2,3,4,5,6,7,8]

Dictionary = dict(zip(list1, list2))
print Dictionary

Dictionary1 = dict(zip(list2, list1))
print Dictionary1

The underlying order is a result, in part, of the key's hash codes*.
Integers are hash coded by their integer values, therefore, they
appear in numeric order. Strings, however, use an algorithm that
ensures as unique hash codes as possible. Notice the difference:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h',
1, 2, 3, 4, 5, 6, 7, 8])
[-468864544, -340864157, -212863774, -84863387, 43136996, 171137383,
299137766, 427138153, 1, 2, 3, 4, 5, 6, 7, 8]

* emphasis on the "in part". Other factors include the amount of
memory space available, order added, the current size of the
underlying hash table, etc.
 
B

Berteun Damman

I'm curious why the different outputs of this code. If I make the
dictionary with letters as the keys, they are not listed in the
dictionary in alphabetical order, but if I use the integers then the
keys are in numerical order.

I know that the order of the keys is not important in a dictionary,
but I was just curious about what causes the differences. Thanks!!

I don't know the exact way Python's hash function works, but I can take
a guess. I'm sorry if I explain something you already know.

A hash is for quickly looking up data. Yet, you don't want to waste too
much memory. So there is a limit number of spaces allocated, in which to
store objects. This number of spaces can be thought of as a list. Then,
if you put something into the dict, Python computes the 'hash' of this
object, which basically forms the index in the list where to store it.
So every object should be mapped onto some index within the list. (If
you retrieve it, the hash is computed again, and the value on that index
is looked up, like list indexing, these are fast operations.)

Say, if you have 100 spaces, and someone puts in integer in the list,
the hashfunction used might be % 100. So the first 100 integers would
always be placed at consecutive places. For strings however, a more
complicated hash-function would be used, which takes into account more
characters, so strings don't end up in order.

For integers, if you put in integers that are spread very widely apart,
they won't end up in order either (see the mod 100 example, 104 will
come before 10).

If you replace the list2 in your example by:
list2 = [10000 * x for x in range(1,9)]

You will see that this one doesn't end up in order either. So, there's
no exception for integers when it comes to the order, yet the particular
properties of the hash function will cause sequential integers to end up
in order under some circumstances.

Berteun

PS:
What happens if two values map onto the same space is of course an
obvious question, and the trick is choosing your hashfunction so this
occurs not very often on average. If it happens there are several
strategies. Wikipedia probably has an explanation of how hash-functions
can work in such a case.
 
G

Gabriel Genellina

I'm curious why the different outputs of this code. If I make the
dictionary with letters as the keys, they are not listed in the
dictionary in alphabetical order, but if I use the integers then the
keys are in numerical order.

I know that the order of the keys is not important in a dictionary,
but I was just curious about what causes the differences. Thanks!!

list1 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
list2 = [1,2,3,4,5,6,7,8]

Dictionary = dict(zip(list1, list2))
print Dictionary

Dictionary1 = dict(zip(list2, list1))
print Dictionary1

Dictionaries use the hash value of the keys to distribute them in
"buckets". Compare these:

for key in list1: print key, hash(key)
for key in list2: print key, hash(key)
 
F

FireNWater

I'm curious why the different outputs of this code. If I make the
dictionary with letters as the keys, they are not listed in the
dictionary in alphabetical order, but if I use the integers then the
keys are in numerical order.
I know that the order of the keys is not important in a dictionary,
but I was just curious about what causes the differences. Thanks!!

I don't know the exact way Python's hash function works, but I can take
a guess. I'm sorry if I explain something you already know.

A hash is for quickly looking up data. Yet, you don't want to waste too
much memory. So there is a limit number of spaces allocated, in which to
store objects. This number of spaces can be thought of as a list. Then,
if you put something into the dict, Python computes the 'hash' of this
object, which basically forms the index in the list where to store it.
So every object should be mapped onto some index within the list. (If
you retrieve it, the hash is computed again, and the value on that index
is looked up, like list indexing, these are fast operations.)

Say, if you have 100 spaces, and someone puts in integer in the list,
the hashfunction used might be % 100. So the first 100 integers would
always be placed at consecutive places. For strings however, a more
complicated hash-function would be used, which takes into account more
characters, so strings don't end up in order.

For integers, if you put in integers that are spread very widely apart,
they won't end up in order either (see the mod 100 example, 104 will
come before 10).

If you replace the list2 in your example by:
list2 = [10000 * x for x in range(1,9)]

You will see that this one doesn't end up in order either. So, there's
no exception for integers when it comes to the order, yet the particular
properties of the hash function will cause sequential integers to end up
in order under some circumstances.

Berteun

PS:
What happens if two values map onto the same space is of course an
obvious question, and the trick is choosing your hashfunction so this
occurs not very often on average. If it happens there are several
strategies. Wikipedia probably has an explanation of how hash-functions
can work in such a case.

Thank you for the explanation. . . I think I now have a (foggy)
understanding of hash tables. It seems to be a way to create order
(an index) out of disorder (random numbers or characters) behind the
scenes. .
 
Ad

Advertisements

R

Ryszard Szopa

The underlying order is a result, in part, of the key's hash codes*.
Integers are hash coded by their integer values, therefore, they
appear in numeric order. Strings, however, use an algorithm that
ensures as unique hash codes as possible. Notice the difference:

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h',
1, 2, 3, 4, 5, 6, 7, 8])
[-468864544, -340864157, -212863774, -84863387, 43136996, 171137383,
299137766, 427138153, 1, 2, 3, 4, 5, 6, 7, 8]

* emphasis on the "in part". Other factors include the amount of
memory space available, order added, the current size of the
underlying hash table, etc.

BTW, can anybody explain me how is the hash function implemented in
Python?

Cheers,

-- Richard
 
M

Marc 'BlackJack' Rintsch

BTW, can anybody explain me how is the hash function implemented in
Python?

It calls the `__hash__()` method on the object that was given as argument.
So there's not *the* hash function, but every type implements its own.

Fallback is the hash of the identity of an object:

In [415]: class A(object): pass
.....:

In [416]: a = A()

In [417]: hash(a)
Out[417]: 161029068

In [418]: hash(id(a))
Out[418]: 161029068

Ciao,
Marc 'BlackJack' Rintsch
 
D

Dustan

Thank you for the explanation. . . I think I now have a (foggy)
understanding of hash tables. It seems to be a way to create order
(an index) out of disorder (random numbers or characters) behind the
scenes. .

The key thing to realize is, quite simply, don't rely on order in a
dictionary. If you do, bad things can happen. The underlying
implementation is not important to know.

But if you really do want to know, let me correct you here, and give a
perhaps clearer explanation (if not, there's no need to read any
further):

The 'order' that your speaking of is not implemented by the hash
*table*, per se, but rather by the hash function, which returns an
integer (the hash code).

The hash table takes the hash code and calculates where in its list to
place the object (as explained before, using modulo to shrink the
integer into the range of the list). If multiple items end up in the
same list, they are placed into a kind of linked list, with each node
containing an object and pointing to the next. Of course, if too many
objects end up in the same bucket, the efficiency of finding an object
in the hash table reduces to that of a linked list, so hash functions
are generally implemented to ensure a unique number (or as unique as
possible) to every object.

Python dictionaries are hash tables that automatically grow as space
is needed. While integers in the range of the list will never change
location unless the list shrinks, larger hash codes can move around
quite apparently randomly. Space available is also a factor, as others
have found out on this list. The order of a dictionary *is*
determined, but factors involved in deciding that order may appear
surprisingly mundane, and certainly variable across runs of your
program.
 
B

Ben Finney

Dustan said:
The key thing to realize is, quite simply, don't rely on order in a
dictionary.

The poster to which you replied is using "order" as contrasted with
"disorder". Clearly dictionaries *do* have order that can be relied
upon.

I think you're using "order" in the colloquial manner; more accurate
would be to say "don't rely on *sequence* in a dictionary".
 
F

FireNWater

The 'order' that your speaking of is not implemented by the hash
*table*, per se, but rather by the hash function, which returns an
integer (the hash code).
Dustan,

Yes, I think I understand it now. . . the order I was referring to is
provided "behind the scenes" by the return values of the hash
function.

Thanks to everyone with their explanations!!
 
Ad

Advertisements

D

Dustan

The poster to which you replied is using "order" as contrasted with
"disorder". Clearly dictionaries *do* have order that can be relied
upon.

He was referring to the index. So was I, as in: Don't rely on the
index, because the size of the dictionary can vary, and therefore, the
index can vary, and therefore, the programmer must recognize that the
order of looping can vary. If you're referring to the actual order of
looping, then I and every good Python Guru (of which I am not one)
disagrees with you. If not, then you're confusing the different
meanings of "order" in this context.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top