G
Guest
Hi Pythonistas!
I've got a question about storing tuples in a dictionary. First, a
small test case which creates a list of dictionaries:
import time
list_of_dicts = []
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = key
list_of_dicts.append(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk
It creates dictionaries and stores them in a list, printing out
execution times. The size of each dictionary is constant, so is the
execution time for each iteration.
0 0.1
1 0.1
2 0.1
3 0.08
4 0.09
....and so on.
Then, just one line is changed:
my_dict[key] = key
into:
my_dict[key] = (key, key)
Full code:
list_of_dicts = []
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = (key, key)
list_of_dicts.append(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk
The difference is that instead of single values, tuples are added to
the dictionary instead. When the program is run again:
0 0.27
1 0.37
2 0.49
3 0.6
....
16 2.32
17 2.45
18 2.54
19 2.68
The execution time is rising linearly with every new dictionary
created.
Next experiment: dictionaries are not stored in a list, they are just
left out when an iteration has finished. It's done by removing two
lines:
list_of_dicts = []
and
list_of_dicts.append(my_dict)
Full code:
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = (key, key)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk
The time is constant again:
0 0.28
1 0.28
2 0.28
3 0.26
4 0.26
I see no reason for this kind of performance problem, really. It
happens when both things are true: dictionaries are kept in a list (or
more generally, in memory) and they store tuples.
As this goes beyond my understanding of Python internals, I would like
to kindly ask, if anyone has an idea about how to create this data
structure (list of dictionaries of tuples, assuming that size of all
dictionaries is the same), in constant time?
Regards,
Maciej
I've got a question about storing tuples in a dictionary. First, a
small test case which creates a list of dictionaries:
import time
list_of_dicts = []
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = key
list_of_dicts.append(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk
It creates dictionaries and stores them in a list, printing out
execution times. The size of each dictionary is constant, so is the
execution time for each iteration.
0 0.1
1 0.1
2 0.1
3 0.08
4 0.09
....and so on.
Then, just one line is changed:
my_dict[key] = key
into:
my_dict[key] = (key, key)
Full code:
list_of_dicts = []
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = (key, key)
list_of_dicts.append(my_dict)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk
The difference is that instead of single values, tuples are added to
the dictionary instead. When the program is run again:
0 0.27
1 0.37
2 0.49
3 0.6
....
16 2.32
17 2.45
18 2.54
19 2.68
The execution time is rising linearly with every new dictionary
created.
Next experiment: dictionaries are not stored in a list, they are just
left out when an iteration has finished. It's done by removing two
lines:
list_of_dicts = []
and
list_of_dicts.append(my_dict)
Full code:
keys = [str(x) for x in range(200000)]
prev_clk = time.clock()
for i in range(20):
my_dict = {}
for key in keys:
my_dict[key] = (key, key)
new_clk = time.clock()
print i, new_clk - prev_clk
prev_clk = new_clk
The time is constant again:
0 0.28
1 0.28
2 0.28
3 0.26
4 0.26
I see no reason for this kind of performance problem, really. It
happens when both things are true: dictionaries are kept in a list (or
more generally, in memory) and they store tuples.
As this goes beyond my understanding of Python internals, I would like
to kindly ask, if anyone has an idea about how to create this data
structure (list of dictionaries of tuples, assuming that size of all
dictionaries is the same), in constant time?
Regards,
Maciej