dict.reserve and other tricks

bearophileHUGS · Nov 17, 2006

I have started doing practice creating C extensions for CPython, so
here are two ideas I have had, possibly useless.

If you keep adding elements to a CPython dict/set, it periodically
rebuilds itself. So maybe dict.reserve(n) and a set.reserve(n) methods
may help, reserving enough (empty) memory for about n *distinct* keys
the programmer wants to add to the dict/set in a short future. I have
seen that the the C API of the dicts doesn't allow this, and I don't
know if this can be implemented modifying the dicts a bit. Do you think
this may be useful?

-------------------------------

Sometime I need to remove some elements from dicts/sets. If I know a
rule that tells what elements have to be removed I can use code like
this:

import itertools

def filterset(predicate, inset):
return set(itertools.ifilter(predicate, inset))

print filterset(lambda x: x & 1, set(range(10)))

Output:
set([1, 3, 9, 5, 7])

And a similar one to filter dicts that works with a predicate(key,
val).
Most of the times such functions are good enough, but sometimes the
dicts are big, so to reduce memory used I remove keys in place:

def filterdict(pred, indict):
todel = [k for k,v in indict.iteritems() if not pred(k,v)]
for key in todel:
del indict[key]

d = dict.fromkeys(xrange(8), 0)
print d
filterdict(lambda k,v: k & 1, d)
print d

# Output:
# {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0}
# {1: 0, 3: 0, 5: 0, 7: 0}

But doing the same thing while iterating on the dict may be faster and
use even less memory.
This iteration&deletion capability is probably not safe enough to be
used inside Python programs, but a compiled C module with a function
that works like that filterdict (and deletes while iterating) may be
created, and its use is safe from Python programs.

The C++ STL API of maps allows to delete items while iterating on the
map (but probably it has to be done only when necessary, because it may
be a source for bugs):

typedef map<int, string> isMap;
typedef isMap::value_type isValType;
typedef isMap::iterator isMapItor;

void main() {
isMap m;

... // items added to m

for(isMapItor itr = m.begin(); itr != m.end()

{
if(itr->second == "somestring")
m.erase(itr++);
else
++itr;
}

The Python/C API, 7.4.1 Dictionary Objects, says that's not possible
with CPython dicts:

The dictionary p should not be mutated during iteration. It is safe (since Python 2.1) to modify the values of the keys as you iterate over the dictionary, but only so long as the set of keys does not change.<

Do you think it may be useful to create to create such C filterdict
function that deletes while iterating? (To create such function it
probably has to bypass the CPython dict API).

Bye,
bearophile

Terry Reedy · Nov 17, 2006

If you keep adding elements to a CPython dict/set, it periodically
rebuilds itself. So maybe dict.reserve(n) and a set.reserve(n) methods
may help, reserving enough (empty) memory for about n *distinct* keys
the programmer wants to add to the dict/set in a short future. I have
seen that the the C API of the dicts doesn't allow this, and I don't
know if this can be implemented modifying the dicts a bit. Do you think
this may be useful?

Ideas for user-controlled 'performance tuning' have been proposed on the
development list. GVR has rejected such as more trouble than they are
worth. Instead, effort has gone into making the implementation as good as
he and others know how.

Since you are writing extensions, you can create a built-in subclass of
dict to experiment with. I presume the 2.5 default dict should be a model.

Terry Jan Reedy

Klaas · Nov 17, 2006

I have started doing practice creating C extensions for CPython, so
here are two ideas I have had, possibly useless.

If you keep adding elements to a CPython dict/set, it periodically
rebuilds itself. So maybe dict.reserve(n) and a set.reserve(n) methods
may help, reserving enough (empty) memory for about n *distinct* keys
the programmer wants to add to the dict/set in a short future. I have
seen that the the C API of the dicts doesn't allow this, and I don't
know if this can be implemented modifying the dicts a bit. Do you think
this may be useful?

It has been proposed before and rejected. How often is dict creation a
bottleneck in python apps? I'd guess not often. Do you know of any
examples

Optimize space use also isn't terribly compelling, as a "tight" dict
can be created from a "loose" dict d using dict(d).

Most of the times such functions are good enough, but sometimes the
dicts are big, so to reduce memory used I remove keys in place:

def filterdict(pred, indict):
todel = [k for k,v in indict.iteritems() if not pred(k,v)]
for key in todel:
del indict[key]

But doing the same thing while iterating on the dict may be faster and
use even less memory.

Well, you can reduce the memory usage to virtually nothing by using a
generator expression rather than list comprehension.

This iteration&deletion capability is probably not safe enough to be
used inside Python programs, but a compiled C module with a function
that works like that filterdict (and deletes while iterating) may be
created, and its use is safe from Python programs.

Do you think it may be useful to create to create such C filterdict
function that deletes while iterating? (To create such function it
probably has to bypass the CPython dict API).

Such a beast would be fiendish to write, I think. Remember, arbitrary
python code can be executed by __hash__ and deleting (DECREF) python
objects.

-Mike

bearophileHUGS · Nov 17, 2006

Thank you for the answers Terry Reedy and Klaas.

Since you are writing extensions, you can create a built-in subclass of
dict to experiment with. I presume the 2.5 default dict should be a model.

That way it's doable, but I think it's of limited use too; I'd like to
remove elements from arbitrary (normal) dicts.

Klaas:

Well, you can reduce the memory usage to virtually nothing by using a
generator expression rather than list comprehension.

Are you sure? I don't think so. Can you show a little example?
As you know this can't work:

def filterdict(pred, indict):
todel = (k for k,v in indict.iteritems() if not pred(k,v))
for key in todel:
del indict[key]

d = dict.fromkeys(xrange(8), 0)
print d
filterdict(lambda k,v: k & 1, d)
print d

(You can remove elements from the dict and put them into a list, and
then put back the elements into the dict, this is probably the only way
I know of *theoretically* keeping the memory used about the same with
Python. In practice the less memory consuming version may be the
filterdict I have shown).

arbitrary python code can be executed by __hash__
and deleting (DECREF) python objects.

I am starting to understand. I'll think more about this.

Thank you,
bye,
bearophile

Klaas · Nov 18, 2006

Klaas:

Are you sure? I don't think so. Can you show a little example?

Sorry, that was boneheaded and wrong.

-Mike

Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
Help for my project in the last minute	0	Apr 23, 2022
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
set and dict iteration	42	Aug 16, 2012
stupid namespace tricks	1	Sep 13, 2004
Eclipse memory setting and other tricks	0	Mar 7, 2004
Help with my responsive home page	2	Dec 14, 2022
Linux: using "clone3" and "waitid"	0	Oct 17, 2023

dict.reserve and other tricks

bearophileHUGS

Terry Reedy

Klaas

bearophileHUGS

Klaas

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads