collections.Counter surprisingly slow

R

Roy Smith

I've been doing an informal "intro to Python" lunchtime series for some
co-workers (who are all experienced programmers, in other languages).
This week I was going to cover list comprehensions, exceptions, and
profiling. So, I did a little demo showing different ways to build a
dictionary counting how many times a string appears in some input:

test() implements a "look before you leap" python loop

exception() uses a try/catch construct in a similar python loop

default() uses a defaultdict

count() uses a Counter

I profiled it, to show how the profiler works. The full code is below.
The input is an 8.8 Mbyte file containing about 570,000 lines (11,000
unique strings). Python 2.7.3 on Ubuntu Precise.

As I expected, test() is slower than exception(), which is slower than
default(). I'm rather shocked to discover that count() is the slowest
of all! I expected it to be the fastest. Or, certainly, no slower
than default().

The full profiler dump is at the end of this message, but the gist of
it is:

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.322 0.322 ./stations.py:42(count)
1 0.159 0.159 0.159 0.159 ./stations.py:17(test)
1 0.114 0.114 0.114 0.114 ./stations.py:27(exception)
1 0.097 0.097 0.097 0.097 ./stations.py:36(default)

Why is count() [i.e. collections.Counter] so slow?

-------------------------------------------------------------
from collections import defaultdict, Counter

def main():
lines = open('stations.txt').readlines()

d1 = test(lines)
d2 = exception(lines)
d3 = default(lines)
d4 = count(lines)

print d1 == d2
print d1 == d3
print d1 == d4

def test(lines):
d = {}
for station in lines:
if station in d:
d[station] += 1
else:
d[station] = 1
return d


def exception(lines):
d = {}
for station in lines:
try:
d[station] += 1
except KeyError:
d[station] = 1
return d

def default(lines):
d = defaultdict(int)
for station in lines:
d[station] += 1
return d

def count(lines):
d = Counter(lines)
return d


if __name__ == '__main__':
import cProfile
import pstats
cProfile.run('main()', 'stations.stats')
p = pstats.Stats('stations.stats')
p.sort_stats('cumulative').print_stats()
-------------------------------------------------------------

570335 function calls (570332 primitive calls) in 0.776 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.023 0.023 0.776 0.776 <string>:1(<module>)
1 0.006 0.006 0.753 0.753 ./stations.py:5(main)
1 0.000 0.000 0.322 0.322 ./stations.py:42(count)
1 0.000 0.000 0.322 0.322 /usr/lib/python2.7/collections.py:407(__init__)
1 0.242 0.242 0.322 0.322 /usr/lib/python2.7/collections.py:470(update)
1 0.159 0.159 0.159 0.159 ./stations.py:17(test)
1 0.114 0.114 0.114 0.114 ./stations.py:27(exception)
1 0.097 0.097 0.097 0.097 ./stations.py:36(default)
570285 0.080 0.000 0.080 0.000 {method 'get' of 'dict' objects}
1 0.055 0.055 0.055 0.055 {method 'readlines' of 'file' objects}
1 0.000 0.000 0.000 0.000 {isinstance}
1 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/abc.py:128(__instancecheck__)
2/1 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/abc.py:148(__subclasscheck__)
3/1 0.000 0.000 0.000 0.000 {issubclass}
4 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:58(__iter__)
1 0.000 0.000 0.000 0.000 {open}
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:26(__exit__)
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:20(__enter__)
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:36(__init__)
3 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:68(__contains__)
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:52(_commit_removals)
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:81(add)
3 0.000 0.000 0.000 0.000 {getattr}
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:16(__init__)
2 0.000 0.000 0.000 0.000 {method 'remove' of 'set' objects}
2 0.000 0.000 0.000 0.000 {method '__subclasses__' of 'type' objects}
2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_abcoll.py:97(__subclasshook__)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
4 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
 
S

Steven D'Aprano

On Sun, 28 Jul 2013 15:59:04 -0400, Roy Smith wrote:

[...]
I'm rather shocked to discover that count() is the slowest
of all! I expected it to be the fastest. Or, certainly, no slower than
default().

The full profiler dump is at the end of this message, but the gist of it
is:

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.322 0.322 ./stations.py:42(count)
1 0.159 0.159 0.159 0.159 ./stations.py:17(test)
1 0.114 0.114 0.114 0.114 ./stations.py:27(exception)
1 0.097 0.097 0.097 0.097 ./stations.py:36(default)

Why is count() [i.e. collections.Counter] so slow?

It's within a factor of 2 of test, and 3 of exception or default (give or
take). I don't think that's surprisingly slow. In 2.7, Counter is written
in Python, while defaultdict has an accelerated C version. I expect that
has something to do with it.

Calling Counter ends up calling essentially this code:

for elem in iterable:
self[elem] = self.get(elem, 0) + 1

(although micro-optimized), where "iterable" is your data (lines).
Calling the get method has higher overhead than dict[key], that will also
contribute.
 
R

Roy Smith

Why is count() [i.e. collections.Counter] so slow?

It's within a factor of 2 of test, and 3 of exception or default (give or
take). I don't think that's surprisingly slow.[/QUOTE]

It is for a module which describes itself as "High-performance container
datatypes" :)
 
S

Serhiy Storchaka

28.07.13 22:59, Roy Smith напиÑав(ла):
The input is an 8.8 Mbyte file containing about 570,000 lines (11,000
unique strings).

Repeat you tests with totally unique lines.
The full profiler dump is at the end of this message, but the gist of
it is:

Profiler affects execution time. In particular it slowdown Counter
implementation which uses more function calls. For real world
measurement use different approach.
Why is count() [i.e. collections.Counter] so slow?

Feel free to contribute a patch which fixes this "wart". Note that
Counter shouldn't be slowdowned on mostly unique data.
 
S

Stefan Behnel

Steven D'Aprano, 28.07.2013 22:51:
Calling Counter ends up calling essentially this code:

for elem in iterable:
self[elem] = self.get(elem, 0) + 1

(although micro-optimized), where "iterable" is your data (lines).
Calling the get method has higher overhead than dict[key], that will also
contribute.

It comes with a C accelerator (at least in Py3.4dev), but it seems like
that stumbles a bit over its own feet. The accelerator function special
cases the (exact) dict type, but the Counter class is a subtype of dict and
thus takes the generic path, which makes it benefit a bit less than possible.

Look for _count_elements() in

http://hg.python.org/cpython/file/tip/Modules/_collectionsmodule.c

Nevertheless, even the generic C code path looks fast enough in general. I
think the problem is just that the OP used Python 2.7, which doesn't have
this accelerator function.

Stefan
 
J

Joshua Landau

28.07.13 22:59, Roy Smith напиÑав(ла):

The input is an 8.8 Mbyte file containing about 570,000 lines (11,000

Repeat you tests with totally unique lines.


Counter is about ½ the speed of defaultdict in that case (as opposed to ⅓).

The full profiler dump is at the end of this message, but the gist of

Profiler affects execution time. In particular it slowdown Counter
implementation which uses more function calls. For real world measurement
use different approach.


Doing some re-times, it seems that his originals for defaultdict, exception
and Counter were about right. I haven't timed the other.

Why is count() [i.e. collections.Counter] so slow?
Feel free to contribute a patch which fixes this "wart". Note that Counter
shouldn't be slowdowned on mostly unique data.


I find it hard to agree that counter should be optimised for the
unique-data case, as surely it's much more oft used when there's a point to
counting?

Also, couldn't Counter just extend from defaultdict?
 
J

Joshua Landau

Steven D'Aprano, 28.07.2013 22:51:
Calling Counter ends up calling essentially this code:

for elem in iterable:
self[elem] = self.get(elem, 0) + 1

(although micro-optimized), where "iterable" is your data (lines).
Calling the get method has higher overhead than dict[key], that will also
contribute.

It comes with a C accelerator (at least in Py3.4dev), but it seems like
that stumbles a bit over its own feet. The accelerator function special
cases the (exact) dict type, but the Counter class is a subtype of dict and
thus takes the generic path, which makes it benefit a bit less than
possible.

Look for _count_elements() in

http://hg.python.org/cpython/file/tip/Modules/_collectionsmodule.c

Nevertheless, even the generic C code path looks fast enough in general. I
think the problem is just that the OP used Python 2.7, which doesn't have
this accelerator function.

# _count_elements({}, items), _count_elements(dict_subclass(), items),
Counter(items), defaultdict(int) loop with exception handling
# "items" is always 1m long with varying levels of repetition
.... helper.timeit(1), helper_subclass.timeit(1), counter.timeit(1),
default.timeit(1)
....
(0.18816172199876746, 0.4679023139997298, 0.9684444869999425,
0.33518486200046027)
(0.2936601179990248, 0.6056111739999324, 1.1316078849995392,
0.46283868699902087)
(0.35396358400066674, 0.685048443998312, 1.2120939880005608,
0.5497965239992482)
(0.5337620789996436, 0.8658702100001392, 1.4507492869997805,
0.7772859329998028)
(0.745282343999861, 1.1455801379997865, 2.116569702000561,
1.3293145009993168)

:(

I have the helper but Counter is still slow. Is it not getting used for
some reason? It's not even as fast as helper on a dict's (direct, no
overridden methods) subclass.
 
I

Ian Kelly

Also, couldn't Counter just extend from defaultdict?

It could, but I expect the C helper function in 3.4 will be faster
since it doesn't even need to call __missing__ in the first place.
And the cost (both in terms of maintenance and run-time) of adding
defaultdict to the Counter MRO just to reuse one tiny __missing__
method seems questionable, especially after taking into account that
the constructor signature is incompatible.
 
S

Serhiy Storchaka

29.07.13 20:19, Ian Kelly напиÑав(ла):
It could, but I expect the C helper function in 3.4 will be faster
since it doesn't even need to call __missing__ in the first place.

I'm surprised, but the Counter constructor with commented out import of
this accelerator is faster (at least for some data).
 
S

Stefan Behnel

Serhiy Storchaka, 29.07.2013 21:37:
29.07.13 20:19, Ian Kelly напиÑав(ла):

I'm surprised, but the Counter constructor with commented out import of
this accelerator is faster (at least for some data).

Read my post. The accelerator doesn't take the fast path for dicts as
Counter is only a subtype of dict, not exactly a dict. That means that it
raises and catches a KeyError exception for each new value that it finds,
and that is apparently more costly than the overhead of calling get().

So, my expectation is that it's faster for highly repetitive data and
slower for mostly unique data.

Maybe a "fast_dict_lookup" option for the accelerator that forces the fast
path would fix this. The Counter class, just like many (most?) other
subtypes of dict, definitely doesn't need the fallback behaviour.

Stefan
 
S

Stefan Behnel

Stefan Behnel, 30.07.2013 08:39:
Serhiy Storchaka, 29.07.2013 21:37:

Read my post. The accelerator doesn't take the fast path for dicts as
Counter is only a subtype of dict, not exactly a dict. That means that it
raises and catches a KeyError exception for each new value that it finds,
and that is apparently more costly than the overhead of calling get().

So, my expectation is that it's faster for highly repetitive data and
slower for mostly unique data.

Maybe a "fast_dict_lookup" option for the accelerator that forces the fast
path would fix this. The Counter class, just like many (most?) other
subtypes of dict, definitely doesn't need the fallback behaviour.

Or rather drop the fallback path completely. It's not worth having code
duplication if it's not predictable up-front (before looking at the data)
if it will help or not.

http://bugs.python.org/issue18594

Stefan
 
S

Serhiy Storchaka

29.07.13 14:49, Joshua Landau напиÑав(ла):
I find it hard to agree that counter should be optimised for the
unique-data case, as surely it's much more oft used when there's a point
to counting?

Different methods are faster for different data. LBYL approach is best
for the mostly unique data case, while EAFP approach is best for the
mostly repeated data case. In general case a performance of particular
method is a function of its performances in this two extreme cases. When
it slow for one of extreme case it can be slow in a number of
intermediate cases.
Also, couldn't Counter just extend from defaultdict?

Unfortunately this only will slowdown it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,001
Latest member
Kendra00E1

Latest Threads

Top