Optimizing multiple dispatch

  • Thread starter Jacek Generowicz
  • Start date
J

Jacek Generowicz

I have a multiple disptacher which, conceptually, looks something like
this:



class Multimethod:

def __init__(self):
self.methods = {}

def __call__(self, *args):
return self.methods[tuple(map(type, args))](*args)

def add_method(self, method, *types):
self.methods[types] = method

foo = Multimethod()

def fooii(a, b):
print "We got two ints"

def foosb(a, b):
print "We got a string and a bar"

class bar(object): pass

foo.add_method(fooii, int, int)
foo.add_method(foosb, str, bar)

foo(1,2)
foo("a string", bar())



My actual code looks nothing like this[*], because it contains a
number of optimizations, and addresses some domain-specific
considerations which are not relevant in this context; but the code
shown should highlight the salient points.

Profiling suggests that "tuple(map(type, args))" is taking a
significant proportion of time in certain critical loops.

Do you have any suggestions about how to make in run faster? (Either
by optimizing "tuple(map(type, args)", or by using a completely
different organization for the whole thing. There is almost certainly
no point in addressing any other implementation detail[*] of what is
shown here, as it is likely to be absent from my real code.)


[*] For example, there is no __call__ in my implementation; it's
implementeted as a closure.
 
B

Brian Gough

Jacek Generowicz said:
Profiling suggests that "tuple(map(type, args))" is taking a
significant proportion of time in certain critical loops.
Do you have any suggestions about how to make in run faster? (Either
by optimizing "tuple(map(type, args)", or by using a completely
different organization for the whole thing.

If the arguments inside the loop have fixed types, then maybe you
could have a method to get the reference to the function (once)
outside the loop.

This would be like methodForSelector in Objective-C.
 
J

Jeff Epler

Avoding map() may help:
tuple([type(a) for a in args])
Using generator expressions (aren't they in 2.4?) might or might not
help:
tuple(type(a) for a in args)

You could write a small extension to perform this operation without all
the Python function calls.

cdef extern from "Python.h":
extern int PyTuple_Check(object)
extern object PyTuple_New(int)
extern int PyTuple_GET_SIZE(object)
extern void *PyObject_Type(void*)
extern void PyTuple_SET_ITEM(object, int, void*)
extern void *PyTuple_GET_ITEM(object, int)

def maptype(i):
if not PyTuple_Check(i): raise TypeError
cdef int l
cdef int j
l = PyTuple_GET_SIZE(i)
o = PyTuple_New(l)
for j from 0 <= j < l:
PyTuple_SET_ITEM(o, j, PyObject_Type(PyTuple_GET_ITEM(i, j)))
return o
(<type 'str'>, <type 'int'>, <type 'float'>, <type 'long'>, <type 'type'>, <type 'type'>, <type 'type'>)

$ timeit -s 's = ("", 0, 0.0, 0L, str, int, type); from maptype import maptype' 'maptype(s)'
100000 loops, best of 3: 2.41 usec per loop
$ timeit -s 's = ("", 0, 0.0, 0L, str, int, type)' -s 'def maptype(s): return tuple([type(i) for i in s])' 'maptype(s)'
10000 loops, best of 3: 25.3 usec per loop
$ timeit -s 's = ("", 0, 0.0, 0L, str, int, type)' -s 'def maptype(s): return tuple(map(type, s))' 'maptype(s)'
100000 loops, best of 3: 17 usec per loop

... hmm, map is faster than listcomp. my mistake!

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFAvy2pJd01MZaTXX0RAvgPAJ9+smiDH0o+bo5rwwsOri1hl7jtIQCeKCXR
BE8bms1T8IJduTWe7+ivEwQ=
=Xtuc
-----END PGP SIGNATURE-----
 
J

Jacek Generowicz

[Hi, Brian, fancy meeting you here !]

Brian Gough said:
If the arguments inside the loop have fixed types, then maybe you
could have a method to get the reference to the function (once)
outside the loop.

Yup, we do that already ... it gives us more than a factor of 2
improvement in a particular politically important benchmark.

But, this must remain an option open to the user, and there is still a
need to make the thing go faster without pre-selection.
 
J

Jacek Generowicz

Jeff Epler said:
Avoding map() may help:
tuple([type(a) for a in args])

Nonononononooooo! :)
You could write a small extension to perform this operation without all
the Python function calls.


[... Pyrex ...]
100000 loops, best of 3: 2.41 usec per loop [... List comprehension ...]
10000 loops, best of 3: 25.3 usec per loop [... map ...]
100000 loops, best of 3: 17 usec per loop

(I'm pretty sure that should be 10**5 loops in the list comp case)

Hmm, ok, the extension seems to be significantly faster. This
surprises me, because everything that executes in
"tuple(map(type,args))" is coded in C already, so I don't see where
the gain is coming from.

I've never got around to trying Pyrex ... and I won't be allowed to
depend on it formally. Would it be feasible to use Pyrex to generate
the extension source code and paste[*] that into my extension module?
IOW, is there a run-time dependence on Pyrex?


... hmm, map is faster than listcomp. my mistake!

:)

I've asked for optimization advice a few times here, and usually the
responses include "replace map with a list comprehension" ... and yet
the map is always faster. I wonder where people get this idea that map
is slow ... until this started happening I had always assumed that
"everyone" knows that the map/reduce/filter family are the fastest
Python looping constructs.

</ASIDE>

[*] Ugh, I shudder at the thought, putting the products of code
generators into CVS really goes against my fundamental principles.
 
J

Jacek Generowicz

Jacek Generowicz said:
Hmm, ok, the extension seems to be significantly faster. This
surprises me, because everything that executes in
"tuple(map(type,args))" is coded in C already, so I don't see where
the gain is coming from.

Forgot to mention: your map and list comp versions included an
unnecessary call to a pure Python function (maptype). The extension
you proposed would replace an inlined "tuple(map(type, args))", so we
don't get hit by pure Python function call overhead at that point[*].

Still, eliminating that overhead knocks off only one third of the
time, while your extension was better by a factor of 7 ... so there's
still a factor of 4.5ish to be had by coding it as an extension
(sorry, I don't have Pyrex, so I can't show the time):

-s 's = ("", 0, 0.0, 0L, str, int, type)' -s 'def maptype(s): return tuple(map(type, s))' 'maptype(s)'
100000 loops, best of 3: 4.73 usec per loop
-s 's = ("", 0, 0.0, 0L, str, int, type)' 'tuple(map(type, s))'
100000 loops, best of 3: 4.46 usec per loop



[*] There is a pure Python function call overhead I would like to
eliminate by recoding in C, but it concerns a closure (which Pyrex
doesn't allow, IIRC), and that seems a bit tricky to do.
 
H

Heiko Wundram

Am Donnerstag, 3. Juni 2004 16:55 schrieb Jacek Generowicz:
[*] Ugh, I shudder at the thought, putting the products of code
generators into CVS really goes against my fundamental principles.

Writing this module by hand shouldn't be much harder than writing it using
Pyrex. The Python/C-API is very clean, and there's good documentation in the
Python Documentation section on Python.org...

I guess you could squeeze out another 0.1 usecs by writing it by hand, because
Pyrex sometimes generates suboptimal C code, on another note ;)

Heiko.
 
J

Jacek Generowicz

Jacek Generowicz said:
(I'm pretty sure that should be 10**5 loops in the list comp case)

I'm sorry, ignore that, I was talking out of my ****
 
J

Jeff Epler

Pyrex is just a translator, there's no dependency on a Pyrex library or
include file when you actually want to compile the generated .c

Jeff Epler said:
Avoding map() may help:
tuple([type(a) for a in args])

Nonononononooooo! :) [...]
... hmm, map is faster than listcomp. my mistake!

:)

I've asked for optimization advice a few times here, and usually the
responses include "replace map with a list comprehension" ... and yet
the map is always faster. I wonder where people get this idea that map
is slow ... until this started happening I had always assumed that
"everyone" knows that the map/reduce/filter family are the fastest
Python looping constructs.

No message about optimization should be without one downright wrong
suggestion.

The common wisdom about listcomp speed may apply when the body doesn't
include a function call, but the map version would include a lambda:
$ timeit 'map(lambda x:x+1, range(100))'
1000 loops, best of 3: 230 usec per loop
$ timeit '[x+1 for x in range(100)]'
10000 loops, best of 3: 156 usec per loop


Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFAv0QgJd01MZaTXX0RAmwoAJ920fACmXQR0UWd03BMQybUuKWEOwCfWDPB
Yc059n1wXFyym54rUNFnctg=
=2YRY
-----END PGP SIGNATURE-----
 
D

David Bolen

Jacek Generowicz said:
I've never got around to trying Pyrex ... and I won't be allowed to
depend on it formally. Would it be feasible to use Pyrex to generate
the extension source code and paste[*] that into my extension module?
IOW, is there a run-time dependence on Pyrex?

Probably not even necessary - doing the same operation directly in
your extension module with straight C code shouldn't be much work (the
example Pyrex code posted is largely just a set of calls tov the
Python API anyway already). About the only thing that can't be
written directly in C is probably the "raise TypeError" part, but for
that you can just set the error and return NULL from the function, I
believe.

-- David
 
C

Carl Banks

Jeff Epler said:
Avoding map() may help:
tuple([type(a) for a in args]) [snip]

... hmm, map is faster than listcomp. my mistake!

Rule of thumb: listcomp is only faster than map if it avoids an extra
function call. So, for example, "[ x+1 for x in xs ]" is probably
faster than "map(lambda x:x+1,xs)".

But, since this particular use doesn't avoid a function call, the map
is faster.
 
C

Carl Banks

Jacek Generowicz said:
Jeff Epler said:
100000 loops, best of 3: 2.41 usec per loop [... List comprehension ...]
10000 loops, best of 3: 25.3 usec per loop [... map ...]
100000 loops, best of 3: 17 usec per loop

(I'm pretty sure that should be 10**5 loops in the list comp case)

Hmm, ok, the extension seems to be significantly faster. This
surprises me, because everything that executes in
"tuple(map(type,args))" is coded in C already, so I don't see where
the gain is coming from.

Because not all C is the same. For example, to execute something like
"type(m)", Python has to construct a tuple for the arguments, and call
the type object with it. The type object new function has some
special casing (to get the behavior of the old type function) and
argument checks and stuff in it.

OTOH, PyObject_Type is probably a macro defined like this:
#define PyObject_Type(_obj) (_obj->ob_type)

So, even though type(m) is "coded in C already", it's not really fast
C.
 
J

Jacek Generowicz

Heiko Wundram said:
Writing this module by hand shouldn't be much harder than writing it
using Pyrex. The Python/C-API is very clean, and there's good
documentation in the Python Documentation section on Python.org...

Well, I still find writing pure Python/C-API to be a pain.
I guess you could squeeze out another 0.1 usecs by writing it by
hand, because Pyrex sometimes generates suboptimal C code, on
another note ;)

I probably write suboptimal code too ... probably more suboptimal than
Pyrex :)
 
H

Howard Stearns

In your Multimethods, is the number of args fixed and known at the time
you first call Multimethod(), or do you need to be able to support dispatching
based on the number as well as the type of the arguments?

If the former, your __init__ method can take an n_args_to_dispatch argument,
and use this to create a function that collects the types. For the two arg
case shown here,
lambda args: (type(args[0]), type(args[1]))

I suppose you would save more if type_collector1(), type_collector2(), etc.
were written in C.

You could save even more if __call__ took an explict signature that you
didn't have to access from a tuple. This isn't an option for a general purpose
Multmethod, but might be ok in your domain-specific situation. Or you could have
classes Multimethod1, Multimethod2, etc.
>>> Timer('tuple(map(type, (1, 2)))').timeit() 5.4792276672034177
>>> Timer('(lambda args: (type(args[0]), type(args[1])))((1, 2))').timeit() 3.246841148805288
>>> Timer('(lambda a, b: (type(a), type(b)))(1, 2)').timeit()
2.6581895185003077


Jacek said:
I have a multiple disptacher which, conceptually, looks something like
this:



class Multimethod:

def __init__(self):
self.methods = {}

def __call__(self, *args):
return self.methods[tuple(map(type, args))](*args)

def add_method(self, method, *types):
self.methods[types] = method

foo = Multimethod()

def fooii(a, b):
print "We got two ints"

def foosb(a, b):
print "We got a string and a bar"

class bar(object): pass

foo.add_method(fooii, int, int)
foo.add_method(foosb, str, bar)

foo(1,2)
foo("a string", bar())



My actual code looks nothing like this[*], because it contains a
number of optimizations, and addresses some domain-specific
considerations which are not relevant in this context; but the code
shown should highlight the salient points.

Profiling suggests that "tuple(map(type, args))" is taking a
significant proportion of time in certain critical loops.

Do you have any suggestions about how to make in run faster? (Either
by optimizing "tuple(map(type, args)", or by using a completely
different organization for the whole thing. There is almost certainly
no point in addressing any other implementation detail[*] of what is
shown here, as it is likely to be absent from my real code.)


[*] For example, there is no __call__ in my implementation; it's
implementeted as a closure.
 
J

Jacek Generowicz

Howard Stearns said:
In your Multimethods, is the number of args fixed and known at the time
you first call Multimethod(), or do you need to be able to support dispatching
based on the number as well as the type of the arguments?

These multimethods are proxies for C++ functions, so the number of
arguments is variable.

Thanks for your ideas though: it surprised me to see just how much
those timings differ.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top