Optimizing Inner Loop Copy

Mark E. Fenner · Aug 17, 2006

Hello all,

I have a code where my inner loop looks like:

allNew = []
for params in cases:
newObj = copy(initialObject)
newObj.modify(params)
allNew.append(newObj)
return allNew

and the copy is taking the majority (42%) of my execution time.
So, I'd like to speed up my copy. I had an explicit copy method that did
what was needed and returned a new object, but this was quite a bit slower
than using the standard lib copy.copy().

Here's my class of the objects being copied:

class Rule(list):
def __init__(self, lhs=None, rhs=None, nClasses=0, nCases=0):
self.nClasses = nClasses
self.nCases = nCases

if lhs is not None:
self.extend(lhs)

if rhs is None:
self.rhs=tuple()
else:
self.rhs=rhs

# as I mentioned, Rule.copy is slower than copy.copy
def copy(self):
r = Rule(self,
self.rhs,
self.nClasses,
self.nCases)
return r

Basically, the left hand side of a rule (e.g., a rule is a & b & c -> d) is
self and three other pieces of information are kept around, two ints and a
right hand side (e.g., (d, 5) meaning that d takes the value five ... all
the LHS elements are tuples as well).

As far as optimization goes, it seems I could write a custom __copy__
method, but a c.l.python search (sorry, forgot to bookmark the reference)
seemed to indicate that a special purpose __copy__ that copies all the
objects's attributes would lose out to the generic copy.copy(). This
doesn't seem quite right ... ideas?

Next option would be to rewrite the whole class in C. I haven't done C
extensions before and since most of the class is already a builtin list,
that seems like a lot of work for little gain. Would Pyrex be a help here?
I don't think the relevant typing information (on the ints alone, I guess)
would get me very far ... and again, recoding the list stuff (by hand or by
Pyrex) in C is not going to get any gains (it might slow things down?).

It seems to me that shallow copies (of objects built from immutable types)
should be able to be speed up by memory mapping (somehow!). The top-level
list/rule should be the only new reference that needs to be created.

Any quick answers or most likely directions to explore, would be greatly
appreciated.

Regards,
Mark

Michael Spencer · Aug 18, 2006

Mark said:
and the copy is taking the majority (42%) of my execution time.
So, I'd like to speed up my copy. I had an explicit copy method that did
what was needed and returned a new object, but this was quite a bit slower
than using the standard lib copy.copy().

How are you measuring? It seems to me that your Rule.copy method is a lot faster
than copy.copy:

where shell.timefunc is:
def _get_timer():
if sys.platform == "win32":
return time.clock
else:
return time.time
return

def timefunc(func, *args, **kwds):
timer = _get_timer()
count, totaltime = 0, 0
while totaltime < 0.5:
t1 = timer()
res = func(*args, **kwds)
t2 = timer()
totaltime += (t2-t1)
count += 1
if count > 1000:
unit = "usec"
timeper = totaltime * 1000000 / count
else:
unit = "msec"
timeper = totaltime * 1000 / count
return "%s(...) %s iterations, %.2f%s per call" % \
(func.__name__, count, timeper, unit)

Michael

Mark E. Fenner · Aug 18, 2006

Michael said:
How are you measuring? It seems to me that your Rule.copy method is a lot
faster than copy.copy:

'copy(...) 4498 iterations, 111.17usec per call'

Michael

Michael,

Thank you. I misinterpreted something somewhere ... the program is indeed
faster using Rule.copy. I need to look over the rest of my profiling data,
to see if I screwed up elsewhere as well.

So, how to optimize Rule.copy()?

Regards,
Mark

John Machin · Aug 18, 2006

Mark said:
Here's my class of the objects being copied:

Here's a couple of things that might help speed up your __init__
method, and hence your copy method:

class Rule(list):
def __init__(self, lhs=None, rhs=None, nClasses=0, nCases=0):

def __init__(self, lhs=None, rhs=(), nClasses=0, nCases=0):

self.nClasses = nClasses
self.nCases = nCases

if lhs is not None:
self.extend(lhs)

what does the extend method do? If it is small, perhaps inline a copy
of its code here.

if rhs is None:
self.rhs=tuple()
else:
self.rhs=rhs

Replace the above 4 lines by:
self.rhs = rhs

# as I mentioned, Rule.copy is slower than copy.copy
def copy(self):
r = Rule(self,
self.rhs,
self.nClasses,
self.nCases)
return r

HTH,
John

Mark E. Fenner · Aug 18, 2006

John said:
Here's a couple of things that might help speed up your __init__
method, and hence your copy method:

def __init__(self, lhs=None, rhs=(), nClasses=0, nCases=0):

what does the extend method do? If it is small, perhaps inline a copy
of its code here.

Replace the above 4 lines by:
self.rhs = rhs

HTH,
John

John,

Thanks. I thought of those at the same you did! I also incorporated one
other use of the default=() + no conditional:

class Rule(list):
def __init__(self, lhs=(), rhs=(), nClasses=0, nCases=0):
self.nClasses = nClasses
self.nCases = nCases
self.extend(lhs) # note, self is a list so this is list.extend
self.rhs=rhs

def copy(self):
return Rule(self,
self.rhs,
self.nClasses,
self.nCases)

Mark E. Fenner · Aug 18, 2006

Mark said:
John,

Thanks. I thought of those at the same you did! I also incorporated one
other use of the default=() + no conditional:

class Rule(list):
def __init__(self, lhs=(), rhs=(), nClasses=0, nCases=0):
self.nClasses = nClasses
self.nCases = nCases
self.extend(lhs) # note, self is a list so this is list.extend
self.rhs=rhs

def copy(self):
return Rule(self,
self.rhs,
self.nClasses,
self.nCases)

Actually, I also removed the "passthrough" that copy was doing and just
called the constructor directly. So, at the top level code, we have:

allNew = []
for params in cases:
# newobj = initialObject.copy()
newObj = Rule(initialObject, initialObject.rhs,
initialObject.nClasses,
initialObject.nCases)
newObj.modify(params)
allNew.append(newObj)
return allNew

Regards,
Mark

Michael Spencer · Aug 18, 2006

Mark said:
Michael,

Thank you. I misinterpreted something somewhere ... the program is indeed
faster using Rule.copy. I need to look over the rest of my profiling data,
to see if I screwed up elsewhere as well.

So, how to optimize Rule.copy()?

Regards,
Mark

You're not doing much in the loop, so there isn't much to change:

def copy2(self):
new_rule = list.__new__(Rule)
new_rule[:] = self
new_rule.__dict__.update(self.__dict__)
return new_rule

measures slightly (5-10%) faster for me - but hardly worth the obscurity

You could bind allNew.append outside the loop, and perhaps merge the Rule.modify
method into Rule.copy to save a call.

Given that a no-op method call, takes 2.7 usec on my machine, and you have 3 or
4 method calls per copy, I suspect you're near the limit of a pure-Python
solution with this object structure.
'nop(...) 185488 iterations, 2.70usec per call'

Assuming you're going to be doing some material work with these rules, I wonder
if it's actually worth your the effort to make the copying go faster.

(Sorry no better ideas)

Michael

danielx · Aug 18, 2006

Mark said:
Mark said:

John,

Thanks. I thought of those at the same you did! I also incorporated one
other use of the default=() + no conditional:

class Rule(list):
def __init__(self, lhs=(), rhs=(), nClasses=0, nCases=0):
self.nClasses = nClasses
self.nCases = nCases
self.extend(lhs) # note, self is a list so this is list.extend
self.rhs=rhs

def copy(self):
return Rule(self,
self.rhs,
self.nClasses,
self.nCases)

Click to expand...

Actually, I also removed the "passthrough" that copy was doing and just
called the constructor directly. So, at the top level code, we have:

allNew = []
for params in cases:
# newobj = initialObject.copy()
newObj = Rule(initialObject, initialObject.rhs,
initialObject.nClasses,
initialObject.nCases)
newObj.modify(params)
allNew.append(newObj)
return allNew

Regards,
Mark

I'm not sure how much this will help, but another thing you can do is
put this line before the "for":

append = allNew.append

Then, replace the last line in the loop with

append(newObj)

Check out this doc for more info on optimizing Python, and the section
which talks about eliminating dots:

http://wiki.python.org/moin/PythonSpeed/PerformanceTips
http://wiki.python.org/moin/PythonS...head-aa6c07c46a630a2fa10bd6502510e532806f1f62

Mark E. Fenner · Aug 18, 2006

danielx said:
Mark said:

John Machin wrote:

Mark E. Fenner wrote:

Here's my class of the objects being copied:

Here's a couple of things that might help speed up your __init__
method, and hence your copy method:

class Rule(list):
def __init__(self, lhs=None, rhs=None, nClasses=0, nCases=0):

def __init__(self, lhs=None, rhs=(), nClasses=0, nCases=0):

self.nClasses = nClasses
self.nCases = nCases

if lhs is not None:
self.extend(lhs)
what does the extend method do? If it is small, perhaps inline a copy
of its code here.

if rhs is None:
self.rhs=tuple()
else:
self.rhs=rhs

Replace the above 4 lines by:
self.rhs = rhs

HTH,
John

John,

Thanks. I thought of those at the same you did! I also incorporated
one other use of the default=() + no conditional:

class Rule(list):
def __init__(self, lhs=(), rhs=(), nClasses=0, nCases=0):
self.nClasses = nClasses
self.nCases = nCases
self.extend(lhs) # note, self is a list so this is list.extend
self.rhs=rhs

def copy(self):
return Rule(self,
self.rhs,
self.nClasses,
self.nCases)

Click to expand...

Actually, I also removed the "passthrough" that copy was doing and just
called the constructor directly. So, at the top level code, we have:

allNew = []
for params in cases:
# newobj = initialObject.copy()
newObj = Rule(initialObject, initialObject.rhs,
initialObject.nClasses,
initialObject.nCases)
newObj.modify(params)
allNew.append(newObj)
return allNew

Regards,
Mark

Click to expand...

I'm not sure how much this will help, but another thing you can do is
put this line before the "for":

append = allNew.append

Then, replace the last line in the loop with

append(newObj)

Check out this doc for more info on optimizing Python, and the section
which talks about eliminating dots:

http://wiki.python.org/moin/PythonSpeed/PerformanceTips

http://wiki.python.org/moin/PythonS...head-aa6c07c46a630a2fa10bd6502510e532806f1f62

I've read these, but they do yield additional nuggets upon rereading (does
that make them generators?). In reality, my loop has several lines of code
instead of a newObj.modify(args) method. I localised the obvious functions
and attributes that I was using. In reality, the Rule.copy/Rule.__init__
is swamping everything else ... humm, unless ... I'll get back to you.

Regards,
Mark

Paul McGuire · Aug 18, 2006

Mark E. Fenner said:
Hello all,

Here's my class of the objects being copied:

class Rule(list):
def __init__(self, lhs=None, rhs=None, nClasses=0, nCases=0):
self.nClasses = nClasses
self.nCases = nCases

Ok, so here are some "bigger picture" kind of questions.

1. Do you need to construct all these Rule objects in the first place? One
of the optimizations I did in pyparsing was to pre-construct exception
objects, and to reuse them over and over instead of creating them once,
raising them, and then discarding them. (There is a trade-off with
thread-safety when you do this, but we deal with that separately.) This
gave me about a 30% reduction in processing time, since pyparsing does a lot
of exception raising/catching internally. So you might look at your larger
app and see if you could memoize your Rule objects or something similar, and
avoid the whole object create/init/delete overhead in the first place.

2. More of an OO question than a performance question, but why does Rule
inherit from list in the first place? Is Rule *really* a list, or is it
just implemented using a list? If the latter, then you might look at moving
Rule's list contents into an instance variable, maybe something called
self.contents. Then you can be more explicit about appending to
self.contents when you want to add lhs to the contents of the Rule. For
example, why are you calling extend, instead of just using slice notation to
copy lhs? Ah, because then you would have to write something like "self =
lhs[:]", which doesn't look like it will work very well. On the other hand,
if you use containment/delegation instead of inheritance, you can use the
more explicit "self.contents = lhs[:]". In fact now you have much more
control over the assemblage of rules from other rules.

In the original post, you state: "the left hand side of a rule (e.g., a rule
is a & b & c -> d) is self and three other pieces of information are kept
around, two ints and a right hand side"

What other aspects of list are you using in Rule? Are you iterating over
its contents somewhere? Then implement __iter__ and return
iter(self.contents). Are you using "if rule1:" and implicitly testing if
its length is nonzero? Then implement __nonzero__ and return
operator.truth(self.contents). Do you want to build up rules incrementally
using += operator? Then implement __iadd__ and do
self.contents.extend(other.contents), or self.contents += other.contents[:]
(no need to test for None-ness of other.contents, we ensure in our
constructor that self.contents is always a list, even if its an empty one).

Save inheritance for the true "is-a" relationships among your problem domain
classes. For instance, define a base Rule class, and then you can extend it
with things like DeterministicRule, ProbabilisticRule, ArbitraryRule, etc.
But don't confuse "is-implemented-using-a" with "is-a".

-- Paul

Mark E. Fenner · Aug 18, 2006

Paul said:
Ok, so here are some "bigger picture" kind of questions.

1. Do you need to construct all these Rule objects in the first place?

Sorry, I had a nice response written out ... and I lost it. Here's the
summary. Unfortunately, yes, the algorithm this code implements goes to
great lengths to ensure that each possible Rule is generated only once.

2. More of an OO question than a performance question, but why does Rule
inherit from list in the first place? Is Rule *really* a list, or is it
just implemented using a list?

It is the later. And it is a moderate abuse to say that the Rule _isa_
list. Aside from the two integer statistics (nClasses, nCounts), the RHS
and LHS can be merged and we can say that the rule is [(tuple1), ...,
(tupleN), (tupleRHS)] with the RHS being Rule[-1]. However, in that case,
I went for OO principles and kept the two logical parts separate.

B/c of the frequency of list operations, it's better to be able to use
thisRule.append() or thisRule then thisRule.lhs.append() or
thisRule.lhs. Where speed doesn't matter, I have the more appropriate
thisRule.addLHS().

Yes, I could do "dot removal" (i.e., localize the attribute lookups), but
when we iterate over a whole set of Rules, this still leave the
localization assignment inside an inner loop.

If the latter, then you might look at
moving Rule's list contents into an instance variable, maybe something
called
self.contents. Then you can be more explicit about appending to
self.contents when you want to add lhs to the contents of the Rule. For
example, why are you calling extend, instead of just using slice notation
to
copy lhs? Ah, because then you would have to write something like "self =
lhs[:]", which doesn't look like it will work very well. On the other
hand, if you use containment/delegation instead of inheritance, you can
use the
more explicit "self.contents = lhs[:]". In fact now you have much more
control over the assemblage of rules from other rules.

Click to expand...

Fortunately, the Rules are only assembled in one way ... appending to that
copy that started the thread. Incidentally, by speeding up Rule.__init__
and inlining it (instead of calling Rule.copy which called Rule.__init__),
that is no longer my bottleneck *phew*.

In the original post, you state: "the left hand side of a rule (e.g., a
rule is a & b & c -> d) is self and three other pieces of information are
kept around, two ints and a right hand side"

What other aspects of list are you using in Rule? Are you iterating over
its contents somewhere? Then implement __iter__ and return
iter(self.contents). Are you using "if rule1:" and implicitly testing if
its length is nonzero? Then implement __nonzero__ and return
operator.truth(self.contents). Do you want to build up rules
incrementally
using += operator? Then implement __iadd__ and do
self.contents.extend(other.contents), or self.contents +=
other.contents[:] (no need to test for None-ness of other.contents, we
ensure in our constructor that self.contents is always a list, even if its
an empty one).

Click to expand...

Well, I do several of these things ... many of them inside inner loops
(there's a sequence of inner loops ... the algorithms choice, not mine).
Each of these that are implemented (and not left to Python's builtin list
methods), is an extra level of function call overhead. This has been a
secondary project for me over a number of years ... so I have to look at my
early revisions, but I think my original architecture was an explicit
self.lhs = [] attribute. And it was REALLY slow. When Python allowed
inheritence from builtin objects, I tried it and got a nice win (I think!).

Save inheritance for the true "is-a" relationships among your problem
domain
classes. For instance, define a base Rule class, and then you can extend
it with things like DeterministicRule, ProbabilisticRule, ArbitraryRule,
etc. But don't confuse "is-implemented-using-a" with "is-a".

Click to expand...

Well, I know the difference and choose to break the rule intentionally
*chuckle*. After the initial prototype, speed trumped OO.

-- Paul

Click to expand...

Thanks for your comments, Paul.

Regards,
Mark

Paul McGuire · Aug 18, 2006

Mark E. Fenner said:
Well, I know the difference and choose to break the rule intentionally
*chuckle*. After the initial prototype, speed trumped OO.

Thanks for your comments, Paul.

Regards,
Mark

hmm... maybe I should consider making ParseResults inherit from list...

-- Paul

danielx · Aug 19, 2006

Mark said:
Paul said:

Ok, so here are some "bigger picture" kind of questions.

1. Do you need to construct all these Rule objects in the first place?

Click to expand...

Sorry, I had a nice response written out ... and I lost it. Here's the
summary. Unfortunately, yes, the algorithm this code implements goes to
great lengths to ensure that each possible Rule is generated only once.

2. More of an OO question than a performance question, but why does Rule
inherit from list in the first place? Is Rule *really* a list, or is it
just implemented using a list?

Click to expand...

It is the later. And it is a moderate abuse to say that the Rule _isa_
list. Aside from the two integer statistics (nClasses, nCounts), the RHS
and LHS can be merged and we can say that the rule is [(tuple1), ...,
(tupleN), (tupleRHS)] with the RHS being Rule[-1]. However, in that case,
I went for OO principles and kept the two logical parts separate.

B/c of the frequency of list operations, it's better to be able to use
thisRule.append() or thisRule then thisRule.lhs.append() or
thisRule.lhs. Where speed doesn't matter, I have the more appropriate
thisRule.addLHS().

Yes, I could do "dot removal" (i.e., localize the attribute lookups), but
when we iterate over a whole set of Rules, this still leave the
localization assignment inside an inner loop.

Then maybe you can assign an unbound method (eg, "append =
list.append"), then pass in your instance as the first argument (eg
"append( myList, newElement )"). Even if you localize a bound method,
it should save some time, no?

If the latter, then you might look at
moving Rule's list contents into an instance variable, maybe something
called
self.contents. Then you can be more explicit about appending to
self.contents when you want to add lhs to the contents of the Rule. For
example, why are you calling extend, instead of just using slice notation
to
copy lhs? Ah, because then you would have to write something like "self =
lhs[:]", which doesn't look like it will work very well. On the other
hand, if you use containment/delegation instead of inheritance, you can
use the
more explicit "self.contents = lhs[:]". In fact now you have much more
control over the assemblage of rules from other rules.

Click to expand...

Fortunately, the Rules are only assembled in one way ... appending to that
copy that started the thread. Incidentally, by speeding up Rule.__init__
and inlining it (instead of calling Rule.copy which called Rule.__init__),
that is no longer my bottleneck *phew*.

In the original post, you state: "the left hand side of a rule (e.g., a
rule is a & b & c -> d) is self and three other pieces of information are
kept around, two ints and a right hand side"

What other aspects of list are you using in Rule? Are you iterating over
its contents somewhere? Then implement __iter__ and return
iter(self.contents). Are you using "if rule1:" and implicitly testing if
its length is nonzero? Then implement __nonzero__ and return
operator.truth(self.contents). Do you want to build up rules
incrementally
using += operator? Then implement __iadd__ and do
self.contents.extend(other.contents), or self.contents +=
other.contents[:] (no need to test for None-ness of other.contents, we
ensure in our constructor that self.contents is always a list, even if its
an empty one).

Click to expand...

Well, I do several of these things ... many of them inside inner loops
(there's a sequence of inner loops ... the algorithms choice, not mine).
Each of these that are implemented (and not left to Python's builtin list
methods), is an extra level of function call overhead. This has been a
secondary project for me over a number of years ... so I have to look at my
early revisions, but I think my original architecture was an explicit
self.lhs = [] attribute. And it was REALLY slow. When Python allowed
inheritence from builtin objects, I tried it and got a nice win (I think!).

Save inheritance for the true "is-a" relationships among your problem
domain
classes. For instance, define a base Rule class, and then you can extend
it with things like DeterministicRule, ProbabilisticRule, ArbitraryRule,
etc. But don't confuse "is-implemented-using-a" with "is-a".

Click to expand...

Well, I know the difference and choose to break the rule intentionally
*chuckle*. After the initial prototype, speed trumped OO.

-- Paul

Click to expand...

Thanks for your comments, Paul.

Regards,
Mark

Click to expand...

optimizing large dictionaries	10	Jan 15, 2009
copy object?	1	Sep 1, 2009
Why the destructor executes after its friend operator (in which thereis a copy constructor) runs	5	Jan 5, 2013
Pretty Scheme, ??? Python	10	Jul 2, 2007
Just for fun: Countdown numbers game solver	45	Jan 20, 2008
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Tkinter polling example: file copy with progress bar	7	Dec 12, 2010
Q about assignment and references	3	Apr 7, 2010

Optimizing Inner Loop Copy

Mark E. Fenner

Michael Spencer

Mark E. Fenner

John Machin

Mark E. Fenner

Mark E. Fenner

Michael Spencer

danielx

Mark E. Fenner

Paul McGuire

Mark E. Fenner

Paul McGuire

danielx

Ask a Question

Similar Threads

Staff online

Members online

Forum statistics

Latest Threads