Just wondering

G

Gediminas Kregzde

Hello,

I'm Vilnius college II degree student and last semester our teacher
introduced us to python
I've used to program with Delphi, so I very fast adopted to python

Now I'm developing cross platform program and use huge amounts of
data. Program is needed to run as fast as it coud. I've read all tips
about geting performance, but got 1 bug: map function is slower than
for loop for about 5 times, when using huge amounts of data.
It is needed to perform some operations, not to return data.

I'm adding sample code:
from time import time

def doit(i):
pass

def main():
a = [0] * 10000000
t = time()
map(doit, a)
print "map time: " + str(time() - t)

def main2():
t = time()
a = [0] * 10000000
for i in a:
pass
print "loop time: " + str(time() - t)

main() # takes approximately 5x times longer than main2()
main2()

I'm wondering were is catch?

I'm using python 2.6 on windows xp sp2 machine

P.S. Sorry for my broken English
 
B

bearophileHUGS

Gediminas Kregzde:
map function is slower than
for loop for about 5 times, when using huge amounts of data.
It is needed to perform some operations, not to return data.

Then you are using map() for the wrong purpose. map() purpose is to
build a list of things. Use a for loop.

Bye,
bearophile
 
M

Marco Mariani

Gediminas Kregzde wrote:

def doit(i):
pass

def main():
a = [0] * 10000000
t = time()
map(doit, a)
print "map time: " + str(time() - t)

Here you are calling a function ten million times, build a list with of
ten million None results, then throw it away.

def main2():
t = time()
a = [0] * 10000000
for i in a:
pass
print "loop time: " + str(time() - t)

Here you do nothing but iterating 'i' over the 'a' list.

main() # takes approximately 5x times longer than main2()
main2()

I'm wondering were is catch?

Function calls are not free in python. They cost a lot more than they do
in C, Delphi or other languages.
 
P

Peter Otten

Gediminas said:
Now I'm developing cross platform program and use huge amounts of
data. Program is needed to run as fast as it coud. I've read all tips
about geting performance, but got 1 bug: map function is slower than
for loop for about 5 times, when using huge amounts of data.
It is needed to perform some operations, not to return data.

I'm adding sample code:
from time import time

def doit(i):
pass

def main():
a = [0] * 10000000
t = time()
map(doit, a)
print "map time: " + str(time() - t)

def main2():
t = time()
a = [0] * 10000000
for i in a:
pass
print "loop time: " + str(time() - t)

main() # takes approximately 5x times longer than main2()
main2()

I'm wondering were is catch?

I'm using python 2.6 on windows xp sp2 machine

Two factors:

(1) building another throwaway list and
(2) function call overhead for calling doit()

You can avoid (1) by using filter() instead of map() and verify (2) by
changing the loop to

for i in a:
doit(i)

Peter
 
J

Jaime Fernandez del Rio

(1) building another throwaway list and
(2) function call overhead for calling doit()

You can avoid (1) by using filter() instead of map()

Are you sure of this? My python returns, when asked for help(filter) :

Help on built-in function filter in module __builtin__:

filter(...)
filter(function or None, sequence) -> list, tuple, or string

Return those items of sequence for which function(item) is true. If
function is None, return the items that are true. If sequence is a tuple
or string, return the same type, else return a list.

I'm afraid there is no builtin function to do an inplace modification
of a sequence...
 
B

Bruno Desthuilliers

Gediminas Kregzde a écrit :
Hello,

I'm Vilnius college II degree student and last semester our teacher
introduced us to python
I've used to program with Delphi, so I very fast adopted to python

Now I'm developing cross platform program and use huge amounts of
data. Program is needed to run as fast as it coud. I've read all tips
about geting performance, but got 1 bug: map function is slower than
for loop

Why should it be a bug ?
for about 5 times,

read below for comments on your benchmark.
when using huge amounts of data.
It is needed to perform some operations, not to return data.

Then don't use map. You're uselessly building a new list (which takes
time *and* eats RAM - so on 'huge' dataset, this might even make your
system start swapping).


I'm adding sample code:
from time import time

The correct way to time code is to use the timeit module.
def doit(i):
pass

def main():
a = [0] * 10000000
t = time()
map(doit, a)
print "map time: " + str(time() - t)

def main2():
t = time()
a = [0] * 10000000
for i in a:
pass

Your benchmark is screwed. In the first case, you call doit() X times
(with X = len(a)), while in the second case you don't call it at all.
 
P

Peter Otten

Jaime said:
Are you sure of this?
I'm afraid there is no builtin function to do an inplace modification
of a sequence...

You are right.

But the OP invokes map() for the side effect of calling doit(item) for every
item in the (huge) list and then gets another list of the same length
containing only None values. by using filter() instead of map() with doit
always returning None he gets an empty result list which means less wasted
effort:
items = range(10)
map(doit, items) [None, None, None, None, None, None, None, None, None, None]
filter(doit, items)
[]

Peter
 
N

norseman

Marco said:
Gediminas Kregzde wrote:

def doit(i):
pass

def main():
a = [0] * 10000000
t = time()
map(doit, a)
print "map time: " + str(time() - t)

Here you are calling a function ten million times, build a list with of
ten million None results, then throw it away.

def main2():
t = time()
a = [0] * 10000000
for i in a:
pass
print "loop time: " + str(time() - t)

Here you do nothing but iterating 'i' over the 'a' list.

main() # takes approximately 5x times longer than main2()
main2()

I'm wondering were is catch?

Function calls are not free in python. They cost a lot more than they do
in C, Delphi or other languages.

Gediminas - You did a good job of testing the overhead. All good
programmers do that!

Steve
 
D

Dave Angel

norseman said:
Gediminas Kregzde wrote:

def doit(i):
pass

def main():
a = [0] * 10000000
t = time()
map(doit, a)
print "map time: " + str(time() - t)

Here you are calling a function ten million times, build a list with
of ten million None results, then throw it away.

def main2():
t = time()
a = [0] * 10000000
for i in a:
pass
print "loop time: " + str(time() - t)

Here you do nothing but iterating 'i' over the 'a' list.

main() # takes approximately 5x times longer than main2()
main2()

I'm wondering were is catch?

Function calls are not free in python. They cost a lot more than they
do in C, Delphi or other languages.

Gediminas - You did a good job of testing the overhead. All good
programmers do that!

Steve
If the task is to test the overhead, then the two loops need to be
equivalent. The OP's conclusion, confirmed by many of you, is that
map() is lots slower than for. But if you add in the call overhead in
the for loop, they come out within 2 percent. If you just want to
compare map() time to for time, use functions:

from time import time
def doit(i):
pass
def main():
a = [0] * 10000000
t = time()
map(doit, a)
print "map time: " + str(time() - t)

def main2():
a = [0] * 10000000
t = time()
for i in a:
doit(i)
print "loop time: " + str(time() - t)


main()
main2()

gives:
map time: 4.875
loop time: 4.79600000381
 
N

norseman

Dave said:
norseman said:
Gediminas Kregzde wrote:


def doit(i):
pass

def main():
a = [0] * 10000000
t = time()
map(doit, a)
print "map time: " + str(time() - t)

Here you are calling a function ten million times, build a list with
of ten million None results, then throw it away.


def main2():
t = time()
a = [0] * 10000000
for i in a:
pass
print "loop time: " + str(time() - t)

Here you do nothing but iterating 'i' over the 'a' list.


main() # takes approximately 5x times longer than main2()
main2()

I'm wondering were is catch?

Function calls are not free in python. They cost a lot more than they
do in C, Delphi or other languages.

Gediminas - You did a good job of testing the overhead. All good
programmers do that!

Steve
If the task is to test the overhead, then the two loops need to be
equivalent.

no they don't. see below

The OP's conclusion, confirmed by many of you, is that
map() is lots slower than for.

is it? see below

But if you add in the call overhead in
the for loop, they come out within 2 percent. If you just want to
compare map() time to for time, use functions:

from time import time
def doit(i):
pass
def main():
a = [0] * 10000000
t = time()
map(doit, a)
print "map time: " + str(time() - t)

def main2():
a = [0] * 10000000
t = time()
for i in a:
doit(i)
print "loop time: " + str(time() - t)


main()
main2()

gives:
map time: 4.875
loop time: 4.79600000381
================================================
Why do tests have to be equal? If what you are after is to test several
things just to see what they carry for overhead just do so. Whether or
not they are "equal" is a matter of choice. Doing the map() implies it
will store something. Doing the loop tells what it takes to loop. THEN,
by adding your code to each to balance things out and running for time
will tell you things. Maybe map() will win and maybe it will not for
certain objectives. Maybe loop will win under certain conditions.
Without a baseline and testing - any statement as to which is better is
simply unrealistic nonsense. The 'internals', algorithms, can often be
optimized after inspection to obtain better results. Whether or not
these optimizations will interfere with a another function (in this case
map() ) remains to be tested. Just because the minimal overhead on A is
less than B does not mean that A is the better choice for a given situation.

By the way, since any loop action actually needs some sort of counter,
be it the here to end of file or a numeric countdown, I put my timer
before def main? and check the whole test of minimal parts. People who
write compilers and/or interpreters often do strange things with
optimization concepts.

Dave's' times point out that loop has a good chance to win if the rest
of that sections code were present. By that I mean the actual function
the loop or map() were to preform in a given setting. It shows the
function call is expensive. Thus in-line code can be a better choice
when speed is needed. At least in Python and given the results of both
tests of loop above.

Also depicted is that map() requires the expensive function call and
loop does not in order to process something.



the def doit(i):
pass
should result in something like:

..
..
push data_address in R2
call address_of_doit
..
..


address_of_doit
pop data_address to R1 #see note 1
return #see note 2



note 1: not all OS/architectures implement push/pop
whatever the store/call retrieve/(store)/return may be,
it will be the same throughout for that OS/machine.

note 2: a call to a dummy will show the efficiency of calling.
normally - its overhead is quite minimal. As another stated
above - such things are indeed expensive in Python.




All this generated by someone posting a simple overhead test. WOW.
I guess I had a better teacher than I thought. :)


IMHO - Dave's loop test should have been included in the original
posting. Making three tests all together at that time. But then we all
had different teachers. Which is why the question came up.


Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top