Speed-up for loops

M

Michael Kreim

Sep 2, 2010

#1

Hi,

I was comparing the speed of a simple loop program between Matlab and
Python.

My Codes:
$ cat addition.py
imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a

$ cat addition.m
imax = 1e9;
a = 0;
for i=0:imax-1
a = a + 10;
end
disp(a);
exit;

The results look like this:
$ /usr/bin/time --verbose python addition.py
10000000000
Command being timed: "python addition.py"
User time (seconds): 107.30
System time (seconds): 0.08
Percent of CPU this job got: 97%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:50.09
[...]

$ /usr/bin/time --verbose matlab -nodesktop -nosplash -r "addition"
[...]
1.0000e+10
Command being timed: "matlab -nodesktop -nosplash -r addition"
User time (seconds): 7.65
System time (seconds): 0.18
Percent of CPU this job got: 94%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.25
[...]

Unfortunately my Python Code was much slower and I do not understand why.

Are there any ways to speed up the for/xrange loop?
Or do I have to live with the fact that Matlab beats Python in this example?

Thanks a lot for your answers.

With best regards,

Michael

P

Peter Otten

Sep 2, 2010

#2

Michael said:
I was comparing the speed of a simple loop program between Matlab and
Python.

My Codes:
$ cat addition.py
imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a

Are there any ways to speed up the for/xrange loop?

Move it into a function; this turns a and i into local variables.

def f():
imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a
f()

Or do I have to live with the fact that Matlab beats Python in this
example?

I think so.

M

Michael Kreim

Sep 2, 2010

#3

Peter said:
Move it into a function; this turns a and i into local variables.

def f():
imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a
f()

Wow. It is still slower than Matlab, but your suggestion speeds up the
code by ca 50%.
But I do not understand why the change of a global to a local variable
gives such a big difference.

$ cat addition.py
imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a

$ cat additionOtten.py
def f():
imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a
f()

$ /usr/bin/time --verbose python addition.py
10000000000
Command being timed: "python addition.py"
User time (seconds): 110.52
System time (seconds): 0.01
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:52.76
[...]

$ /usr/bin/time --verbose python additionOtten.py
10000000000
Command being timed: "python additionOtten.py"
User time (seconds): 56.38
System time (seconds): 0.00
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:56.64
[...]

P

Peter Otten

Sep 2, 2010

#4

Michael said:
Wow. It is still slower than Matlab, but your suggestion speeds up the
code by ca 50%.
But I do not understand why the change of a global to a local variable
gives such a big difference.

Basically the local namespace is a C array where accessing an item is just
pointer arithmetic while the global namespace is a Python dictionary.

There may be optimisations for the latter. If you can read C have a look at
the LOAD/STORE_FAST and LOAD/STORE_GLOBAL implementations for the gory
details:

http://svn.python.org/view/python/trunk/Python/ceval.c?view=markup

Peter

H

Hrvoje Niksic

Sep 2, 2010

#5

Michael Kreim said:
Are there any ways to speed up the for/xrange loop?
Or do I have to live with the fact that Matlab beats Python in this
example?

To a point, yes. However, there are things you can do to make your
Python code go faster. One has been pointed out by Peter.

Another is that Python treats numbers as regular heap objects, so
creating a bunch of unneeded integers by xrange slows things down
(despite allocation of integer objects being heavily optimized). For
this reason, you may want to change xrange(1000000000) to
itertools.repeat(None, 1000000000).

$ python -m timeit -s 'from itertools import repeat' 'for _ in repeat(None, 100000): pass'
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s 'from itertools import repeat' 'for _ in xrange(100000): pass'
100 loops, best of 3: 2.43 msec per loop

N

Nobody

Sep 2, 2010

#6

I was comparing the speed of a simple loop program between Matlab and
Python.

imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a

Are there any ways to speed up the for/xrange loop?

Sure; the above can be reduced to just:

print imax * 10

More seriously, if you're comparing against Matlab, you should look at
NumPy. If there's a reasonably direct approach using NumPy, it will be
much quicker than a Python "for" loop (in a sense, NumPy is a library of
useful "for" loops implemented in C).

Even a fairly indirect NumPy approach is often quicker than pure Python.

P

Philip Bloom

Sep 2, 2010

#7

Uh.

Try:
Imax=1000000000
a=0
i=0
While(i<imax):
a= a+10
i=i+1
print a

I suspect you will find it is way faster than using range or xrange for
large numbers and map far more closely in the final result to what you
are doing on matlab's side. At least last I checked, xrange and range
both involve iterating through an array, which is much slower in all
cases than just doing an int vs int compare (which is what your matlab
is doing).

-----Original Message-----
From: [email protected]
[mailto

[email protected]] On Behalf Of
Nobody
Sent: Thursday, September 02, 2010 9:05 AM
To: (e-mail address removed)
Subject: Re: Speed-up for loops

I was comparing the speed of a simple loop program between Matlab and
Python.

imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a

Are there any ways to speed up the for/xrange loop?

Sure; the above can be reduced to just:

print imax * 10

More seriously, if you're comparing against Matlab, you should look at
NumPy. If there's a reasonably direct approach using NumPy, it will be
much quicker than a Python "for" loop (in a sense, NumPy is a library of
useful "for" loops implemented in C).

Even a fairly indirect NumPy approach is often quicker than pure Python.

--
http://mail.python.org/mailman/listinfo/python-list

______________________________________________________________________
This email has been scanned by the MessageLabs
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the MessageLabs
______________________________________________________________________

P

Peter Otten

Sep 2, 2010

#8

Philip said:
Uh.

Try:
Imax=1000000000
a=0
i=0
While(i<imax):
a= a+10
i=i+1
print a

I suspect you will find it is way faster than using range or xrange for
large numbers and map far more closely in the final result to what you
are doing on matlab's side. At least last I checked, xrange and range
both involve iterating through an array, which is much slower in all
cases than just doing an int vs int compare (which is what your matlab
is doing).

How did you check?

$ python -m timeit "for i in xrange(1000000): pass"
10 loops, best of 3: 47.5 msec per loop
$ python -m timeit "i = 0" "while i < 1000000: i += 1"
10 loops, best of 3: 152 msec per loop
$

So an empty while loop takes about three times as long as the equivalent for
loop. Also:

"""
class xrange(object)
| xrange([start,] stop[, step]) -> xrange object
|
| Like range(), but instead of returning a list, returns an object that
| generates the numbers in the range on demand. For looping, this is
| slightly faster than range() and more memory efficient.
"""

Peter

B

BartC

Sep 3, 2010

#9

I was comparing the speed of a simple loop program between Matlab and
Python.

My Codes:
$ cat addition.py
imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a

Unfortunately my Python Code was much slower and I do not understand why.

Are there any ways to speed up the for/xrange loop?
Or do I have to live with the fact that Matlab beats Python in this
example?

I'm not sure the Python developers were interested in getting fast loops.

For-loops which iterate between two numbers are amongst the easiest things
to make fast in a language. Yet originally you had to use:

for i in range(N):

which (if I understood correctly) actually created a list of N objects,
populated it with the values 0, 1, 2...N-1 (presumably using a more sensible
loop), then iterated between the values of the list!

So Python had the distinction of being one of the slowest languages in which
to do nothing (ie. running an empty loop).

S

Stefan Behnel

Sep 3, 2010

#10

BartC, 03.09.2010 22:17:

for i in range(N):

which (if I understood correctly) actually created a list of N objects,
populated it with the values 0, 1, 2...N-1 (presumably using a more
sensible loop), then iterated between the values of the list!

I guess what applies here is "special cases aren't special enough to break
the rules". The performance is good enough in most cases, it only hurts
when the range is large and the loop body is small in comparison, such as
in the most obvious stupid benchmarks.

Also, xrange() is a pretty old addition the the language and now replaces
range() in Python 3.

Stefan

D

Dennis Lee Bieber

Sep 4, 2010

#11

I'm not sure the Python developers were interested in getting fast loops.

For-loops which iterate between two numbers are amongst the easiest things
to make fast in a language. Yet originally you had to use:

for i in range(N):

which (if I understood correctly) actually created a list of N objects,
populated it with the values 0, 1, 2...N-1 (presumably using a more sensible
loop), then iterated between the values of the list!

In those languages in which a for loop cycles over a range of
integral values (or even floating point values) one often ends up with
code that then performs subscripting to access the real goal of the
loop...

The Python for loop is innately focused on giving one an object from
some iterable (list, tuple, dictionary, user-defined...) with which one
can directly process... Compare {pseudo-code}:

for item in world_objects:
item.update(simulation_time)

to

for i in range(len(world_objects)):
world_objects.update(simulation_time)

Those languages that have optimized numeric-only (integer-only in
some cases) for loops do not support the clean interface of the first
sample and force all access to look like the latter.

S

Steven D'Aprano

Sep 5, 2010

#12

I'm not sure the Python developers were interested in getting fast
loops.

For-loops which iterate between two numbers are amongst the easiest
things to make fast in a language. Yet originally you had to use:

for i in range(N):

I don't have any versions of Python prior to version 1.5, but as far back
as that there was always a choice between creating a list with range()
and a lazy iterator with xrange().

which (if I understood correctly) actually created a list of N objects,
populated it with the values 0, 1, 2...N-1 (presumably using a more
sensible loop), then iterated between the values of the list!

By "more sensible", do you mean "in C code"? If so, then you are correct.

So Python had the distinction of being one of the slowest languages in
which to do nothing (ie. running an empty loop).

Nonsense.

[steve@sylar ~]$ time python test.py

real 0m3.441s
user 0m2.969s
sys 0m0.024s
[steve@sylar ~]$ time perl test.pl

real 0m3.490s
user 0m2.722s
sys 0m0.011s
[steve@sylar ~]$ time ruby test.rb

real 0m11.875s
user 0m6.740s
sys 0m3.995s

The difference between an empty loop in Python and Perl is insignificant,
and much faster than Ruby (at least the specific version of Ruby
installed on my machine, 1.8.6).

And if you want to see the code I ran:

[steve@sylar ~]$ cat test.*
# perl
for ($i = 0; $i < 10_000_000; ++$i) {
1;
}
# python
for i in xrange(10000000):
1
# ruby
for i in 0...10000000
1
end

Just for comparisons' sake:

[steve@sylar ~]$ gpc empty_test.p
[steve@sylar ~]$ time ./a.out

real 0m0.106s
user 0m0.070s
sys 0m0.004s
[steve@sylar ~]$ cat empty_test.p
program main(input, output);
var
i: integer;
begin
for i := 0 to 10000000 do
begin
end;
end.

Of course, a real optimizing compiler would realise that the Pascal code
did nothing at all, and compile it all away to an empty a.out file...

S

Stefan Behnel

Sep 5, 2010

#13

Steven D'Aprano, 05.09.2010 17:00:

Of course, a real optimizing compiler would realise that the Pascal code
did nothing at all, and compile it all away to an empty a.out file...

Which is just one of the reasons why this kind if "benchmark" provides no
insight into anything that should have an impact on a programmer's daily job.

Stefan

Speed of Python	8	Sep 7, 2007
Speed Test for Comparison	14	Dec 8, 2011
Speed?	4	Jan 3, 2004
Use of Python with GDAL. How to speed up ?	8	Mar 17, 2006
Python subprocesses experience mysterious delay in receiving stdin EOF	0	Feb 9, 2011
looping in array vs looping in a dic	5	Sep 20, 2012
Speed ain't bad	14	Dec 31, 2004
Why Ruby 1.9.2 double the speed on 2.53GHz vs 2.2GHz Core 2 Duo?	2	Nov 1, 2010

Michael Kreim

Peter Otten

Michael Kreim

Peter Otten

Hrvoje Niksic

Nobody

Philip Bloom

Peter Otten

BartC

Stefan Behnel

Dennis Lee Bieber

Steven D'Aprano

Stefan Behnel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads