# Speed-up for loops

Discussion in 'Python' started by Michael Kreim, Sep 2, 2010.

1. ### Michael KreimGuest

Hi,

I was comparing the speed of a simple loop program between Matlab and
Python.

My Codes:
imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a

imax = 1e9;
a = 0;
for i=0:imax-1
a = a + 10;
end
disp(a);
exit;

The results look like this:
\$ /usr/bin/time --verbose python addition.py
10000000000
Command being timed: "python addition.py"
User time (seconds): 107.30
System time (seconds): 0.08
Percent of CPU this job got: 97%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:50.09
[...]

\$ /usr/bin/time --verbose matlab -nodesktop -nosplash -r "addition"
[...]
1.0000e+10
Command being timed: "matlab -nodesktop -nosplash -r addition"
User time (seconds): 7.65
System time (seconds): 0.18
Percent of CPU this job got: 94%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.25
[...]

Unfortunately my Python Code was much slower and I do not understand why.

Are there any ways to speed up the for/xrange loop?
Or do I have to live with the fact that Matlab beats Python in this example?

With best regards,

Michael

Michael Kreim, Sep 2, 2010

2. ### Peter OttenGuest

Michael Kreim wrote:

> I was comparing the speed of a simple loop program between Matlab and
> Python.
>
> My Codes:
> \$ cat addition.py
> imax = 1000000000
> a = 0
> for i in xrange(imax):
> a = a + 10
> print a

> Are there any ways to speed up the for/xrange loop?

Move it into a function; this turns a and i into local variables.

def f():
imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a
f()

> Or do I have to live with the fact that Matlab beats Python in this
> example?

I think so.

Peter Otten, Sep 2, 2010

3. ### Michael KreimGuest

Peter Otten wrote:
> Move it into a function; this turns a and i into local variables.
>
> def f():
> imax = 1000000000
> a = 0
> for i in xrange(imax):
> a = a + 10
> print a
> f()

Wow. It is still slower than Matlab, but your suggestion speeds up the
code by ca 50%.
But I do not understand why the change of a global to a local variable
gives such a big difference.

imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a

def f():
imax = 1000000000
a = 0
for i in xrange(imax):
a = a + 10
print a
f()

\$ /usr/bin/time --verbose python addition.py
10000000000
Command being timed: "python addition.py"
User time (seconds): 110.52
System time (seconds): 0.01
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:52.76
[...]

\$ /usr/bin/time --verbose python additionOtten.py
10000000000
Command being timed: "python additionOtten.py"
User time (seconds): 56.38
System time (seconds): 0.00
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:56.64
[...]

Michael Kreim, Sep 2, 2010
4. ### Peter OttenGuest

Michael Kreim wrote:

> Peter Otten wrote:
>> Move it into a function; this turns a and i into local variables.
>>
>> def f():
>> imax = 1000000000
>> a = 0
>> for i in xrange(imax):
>> a = a + 10
>> print a
>> f()

>
> Wow. It is still slower than Matlab, but your suggestion speeds up the
> code by ca 50%.
> But I do not understand why the change of a global to a local variable
> gives such a big difference.

Basically the local namespace is a C array where accessing an item is just
pointer arithmetic while the global namespace is a Python dictionary.

There may be optimisations for the latter. If you can read C have a look at
the LOAD/STORE_FAST and LOAD/STORE_GLOBAL implementations for the gory
details:

http://svn.python.org/view/python/trunk/Python/ceval.c?view=markup

Peter

Peter Otten, Sep 2, 2010
5. ### Hrvoje NiksicGuest

Michael Kreim <> writes:

> Are there any ways to speed up the for/xrange loop?
> Or do I have to live with the fact that Matlab beats Python in this
> example?

To a point, yes. However, there are things you can do to make your
Python code go faster. One has been pointed out by Peter.

Another is that Python treats numbers as regular heap objects, so
creating a bunch of unneeded integers by xrange slows things down
(despite allocation of integer objects being heavily optimized). For
this reason, you may want to change xrange(1000000000) to
itertools.repeat(None, 1000000000).

\$ python -m timeit -s 'from itertools import repeat' 'for _ in repeat(None, 100000): pass'
1000 loops, best of 3: 1.71 msec per loop
\$ python -m timeit -s 'from itertools import repeat' 'for _ in xrange(100000): pass'
100 loops, best of 3: 2.43 msec per loop

Hrvoje Niksic, Sep 2, 2010
6. ### NobodyGuest

On Thu, 02 Sep 2010 12:02:40 +0200, Michael Kreim wrote:

> I was comparing the speed of a simple loop program between Matlab and
> Python.

> imax = 1000000000
> a = 0
> for i in xrange(imax):
> a = a + 10
> print a

> Are there any ways to speed up the for/xrange loop?

Sure; the above can be reduced to just:

print imax * 10

More seriously, if you're comparing against Matlab, you should look at
NumPy. If there's a reasonably direct approach using NumPy, it will be
much quicker than a Python "for" loop (in a sense, NumPy is a library of
useful "for" loops implemented in C).

Even a fairly indirect NumPy approach is often quicker than pure Python.

Nobody, Sep 2, 2010
7. ### Philip BloomGuest

Uh.

Try:
Imax=1000000000
a=0
i=0
While(i<imax):
a= a+10
i=i+1
print a

I suspect you will find it is way faster than using range or xrange for
large numbers and map far more closely in the final result to what you
are doing on matlab's side. At least last I checked, xrange and range
both involve iterating through an array, which is much slower in all
cases than just doing an int vs int compare (which is what your matlab
is doing).

-----Original Message-----
From: python-list-bounces+pbloom=
[mailtoython-list-bounces+pbloom=] On Behalf Of
Nobody
Sent: Thursday, September 02, 2010 9:05 AM
To:
Subject: Re: Speed-up for loops

On Thu, 02 Sep 2010 12:02:40 +0200, Michael Kreim wrote:

> I was comparing the speed of a simple loop program between Matlab and
> Python.

> imax = 1000000000
> a = 0
> for i in xrange(imax):
> a = a + 10
> print a

> Are there any ways to speed up the for/xrange loop?

Sure; the above can be reduced to just:

print imax * 10

More seriously, if you're comparing against Matlab, you should look at
NumPy. If there's a reasonably direct approach using NumPy, it will be
much quicker than a Python "for" loop (in a sense, NumPy is a library of
useful "for" loops implemented in C).

Even a fairly indirect NumPy approach is often quicker than pure Python.

--
http://mail.python.org/mailman/listinfo/python-list

______________________________________________________________________
This email has been scanned by the MessageLabs
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the MessageLabs
______________________________________________________________________

Philip Bloom, Sep 2, 2010
8. ### Peter OttenGuest

Philip Bloom wrote:

> Uh.

> Try:
> Imax=1000000000
> a=0
> i=0
> While(i<imax):
> a= a+10
> i=i+1
> print a

> I suspect you will find it is way faster than using range or xrange for
> large numbers and map far more closely in the final result to what you
> are doing on matlab's side. At least last I checked, xrange and range
> both involve iterating through an array, which is much slower in all
> cases than just doing an int vs int compare (which is what your matlab
> is doing).

How did you check?

\$ python -m timeit "for i in xrange(1000000): pass"
10 loops, best of 3: 47.5 msec per loop
\$ python -m timeit "i = 0" "while i < 1000000: i += 1"
10 loops, best of 3: 152 msec per loop
\$

So an empty while loop takes about three times as long as the equivalent for
loop. Also:

"""
class xrange(object)
| xrange([start,] stop[, step]) -> xrange object
|
| Like range(), but instead of returning a list, returns an object that
| generates the numbers in the range on demand. For looping, this is
| slightly faster than range() and more memory efficient.
"""

Peter

Peter Otten, Sep 2, 2010
9. ### BartCGuest

"Michael Kreim" <> wrote in message
news:...

> I was comparing the speed of a simple loop program between Matlab and
> Python.
>
> My Codes:
> \$ cat addition.py
> imax = 1000000000
> a = 0
> for i in xrange(imax):
> a = a + 10
> print a

> Unfortunately my Python Code was much slower and I do not understand why.
>
> Are there any ways to speed up the for/xrange loop?
> Or do I have to live with the fact that Matlab beats Python in this
> example?

I'm not sure the Python developers were interested in getting fast loops.

For-loops which iterate between two numbers are amongst the easiest things
to make fast in a language. Yet originally you had to use:

for i in range(N):

which (if I understood correctly) actually created a list of N objects,
populated it with the values 0, 1, 2...N-1 (presumably using a more sensible
loop), then iterated between the values of the list!

So Python had the distinction of being one of the slowest languages in which
to do nothing (ie. running an empty loop).

--
Bartc

BartC, Sep 3, 2010
10. ### Stefan BehnelGuest

BartC, 03.09.2010 22:17:
> for i in range(N):
>
> which (if I understood correctly) actually created a list of N objects,
> populated it with the values 0, 1, 2...N-1 (presumably using a more
> sensible loop), then iterated between the values of the list!

I guess what applies here is "special cases aren't special enough to break
the rules". The performance is good enough in most cases, it only hurts
when the range is large and the loop body is small in comparison, such as
in the most obvious stupid benchmarks.

Also, xrange() is a pretty old addition the the language and now replaces
range() in Python 3.

Stefan

Stefan Behnel, Sep 3, 2010
11. ### Dennis Lee BieberGuest

On Fri, 3 Sep 2010 21:17:44 +0100, "BartC" <> declaimed
the following in gmane.comp.python.general:

> I'm not sure the Python developers were interested in getting fast loops.
>
> For-loops which iterate between two numbers are amongst the easiest things
> to make fast in a language. Yet originally you had to use:
>
> for i in range(N):
>
> which (if I understood correctly) actually created a list of N objects,
> populated it with the values 0, 1, 2...N-1 (presumably using a more sensible
> loop), then iterated between the values of the list!
>

In those languages in which a for loop cycles over a range of
integral values (or even floating point values) one often ends up with
code that then performs subscripting to access the real goal of the
loop...

The Python for loop is innately focused on giving one an object from
some iterable (list, tuple, dictionary, user-defined...) with which one
can directly process... Compare {pseudo-code}:

for item in world_objects:
item.update(simulation_time)

to

for i in range(len(world_objects)):
world_objects.update(simulation_time)

Those languages that have optimized numeric-only (integer-only in
some cases) for loops do not support the clean interface of the first
sample and force all access to look like the latter.

--
Wulfraed Dennis Lee Bieber AF6VN
HTTP://wlfraed.home.netcom.com/

Dennis Lee Bieber, Sep 4, 2010
12. ### Steven D'ApranoGuest

On Fri, 03 Sep 2010 21:17:44 +0100, BartC wrote:

> I'm not sure the Python developers were interested in getting fast
> loops.
>
> For-loops which iterate between two numbers are amongst the easiest
> things to make fast in a language. Yet originally you had to use:
>
> for i in range(N):

I don't have any versions of Python prior to version 1.5, but as far back
as that there was always a choice between creating a list with range()
and a lazy iterator with xrange().

> which (if I understood correctly) actually created a list of N objects,
> populated it with the values 0, 1, 2...N-1 (presumably using a more
> sensible loop), then iterated between the values of the list!

By "more sensible", do you mean "in C code"? If so, then you are correct.

> So Python had the distinction of being one of the slowest languages in
> which to do nothing (ie. running an empty loop).

Nonsense.

[steve@sylar ~]\$ time python test.py

real 0m3.441s
user 0m2.969s
sys 0m0.024s
[steve@sylar ~]\$ time perl test.pl

real 0m3.490s
user 0m2.722s
sys 0m0.011s
[steve@sylar ~]\$ time ruby test.rb

real 0m11.875s
user 0m6.740s
sys 0m3.995s

The difference between an empty loop in Python and Perl is insignificant,
and much faster than Ruby (at least the specific version of Ruby
installed on my machine, 1.8.6).

And if you want to see the code I ran:

[steve@sylar ~]\$ cat test.*
# perl
for (\$i = 0; \$i < 10_000_000; ++\$i) {
1;
}
# python
for i in xrange(10000000):
1
# ruby
for i in 0...10000000
1
end

Just for comparisons' sake:

[steve@sylar ~]\$ gpc empty_test.p
[steve@sylar ~]\$ time ./a.out

real 0m0.106s
user 0m0.070s
sys 0m0.004s
[steve@sylar ~]\$ cat empty_test.p
program main(input, output);
var
i: integer;
begin
for i := 0 to 10000000 do
begin
end;
end.

Of course, a real optimizing compiler would realise that the Pascal code
did nothing at all, and compile it all away to an empty a.out file...

--
Steven

Steven D'Aprano, Sep 5, 2010
13. ### Stefan BehnelGuest

Steven D'Aprano, 05.09.2010 17:00:
> Of course, a real optimizing compiler would realise that the Pascal code
> did nothing at all, and compile it all away to an empty a.out file...

Which is just one of the reasons why this kind if "benchmark" provides no
insight into anything that should have an impact on a programmer's daily job.

Stefan

Stefan Behnel, Sep 5, 2010