range() vs xrange() Python2|3 issues for performance

H

harrismh777

The following is intended as a helpful small extension to the xrange()
range() discussion brought up this past weekend by Billy Mays...

With Python2 you basically have two ways to get a range of numbers:
range() , which returns a list, and
xrange() , which returns an iterator.

With Python3 you must use range(), which produces an iterator; while
xrange() does not exist at all (at least not on 3.2).

I have been doing some research in number theory related to Mersenne
Primes and perfect numbers (perfects, those integers whose primary
divisors when summed result in the number, not including the number
itself)... the first few of those being 6, 28, 496, 8128, 33550336, etc

Never mind, but you know... are there an infinite number of them?
.... and of course, are there any "odd" perfect numbers... well not under
10^1500.... I digress, sorry ...

This brought up the whole range() xrange() thing for me again
because Python in any case is just not fast enough (no brag, just fact).
So my perfect number stuff is written in C, for the moment. But, what
about the differences in performance (supposing we were to stay in
Python for small numbers) between xrange() vs range() [on Python2]
versus range() [on Python3]? I have put my code snips below, with some
explanation below that... these will run on either Python2 or
Python3... except that if you substitute xrange() for range() for
Python2 they will throw an exception on Python3... doh.

So, here is PyPerfectNumbers.py ----------------------------

def PNums(q):
for i in range(2, q):
m = 1
s = 0
while m <= i/2:
if not i%m:
s += m
m += 1
if i == s:
print(i)
return 0

def perf(n):
sum = 0
for i in range(1, n):
if n % i == 0:
sum += i
return sum == n

fperf = lambda n: n == sum(i for i in range(1, n) if n % i == 0)

-----------------/end---------------------------------------

PNums(8200) will crunch out the perfect numbers below 8200.

perf(33550336) will test to see if 33550336 is a perfect number

fperf(33550336) is the lambda equivalent of perf()


These are coded with range(). The interesting thing to note is that
xrange() on Python2 runs "considerably" faster than the same code using
range() on Python3. For large perfect numbers (above 8128) the
performance difference for perf() is orders of magnitude. Actually,
range() on Python2 runs somewhat slower than xrange() on Python2, but
things are much worse on Python3.
This is something I never thought to test before Billy's question,
because I had already decided to work in C for most of my integer
stuff... like perfects. But now that it sparked my interest, I'm
wondering if there might be some focus placed on range() performance in
Python3 for the future, PEP?

kind regards,
 
G

garabik-news-2005-05

harrismh777 said:
Python3... except that if you substitute xrange() for range() for
Python2 they will throw an exception on Python3... doh.

if 'xrange' not in dir(__builtins__):
xrange = range

at the beginning of your program will fix that.
 
P

Peter Otten

harrismh777 said:
The following is intended as a helpful small extension to the xrange()
range() discussion brought up this past weekend by Billy Mays...

With Python2 you basically have two ways to get a range of numbers:
range() , which returns a list, and
xrange() , which returns an iterator.

With Python3 you must use range(), which produces an iterator; while
xrange() does not exist at all (at least not on 3.2).

I have been doing some research in number theory related to Mersenne
Primes and perfect numbers (perfects, those integers whose primary
divisors when summed result in the number, not including the number
itself)... the first few of those being 6, 28, 496, 8128, 33550336, etc

Never mind, but you know... are there an infinite number of them?
... and of course, are there any "odd" perfect numbers... well not under
10^1500.... I digress, sorry ...

This brought up the whole range() xrange() thing for me again
because Python in any case is just not fast enough (no brag, just fact).
So my perfect number stuff is written in C, for the moment. But, what
about the differences in performance (supposing we were to stay in
Python for small numbers) between xrange() vs range() [on Python2]
versus range() [on Python3]? I have put my code snips below, with some
explanation below that... these will run on either Python2 or
Python3... except that if you substitute xrange() for range() for
Python2 they will throw an exception on Python3... doh.

try:
range = xrange
except NameError:
pass
So, here is PyPerfectNumbers.py ----------------------------

def PNums(q):
for i in range(2, q):
m = 1
s = 0
while m <= i/2:

i/2 returns a float in Python 3; you should use i//2 for consistency.
if not i%m:
s += m
m += 1
if i == s:
print(i)
return 0

def perf(n):
sum = 0
for i in range(1, n):
if n % i == 0:
sum += i
return sum == n

fperf = lambda n: n == sum(i for i in range(1, n) if n % i == 0)

-----------------/end---------------------------------------

PNums(8200) will crunch out the perfect numbers below 8200.

perf(33550336) will test to see if 33550336 is a perfect number

fperf(33550336) is the lambda equivalent of perf()


These are coded with range(). The interesting thing to note is that
xrange() on Python2 runs "considerably" faster than the same code using
range() on Python3. For large perfect numbers (above 8128) the
performance difference for perf() is orders of magnitude.

Python 3's range() is indeed slower, but not orders of magnitude:

$ python3.2 -m timeit -s"r = range(33550336)" "for i in r: pass"
10 loops, best of 3: 1.88 sec per loop
$ python2.7 -m timeit -s"r = xrange(33550336)" "for i in r: pass"
10 loops, best of 3: 1.62 sec per loop

$ cat tmp.py
try:
range = xrange
except NameError:
pass

def fperf(n):
return n == sum(i for i in range(1, n) if not n % i)

if __name__ == "__main__":
print(fperf(33550336))
$ time python2.7 tmp.py
True

real 0m6.481s
user 0m6.100s
sys 0m0.000s
$ time python3.2 tmp.py
True

real 0m7.925s
user 0m7.520s
sys 0m0.040s

I don't know what's causing the slowdown, maybe the int/long unification is
to blame.
 
S

Stefan Behnel

harrismh777, 02.08.2011 09:12:
With Python2 you basically have two ways to get a range of numbers:
range() , which returns a list, and
xrange() , which returns an iterator.

With Python3 you must use range(), which produces an iterator; while
xrange() does not exist at all (at least not on 3.2).

That's a good thing. There should be one - and preferably only one -
obvious way to do it.

iterable: range(N)
list: list(range(N))
tuple: tuple(range(N))
set: set(range(N))
....

Less special cases in the language.

This brought up the whole range() xrange() thing for me again because
Python in any case is just not fast enough (no brag, just fact). So my
perfect number stuff is written in C, for the moment.

Or use Cython or PyPy, both of which are simpler ways to get your code up
to speed.

The interesting thing to note is that
xrange() on Python2 runs "considerably" faster than the same code using
range() on Python3.

Are you sure that's due to Py3 range() vs. Py2 xrange()? Py3 has a
different implementation for integers (which is still being optimised,
BTW). That's much more likely to make a difference here.

What version of Py3 were you using? If you used the latest, maybe even the
latest hg version, you will notice that that's substantially faster for
integers than, e.g. 3.1.x.

Stefan
 
T

Thomas Rachel

Am 02.08.2011 09:12 schrieb harrismh777:
The following is intended as a helpful small extension to the xrange()
range() discussion brought up this past weekend by Billy Mays...

With Python2 you basically have two ways to get a range of numbers:
range() , which returns a list, and
xrange() , which returns an iterator.

No. An iterable. As already said, iterators are the ones stopping
forever when the end is reached.

Generally, iterables are allowed to iterate multiple times.

xrange() resp. range() yield iterables.


The interesting thing to note is that
xrange() on Python2 runs "considerably" faster than the same code using
range() on Python3. For large perfect numbers (above 8128) the
performance difference for perf() is orders of magnitude. Actually,
range() on Python2 runs somewhat slower than xrange() on Python2, but
things are much worse on Python3.

That sounds strange at the first glance.

This is something I never thought to test before Billy's question,
because I had already decided to work in C for most of my integer
stuff... like perfects. But now that it sparked my interest, I'm
wondering if there might be some focus placed on range() performance in
Python3 for the future, PEP?

I'm sure it is a matter of implementation. You cannot define by PEP what
performance the implementations should have. Maybe range() in 3 is
defined differently to xrange() in 2. I'm not so familiar with 3 to
definitely confirm or decline that. The behaviour, though, seems to be
the same, but range3 (as I call it now) has some more methods than
xrange, like the rich comparison ones. Indexing works with all of them.


Thomas
 
C

Chris Angelico

What version of Py3 were you using? If you used the latest, maybe even the
latest hg version, you will notice that that's substantially faster for
integers than, e.g. 3.1.x.

I just tried this out, using a slightly modified script but the same guts:

-----
import sys
print(sys.version)

import time
def PNums(q):
start=time.clock()
for i in range(2, q):
m = 1
s = 0
while m <= i/2:
if not i%m:
s += m
m += 1
if i == s:
print(i)
print("Time: %f"%(time.clock()-start))
return

# PNums(33550337)
PNums(10000)

-----

On my dual-core Windows laptop (it always saturates one core with
this, leaving the other for the rest of the system), the results show
no statistically significant difference between xrange and range in
Python 2, but notably slower overall performance in Python 3:

2.4.5 (#1, Dec 15 2009, 16:41:19)
[GCC 4.1.1]
Time: 14.474343
Using xrange: 14.415412

2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)]
Time: 8.990142
Using xrange: 9.015566

3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)]
Time: 24.401461

Since I don't have a build environment in which to test the latest
from hg, I switched to a Linux box. The following timings therefore
cannot be compared with the above ones.

3.3a0 (default:b95096303ed2, Jun 2 2011, 20:43:01)
[GCC 4.4.5]
Time: 34.390000

2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5]
Time: 13.730000
Using xrange: 13.670000

My 3.3a0 is freshly pulled from hg, although I'm not sure if
sys.version has been correctly built. Once again, 2.6.6 shows no
significant difference between range and xrange, but 3 is noticeably
slower than 2. (I did several runs, but the variance between the runs
wasn't significant.)

Of course, this is all fairly moot; if you're doing really heavy
number crunching, CPython isn't the platform to use.

ChrisA
 
C

Chris Angelico

i/2 returns a float in Python 3; you should use i//2 for consistency.

And I forgot to make this change before doing my tests. Redoing the
Python 3 ones with // quite drastically changes things!

3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)]
Time: 17.917331

That's a lot closer to the 10-14 seconds that Python 2 was doing, but
still somewhat worse. Of course, no surprises that 2.6 is faster than
2.4.

But now, here's a fairly small change that makes a LOT of difference,
and reveals the value of range/xrange. Notice that m is just being a
classic iteration counter (it starts at 1, increments to a top
limit)...

-----
for m in xrange(1,i//2+1):
if not i%m:
s += m
-----

This brings 2.6.5 down to 5.359383 seconds, or 4.703364 with xrange.
(I'm now changing two references from range to xrange.) Meanwhile, 3.2
has come down to 7.096237 seconds, and 2.4.5 to 8.222261.

Comparing the latest 3.3a0 and 2.6.6 on the other box shows that
there's still a difference. Both of them improve with range instead of
manual incrementing, but 2.6.6 takes 6.950000 seconds and 3.3a0 takes
13.830000 (down from 13.730000 and 34.390000 in the previous test).

Conclusion: Python 3 is notably slower comparing floating point and
integer than Python 2 is comparing int and int. No surprises there!
But get everything working with integers, and use range() instead of
manually incrementing a variable, and things come much more even.

But as I said, CPython isn't the ideal language for heavy number
crunching. On the same Windows box, a Pike program using the same
algorithm took only 2.055 seconds. And a C program took 0.328 seconds.
But if you have other reasons for keeping it in Python, do keep it to
integers!

ChrisA
 
S

Steven D'Aprano

harrismh777 said:
The following is intended as a helpful small extension to the xrange()
range() discussion brought up this past weekend by Billy Mays...

With Python2 you basically have two ways to get a range of numbers:
range() , which returns a list, and
xrange() , which returns an iterator.


xrange does not return an iterator. It returns a lazy iterable object, which
is not the same thing.

In Python, "iterator" is not merely a generic term for something which can
be iterated over. An iterator is an object which obeys the iterator
protocol, which depends on at least two properties:

* the object must have a next() method, which returns values until the
iterator is exhausted, and then raises StopIteration;

(In Python 3, the method is __next__.)

* and the object must have an __iter__() method which returns itself.

(Also, "non-broken" iterators will continue to raise StopIteration once they
do so once. That is, they can't be reset or repeated.)

xrange objects fail on both accounts. (Likewise for range objects in Python
3.)


[...]
These are coded with range(). The interesting thing to note is that
xrange() on Python2 runs "considerably" faster than the same code using
range() on Python3. For large perfect numbers (above 8128) the
performance difference for perf() is orders of magnitude. Actually,
range() on Python2 runs somewhat slower than xrange() on Python2, but
things are much worse on Python3.

I find these results surprising, at least for numbers as small as 8128, and
suspect your timing code is inaccurate.

(But don't make the mistake of doing what I did, which was to attempt to
produce range(290000000) in Python 2. After multiple *hours* of swapping, I
was finally able to kill the Python process and get control of my PC again.
Sigh.)

I would expect that, in general, Python 3.1 or 3.2 is slightly slower than
Python 2.6 or 2.7. (If you're using Python 3.0, stop now!) But in Python 2,
I wouldn't expect orders of magnitude difference in range and xrange. Using
the function perf(N) you gave, and a modified copy perfx(N) which simply
replaces xrange for range, I get these timings in Python2.6:
2.8787298202514648

A small difference, but not an order of magnitude. In Python 3.1, I get
this:
3.4577009677886963


This is something I never thought to test before Billy's question,
because I had already decided to work in C for most of my integer
stuff... like perfects. But now that it sparked my interest, I'm
wondering if there might be some focus placed on range() performance in
Python3 for the future, PEP?

Oh indubitably. I doubt it will need a PEP. Python 3.x is still quite young,
and the focus is on improving unicode support, but performance improvements
will usually be welcome.

However, at some point I would expect adding hand-crafted optimizations to
CPython will cease to be worthwhile. Guido is already talking about CPython
becoming the reference implementation, and PyPy the production
implementation because it's faster. PyPy's optimizing compiler is already
about twice as fast as CPython, and for at least one specially crafted
example, faster than C:

http://morepypy.blogspot.com/2011/02/pypy-faster-than-c-on-carefully-crafted.html
 
C

Chris Angelico

(But don't make the mistake of doing what I did, which was to attempt to
produce range(290000000) in Python 2. After multiple *hours* of swapping, I
was finally able to kill the Python process and get control of my PC again.
Sigh.)

That is sad. (And, I am only slightly ashamed to admit it, quite
funny.) But on the other hand, quite impressive that it didn't bomb
anywhere!

Whenever I'm about to do something really dangerous, I like to first
open an SSH session from another computer - bash running inside there
can usually kill any process fairly efficiently, without need for UI
interaction. Failing that, good ol' Ctrl-Alt-F1 to get to a console is
usually the next best, but sometimes logging in requires a lot of
resources.

This is actually one place where I'm really glad Python doesn't do
much with multiple threads (for instance, it won't gc on another
thread). A dual core CPU keeps everything happy!

ChrisA
 
S

Steven D'Aprano

harrismh777 said:
def perf(n):
sum = 0
for i in range(1, n):
if n % i == 0:
sum += i
return sum == n


A more effective way to speed that up is with a better algorithm:

def isperfect(n):
if n < 2: return False
total = 1
for i in range(2, int(n**0.5)+1):
a, b = divmod(n, i)
if b == 0:
total += a+i
return total == n

For n=33550336 it loops about 5791 times instead of 33550335, reducing the
amount of work done by about 99.98%.

isperfect is fast enough to exhaustively test every number up to a low
limit: testing every number between 0 and 8128 inclusive took about 0.38
second in total on my PC. That's an average of 0.05 millisecond per number
tested. Unfortunately, it doesn't scale. isperfect(33550336) takes about
4.4 ms, and isperfect(8589869056) about 84 ms.

However, it is still (barely) feasible to exhaustively test every number up
to a much higher range. I extrapolate that the time needed to check every
value between 0 and 33550336 would take about 30 hours.

If you're wondering why exhaustive testing is so expensive, it's because the
test code simply calls isperfect(n) for each n. The number of iterations in
each call to isperfect is approximately sqrt(n), so the total in any
exhaustive test is approximately sum(x**0.5 for x in range(upperlimit)),
which gets rather large.
 
C

Chris Angelico

       a, b = divmod(n, i)
       if b == 0:
           total += a+i

Wouldn't this fail on squares? It happens to give correct results as
far as I've checked; no square up to 10,000 is called perfect, and
there are no perfect squares in that range, but I think it's
technically using an incorrect intermediate result.

ChrisA
 
S

Steven D'Aprano

Chris said:
Wouldn't this fail on squares? It happens to give correct results as
far as I've checked; no square up to 10,000 is called perfect, and
there are no perfect squares in that range, but I think it's
technically using an incorrect intermediate result.

Yes, you're correct -- it counts sqrt(n) twice as a factor if n is a perfect
square. The obvious fix is to change the increment to:

total += a if a==i else a+i

I don't believe it actually makes a difference for perfect numbers, but it's
worth getting right.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,818
Messages
2,569,732
Members
45,691
Latest member
Dick331194

Latest Threads

Top