efficiency of range() and xrange() in for loops

  • Thread starter Steve R. Hastings
  • Start date
S

Steve R. Hastings

When you compile the expression

for i in range(1000):
pass

does Python make an iterator for range(), and then generate the values
on the fly? Or does Python actually allocate the list [0, 1, 2, ..., 999]
and then step through it?

I was under the impression that recent releases of Python optimize this
case, but upon reflection, I have no idea where I got that impression.

If Python actually allocates the list, then clearly we should all use
"for i in xrange". But "for i in range" looks cleaner, and is potentially
more lightweight than xrange. Of course, if you want to write code that
runs well on older versions of Python, you need to use xrange().


If Python doesn't currently optimize this case, is there any chance this
optimization could be added?

P.S. It looks like all the cool people look at the generated bytecodes to
answer questions like this one. I want to be cool too. Where can I find
information about how to get a bytecodes listing for my compiled Python?
 
J

John Salerno

Steve said:
If Python doesn't currently optimize this case, is there any chance this
optimization could be added?

I was just asking about this, and apparently it will be changed in 3.0:

Built-in Namespace
* Make built-ins return an iterator where appropriate (e.g.
range(),
zip(), etc.)

To be removed:
* xrange(): use range() instead [1]
 
G

Georg Brandl

Steve said:
When you compile the expression

for i in range(1000):
pass

does Python make an iterator for range(), and then generate the values
on the fly? Or does Python actually allocate the list [0, 1, 2, ..., 999]
and then step through it?

It does.
I was under the impression that recent releases of Python optimize this
case, but upon reflection, I have no idea where I got that impression.

If Python actually allocates the list, then clearly we should all use
"for i in xrange". But "for i in range" looks cleaner, and is potentially
more lightweight than xrange. Of course, if you want to write code that
runs well on older versions of Python, you need to use xrange().


If Python doesn't currently optimize this case, is there any chance this
optimization could be added?

As the alternative is so easy (adding a single character), I think this will
not make it into the 2.x line. A patch for the old compiler existed on SF,
but it will not work with the new AST compiler.
P.S. It looks like all the cool people look at the generated bytecodes to
answer questions like this one. I want to be cool too. Where can I find
information about how to get a bytecodes listing for my compiled Python?

The "dis" module.

Georg
 
F

Fredrik Lundh

Steve said:
If Python actually allocates the list, then clearly we should all use
"for i in xrange".

clearly we should all? speak for yourself. the difference isn't very
large, xrange is actually slower in some python versions, and you'll
need the integer objects sooner or later anyway...
If Python doesn't currently optimize this case, is there any chance this
optimization could be added?

in Python 2.X, range is defined to return a list. if you start returning
something else, you'll break stuff.

</F>
 
T

Todd

Steve said:
When you compile the expression

for i in range(1000):
pass

does Python make an iterator for range(), and then generate the values
on the fly? Or does Python actually allocate the list [0, 1, 2, ..., 999]
and then step through it?

I ran an experiment on this a while back. I thought it generated a
list. But the results turned out to be surprising. Take this with a
grain of salt. I'm not completely sure I did what I meant to do. But
it looks like however you do it, it works out about the same.

http://www.signalsguru.net/articles/pyloops/pyloops.html
 
M

mensanator

Todd said:
Steve said:
When you compile the expression

for i in range(1000):
pass

does Python make an iterator for range(), and then generate the values
on the fly? Or does Python actually allocate the list [0, 1, 2, ..., 999]
and then step through it?

I ran an experiment on this a while back. I thought it generated a
list. But the results turned out to be surprising. Take this with a
grain of salt. I'm not completely sure I did what I meant to do. But
it looks like however you do it, it works out about the same.

http://www.signalsguru.net/articles/pyloops/pyloops.html

Didn't it occur to you to also check memory usage?

I learned about xrange the hard way, when my range ate up all
available memory.
 
S

Steve R. Hastings

clearly we should all? speak for yourself.

I apologize for my pithy turn of phrase. Clearly what I should have
written was:

"If Python actually allocates the list, then clearly 'for i in xrange'
would be desirable in cases where memory allocation might possibly be a
problem; of course if you have lots of memory, or you are only looping a
small number of times, go ahead and use range() because why not."

I deeply apologize for inadvertently trying to tell you how to write a for
loop. I hope the offense isn't unforgivable.


I apologise for the fault in the newsgroup posting. Those responsible
have been sacked.

the difference isn't very
large, xrange is actually slower in some python versions, and you'll
need the integer objects sooner or later anyway...

Actually, for many uses of "for i in (range|xrange)", you only need the
value of i, and you aren't doing anything with the integer object. You
might not even be looking at the value of i:


start_time = time.time()
for i in xrange(10**6):
run_some_function()
stop_time = time.time()
secs = (stop_time - start_time) / 10**6
print "run_some_function() took on average, %f seconds." % secs


In the above, clearly we should all use xrange()... oops, I meant, if you
want to, you could use xrange() instead of allocating a list of a million
integers and then doing nothing with the list itself and then deallocating
the list of a million integers.

I apologise again for the fault in the newsgroup posting. Those
responsible for sacking the people who have just been sacked
have been sacked.



in Python 2.X, range is defined to return a list. if you start
returning something else, you'll break stuff.

Perhaps I'm mistaken here, but I don't see how this optimization could
possibly break anything. range() makes a list, and for consumes it, and
the list isn't seen anywhere else. If the Python compiler took this:

for i in range(10**6):
pass


and produced code equivalent to this:

for i in iter(range(10**6))
pass


How would this break stuff?
 
G

Giovanni Bajo

Steve said:
Perhaps I'm mistaken here, but I don't see how this optimization could
possibly break anything.

Because you assume that the only use-case of range() is within a for-loop.
range() is a builtin function that can be used in any Python expression. For
instance:

RED, GREEN, BLUE, WHITE, BLACK = range(5)
 
A

Alan Morgan

Because you assume that the only use-case of range() is within a for-loop.
range() is a builtin function that can be used in any Python expression. For
instance:

RED, GREEN, BLUE, WHITE, BLACK = range(5)

Hmmm, this worked fine when I used xrange as well. Am I missing something?
Obviously there *are* differences, viz:

a = range(5)
b = range(5)
a==b # True!
c = xrange(5)
d = xrange(5)
c==d # False!

and various other more arcane things, but do these actually happen
in real life?

Alan
 
S

Steve R. Hastings

[...] you assume that the only use-case of range() is within a for-loop.

No, I do not assume any such thing. I have not suggested that range()
should be changed to always return an iterator. I wondered if the Python
compiler could, as a special case, turn:

for i in range(n)

into compiled code equivalent to

for i in itr

where "itr" is a lightweight iterator that returns the same values as
iter(range(n)).

iter(range(n)) will still allocate a list of n integers, and then produce
an iterator over that list, so I did not mean to suggest that for would
simply call iter() when you use "for i in range".

I apologize if my writing was unclear.
 
E

Erik Max Francis

Alan said:
Hmmm, this worked fine when I used xrange as well. Am I missing something?

Not in your use case. Tuple unpacking will iterate, and so it doesn't
matter whether it's an actual list or an iterator:
2

There are certainly contexts where a sequence and its iterator are not
interchangeable. You missed an obvious one:
False
 
S

Steven D'Aprano

I apologize for my pithy turn of phrase. Clearly what I should have
written was:

"If Python actually allocates the list, then clearly 'for i in xrange'
would be desirable in cases where memory allocation might possibly be a
problem; of course if you have lots of memory, or you are only looping a
small number of times, go ahead and use range() because why not."

How about this?

....because range() is simpler and actually an English word, it reads
better and won't confuse non-native English speakers who try looking
xrange up in a dictionary, it is aesthetically nicer, and at least some of
the time range() will actually be faster and maybe even consume less
memory (although if your list is that small, we're only talking about
trivial amounts of memory anyway).

But yes, for many purposes, the difference between xrange and range is
quite minimal. But please try to recall that many != all, and any
optimization put to Python 2.x must work in all cases, not just many.

I deeply apologize for inadvertently trying to tell you how to write a
for loop. I hope the offense isn't unforgivable.

Sarcasm aside, you will find many many people are deeply allergic to the
merest hint of premature optimization, especially when that premature
optimization is posed in such a way to hint that the person proposing it
hasn't really considered all the factors.

[snip]
Actually, for many uses of "for i in (range|xrange)", you only need the
value of i, and you aren't doing anything with the integer object. You
might not even be looking at the value of i:


start_time = time.time()
for i in xrange(10**6):
run_some_function()
stop_time = time.time()
secs = (stop_time - start_time) / 10**6 print "run_some_function() took
on average, %f seconds." % secs


In the above, clearly we should all use xrange()... oops, I meant, if
you want to, you could use xrange() instead of allocating a list of a
million integers and then doing nothing with the list itself and then
deallocating the list of a million integers.

Yes, the above example is a good use case for xrange. Did you think that
anyone denied that there were good cases for it?

Here is a bad usage case for xrange:

L = []
for i in xrange(10**6):
L.append(2*i+1)

This should likely be written as:

L = range(1, 2*10**6+1, 2)


I'll tell you what: I'll agree that the existence of bad usages for xrange
does NOT mean xrange should never be used, if you agree that the existence
of bad usages for range does NOT mean range should never be used.

In any case, if you look at the source code for the timeit module, you
will see that for cases where you genuinely don't need the values of i,
the optimal strategy is to not bother creating one million integers at all:

L = [None]*10**6 # don't time the creation of the list
start_time = time.time()
for _ in L:
run_some_function()
stop_time = time.time()

This strategy (I think) trades off a little extra memory for a little
extra speed; in other applications, speed may be less important and memory
more so, so xrange or some other iterator would be preferred.

Perhaps I'm mistaken here, but I don't see how this optimization could
possibly break anything. range() makes a list, and for consumes it, and
the list isn't seen anywhere else.

But of course you are forgetting that range is just a function and can be
used anywhere, not just in for loops. Here is a toy example:

L = range(1000)
random.shuffle(L)
while L:
L.pop()

If range was optimized to return a xrange or iterator object, this code
would stop working.
 
A

Alan Morgan

Alan said:
Hmmm, this worked fine when I used xrange as well. Am I missing something?
Obviously there *are* differences, viz:

Just _look_ at the objects:
range(5) [0, 1, 2, 3, 4]
xrange(5) xrange(5)

Yes, I was well aware of this. I noted a difference in my original
post. I was just pointing out that the example given didn't, in
fact, behave differently for range() and xrange() even though range()
and xrange() *are* different.
range is giving you a real list, while xrange is giving you an xrange object.
Have you tried to slice an xrange object? Or using .append on it?

No, I hadn't. I presume these could all be defined.

Alan
 
S

Steve R. Hastings

Yes, the above example is a good use case for xrange. Did you think that
anyone denied that there were good cases for it?

I am now officially sorry for starting this thread.



Please let me just summarize what I wanted to say:

* Question: Does Python do anything special with "for i in range"?
(And I got an answer: the answer is "no".)

* If Python does not do anything special with "for i in range", then you
are building a list and tearing it down again without using the list, and
for memory efficiency you might want to use xrange. (I used a more pithy
phrase, which I now regret.)

* It would be nice if the Python compiler checked for the special case of
"for i in range" and compiled it to "for i in itr" where itr is a
lightweight iterator that generates the same values as the range statement.
Potentially, I speculated, this special iterator might be more lightweight
than an xrange() object.


That is all. I did not mean to tell anyone how to write a for loop. I
did not mean to suggest that range() should be changed to always return an
iterator. I did not mean to insult anyone. I don't even know what I ever
wrote that sounded like "range should never be used".

And I hope everyone recognized that the bit about "those responsible...
have been sacked" was a reference to the opening credits of the movie
_Monty Python and the Holy Grail_. I was trying to be funny, not
sarcastic, bitter, etc.

Thank you for your patience and I am done with this thread, unless I have
written something unclear in *this* post and I have to post another post
to clarify it as well. :-(
 
A

Alan Morgan

Not in your use case. Tuple unpacking will iterate, and so it doesn't
matter whether it's an actual list or an iterator:

2

There are certainly contexts where a sequence and its iterator are not
interchangeable. You missed an obvious one:

False

I thought that one was sufficiently obvious as not to need mentioning.
There was never any argument that range() and xrange() returned different
things; the question (as I understood it) was if you could generally
use the things they return in the same way and not *care* about the
difference (the answer being "Yes, except when you can't").

Alan
 
S

Steve R. Hastings

Don't. It's quite funny, thanks.

I guess I should laugh. :-/


When you read my original articles, did *you* think I was proposing that
range() be changed to always return an iterator? I thought what I wrote
was pretty clear...
 
F

Felipe Almeida Lessa

Em Qua, 2006-04-05 às 22:24 +0200, Fredrik Lundh escreveu:
the difference isn't very
large, xrange is actually slower in some python versions, and you'll
need the integer objects sooner or later anyway...

Some code is worth a thousand words:


$ python2.3
Python 2.3.5 (#2, Mar 6 2006, 10:12:24)
[GCC 4.0.3 20060304 (prerelease) (Debian 4.0.2-10)] on linux2
Type "help", "copyright", "credits" or "license" for more information.70.822579860687256


$ python2.4
Python 2.4.3 (#2, Mar 30 2006, 21:52:26)
[GCC 4.0.3 (Debian 4.0.3-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.68.15216064453125


In sum, difference is as big as the function argument, but xrange was
*never* slower on Python 2.3 or Python 2.4 on x86 with Linux. I'd say
three things:

1) Probably xrange is faster than range on other architectures and
operational systems as well.
2) I can't say anything for Python < 2.3.
3) Run your own tests. If you see the same I saw, use xrange everywhere
you can.

The 3rd point is specially true if your code can be break in the middle,
for example (a really dumb example, I know):

for i in xrange(0, 10000000, 42):
if i**2 >= x:
break

This way you won't create all numbers, in this case just those less than
sqrt(x) plus one.

Cheers,
 
A

Alex Martelli

Steve R. Hastings said:
Actually, for many uses of "for i in (range|xrange)", you only need the
value of i, and you aren't doing anything with the integer object. You

Then, the speediest approach may be completely different:

import itertools

for i in itertools.repeat(None, N):
...


Remember, when you're thinking "blazing speed", think itertools.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top