question about xrange performance

W

_wolf

lately i realized a slow running portion of my application, and a
quick profiling nourished the suspicion that, of all things, calls to
`xrange().__contains__` (`x in b` where `b = xrange(L,H)`) is the
culprit. to avoid any other influences, i wrote this test script with
class `xxrange` being a poor man’s `xrange` replacement:


########################################################
class xxrange( object ):
def __init__( self, start, stop ):
self.start = start
self.stop = stop
def __contains__( self, x ):
return ( x == int( x ) ) and self.start <= x < self.stop

import cProfile
from random import randint
test_integers = [ randint 0, 5000 ) for i in xrange( 8000 ) ]
test_range_a = xxrange( 10000, 20000 )
test_range_b = xrange( 10000, 20000 )

def a():
print test_range_a.__class__.__name__
for x in test_integers:
x in test_range_a

def b():
print test_range_b.__class__.__name__
for x in test_integers:
x in test_range_b

cProfile.run('a()')
cProfile.run('b()')
########################################################

now this is the output, surprise:

########################################################
xxrange
8003 function calls in 0.026 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno
(function)
1 0.000 0.000 0.026 0.026 <string>:1(<module>)
1 0.012 0.012 0.026 0.026 xrange-profiler.py:18(a)
8000 0.014 0.000 0.014 0.000 xrange-profiler.py:9
(__contains__)
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}


xrange
3 function calls in 4.675 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno
(function)
1 0.000 0.000 4.675 4.675 <string>:1(<module>)
1 4.675 4.675 4.675 4.675 xrange-profiler.py:23(b)
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}
########################################################

can it be that a simple diy-class outperforms a python built-in by a
factor of 180? is there something i have done the wrong way?
omissions, oversights? do other people get similar figures?

cheers
 
M

MRAB

_wolf said:
lately i realized a slow running portion of my application, and a
quick profiling nourished the suspicion that, of all things, calls to
`xrange().__contains__` (`x in b` where `b = xrange(L,H)`) is the
culprit. to avoid any other influences, i wrote this test script with
class `xxrange` being a poor man’s `xrange` replacement:


########################################################
class xxrange( object ):
def __init__( self, start, stop ):
self.start = start
self.stop = stop
def __contains__( self, x ):
return ( x == int( x ) ) and self.start <= x < self.stop

import cProfile
from random import randint
test_integers = [ randint 0, 5000 ) for i in xrange( 8000 ) ]
test_range_a = xxrange( 10000, 20000 )
test_range_b = xrange( 10000, 20000 )

def a():
print test_range_a.__class__.__name__
for x in test_integers:
x in test_range_a

def b():
print test_range_b.__class__.__name__
for x in test_integers:
x in test_range_b

cProfile.run('a()')
cProfile.run('b()')
########################################################

now this is the output, surprise:

########################################################
xxrange
8003 function calls in 0.026 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno
(function)
1 0.000 0.000 0.026 0.026 <string>:1(<module>)
1 0.012 0.012 0.026 0.026 xrange-profiler.py:18(a)
8000 0.014 0.000 0.014 0.000 xrange-profiler.py:9
(__contains__)
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}


xrange
3 function calls in 4.675 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno
(function)
1 0.000 0.000 4.675 4.675 <string>:1(<module>)
1 4.675 4.675 4.675 4.675 xrange-profiler.py:23(b)
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}
########################################################

can it be that a simple diy-class outperforms a python built-in by a
factor of 180? is there something i have done the wrong way?
omissions, oversights? do other people get similar figures?
xrange() returns an xrange object, which generates its values on demand.
It doesn't have a __contains__ method, so 'in' uses its iterator, making
the xrange object yield each value until either the desired value is
produced or there are no more values.
 
P

Peter Otten

_wolf said:
lately i realized a slow running portion of my application, and a
quick profiling nourished the suspicion that, of all things, calls to
`xrange().__contains__` (`x in b` where `b = xrange(L,H)`) is the
culprit. to avoid any other influences, i wrote this test script with
class `xxrange` being a poor man’s `xrange` replacement:


########################################################
class xxrange( object ):
def __init__( self, start, stop ):
self.start = start
self.stop = stop
def __contains__( self, x ):
return ( x == int( x ) ) and self.start <= x < self.stop

I haven't read

http://mail.python.org/pipermail/python-3000/2007-July/008971.html

completely, but from what I've read a patch might be accepted. Personally I
will continue to write

start <= x < stop

I don't remember an occasion where I needed the int(x) == x incantation or a
step (that you didn't implement).

Peter
 
P

Paul McGuire

can it be that a simple diy-class outperforms a python built-in by a
factor of 180? is there something i have done the wrong way?
omissions, oversights? do other people get similar figures?

cheers

I wouldn't say you are outperforming xrange until your class also
supports:

for i in xxrange( 10000, 20000 ):
# do something with i

Wouldn't be difficult, but you're not there yet.

And along the lines with MRAB's comments, xrange is not really
intended for "in" testing, it is there for iteration over a range
without constructing the list of range elements first, which one
notices right away when looping over xrange(1e8) vs. range(1e8).

Your observation is especially useful to keep in mind as Python 3 now
imbues "range" with "xrange" behavior, so if you have code that tests
"blah in range(blee,bloo):", you will get similar poor results.)

And of course, you are cheating a bit with your xxrange "in" test,
since you aren't really verifying that the number is actually in the
given list, you are just testing against the extrema, and relying on
your in-built knowledge that xrange (as you are using it) contains all
the intermediate values. Compare to testing with xrange(1,100,2) and
you'll find that 10 is *not* in this range, even though 1 <= 10 <
100. (Extending xxrange to do this as well is also

One might wonder why you are even writing code to test for existence
"in" a range list, when "blee <= blah < bloo" is obviously going to
outperform this kind of code.

-- Paul
 
F

~flow

One might wonder why you are even writing code to test for existence
"in" a range list, when "blee <= blah < bloo" is obviously going to
outperform this kind of code.
-- Paul

the reason is simply the idiomacy and convenience. i use (x)xranges to
implement unicode blocks and similar things. it is natural to write
`if cid in latin_1` and so on. i always assumed it would be about the
fastest and memory-efficient to use xrange for this. i stumbled
against a first wall, tho, when i realized that `xrange` does not
accept long integers. so i wrote the class above to get an xrange-like
object that would accept them. stepping was not of concern---i recall
few times that i was in need of a third argument to xrange at all. i
then added iteration when i needed it; shows that iteration over
xxranges is outperformed by xrange by a factor of over twenty:

class xxrange( object ):
def __iter__( self ):
i = self.start
while i < self.stop:
yield i
i += 1

containment
xxrange 0.027 CPU seconds
xrange 90.365 CPU seconds
set 0.002 CPU seconds

iteration
xxrange 0.262 CPU seconds
xrange 0.019 CPU seconds
set 0.031 CPU seconds # iterated sorted list; tested as `for x
in sorted(my_set)`

however, these numbers also show still an incredible and unexpected
ratio in favor of xxrange for containment; they also show that a set
is (with 200'000 integer numbers) 10 times as fast for containment and
are only 1.6 times slower than xxranges for iteration.

this means that an optimized implemtation of x/range should choose
between different implementations depending on size of collection,
memory available, and purpose if they want to achieve better
efficiency. the least of changes could be

class xrange( object ):
def __old_contains__:
...
def __contains__( self ):
if self step != 1:
return self.__old_contains__()
return ( x == int( x ) ) and self.start <= x < self.stop

the `( x == int( x ) )` is not easily being done away with if
emulation of present x/range behavior is desired (x.0 floats are in,
all others are out).

frankly speaking, i would like to say that finding out whether xrange
is a good solution for containment tests will cause some time of
reading or, of course, dedicated profiling; making Python make that
choice might be a better idea.

my guess would be that most xranges are in fact used with step 1, so
even sorting out that single bottleneck would have noticable effect.
now that xrange has taken over range in py3k people will get some
bumps on the transition way anyhow. i for one submit to my fancy
xxranges for the time.

cheers and thanks for the explanations!
 
M

MRAB

~flow said:
the reason is simply the idiomacy and convenience. i use (x)xranges to
implement unicode blocks and similar things. it is natural to write
`if cid in latin_1` and so on. i always assumed it would be about the
fastest and memory-efficient to use xrange for this. i stumbled
against a first wall, tho, when i realized that `xrange` does not
accept long integers. so i wrote the class above to get an xrange-like
object that would accept them. stepping was not of concern---i recall
few times that i was in need of a third argument to xrange at all. i
then added iteration when i needed it; shows that iteration over
xxranges is outperformed by xrange by a factor of over twenty:

class xxrange( object ):
def __iter__( self ):
i = self.start
while i < self.stop:
yield i
i += 1

containment
xxrange 0.027 CPU seconds
xrange 90.365 CPU seconds
set 0.002 CPU seconds

iteration
xxrange 0.262 CPU seconds
xrange 0.019 CPU seconds
set 0.031 CPU seconds # iterated sorted list; tested as `for x
in sorted(my_set)`

however, these numbers also show still an incredible and unexpected
ratio in favor of xxrange for containment; they also show that a set
is (with 200'000 integer numbers) 10 times as fast for containment and
are only 1.6 times slower than xxranges for iteration.

this means that an optimized implemtation of x/range should choose
between different implementations depending on size of collection,
memory available, and purpose if they want to achieve better
efficiency. the least of changes could be

class xrange( object ):
def __old_contains__:
...
def __contains__( self ):
if self step != 1:
return self.__old_contains__()
return ( x == int( x ) ) and self.start <= x < self.stop

the `( x == int( x ) )` is not easily being done away with if
emulation of present x/range behavior is desired (x.0 floats are in,
all others are out).

frankly speaking, i would like to say that finding out whether xrange
is a good solution for containment tests will cause some time of
reading or, of course, dedicated profiling; making Python make that
choice might be a better idea.

my guess would be that most xranges are in fact used with step 1, so
even sorting out that single bottleneck would have noticable effect.
now that xrange has taken over range in py3k people will get some
bumps on the transition way anyhow. i for one submit to my fancy
xxranges for the time.

cheers and thanks for the explanations!
In that case, I suppose that you need some sort of 'rangeset', a
class which is optimised for sets which include ranges of values.
 
S

Steven D'Aprano

the reason is simply the idiomacy and convenience. i use (x)xranges to
implement unicode blocks and similar things. it is natural to write `if
cid in latin_1` and so on.

[soapbox]
Speaking about idiomacy, it is grammatically incorrect to start sentences
in English with lower-case letters, and it is rude because it harms the
reading ability of people reading your posts. If it saves you 0.01ms of
typing time to avoid pressing the shift key, and 100 people reading your
post take 0.01ms more mental processing time to comprehend your writing
because of the lack of a clear sentence break, then the harm you do to
others is 100 times greater than the saving you make for yourself. You're
not e.e. cummings, who was a dick anyway, and as a programmer you're
supposed to understand about precision in language, syntax and grammar.
[end soapbox]

I think testing y in xrange() is a natural thing to do, but as I recall,
it was actually removed from xrange a few years ago to simplify the code.
I thought that was a silly decision, because the code was written and
working and it's not like the xrange object was likely to change, but
what do I know?

i always assumed it would be about the
fastest and memory-efficient to use xrange for this.

If you don't need to iterate over the actual codepoints, the most memory-
efficient would be to just store the start and end positions, as a tuple
or possibly even a slice object, and then call t[0] <= codepoint < t[1].

If you do need to iterate over them, perhaps some variant of this would
suit your needs:

# Untested
class UnicodeBlock(object):
def __init__(self, start, end):
self.start = start
self.end = end
self._current = start
def __contains__(self, value):
if isinstance(value, (int, long)):
return self.start <= value < self.end
def __iter__(self):
return self
def next(self):
if self._current < self.end:
self._current += 1
return self._current
raise StopIterator
def reset(self):
self._current = self.start


[...]
the `( x == int( x ) )` is not easily being done away with if emulation
of present x/range behavior is desired (x.0 floats are in, all others
are out).

x.0 floats working with xrange is an accident, not a deliberate design
decision, and has been deprecated in Python 2.6, which means it will
probably be gone in a few years:
__main__:1: DeprecationWarning: integer argument expected, got float
 
B

bearophileHUGS

Paul McGuire:
xrange is not really intended for "in" testing,<

Let's add the semantic of a good and fast "in" to xrange (and to the
range of Python3). It hurts no one, allows for a natural idiom
(especially when you have a stride you don't want to re-invent the
logic of skipping absent numbers), and it's readable.

My xrange() I have written in D language has a fast OpIn_r() method. I
haven't found any bad side effect to such capability yet.

Bye,
bearophile
 
G

Guest

xrange is not really intended for "in" testing,<

Let's add the semantic of a good and fast "in" to xrange (and to the
range of Python3). It hurts no one, allows for a natural idiom
(especially when you have a stride you don't want to re-invent the
logic of skipping absent numbers), and it's readable.[/QUOTE]

A fast "in" to xrange would be great.
Why was it taken away?
 
F

~flow

[soapbox]
Speaking about idiomacy, ...
[end soapbox]

soapbox]
I ALREADY STEPPED DOWN FROM SOAPBOX (on this topic)
[end soapbox]

thanks for the comment anyhow.

that an efficient `x in y` implementation used to be there and is gone
now is gross. guess i'll just have to live with my own makeshift
implementation. gross.

cheers
 
S

Steven D'Aprano

A fast "in" to xrange would be great. Why was it taken away?


http://docs.python.org/whatsnew/2.2.html#other-changes-and-fixes

"Some features of the object returned by the xrange() function are now
deprecated, and trigger warnings when they’re accessed; they’ll disappear
in Python 2.3. xrange objects tried to pretend they were full sequence
types by supporting slicing, sequence multiplication, and the in
operator, but these features were rarely used and therefore buggy. The
tolist() method and the start, stop, and step attributes are also being
deprecated. At the C level, the fourth argument to the PyRange_New
function, repeat, has also been deprecated."



You may also want to read these:

http://mail.python.org/pipermail/python-3000/2007-August/009061.html
http://mail.python.org/pipermail/patches/2004-August/015501.html
 
A

Arnaud Delobelle

Steven D'Aprano said:
Speaking about idiomacy, it is grammatically incorrect to start sentences
in English with lower-case letters [...] [...]

x.0 floats working with xrange is an accident, not a deliberate design
decision, and has been deprecated in Python 2.6, which means it will
probably be gone in a few years:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top