Rationale behind the deprecation of __getslice__?

F

Fernando Perez

Hi all,

I was wondering if someone can help me understand why __getslice__ has been
deprecated, yet it remains necessary to implement it for simple slices (i:j),
while __getitem__ gets called for extended slices (i:j:k).

The problem with this approach, besides a bit of code duplication, is that
classes which implement slicing must now do runtime type-checking inside
__getitem__. Here's a trivial example:


################################################################
import types

class Vector(list):
def __init__(self,arg,wrap=0):
'''
If arg is an integer, make a new zero vector of length n.
Otherwise, arg must be a python sequence or another vector,
and a new (deep) copy is generated.
'''
if isinstance(arg,int):
list.__init__(self,[0.0]*arg)
else:
list.__init__(self,arg)

def __getitem__(self,key):
"""called for single-element OR slice access"""
if type(key) is types.SliceType:
return Vector(list.__getitem__(self,key))
else:
return list.__getitem__(self,key)

def __getslice__(self,i,j):
"""Deprecated since 2.0, but still called for non-extended slices.

Since extended slices are handled by __getitem__, I'm just deferring
to that one so all actual implementation is done there. Why this is
not the default (since they deprecated __getslice__) is beyond me."""

return self.__getitem__(slice(i,j))

print 'a is a vector'
a = Vector(5)
print a
print type(a)
print

print 'b is a slice of a'
b = a[1:3]
print b
print type(b)
print

print 'c is an element of a'
c = a[1]
print c
print type(c)

################################################################

What bugs me is that typecheck for slicing which seems to be necessary inside
of the __getitem__ method. I have the feeling that the code would be much
cleaner if I could simply use __getslice__ for slices and __getitem__ for
items, and that bundling the two in this hybrid mode is reall ugly and
unpythonic. Am I just missing something?

Thanks for any help,

f
 
S

Steven Bethard

Fernando said:
I was wondering if someone can help me understand why __getslice__ has been
deprecated, yet it remains necessary to implement it for simple slices (i:j),
while __getitem__ gets called for extended slices (i:j:k).

I don't think this is true -- everything goes to __getitem__:
.... def __getitem__(self, *args, **kwds):
.... print args, kwds
....
>>> c = C()
>>> c[1] (1,) {}
>>> c[1:2] (slice(1, 2, None),) {}
>>> c[1:2:-1]
(slice(1, 2, -1),) {}

All you have to do is check the type of the single argument to __getitem__:
.... def __getitem__(self, x):
.... if isinstance(x, slice):
.... return x.indices(10)
.... else:
.... return x
....
>>> c = C()
>>> c[1] 1
>>> c[1:2] (1, 2, 1)
>>> c[1:2:-1]
(1, 2, -1)

Steve
 
S

Steven Bethard

Fernando said:
classes which implement slicing must now do runtime type-checking inside
__getitem__.

Just in case you thought that they wouldn't have to do runtime
type-checking otherwise:
.... def __getitem__(self, x):
.... print type(x), x
....
<type 'list'> []

You can put just about anything in a __getitem__ call. Do you really
want a method for each of the variants above?

Steve
 
F

Fernando Perez

Steven said:
Fernando said:
I was wondering if someone can help me understand why __getslice__ has been
deprecated, yet it remains necessary to implement it for simple slices
(i:j), while __getitem__ gets called for extended slices (i:j:k).

I don't think this is true -- everything goes to __getitem__:
... def __getitem__(self, *args, **kwds):
... print args, kwds
...
c = C()
c[1] (1,) {}
c[1:2] (slice(1, 2, None),) {}
c[1:2:-1]
(slice(1, 2, -1),) {}

Not if you subclass builtin types like list:

In [6]: class mylist(list):
...: def __getitem__(self,*args,**kwds):
...: print 'mylist getitem'
...: print args,kwds
...:

In [7]: a=mylist()

In [8]: a[1]
mylist getitem
(1,) {}

In [9]: a[1:2]
Out[9]: []

In [10]: a[1:2:3]
mylist getitem
(slice(1, 2, 3),) {}

I did this testing, which is what forced me to implement __getslice__
separately in my little example, to satisfy calls with simple i:j slices.

Best,

f
 
F

Fernando Perez

Steven said:
Fernando said:
classes which implement slicing must now do runtime type-checking inside
__getitem__.

Just in case you thought that they wouldn't have to do runtime
type-checking otherwise:
... def __getitem__(self, x):
... print type(x), x
...
[...]

You can put just about anything in a __getitem__ call. Do you really
want a method for each of the variants above?

I guess that conceptually it just felt natural to me to keep separate methods
for dealing with a slice (get many elements out) and other types of indexing,
which I tend to think of as 'scalar' indexing.

Regards,

f
 
C

Carl Banks

Fernando said:
I was wondering if someone can help me understand why __getslice__ has been
deprecated, yet it remains necessary to implement it for simple slices (i:j),
while __getitem__ gets called for extended slices (i:j:k).

The problem with this approach, besides a bit of code duplication, is that
classes which implement slicing must now do runtime type-checking inside
__getitem__.

I'm pretty sure it's to support multidimensional array slicing.
Consider an array reference such as a[1,2:5,4,6:9,10:]. Now what do
you do? You have mixed slices and indices. The
__getslice__/__getitem__ paradigm isn't versatile enough to handle this
situation. In that light, I'd say checking for slices is the lesser
evil.

As for why list objects still use getslice--they probably shouldn't.
I'd file a bug report.
 
S

Steven Bethard

Fernando said:
Steven said:
I don't think this is true -- everything goes to __getitem__:
[snip]

Not if you subclass builtin types like list:

Ahh, I didn't catch that your problem was with list. Yeah, so if a
__getslice__ exists, this is used first. Unfortunately, by inheriting
from list, you inherit __getslice__ from list. Another example without
builtin types:
.... def __getitem__(self, x):
.... print "C:getitem"
.... def __getslice__(self, *args):
.... print "C:getslice"
........ def __getitem__(self, x):
.... print "D:getitem"
....
>>> d = D()
>>> d[1] D:getitem
>>> d[1:2] C:getslice
>>> d[1:2:-1]
D:getitem

While the D class doesn't define __getslice__, it's parent class does,
so it has the same behavior you're running into. I don't see how to fix
this other than overriding __getslice__ to call __getitem__ like you have.

Unfortunately, I don't think __getslice__ can be removed from list (and
str and tuple) because of backwards compatibility constraints...

Steve
 
C

Carl Banks

Steven said:
Unfortunately, I don't think __getslice__ can be removed from list (and
str and tuple) because of backwards compatibility constraints...

Wouldn't it work to have __getslice__ call __getitem__? And, since
that would be too much of a performance hit, have it check whether its
type is list (or str or tuple), and only call __getitem__ if it is not
(i.e., only for subclasses). I don't think that would be too bad.

Subclasses would still be free to override __getslice__, but wouldn't
have to.
 
S

Steven Bethard

Fernando said:
I guess that conceptually it just felt natural to me to keep separate methods
for dealing with a slice (get many elements out) and other types of indexing,
which I tend to think of as 'scalar' indexing.

Yeah, I can see that a bit.

Ignoring dicts for the moment (and concerning ourselves only with
"sequences"), you're probably right in thinking that that slice objects
are the second most common thing to get in __getitem__ (second to ints
of course). But there is heavy use of other objects in various other
modules, most notably tuples in numarray:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],

[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
>>> a[0,1,2] 6
>>> a[0,1] array([4, 5, 6, 7])
>>> a[...,3]
array([[ 3, 7, 11],
[15, 19, 23]])
array([12, 16, 20])

Presumably the numarray code has to do quite a bit of type checking to
perform all these slicings right (and I didn't even show you what
happens when you use another array as an "index"). I'm not necessarily
saying that all this type checking is a good thing, but because people
will always find new things that they want to index by, adding
__getxxx__ methods for each of the index types is probably not the right
road to go down...

Steve
 
S

Steven Bethard

Carl said:
As for why list objects still use getslice--they probably shouldn't.
I'd file a bug report.

I'm not convinced this is actually a bug; it works just like the docs
promise:

------------------------------------------------------------
http://docs.python.org/ref/sequence-methods.html
__getslice__( self, i, j)
....
Called to implement evaluation of self[i:j].
....
If no __getslice__() is found, a slice object is created instead, and
passed to __getitem__() instead.
....
For slice operations involving extended slice notation, or in absence of
the slice methods, __getitem__(), __setitem__() or __delitem__() is
called with a slice object as argument.
------------------------------------------------------------

So the docs imply that if __getslice__ exists, it will be used before
trying __getitem__. Since list defines __getslice__, list.__getslice__
will be used before __getitem__ in any class that inherits from list.

This is certainly a wart though. I'd love to see list.__getslice__
removed, leaving only list.__getitem__, but I suspect that this kind of
backwards-incompatible change would be frowned on...

Guess we could file a feature request?

Steve
 
S

Steven Bethard

Carl said:
Wouldn't it work to have __getslice__ call __getitem__? And, since
that would be too much of a performance hit, have it check whether its
type is list (or str or tuple), and only call __getitem__ if it is not
(i.e., only for subclasses). I don't think that would be too bad.

Subclasses would still be free to override __getslice__, but wouldn't
have to.

Yeah, that doesn't seem like it would be too bad. Probably someone
would have to actually run some benchmarks to see what kind of
performance hit you get... But it would definitely solve the OP's
problem...

Steve
 
T

Terry Reedy

http://docs.python.org/ref/sequence-methods.html
__getslice__( self, i, j)
...
Called to implement evaluation of self[i:j].
...
If no __getslice__() is found, a slice object is created instead, and
passed to __getitem__() instead.

The overwhelmingl most common case of a simple slice is more efficiently
done by having a separate function since no slice object is created.
a=[1,2,3]
def f(): return a[0:1] ....
import dis
dis.dis(f)
0 SET_LINENO 1

3 SET_LINENO 1
6 LOAD_GLOBAL 0 (a)
9 LOAD_CONST 1 (0)
12 LOAD_CONST 2 (1)
15 SLICE+3
16 RETURN_VALUE

Terry J. Reedy
 
C

Carl Banks

Terry said:
The overwhelmingl most common case of a simple slice is more efficiently
done by having a separate function since no slice object is created.

Why is __getslice__ deprecated, then? I'm ok with keeping __getslice__
around as an optimization, but if you do that, I don't think it's a
good idea to deprecate it.
 
F

Fernando Perez

Terry said:
If no __getslice__() is found, a slice object is created instead, and
passed to __getitem__() instead.

The overwhelmingl most common case of a simple slice is more efficiently
done by having a separate function since no slice object is created.
a=[1,2,3]
def f(): return a[0:1] ...
import dis
dis.dis(f)

[...]

Very good point. I always forget how useful dis is, thanks.

f
 
F

Fernando Perez

Steven said:
Presumably the numarray code has to do quite a bit of type checking to
perform all these slicings right (and I didn't even show you what
happens when you use another array as an "index"). I'm not necessarily

Yes, I know. I haven't switched to numarray because of the small-array penalty
(which I can't pay in my code), but recently T. Oliphant added array-indexing
to Numeric in Scipy's port. Very nifty.
saying that all this type checking is a good thing, but because people
will always find new things that they want to index by, adding
__getxxx__ methods for each of the index types is probably not the right
road to go down...

Ultimately it's true that since indexing objects for multidimensional arrays
will always come in packaged as a tuple, the __getitem__ method will simply
have to deal with a fair bit of analysis before getting to the meat of
actually returning a result.

Anyway, thanks for the discussion, it clarified a few points.

Regards,

f
 
S

Steven Bethard

Fernando said:
Anyway, thanks for the discussion, it clarified a few points.

Likewise. I hadn't really delved much into the __getslice__ details
until you found that quirk. Always good to have a motivator!

Steve
 
N

Nick Coghlan

Steven said:
Yeah, that doesn't seem like it would be too bad. Probably someone
would have to actually run some benchmarks to see what kind of
performance hit you get... But it would definitely solve the OP's
problem...

It might be better handled at construction time - if the class supplied to
__new__ is a subclass of the builtin type, swap the __getslice__ implementation
for one which delegates to __getitem__.

Cheers,
Nick.
 
N

Nick Coghlan

Steven said:
Presumably the numarray code has to do quite a bit of type checking to
perform all these slicings right (and I didn't even show you what
happens when you use another array as an "index"). I'm not necessarily
saying that all this type checking is a good thing, but because people
will always find new things that they want to index by, adding
__getxxx__ methods for each of the index types is probably not the right
road to go down...

I suspect Steve's hit the nail on the head here - the requirements of the numpy
folks certainly drove the development of extended slicing in the first place (it
wasn't until 2.3 that Python's builtin sequences supported extended slicing at all).

Cheers,
Nick.
 
S

Steven Bethard

Nick said:
It might be better handled at construction time - if the class supplied
to __new__ is a subclass of the builtin type, swap the __getslice__
implementation for one which delegates to __getitem__.

Yeah, that seems like the minimally invasive solution... I looked a bit
at the listobject.c code, but I think the patch for this one is a bit
over my head...

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top