Slice inconsistency?

  • Thread starter Roberto A. F. De Almeida
  • Start date
R

Roberto A. F. De Almeida

I found that when using negative indices, the slice object passed to
__getitem__ depends on the number of slices. An example to clarify:

class a:
def __getitem__(self, index):
return index
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: a instance has no attribute '__len__'

But if I pass a "multiple" slice:
(slice(None, -1, None), slice(None, -1, None))

If we add the following __len__ to class a:

def __len__(self):
return 42

Then the "single" slice works based on __len__:
slice(0, 41, None)

But not for the "multiple" slice:
(slice(None, -1, None), slice(None, -1, None))

I am having these problems because I want to slice a multi-dimensional
object. When slicing the object, __getitem__ returns a Numeric array
of the desired size:
print data
print data.variables['u'].shape (16, 17, 21)
u = data.variables['u'][:,:,:]
print type(u)
print u.shape (16, 17, 21)
u[0,0,0]
-1728

Is there something wrong in using slices like this in objects? A
better way?
 
T

Terry Reedy

Roberto A. F. De Almeida said:
I found that when using negative indices, the slice object passed to
__getitem__ depends on the number of slices. An example to clarify:

class a:
def __getitem__(self, index):
return index
b = a()
print b[:-1]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: a instance has no attribute '__len__'

But if I pass a "multiple" slice:
(slice(None, -1, None), slice(None, -1, None))

A square-bracketed subscript (index/key) is a *single* object, in this
case a tuple. The contents of the tuple are irrelevant (for this
code). Any tuple will be echoed:
(1, 2, 3)
If we add the following __len__ to class a:

def __len__(self):
return 42

Completely irrelevant for the processing of tuple keys.
Is there something wrong in using slices like this in objects? A
better way?

What you probably want is b[:-1][:-1], etc. Each index must be
separately bracketed to access items in list of lists (etc).

Terry J. Reedy
 
M

Michael Hudson

I found that when using negative indices, the slice object passed to
__getitem__ depends on the number of slices.

It's bit of a pest for me to check things right now, but: are you
using Python 2.3? Things *may* have changed here.

Cheers,
mwh
 
S

Stephen Horne

Roberto A. F. De Almeida said:
I found that when using negative indices, the slice object passed to
__getitem__ depends on the number of slices. An example to clarify:

class a:
def __getitem__(self, index):
return index
b = a()
print b[:-1]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: a instance has no attribute '__len__'

But if I pass a "multiple" slice:
print b[:-1,:-1]
(slice(None, -1, None), slice(None, -1, None))

A square-bracketed subscript (index/key) is a *single* object, in this
case a tuple. The contents of the tuple are irrelevant (for this
code). Any tuple will be echoed:

I didn't see any sign that he wasn't aware of that.

I would have expected, given that the tuple contains slice objects
constructed from the multiple-slice notation, that the same
translations would be performed on the slices that are inserted into
the tuple that are applied when the single slice is created.

That is, whether the single or multiple notation is used, and whether
the slice objects are placed in a tuple or not, they are constructed
from the tuple notation - the translation from notation to slice
object should be done consistently.

Actually, the translation done in the single case worries me - an
extension module I wrote a while ago should not work if this is the
case (as the slice is based on associative keys, not subscripts, so
the negative subscript trick doesn't apply).

I'd better do some testing, I suppose :-(
What you probably want is b[:-1][:-1], etc. Each index must be
separately bracketed to access items in list of lists (etc).

Maybe. Maybe not. I assumed he was doing something similar to what
happens in numeric - they do some rather interesting
slicing-of-multidimensional-container things over there. Of course I'm
not a numeric user, so I may be misunderstanding things.
 
R

Roberto A. F. De Almeida

Stephen Horne said:
I would have expected, given that the tuple contains slice objects
constructed from the multiple-slice notation, that the same
translations would be performed on the slices that are inserted into
the tuple that are applied when the single slice is created.

That is, whether the single or multiple notation is used, and whether
the slice objects are placed in a tuple or not, they are constructed
from the tuple notation - the translation from notation to slice
object should be done consistently.

Yes, I believe I was not very clear. This is the inconsistency I was
talking about. (And this is with Python 2.3.1)
What you probably want is b[:-1][:-1], etc. Each index must be
separately bracketed to access items in list of lists (etc).
Maybe. Maybe not. I assumed he was doing something similar to what
happens in numeric - they do some rather interesting
slicing-of-multidimensional-container things over there. Of course I'm
not a numeric user, so I may be misunderstanding things.

Yes, you're right.

The problem with b[:-1][:-1] is that it will return b[:-1], and then
slice the result again. It's ok if you're working with lists, but not
for what I'm doing.

I'm working in a client to access OPeNDAP servers. It's basically data
served through http, with some metadata and the ability to specify the
variables and ranges you want.

For example, if you have a dataset served at:

http://server/cgi-bin/opendap/dataset.nc

You can download the variable "A" as ASCII with the following URL:

http://server/cgi-bin/opendap/dataset.nc.ascii?A

If you want only part of "A", in binary (xdr) format:

http://server/cgi-bin/opendap/dataset.nc.dods?A[1:10][1:5][6]

What I'm doing is this:
from opendap import client
data = client.dataset("http://server/cgi-bin/opendap/dataset.nc")
print data.variables["A"].shape (20,10,12)
A = data.variables["A"][1:10,1:5,6]

Data is only retrieved when you slice a variable like above, and only
the data sliced is requested from the OPeNDAP server, instead of the
whole range. If I use multiple slices, eg,
A = data.variables["A"][1:10][1:5][6]

then first the client will have to download A[1:10][:][:], and slice
it later -- more data than needed will be retrieved. If instead I pass
a tuple of slices, I can construct the URL

http://server/cgi-bin/opendap/dataset.nc.dods?A[1:10][1:5][6]

and request from the server only the data that will be used, and
return it as a Numeric array.

Roberto
 
S

Stephen Horne

Yes, I believe I was not very clear. This is the inconsistency I was
talking about. (And this is with Python 2.3.1)

Hmmm.

I basically used your earlier print-the-index-to-getitem code (on
Python 2.3) and got...

"""
slice(None, -1, None)
"""

but then I realised I did one thing different compared with your
example - I inherited from 'object', creating a new-style class (which
is getting to be a habit). Doing it without the inheritence from
object, I got...

"""
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: y instance has no attribute '__len__'
"""

which is exactly what you got. Which suggests that the attempt to
translate that '-1' to a positive subscript is an old convenience,
kept for backward compatibility in old-style classes, but dropped for
greater generality in new-style classes.

I can understand the assumed-to-be old approach - slicing isn't
implemented at present on any builtin container except those that use
subscripts (it works on lists but not dictionaries, for example) but
in terms of generality I'm rather glad we can opt out of it by using
new style classes. While hashing doesn't support it, some
mapping-style data structures can reasonably support slicing by key -
and as in your example, even when using subscripts (as in your
example) the length may be unknown in advance, and the container may
not even have a simple scalar length.
Data is only retrieved when you slice a variable like above, and only
the data sliced is requested from the OPeNDAP server, instead of the
whole range. If I use multiple slices, eg,

Makes good sense.
 
P

Peter Otten

Roberto said:
The problem with b[:-1][:-1] is that it will return b[:-1], and then
slice the result again. It's ok if you're working with lists, but not
for what I'm doing.

I'm working in a client to access OPeNDAP servers. It's basically data
served through http, with some metadata and the ability to specify the
variables and ranges you want.
[...]

Data is only retrieved when you slice a variable like above, and only
the data sliced is requested from the OPeNDAP server, instead of the
whole range. If I use multiple slices, eg,
A = data.variables["A"][1:10][1:5][6]

then first the client will have to download A[1:10][:][:], and slice
it later -- more data than needed will be retrieved. If instead I pass
a tuple of slices, I can construct the URL

http://server/cgi-bin/opendap/dataset.nc.dods?A[1:10][1:5][6]

and request from the server only the data that will be used, and
return it as a Numeric array.

Roberto

Have you thought about a proxy class on the client side?
It can defer the actual server access until it has the slice/index
information for all dimensions:

<code>
class Proxy:
def __init__(self, dims, slices=None):
if slices == None:
self.slices = []
else:
self.slices = slices[:]
self.dims = dims

def fetchData(self, index):
return "fetching data: " + str(self.slices + [index])
def __len__(self):
return self.dims[0]
def __getitem__(self, index):
if len(self.dims) == 1:
return self.fetchData(index)
return Proxy(self.dims[1:], self.slices + [index])
def __str__(self):
return "data access deferred"

p = Proxy([10, 42, 9])
print p[:4]
print p[:-1][:-2][:-3]
print p[1][-2][:-3]

q = Proxy([0]*3)
print q[:4]
print q[:-1][:-2][:-3]
print q[1][-2][:-3]
</code>

If I have understood you right, the above should be a workable solution.
If you do not know the nested arrays' lengths beforehand, you could
implement __len__() to always return 0, which saves you from an
AttributeError for negative indices.

Peter
 
T

Terry Reedy

Stephen Horne said:
I didn't see any sign that he wasn't aware of that.

Since you both missed the key points I tried to make, I will try be a
little clearer by being much wordier.

When a sequence is indexed with a slice expression, the interpreter
tries to immediately slice it. From 2.2.1:
def f1(): return c[:-1] ....
dis.dis(f)
0 SET_LINENO 1

3 SET_LINENO 1
6 LOAD_GLOBAL 0 (c)
9 LOAD_CONST 1 (-1)
12 SLICE+2
13 RETURN_VALUE

No intermediate slice object is even constructed. Since stop is
negative, 'SLICE+2' must look up the length to do the subtraction.

When the index is a tuple, the interpreter does not try to slice:
def f2(): return c[:1,:1] ....
dis.dis(f2)
0 SET_LINENO 1
3 SET_LINENO 1
6 LOAD_GLOBAL 0 (c)
9 LOAD_CONST 0 (None)
12 LOAD_CONST 1 (1)
15 BUILD_SLICE 2
18 LOAD_CONST 0 (None)
21 LOAD_CONST 1 (1)
24 BUILD_SLICE 2
27 BUILD_TUPLE 2
30 BINARY_SUBSCR
31 RETURN_VALUE

For BINARY_SUBSCR, the contents of the tuple only matter when the
tuple is hashed. Roberto prevented this with his custom __getitem__()
method. Since slices do not hash, a tuple of slices is normally
useless and would give a TypeError if hashing were attempted.
I would have expected, given that the tuple contains slice objects
constructed from the multiple-slice notation, that the same
translations would be performed on the slices that are inserted into
the tuple that are applied when the single slice is created.

Slice objects and tuples thereof were added for use by Numerical
Python so it could slice multidimensional arrays stored as a single
linear array. The length used for start or stop subtraction is the
length of each dimension, which is typically different for each
dimension. Subtracting the total length is the *wrong* thing to do
for multiple slices. Only custom code can know the correct factors of
that total length to use for each component slice.

In Roberto's example, there is no single slice created. See
disassembly above and comment below.
That is, whether the single or multiple notation is used, and whether
the slice objects are placed in a tuple or not, they are constructed
from the tuple notation

Sorry, wrong. Tuple notation for single slices givea a different
result.
c[:-1] # this is *NOT* tuple notation
Traceback (most recent call last):
<type 'tuple'> (slice(None, -1, None),)

DING! Here is how to get consistency. Add the tuplizing comma.

Terry J. Reedy
 
T

Terry Reedy

Roberto A. F. De Almeida said:
Stephen Horne <$$$$$$$$$$$$$$$$$@$$$$$$$$$$$$$$$$$$$$.co.uk> wrote

Yes, I believe I was not very clear. This is the inconsistency I was
talking about. (And this is with Python 2.3.1)

As I pointed out in response to the same message, there is no
inconsistency because the premise behind that conclusion is wrong. I
also pointed out that you can make the premise true and get the
consistency you expect by adding a comma after single slice
expressions so that you do always use tuple notation.

[me]> > >What you probably want is b[:-1][:-1], etc. Each index must
be
The problem with b[:-1][:-1] is that it will return b[:-1], and then
slice the result again. It's ok if you're working with lists, but not
for what I'm doing. ....
and request from the server only the data that will be used, and
return it as a Numeric array.

Then you should, of course, use N.P.'s extended slicing syntax and I
would expect all to work well. If you are doing something just
similar, then you will have to do similar custom interpretation of
slice tuples.

Terry J. Reedy
 
S

Stephen Horne

As I pointed out in response to the same message, there is no
inconsistency because the premise behind that conclusion is wrong. I
also pointed out that you can make the premise true and get the
consistency you expect by adding a comma after single slice
expressions so that you do always use tuple notation.

"object [slice(...),slice(...)]" is a notation for subscripting with a
tuple of slices. The user has explicitly created slice objects without
using slice notation, and has wrapped them in a tuple, so has no right
to expect any additional behaviour that slice notation might provide.

But Roberto didn't do that.

To me, "object [a:b:c,d:e:f]" doesn't look like a tuple notation. It
looks like a slice notation. 'a:b:c' is not a notation for an object
in itself, and therefore 'a:b:c,d:e:f' is not a notation for a tuple.

The fact that this slicing is implemented by passing a tuple
containing slice objects to __getitem__ is besides the point - each
individual 'a:b:c' should be handled in the same way as if there were
only one 'a:b:c' there.

However, a little experiment surprised me...
.... def __getitem__ (self, idx) :
.... print idx
....
(slice(1, 2, 3),)

The apparant purpose of the notation is to allow multi-dimensional
slicing. I can see how having a single-dimensional slice call that is
implemented consistently with the multi-dimensional slices is
convenient when writing the __getitem__, but the whole point of
libraries is handle complexity so that the caller doesn't have to. Why
does the caller have to deal with two incompatible single-dimensional
slice notations?

Could the two single-dimensional slice notations have usefully
distinct meanings? Do those distinct meanings justify applying the
negative subscript fix in one case but not the other?

Something is going on that seems strange to me. I suspect that these
are signs of a repeated-extension-over-history with
backward-compatibility issue. Maybe Numeric was the first place where
it was recognised that the negative subscription trick is not always
appropriate - but until new-style classes the basic slice notation
could not be made consistent without big back-compatibility issues.

I'm rather glad that with new-style classes things have been made
consistent.
 
S

Stephen Horne

Sorry, wrong. Tuple notation for single slices givea a different
result.

A different implementation, but this is a high level language. The
caller is supposed to worry about the purpose of the syntax, not (as
far as possible) the implementation.

The idea that single slices *should* behave in a different way to
extended slices seem bizarre to me - the kind of thing that arises out
of historic issues rather than out of principle. And if they really
*should* have different meanings, why does this inconsistency
disappear at the first opportunity to remove it without back
compatibility issues (ie in new style classes), for which __getitem__
is called with a slice object (and no -ve index translation) for both
single and extended slices.

This isn't a criticism of Python - it is simply an example of how new
style classes has allowed vestiges from a less flexible past to be
tidied up. Python has grown without becoming a house of cards - that
is a good thing.
 
J

John Roth

Stephen Horne said:
As I pointed out in response to the same message, there is no
inconsistency because the premise behind that conclusion is wrong. I
also pointed out that you can make the premise true and get the
consistency you expect by adding a comma after single slice
expressions so that you do always use tuple notation.

"object [slice(...),slice(...)]" is a notation for subscripting with a
tuple of slices. The user has explicitly created slice objects without
using slice notation, and has wrapped them in a tuple, so has no right
to expect any additional behaviour that slice notation might provide.

But Roberto didn't do that.

To me, "object [a:b:c,d:e:f]" doesn't look like a tuple notation. It
looks like a slice notation. 'a:b:c' is not a notation for an object
in itself, and therefore 'a:b:c,d:e:f' is not a notation for a tuple.

The fact that this slicing is implemented by passing a tuple
containing slice objects to __getitem__ is besides the point - each
individual 'a:b:c' should be handled in the same way as if there were
only one 'a:b:c' there.

However, a little experiment surprised me...
... def __getitem__ (self, idx) :
... print idx
...
(slice(1, 2, 3),)

The apparant purpose of the notation is to allow multi-dimensional
slicing. I can see how having a single-dimensional slice call that is
implemented consistently with the multi-dimensional slices is
convenient when writing the __getitem__, but the whole point of
libraries is handle complexity so that the caller doesn't have to. Why
does the caller have to deal with two incompatible single-dimensional
slice notations?

Could the two single-dimensional slice notations have usefully
distinct meanings? Do those distinct meanings justify applying the
negative subscript fix in one case but not the other?

Something is going on that seems strange to me. I suspect that these
are signs of a repeated-extension-over-history with
backward-compatibility issue. Maybe Numeric was the first place where
it was recognised that the negative subscription trick is not always
appropriate - but until new-style classes the basic slice notation
could not be made consistent without big back-compatibility issues.

I'm rather glad that with new-style classes things have been made
consistent.

I suppose the whole thing was totally accidental, in the sense that
it fell out of the way things work without any explicit design intent
that it work that way.

Python has a number of surprising things that cause trouble in
reasoning about how things work. One of them is that it is
the comma that creates the tuple, not the parenthesis, and the
colon creates the slice object, not the brackets. Likewise,
the interpreter does not distinguish between subscripts
and slices, that's up to the __getitem__() magic
method.

So you're basically passing something to __getitem__()
that nobody thought about, and expecting it to work.
It sounds like something I might like to use some day
myself, but it's not going to suddenly appear by magic.

Submit a PEP.

John Roth
 
S

Stephen Horne

So you're basically passing something to __getitem__()
that nobody thought about, and expecting it to work.
It sounds like something I might like to use some day
myself, but it's not going to suddenly appear by magic.

Submit a PEP.

For what? - new style classes already do what I (and apparently
Roberto) would expect. I don't care if old style classes act in a
backward-compatible rather than idealistic way - that is what old
style classes are for.

All Roberto needs to do is make his class a subclass of object and it
will do exactly what he needs - somebody did think about it and that
PEP has apparently already been submitted and implemented a long time
ago.
 
R

Roberto A. F. De Almeida

Stephen Horne said:
All Roberto needs to do is make his class a subclass of object and it
will do exactly what he needs - somebody did think about it and that
PEP has apparently already been submitted and implemented a long time
ago.

Yes, I think this is the best thing to do. It removes the (apparent or
not) inconsistency, and does what I would expect from the principle of
least surprise. The other solutions, like the Proxy class and Terry's
suggestion of adding a comma to uni-dimensional slices (c[:-1,]) are
interesting, but they are not consistent with Numeric slices. I'll
keep them in mind, though.

Thanks,

Roberto
 
G

Greg Ewing (using news.cis.dfn.de)

Stephen said:
The idea that single slices *should* behave in a different way to
extended slices seem bizarre to me - the kind of thing that arises out
of historic issues rather than out of principle.

And it is indeed a historic issue.

If you want the gory details:

Once upon a time, there was no such thing as a slice object.
Indexing and slicing were treated as completely separate
things, and handled by separate methods at the C level:

seq[x] --> the C-level equivalent of __getitem__(self, x)
seq[x:y] --> the C-level equivalent of __getslice__(self, x, y)

(There were no 3-element slices, either.)

Moreover, the arguments to the C-level __getslice__ method
*had* to be integers, and the interpreter performed the convenience
of interpreting negative indices for you before calling it.

Then Numeric came along, and people wanted to be able to
slice multi-dimensional arrays. So the slice object was
invented, and the parser and interpreter taught to create
them when encountering slice notation, and feed tuples of
them to __getitem__.

But, for backwards compatibility, the special case of a
single 2-element had to be handled specially. If the object
being sliced had (at the C level) a __getslice__ method,
it would be used. If it didn't, a slice object would be
created and passed to __getitem__.

This was fine for C objects, such as Numeric arrays,
which can choose to not provide a __getslice__ method.
But due to the way old-style classes are implemented,
they *always* have a __getslice__ method at the C level,
regardless of whether they have one at the Python level.
So it's impossible to write an old-style Python class that
doesn't get its single two-element slices mangled.

Fortunately, new-style classes are much more like C
objects, and they don't suffer from this problem.
 
T

Terry Reedy

Greg Ewing (using news.cis.dfn.de) said:
If you want the gory details:

Thank you for the nice explanation of why slices currently behave as
they do.

Terry J. Reedy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top