Bug in slice type

B

Bryan Olson

The Python slice type has one method 'indices', and reportedly:

This method takes a single integer argument /length/ and
computes information about the extended slice that the slice
object would describe if applied to a sequence of length
items. It returns a tuple of three integers; respectively
these are the /start/ and /stop/ indices and the /step/ or
stride length of the slice. Missing or out-of-bounds indices
are handled in a manner consistent with regular slices.

http://docs.python.org/ref/types.html


It behaves incorrectly when step is negative and the slice
includes the 0 index.


class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]


print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]


The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python interprets
negative numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.
 
S

Steven Bethard

Bryan said:
class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]


print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]

The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python interprets
negative numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.

I suspect there's a reason that it's done this way, but I agree with you
that this seems strange. Have you filed a bug report on Sourceforge?

BTW, a simpler example of the same phenomenon is:

py> range(10)[slice(None, None, -2)]
[9, 7, 5, 3, 1]
py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]

STeVe
 
B

Bryan Olson

Steven said:
> I suspect there's a reason that it's done this way, but I agree with you
> that this seems strange. Have you filed a bug report on Sourceforge?

I gather that the slice class is young, so my guess is bug. I
filed the report -- my first Sourceforge bug report.
> BTW, a simpler example of the same phenomenon is:
>
> py> range(10)[slice(None, None, -2)]
> [9, 7, 5, 3, 1]
> py> slice(None, None, -2).indices(10)
> (9, -1, -2)
> py> range(10)[9:-1:-2]
> []

Ah, thanks.
 
J

John Machin

Steven said:
Bryan said:
class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]


print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]

The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python interprets
negative numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.


I suspect there's a reason that it's done this way, but I agree with you
that this seems strange. Have you filed a bug report on Sourceforge?

BTW, a simpler example of the same phenomenon is:

py> range(10)[slice(None, None, -2)]
[9, 7, 5, 3, 1]
py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]
>>> rt = range(10)
>>> rt[slice(None, None, -2)] [9, 7, 5, 3, 1]
>>> rt[::-2] [9, 7, 5, 3, 1]
>>> slice(None, None, -2).indices(10) (9, -1, -2)
>>> [rt[x] for x in range(9, -1, -2)] [9, 7, 5, 3, 1]
>>>

Looks good to me. indices has returned a usable (start, stop, step).
Maybe the docs need expanding.
 
B

Bryan Olson

John said:
> Steven Bethard wrote: [...]
>> BTW, a simpler example of the same phenomenon is:
>>
>> py> range(10)[slice(None, None, -2)]
>> [9, 7, 5, 3, 1]
>> py> slice(None, None, -2).indices(10)
>> (9, -1, -2)
>> py> range(10)[9:-1:-2]
>> []
>> >
> >>> rt = range(10)
> >>> rt[slice(None, None, -2)] > [9, 7, 5, 3, 1]
> >>> rt[::-2] > [9, 7, 5, 3, 1]
> >>> slice(None, None, -2).indices(10) > (9, -1, -2)
> >>> [rt[x] for x in range(9, -1, -2)] > [9, 7, 5, 3, 1]
> >>>
>
> Looks good to me. indices has returned a usable (start, stop, step).
> Maybe the docs need expanding.

But not a usable [start: stop: step], which is what 'slice' is
all about.
 
M

Michael Hudson

Bryan Olson said:
The Python slice type has one method 'indices', and reportedly:

This method takes a single integer argument /length/ and
computes information about the extended slice that the slice
object would describe if applied to a sequence of length
items. It returns a tuple of three integers; respectively
these are the /start/ and /stop/ indices and the /step/ or
stride length of the slice. Missing or out-of-bounds indices
are handled in a manner consistent with regular slices.

http://docs.python.org/ref/types.html


It behaves incorrectly

In some sense; it certainly does what I intended it to do.
when step is negative and the slice includes the 0 index.


class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]

But if that's what you want to do with the slice object, just write

start, stop, step = key.start, key.stop, key.step
return self.sequence[start : stop : step]

or even

return self.sequence[key]

What the values returned from indices are for is to pass to the
range() function, more or less. They're not intended to be
interpreted in the way things passed to __getitem__ are.

(Well, _actually_ the main motivation for writing .indices() was to
use it in unittests...)
print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]


The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound,

It is, when you're doing arithmetic, which is what the client code to
PySlice_GetIndicesEx() which in turn is what indices() is a thin
wrapper of, does
but when using it to actually slice, Python interprets negative
numbers as an offset from the high end of the sequence.

Good start-stop-step values are (9, None, -2), or (9, -11, -2),
or (-1, -11, -2). The later two have the advantage of being
consistend with the documented behavior of returning three
integers.

I'm not going to change the behaviour. The docs probably aren't
especially clear, though.

Cheers,
mwh
 
B

bryanjugglercryptographer

Michael said:
Bryan Olson writes:
In some sense; it certainly does what I intended it to do.
[...]
I'm not going to change the behaviour. The docs probably aren't
especially clear, though.

The docs and the behavior contradict:

[...] these are the /start/ and /stop/ indices and the
/step/ or stride length of the slice [emphasis added].


I'm fine with your favored behavior. What do we do next to get
the doc fixed?
 
M

Michael Hudson

Michael said:
Bryan Olson writes:
In some sense; it certainly does what I intended it to do.
[...]
I'm not going to change the behaviour. The docs probably aren't
especially clear, though.

The docs and the behavior contradict:

[...] these are the /start/ and /stop/ indices and the
/step/ or stride length of the slice [emphasis added].


I'm fine with your favored behavior. What do we do next to get
the doc fixed?

I guess one of us comes up with some less misleading words. It's not
totally obvious to me what to do, seeing as the returned values *are*
indices is a sense, just not the sense in which they are used in
Python. Any ideas?

Cheers,
mwh
 
S

Steven Bethard

Michael said:
I guess one of us comes up with some less misleading words. It's not
totally obvious to me what to do, seeing as the returned values *are*
indices is a sense, just not the sense in which they are used in
Python. Any ideas?

Maybe you could replace:

"these are the start and stop indices and the step or stride length of
the slice"

with

"these are start, stop and step values suitable for passing to range or
xrange"


I wanted to say something about what happens with a negative stride, to
indicate that it produces (9, -1, -2) instead of (-1, -11, -2), but I
wasn't able to navigate the Python documentation well enough.

Looking at the Language Reference section on the slice type[1] (section
3.2), I find that "Missing or out-of-bounds indices are handled in a
manner consistent with regular slices." So I looked for the
documentation of "regular slices". My best guess was that this meant
looking at the Language Reference on slicings[2]. But all I could find
in this documentation about the "stride" argument was:

"The conversion of a proper slice is a slice object (see section 3.2)
whose start, stop and step attributes are the values of the expressions
given as lower bound, upper bound and stride, respectively, substituting
None for missing expressions."

This feels circular to me. Can someone help me find where the semantics
of a negative stride index is defined?


Steve

[1] http://docs.python.org/ref/types.html
[2] http://docs.python.org/ref/slicings.html
 
S

Steven Bethard

I said:
I wanted to say something about what happens with a negative stride, to
indicate that it produces (9, -1, -2) instead of (-1, -11, -2), but I
wasn't able to navigate the Python documentation well enough.

Looking at the Language Reference section on the slice type[1] (section
3.2), I find that "Missing or out-of-bounds indices are handled in a
manner consistent with regular slices." So I looked for the
documentation of "regular slices". My best guess was that this meant
looking at the Language Reference on slicings[2]. But all I could find
in this documentation about the "stride" argument was:

"The conversion of a proper slice is a slice object (see section 3.2)
whose start, stop and step attributes are the values of the expressions
given as lower bound, upper bound and stride, respectively, substituting
None for missing expressions."

This feels circular to me. Can someone help me find where the semantics
of a negative stride index is defined?

Well, I couldn't find where the general semantics of a negative stride
index are defined, but for sequences at least[1]:

"The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k."

This seems to contradict list behavior though.
range(10)[9:-1:-2] == []
But the values of n that satisfy
0 <= n < (-1 - 9)/-2 = -10/-2 = 5
are 0, 1, 2, 3, 4, corresponding to the x values of 9, 7, 5, 3, 1. But
[range(10)[x] for x in [9, 7, 5, 3, 1]] == [9, 7, 5, 3, 1]
Does this mean that there's a bug in the list object?

STeVe

[1] http://docs.python.org/lib/typesseq.html
 
B

Bryan Olson

Steven said:
> Well, I couldn't find where the general semantics of a negative stride
> index are defined, but for sequences at least[1]:
>
> "The slice of s from i to j with step k is defined as the sequence of
> items with index x = i + n*k such that 0 <= n < (j-i)/k."
>
> This seems to contradict list behavior though. [...]

The conclusion is inescapable: Python's handling of negative
subscripts is a wart. Indexing from the high end is too useful
to give up, but it should be specified by the slicing/indexing
operation, not by the value of the index expression.


PPEP (Proposed Python Enhancement Proposal): New-Style Indexing

Instead of:

sequence[start : stop : step]

new-style slicing uses the syntax:

sequence[start ; stop ; step]

It works like current slicing, except that negative start or
stop values do not trigger from-the-high-end interpretation.
Omissions and None work the same as in old-style slicing.

Within the square-brackets, the '$' symbol stands for the length
of the sequence. One can index from the high end by subtracting
the index from '$'. Instead of:

seq[3 : -4]

we write:

seq[3 ; $ - 4]

When square-brackets appear within other square-brackets, the
inner-most bracket-pair determines which sequence '$' describes.
(Perhaps '$$' should be the length of the next containing
bracket pair, and '$$$' the next-out and...?)

So far, I don't think the proposal breaks anything; let's keep
it that way. The next bit is tricky...

Obviously '$' should also work in simple (non-slice) indexing.
Instead of:

seq[-2]

we write:

seq[$ - 2]

So really seq[-2] should be out-of-bounds. Alas, that would
break way too much code. For now, simple indexing with a
negative subscript (and no '$') should continue to index from
the high end, as a deprecated feature. The presence of '$'
always indicates new-style slicing, so a programmer who needs a
negative index to trigger a range error can write:

seq[($ - $) + index]



An Alternative Variant:

Suppose instead of using semicolons as the PPEP proposes, we use
commas, as in:

sequence[start, stop, step]

Commas are already in use to form tuples, and we let them do
just that. A slice is a subscript that is a tuple (or perhaps we
should allow any sequence). We could just as well write:

index_tuple = (start, stop, step)
sequence[index_tuple]

This variant *reduces* the number and complexity of rules that
define Python semantics. There is no special interpretation of
the comma, and no need for a distinct slice type.

The '$' character works as in the PPEP above. It is undefined
outside square brackets, but that makes no real difference; the
programmer can use len(sequence).

This variant might break some tricky code.
 
S

Steven Bethard

Bryan said:
Steven said:
Well, I couldn't find where the general semantics of a negative stride
index are defined, but for sequences at least[1]:

"The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k."

This seems to contradict list behavior though. [...]

The conclusion is inescapable: Python's handling of negative
subscripts is a wart.

I'm not sure I'd go that far. Note that my confusion above was the
order of combination of points (3) and (5) on the page quoted above[1].
I think the problem is not the subscript handling so much as the
documentation thereof. I posted a message about this [2], and a
documentation patch based on that message [3].


[1] http://docs.python.org/lib/typesseq.html
[2] http://mail.python.org/pipermail/python-list/2005-August/295260.html
[3] http://www.python.org/sf/1265100

Suppose instead of using semicolons as the PPEP proposes, we use
commas, as in:

sequence[start, stop, step]

This definitely won't work. This is already valid syntax, and is used
heavily by the numarray/numeric folks.

STeVe
 
K

Kay Schluehr

Steven said:
"The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k."

This seems to contradict list behavior though.
range(10)[9:-1:-2] == []

No, both is correct. But we don't have to interpret the second slice
argument m as the limit j of the above definition. For positive values
of m the identity
m==j holds. For negative values of m we have j = max(0,i+m). This is
consistent with the convenient negative indexing:

If we remember how -1 is interpreted as an index not as some limit the
behaviour makes perfect sense.

Kay
 
K

Kay Schluehr

Bryan said:
Steven said:
Well, I couldn't find where the general semantics of a negative stride
index are defined, but for sequences at least[1]:

"The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k."

This seems to contradict list behavior though. [...]

The conclusion is inescapable: Python's handling of negative
subscripts is a wart. Indexing from the high end is too useful
to give up, but it should be specified by the slicing/indexing
operation, not by the value of the index expression.

It is a Python gotcha, but the identity X[-1] == X[len(X)-1] holds and
is very usefull IMO. If you want to slice to the bottom, take 0 as
bottom value. The docs have to be extended in this respect.

Kay
 
P

Paul Rubin

Bryan Olson said:
seq[3 : -4]

we write:

seq[3 ; $ - 4]
+1

When square-brackets appear within other square-brackets, the
inner-most bracket-pair determines which sequence '$' describes.
(Perhaps '$$' should be the length of the next containing
bracket pair, and '$$$' the next-out and...?)

Not sure. $1 said:
So really seq[-2] should be out-of-bounds. Alas, that would
break way too much code. For now, simple indexing with a
negative subscript (and no '$') should continue to index from
the high end, as a deprecated feature. The presence of '$'
always indicates new-style slicing, so a programmer who needs a
negative index to trigger a range error can write:

seq[($ - $) + index]
+1

Commas are already in use to form tuples, and we let them do
just that. A slice is a subscript that is a tuple (or perhaps we
should allow any sequence). We could just as well write:

index_tuple = (start, stop, step)
sequence[index_tuple]

Hmm, tuples are hashable and are already valid indices to mapping
objects like dictionaries. Having slices means an object can
implement both the mapping and sequence interfaces. Whether that's
worth caring about, I don't know.
 
B

Bryan Olson

Paul said:
> Bryan Olson writes:
>
>> seq[3 : -4]
>>
>>we write:
>>
>> seq[3 ; $ - 4]
>
>
> +1

I think you're wrong about the "+1". I defined '$' to stand for
the length of the sequence (not the address of the last
element).

>
> Not sure. $1, $2, etc. might be better, or $<tag> like in regexps, etc.

Sounds reasonable.


[...]
> Hmm, tuples are hashable and are already valid indices to mapping
> objects like dictionaries. Having slices means an object can
> implement both the mapping and sequence interfaces. Whether that's
> worth caring about, I don't know.

Yeah, I thought that alternative might break peoples code, and
it turns out it does.
 
B

Bryan Olson

Kay said:
> Bryan Olson wrote:
>
>> > Well, I couldn't find where the general semantics of a negative stride
>> > index are defined, but for sequences at least[1]:
>> >
>> > "The slice of s from i to j with step k is defined as the sequence of
>> > items with index x = i + n*k such that 0 <= n < (j-i)/k."
>> >
>> > This seems to contradict list behavior though. [...]
>>
>>The conclusion is inescapable: Python's handling of negative
>>subscripts is a wart. Indexing from the high end is too useful
>>to give up, but it should be specified by the slicing/indexing
>>operation, not by the value of the index expression.
>
>
> It is a Python gotcha, but the identity X[-1] == X[len(X)-1] holds and
> is very usefull IMO.

No question index-from-the-far-end is useful, but I think
special-casing some otherwise-out-of-bounds indexes is a
mistake.

Are there any cases in popular Python code where my proposal
would not allow as elegant a solution?
> If you want to slice to the bottom, take 0 as
> bottom value. The docs have to be extended in this respect.

I'm not sure what you mean. Slicing with a negative step and a
stop value of zero will not reach the bottom (unless the
sequence is empty). In general, Python uses inclusive beginning
bounds and exclusive ending bounds. (The rule is frequently
stated incorrectly as "inclusive lower bounds and exclusive
upper bounds," which fails to consider negative increments.)
 
B

Bryan Olson

Kay said:
>>"The slice of s from i to j with step k is defined as the sequence of
>>items with index x = i + n*k such that 0 <= n < (j-i)/k."
>>
>>This seems to contradict list behavior though.
>> range(10)[9:-1:-2] == []
>
>
> No, both is correct. But we don't have to interpret the second slice
> argument m as the limit j of the above definition.

Even if "we don't have to," it sure reads like we should.

> For positive values
> of m the identity
> m==j holds. For negative values of m we have j = max(0,i+m).

First, the definition from the doc is still ambiguous: Is the
division in

0 <= n < (j-i)/k

real division, or is it Python integer (truncating) division? It
matters.

Second, the rule Kay Schluehr states is wrong for either type
of division. Look at:

range(5)[4 : -6 : -2]

Since Python is so programmer-friendly, I wrote some code to
make the "look at" task easy:



slice_definition = """"
The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k.
"""

Kay_Schluehr_rule = """
For positive values of m the identity m==j holds. For negative values
of m we have j = max(0,i+m).
"""

def m_to_j(i, m):
""" Compute slice_definition's 'j' according to Kay_Schluehr_rule
when the slice of sequence is specified as,
sequence[i : m : k].
"""
if m > 0:
j = m
else:
j = max(0, i + m)
return j

def extract_slice(sequence, i, m, k, div_type='i'):
""" Apply the slice definition with Kay Schluehr's rule to find
what the slice should be. Pass div_type of 'i' to use integer
division, or 'f' for float (~real) division, in the
slice_definition expression,
(j-i)/k.
"""
j = m_to_j(i, m)
result = []
n = 0
if div_type == 'i':
end_bound = (j - i) / k
else:
assert div_type == 'f', "div_type must be 'i' or 'f'."
end_bound = float(j - i) / k
while n < end_bound:
result.append(sequence[i + n * k])
n += 1
return result

def show(sequence, i, m, k):
""" Print what happens, both actually and according to stated rules.
"""
print "Checking: %s[%d : %d : %d]" % (sequence, i, m, k)
print "actual :", sequence[i : m : k]
print "Kay's rule, int division :", extract_slice(sequence, i, m, k)
print "Kay's rule, real division:", extract_slice(sequence, i, m,
k, 'f')
print



show(range(5), 4, -6, -2)
 
B

Bryan Olson

Steven said:
> Bryan Olson wrote:
>
>> > Well, I couldn't find where the general semantics of a negative stride
>> > index are defined, but for sequences at least[1]:
>> >
>> > "The slice of s from i to j with step k is defined as the sequence of
>> > items with index x = i + n*k such that 0 <= n < (j-i)/k."
>> >
>> > This seems to contradict list behavior though. [...]
>>
>> The conclusion is inescapable: Python's handling of negative
>> subscripts is a wart.
>
>
> I'm not sure I'd go that far. Note that my confusion above was the
> order of combination of points (3) and (5) on the page quoted above[1].
> I think the problem is not the subscript handling so much as the
> documentation thereof.

Any bug can be pseudo-fixed by changing the documentation to
conform to the behavior. Here, the doc clearly went wrong by
expecting Python's behavior to follow from a few consistent
rules. The special-case handling of negative indexes looks
handy, but raises more difficulties than people realized.

I believe my PPEP avoids the proliferation of special cases. The
one additional issue I've discovered is that user-defined types
that are to support __getitem__ and/or __setitem__ *must* also
implement __len__. Sensible sequence types already do, so I
don't think it's much of an issue.

> This is already valid syntax, and is used
> heavily by the numarray/numeric folks.

Yeah, I thought that variant might break some code. I didn't
know it would be that much. Forget that variant.
 
R

Robert Kern

Bryan said:
Paul said:
Bryan said:
seq[3 : -4]

we write:

seq[3 ; $ - 4]

+1

I think you're wrong about the "+1". I defined '$' to stand for
the length of the sequence (not the address of the last
element).

By "+1" he means, "I like it." He's not correcting you.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top