Bug in slice type

B

Bengt Richter

Robert said:
Bryan Olson wrote:

Currently, user-defined classes can implement Python
subscripting and slicing without implementing Python's len()
function. In our proposal, the '$' symbol stands for the
sequence's length, so classes must be able to report their
length in order for $ to work within their slices and
indexes.

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.


Incorrect. Some sane programmers have multiple dimensions they need to
index.

from Numeric import *
A = array([[0, 1], [2, 3], [4, 5]])
A[$-1, $-1]

The result of len(A) has nothing to do with the second $.

I think you have a good observation there, but I'll stand by my
correctness.

My initial post considered re-interpreting tuple arguments, but
I abandoned that alternative after Steven Bethard pointed out
how much code it would break. Modules/classes would remain free
to interpret tuple arguments in any way they wish. I don't think
my proposal breaks any sane existing code.

Going forward, I would advocate that user classes which
implement their own kind of subscripting adopt the '$' syntax,
and interpret it as consistently as possible. For example, they
could respond to __len__() by returning a type that supports the
"Emulating numeric types" methods from the Python Language
Reference 3.3.7, and also allows the class's methods to tell
that it stands for the length of the dimension in question.
(OTTOMH ;-)
Perhaps the slice triple could be extended with a flag indicating
which of the other elements should have $ added to it, and $ would
take meaning from the subarray being indexed, not the whole. E.g.,

arr.[1:$-1, $-5:$-2]

would call arr.__getitem__((slice(1,-1,None,STOP), slice(-5,-2,None,START|STOP))

(Hypothesizing bitmask constants START and STOP)

Regards,
Bengt Richter
 
B

Bengt Richter

Op 2005-08-30 said:
Really it's x[-1]'s behavior that should go, not find/rfind.

I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?
Give it a handy property? E.g.,

table.as_python_list[-1]


Regards,
Bengt Richter
 
A

Antoon Pardon

Op 2005-08-30 said:
Since you are clearly feeling pedantic enough to beat this one to death
with a 2 x 4 please let me substitute "usages" for "types".

But it's not my usage but python's usage.
In the case of a find() result -1 *isn't* a string index, it's a failure
flag. Which is precisely why it should be filtered out of any set of
indexes. once it's been inserted it can no longer be distinguished as a
failure indication.

Which is precisely why it was such a bad choice in the first place.

If I need to write code like this:

var = str.find('.')
if var == -1:
var = None

each time I want to store an index for later use, then surely '-1'
shouldn't have been used here.

I've already admitted that the choice of -1 as a return value wasn't
smart. However you appear to be saying that it's sensible to mix return
values from find() with general-case index values.

I'm saying it should be possible without a problem. It is poor design
to return a legal value as an indication for an error flag.
I'm saying that you
should do so only with caution. The fact that the naiive user will often
not have the wisdom to apply such caution is what makes a change desirable.

I don't think it is naive, if you expect that no legal value will be
returned as an error flag.
What I am trying to say is that this doesn't make sense: if you want to
combine find() results with general-case indexes (i.e. both positive and
negative index values) it behooves you to strip out the -1's before you
do so. Any other behaviour is asking for trouble.

I would say that choosing this particular return value as an error flag
was asking for trouble. My impression is that you are putting more
blame on the programmer which fails to take corrective action, instead
of on the design of find, which makes that corrective action needed
in the first place.
 
A

Antoon Pardon

Op 2005-08-30 said:
Op 2005-08-30 said:
Really it's x[-1]'s behavior that should go, not find/rfind.

I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?
Give it a handy property? E.g.,

table.as_python_list[-1]

Your missing the point, I probably didn't make it clear.

It is not about the possibilty of doing such a thing. It is
about python providing a frame for such things that work
in general without the need of extra properties in 'special'
cases.
 
B

Bengt Richter

Op 2005-08-30 said:
Op 2005-08-30, Terry Reedy schreef <[email protected]>:


Really it's x[-1]'s behavior that should go, not find/rfind.

I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?
Give it a handy property? E.g.,

table.as_python_list[-1]

Your missing the point, I probably didn't make it clear.

It is not about the possibilty of doing such a thing. It is
about python providing a frame for such things that work
in general without the need of extra properties in 'special'
cases.
How about interpreting seq as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)

Regards,
Bengt Richter
 
A

Antoon Pardon

Op 2005-08-31 said:
Op 2005-08-30 said:
Op 2005-08-30, Terry Reedy schreef <[email protected]>:


Really it's x[-1]'s behavior that should go, not find/rfind.

I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?
Give it a handy property? E.g.,

table.as_python_list[-1]

Your missing the point, I probably didn't make it clear.

It is not about the possibilty of doing such a thing. It is
about python providing a frame for such things that work
in general without the need of extra properties in 'special'
cases.
How about interpreting seq as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)


But the question was not about having a consistent interpretation for
-1, but about an easy way to get the last value.

But I like your idea. I just think there should be two differnt ways
to index. maybe use braces in one case.

seq{i} would be pure indexing, that throws exceptions if you
are out of bound

seq would then be seq{i%len(seq)}
 
B

Bryan Olson

Paul said:
> Not every sequence needs __len__; for example, infinite sequences, or
> sequences that implement slicing and subscripts by doing lazy
> evaluation of iterators:
>
> digits_of_pi = memoize(generate_pi_digits()) # 3,1,4,1,5,9,2,...
> print digits_of_pi[5] # computes 6 digits and prints '9'
> print digits_of_pi($-5) # raises exception

Good point. I like the memoize thing, so here is one:


class memoize (object):
""" Build a sequence from an iterable, evaluating as needed.
"""

def __init__(self, iterable):
self.it = iterable
self.known = []

def extend_(self, stop):
while len(self.known) < stop:
self.known.append(self.it.next())

def __getitem__(self, key):
if isinstance(key, (int, long)):
self.extend_(key + 1)
return self.known[key]
elif isinstance(key, slice):
start, stop, step = key.start, key.stop, key.step
stop = start + 1 + (stop - start - 1) // step * step
self.extend_(stop)
return self.known[start : stop : step]
else:
raise TypeError(_type_err_note), "Bad subscript type"
 
K

Kay Schluehr

Bengt said:
How about interpreting seq as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)


Cool, indexing becomes cyclic by default ;)

But maybe it's better to define it explicitely:

seq[!i] = seq[i%len(seq)]

Well, I don't like the latter definition very much because it
introduces special syntax for __getitem__. A better solution may be the
introduction of new syntax and arithmetics for positive and negative
infinite values. Sequencing has to be adapted to handle them.

The semantics follows that creating of limits of divergent sequences:

!0 = lim n
n->infinity

That enables consistent arithmetics:

!0+k = lim n+k -> !0
n->infinity

!0/k = lim n/k -> !0 for k>0,
n->infinity -!0 for k<0
ZeroDevisionError for k==0


etc.

In Python notation:
Traceback (...)
....
UndefinedValueTraceback (...)
....
UndefinedValue
-!0 -!0
range(9)[4:!0] == range(9)[4:] True
range(9)[4:-!0:-1] == range(5)
True

Life can be simpler with unbound limits.

Kay
 
K

Kay Schluehr

Bengt said:
How about interpreting seq as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)


Cool, indexing becomes cyclic by default ;)

But maybe it's better to define it explicitely:

seq[!i] = seq[i%len(seq)]

Well, I don't like the latter definition very much because it
introduces special syntax for __getitem__. A better solution may be the
introduction of new syntax and arithmetics for positive and negative
infinite values. Sequencing has to be adapted to handle them.

The semantics follows that creating of limits of divergent sequences:

!0 = lim n
n->infinity

That enables consistent arithmetics:

!0+k = lim n+k -> !0
n->infinity

!0/k = lim n/k -> !0 for k>0,
n->infinity -!0 for k<0
ZeroDevisionError for k==0


etc.

In Python notation:
Traceback (...)
....
UndefinedValueTraceback (...)
....
UndefinedValue
-!0 -!0
range(9)[4:!0] == range(9)[4:] True
range(9)[4:-!0:-1] == range(5)
True

Life can be simpler with unbound limits.

Kay
 
R

Ron Adam

Antoon said:
Op 2005-08-31 said:
Op 2005-08-30, Bengt Richter schreef <[email protected]>:



Op 2005-08-30, Terry Reedy schreef <[email protected]>:



Really it's x[-1]'s behavior that should go, not find/rfind.

I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name.

I don't think the ability to easily index sequences from the right is
in dispute. Just the fact that negative numbers on their own provide
this functionality.

Because I sometimes find it usefull to have a sequence start and
end at arbitrary indexes, I have written a table class. So I
can have a table that is indexed from e.g. -4 to +6. So how am
I supposed to easily get at that last value?

Give it a handy property? E.g.,

table.as_python_list[-1]

Your missing the point, I probably didn't make it clear.

It is not about the possibilty of doing such a thing. It is
about python providing a frame for such things that work
in general without the need of extra properties in 'special'
cases.

How about interpreting seq as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)



But the question was not about having a consistent interpretation for
-1, but about an easy way to get the last value.

But I like your idea. I just think there should be two differnt ways
to index. maybe use braces in one case.

seq{i} would be pure indexing, that throws exceptions if you
are out of bound

seq would then be seq{i%len(seq)}


The problem with negative index's are that positive index's are zero
based, but negative index's are 1 based. Which leads to a non
symmetrical situations.

Note that you can insert an item before the first item using slices. But
not after the last item without using len(list) or some value larger
than len(list).
>>> a = list('abcde')
>>> a[len(a):len(a)] = ['end']
>>> a
['a', 'b', 'c', 'd', 'e', 'end']
['a', 'b', 'c', 'd', 'e', 'last', 'end'] # Second to last.
>>> a[100:100] = ['final']
>>> a
['a', 'b', 'c', 'd', 'e', 'last', 'end', 'final']


Cheers,
Ron
 
B

Bengt Richter

Bengt said:
How about interpreting seq as an abbreviation of seq[i%len(seq)] ?
That would give a consitent interpretation of seq[-1] and no errors
for any value ;-)


Cool, indexing becomes cyclic by default ;)

But maybe it's better to define it explicitely:

seq[!i] = seq[i%len(seq)]

Well, I don't like the latter definition very much because it
introduces special syntax for __getitem__. A better solution may be the
introduction of new syntax and arithmetics for positive and negative
infinite values. Sequencing has to be adapted to handle them.

The semantics follows that creating of limits of divergent sequences:

!0 = lim n
n->infinity

That enables consistent arithmetics:

!0+k = lim n+k -> !0
n->infinity

!0/k = lim n/k -> !0 for k>0,
n->infinity -!0 for k<0
ZeroDevisionError for k==0


etc.

In Python notation:
Traceback (...)
...
UndefinedValueTraceback (...)
...
UndefinedValue
-!0 -!0
range(9)[4:!0] == range(9)[4:] True
range(9)[4:-!0:-1] == range(5)
True

Interesting, but wouldn't that last line be
>>> range(9)[4:-!0:-1] == range(5)[::-1]
Life can be simpler with unbound limits.
Hm, is "!0" a di-graph symbol for infinity?
What if we get full unicode on our screens? Should
it be rendered with unichr(0x221e) ? And how should
symbols be keyed in? Is there a standard mnemonic
way of using an ascii keyboard, something like typing
Japanese hiragana in some word processing programs?

I'm not sure about '!' since it already has some semantic
ties to negation and factorial and execution (not to mention
exclamation ;-) If !0 means infinity, what does !2 mean?

Just rambling ... ;-)

Regards,
Bengt Richter
 
K

Kay Schluehr

Bengt said:
range(9)[4:-!0:-1] == range(5)
True
Interesting, but wouldn't that last line be
range(9)[4:-!0:-1] == range(5)[::-1]

Ups. Yes of course.
Hm, is "!0" a di-graph symbol for infinity?
What if we get full unicode on our screens? Should
it be rendered with unichr(0x221e) ? And how should
symbols be keyed in? Is there a standard mnemonic
way of using an ascii keyboard, something like typing
Japanese hiragana in some word processing programs?

You can ask questions ;-)
I'm not sure about '!' since it already has some semantic
ties to negation and factorial and execution (not to mention
exclamation ;-) If !0 means infinity, what does !2 mean?

Just rambling ... ;-)

I'm not shure too. Probably Inf as a keyword is a much better choice.
The only std-library module I found that used Inf was Decimal where Inf
has the same meaning. Inf is quick to write ( just one more character
than !0 ) and easy to parse for human readers. Rewriting the above
statements/expressions leads to:
Traceback (...)
....
UndefinedValueTraceback (...)
....
UndefinedValue
-Inf -Inf
range(9)[4:Inf] == range(9)[4:] True
range(9)[4:-Inf:-1] == range(5)[::-1]
True

IMO it's still consice.

Kay
 
S

Stefan Rank

[snipped alot from others about indexing, slicing problems,
and the inadequacy of -1 as Not Found indicator]

The problem with negative index's are that positive index's are zero
based, but negative index's are 1 based. Which leads to a non
symmetrical situations.

Hear, hear.

This is, for me, the root of the problem.

But changing the whole of Python to the (more natural and consistent)
one-based indexing style, for indexing from left and right, is...
difficult.
 
F

Fredrik Lundh

Ron said:
The problem with negative index's are that positive index's are zero
based, but negative index's are 1 based. Which leads to a non
symmetrical situations.

indices point to the "gap" between items, not to the items themselves.

positive indices start from the left end, negative indices from the righept end.

straight indexing returns the item just to the right of the given gap (this is
what gives you the perceived assymmetry), slices return all items between
the given gaps.

</F>
 
T

Terry Reedy

Fredrik Lundh said:
[slice] indices point to the "gap" between items, not to the items
themselves.

positive indices start from the left end, negative indices from the
righept end.

straight indexing returns the item just to the right of the given gap
(this is
what gives you the perceived assymmetry), slices return all items between
the given gaps.

Well said. In some languages, straight indexing returns the item to the
left instead. The different between items and gaps in seen in old
terminals and older screens versus modern gui screens. Then, a cursur sat
on top of or under a character space. Now, a cursur sits between chars.

Terry J. Reedy
 
T

Terry Reedy

Stefan Rank said:
Hear, hear.

This is, for me, the root of the problem.

The root of the problem is the misunderstanding of slice indexes and the
symmetry-breaking desire to denote an interval of length 1 by 1 number
instead of 2. Someday, I may look at the tutorial to see if I can suggest
improvements. In the meanwhile, see Fredrik's reply and my supplement
thereto and the additional explanation below.
But changing the whole of Python to the (more natural and consistent)
one-based indexing style, for indexing from left and right, is...
difficult.

Consider a mathematical axis

|_|_|_|_|...
0 1 2 3 4

The numbers represent points separating unit intervals and representing the
total count of intervals from the left. Count 'up' to the right is
standard practice. Intervals of length n are denoted by 2 numbers, a:b,
where b-a = n.

Now consider the 'tick marks' to be gui cursor positions. Characters go in
the spaces *between* the cursor. (Fixed versus variable space
representations are irrelevant here.) More generally, one can put 'items'
or 'item indicators' in the spaces to form general sequences rather than
just char strings.

It seems convenient to indicate a single char or item with a single number
instead of two. We could use the average coordinate, n.5. But that is a
nuisance, and the one number representation is about convenience, so we
round down or up, depending on the language. Each choice has pluses and
minuses; Python rounds down.

The axis above and Python iterables are potentially unbounded. But actual
strings and sequences are finite and have a right end also. Python then
gives the option of counting 'down' from that right end and makes the count
negative, as is standard. (But it does not make the string/sequence
circular).

One can devise slight different sequence models, but the above is the one
used by Python. It is consistent and not buggy once understood. I hope
this clears some of the confusion seen in this thread.

Terry J. Reedy
 
R

Ron Adam

Fredrik said:
Ron Adam wrote:




indices point to the "gap" between items, not to the items themselves.

So how do I express a -0? Which should point to the gap after the last
item.

straight indexing returns the item just to the right of the given gap (this is
what gives you the perceived assymmetry), slices return all items between
the given gaps.


If this were symmetrical, then positive index's would return the value
to the right and negative index's would return the value to the left.

Have you looked at negative steps? They also are not symmetrical.

All of the following get the center 'd' from the string.

a = 'abcdefg'
print a[3] # d 4 gaps from beginning
print a[-4] # d 5 gaps from end
print a[3:4] # d
print a[-4:-3] # d
print a[-4:4] # d
print a[3:-3] # d
print a[3:2:-1] # d These are symetric?!
print a[-4:-5:-1] # d
print a[3:-5:-1] # d
print a[-4:2:-1] # d

This is why it confuses so many people. It's a shame too, because slice
objects could be so much more useful for indirectly accessing list
ranges. But I think this needs to be fixed first.

Cheers,
Ron
 
T

Terry Reedy

Ron Adam said:
So how do I express a -0?

You just did ;-) but I probably do not know what you mean.
Which should point to the gap after the last item.

The slice index of the gap after the last item is len(seq).
If this were symmetrical, then positive index's would return the value
to the right and negative index's would return the value to the left.

As I posted before (but perhaps it arrived after you sent this), one number
indexing rounds down, introducing a slight asymmetry.
Have you looked at negative steps? They also are not symmetrical.
???

All of the following get the center 'd' from the string.

a = 'abcdefg'
print a[3] # d 4 gaps from beginning
print a[-4] # d 5 gaps from end

It is 3 and 4 gaps *from* the left and right end to the left side of the
'd'. You can also see the asymmetry as coming from rounding 3.5 and -3.5
down to 3 and down to -4.
print a[3:4] # d
print a[-4:-3] # d

These are is symmetric, as we claimed.
print a[-4:4] # d

Here you count down past and up past the d.
print a[3:-3] # d

Here you count up to and down to the d. The count is one more when you
cross the d than when you do not. You do different actions, you get
different counts. I would not recommend mixing up and down counting to a
beginner, and not down and up counting to anyone who did not absolutely
have to.
print a[3:2:-1] # d These are symetric?!
print a[-4:-5:-1] # d
print a[3:-5:-1] # d
print a[-4:2:-1] # d

The pattern seems to be: left-gap-index : farther-to-left-index : -1 is
somehow equivalent to left:right, but I never paid much attention to
strides and don't know the full rule.

Stride slices are really a different subject from two-gap slicing. They
were introduced in the early years of Python specificly and only for
Numerical Python. The rules were those needed specificly for Numerical
Python arrays. They was made valid for general sequence use only a few
years ago. I would say that they are only for careful mid-level to expert
use by those who actually need them for their code.

Terry J. Reedy
 
P

Paul Rubin

Ron Adam said:
All of the following get the center 'd' from the string.

a = 'abcdefg'
print a[3] # d 4 gaps from beginning
print a[-4] # d 5 gaps from end
print a[3:4] # d
print a[-4:-3] # d
print a[-4:4] # d
print a[3:-3] # d
print a[3:2:-1] # d These are symetric?!
print a[-4:-5:-1] # d
print a[3:-5:-1] # d
print a[-4:2:-1] # d

+1 QOTW
 
F

Fredrik Lundh

Ron said:
So how do I express a -0? Which should point to the gap after the last
item.

that item doesn't exist when you're doing plain indexing, so being able
to express -0 would be pointless.

when you're doing slicing, you express it by leaving the value out, or by
using len(seq) or (in recent versions) None.
If this were symmetrical, then positive index's would return the value
to the right and negative index's would return the value to the left.

the gap addressing is symmetrical, but indexing always picks the item to
the right.
Have you looked at negative steps? They also are not symmetrical.
print a[3:2:-1] # d These are symetric?!

the gap addressing works as before, but to understand exactly what characters
you'll get, you have to realize that the slice is really a gap index generator. when
you use step=1, you can view slice as a "cut here and cut there, and return what's
in between". for other step sizes, you have to think in gap indexes (for which the
plain indexing rules apply).

and if you know range(), you already know how the indexes are generated for
various step sizes.

from the range documentation:

... returns a list of plain integers [start, start + step, start + 2 * step, ...].
If step is positive, the last element is the largest start + i * step less than
stop; if step is negative, the last element is the largest start + i * step
greater than stop.

or, in sequence terms (see http://docs.python.org/lib/typesseq.html )

(3) If i or j is negative, the index is relative to the end of the string: len(s) + i
or len(s) + j is substituted.

...

(5) The slice of s from i to j with step k is defined as the sequence of items
with index x = i + n*k for n in the range(0,(j-i)/k). In other words, the
indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached
(but never including j).

so in this case, you get
2 # which is your stop condition

so a[3:2:-1] is the same as a[3].
print a[-4:-5:-1] # d

same as a[-4]
print a[3:-5:-1] # d

now you're mixing addressing modes, which is a great way to confuse
yourself. if you normalize the gap indexes (rule 3 above), you'll get
a[3:2:-1] which is the same as your earlier example. you can use the
"indices" method to let Python do this for you:
[3]

print a[-4:2:-1] # d

same problem here; indices will tell you what that really means:
[3]

same example again, in other words. and same result.
This is why it confuses so many people. It's a shame too, because slice
objects could be so much more useful for indirectly accessing list
ranges. But I think this needs to be fixed first.

as everything else in Python, if you use the wrong mental model, things
may look "assymmetrical" or "confusing" or "inconsistent". if you look at
how things really work, it's usually extremely simple and more often than
not internally consistent (since the designers have the "big picture", and
knows what they're tried to be consistent with; when slice steps were
added, existing slicing rules and range() were the obvious references).

it's of course pretty common that people who didn't read the documentation
very carefully and therefore adopted the wrong model will insist that Python
uses a buggy implementation of their model, rather than a perfectly consistent
implementation of the actual model. slices with non-standard step sizes are
obviously one such thing, immutable/mutable objects and the exact behaviour
of for-else, while-else, and try-else are others. as usual, being able to reset
your brain is the only thing that helps.

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,266
Latest member
DavidaAlla

Latest Threads

Top