Bug in slice type

B

Bryan Olson

The doc for the find() method of string objects, which is
essentially the same as the string.find() function, states:

find(sub[, start[, end]])
Return the lowest index in the string where substring sub
is found, such that sub is contained in the range [start,
end). Optional arguments start and end are interpreted as
in slice notation. Return -1 if sub is not found.

Consider:

print 'Hello'.find('o')

or:

import string
print string.find('Hello', 'o')

The substring 'o' is found in 'Hello' at the index -1, and at
the index 4, and it is not found at any other index. Both the
locations found are in the range [start, end), and obviously -1
is less than 4, so according to the documentation, find() should
return -1.

What the either of the above actually prints is:

4

which shows yet another bug resulting from Python's handling of
negative indexes. This one is clearly a documentation error, but
the real fix is to cure the wart so that Python's behavior is
consistent enough that we'll be able to describe it correctly.
 
S

Steve Holden

Bryan said:
The doc for the find() method of string objects, which is
essentially the same as the string.find() function, states:

find(sub[, start[, end]])
Return the lowest index in the string where substring sub
is found, such that sub is contained in the range [start,
end). Optional arguments start and end are interpreted as
in slice notation. Return -1 if sub is not found.

Consider:

print 'Hello'.find('o')

or:

import string
print string.find('Hello', 'o')

The substring 'o' is found in 'Hello' at the index -1, and at
the index 4, and it is not found at any other index. Both the
locations found are in the range [start, end), and obviously -1
is less than 4, so according to the documentation, find() should
return -1.

What the either of the above actually prints is:

4

which shows yet another bug resulting from Python's handling of
negative indexes. This one is clearly a documentation error, but
the real fix is to cure the wart so that Python's behavior is
consistent enough that we'll be able to describe it correctly.
Do you just go round looking for trouble?

As far as position reporting goes, it seems pretty clear that find()
will always report positive index values. In a five-character string
then -1 and 4 are effectively equivalent.

What on earth makes you call this a bug? And what are you proposing that
find() should return if the substring isn't found at all? please don't
suggest it should raise an exception, as index() exists to provide that
functionality.

regards
Steve
 
C

Casey Hawthorne

contained in the range [start, end)

Does range(start, end) generate negative integers in Python if start
 
G

Guest

What on earth makes you call this a bug? And what are you proposing that
find() should return if the substring isn't found at all? please don't
suggest it should raise an exception, as index() exists to provide that
functionality.

Returning -1 looks like C-ism for me. It could better return None when none
is found.

index = "Hello".find("z")
if index is not None:
# ...

Now it's too late for it, I know.
 
P

Paul Rubin

Steve Holden said:
As far as position reporting goes, it seems pretty clear that find()
will always report positive index values. In a five-character string
then -1 and 4 are effectively equivalent.

What on earth makes you call this a bug? And what are you proposing
that find() should return if the substring isn't found at all? please
don't suggest it should raise an exception, as index() exists to
provide that functionality.

Bryan is making the case that Python's use of negative subscripts to
measure from the end of sequences is bogus, and that it should be done
some other way instead. I've certainly had bugs in my own programs
related to that "feature".
 
B

Bryan Olson

Steve Holden asked:
> Do you just go round looking for trouble?

In the course of programming, yes, absolutly.
> As far as position reporting goes, it seems pretty clear that find()
> will always report positive index values. In a five-character string
> then -1 and 4 are effectively equivalent.
>
> What on earth makes you call this a bug?

What you just said, versus what the doc says.
> And what are you proposing that
> find() should return if the substring isn't found at all? please don't
> suggest it should raise an exception, as index() exists to provide that
> functionality.

There are a number of good options. A legal index is not one of
them.
 
A

Antoon Pardon

Op 2005-08-25 said:
Steve Holden asked:

In the course of programming, yes, absolutly.


What you just said, versus what the doc says.


There are a number of good options. A legal index is not one of
them.

IMO, with find a number of "features" of python come together.
that create an awkward situation.

1) 0 is a false value, but indexes start at 0 so you can't
return 0 to indicate nothing was found.

2) -1 is returned, which is both a true value and a legal
index.


It probably is too late now, but I always felt, find should
have returned None when the substring isn't found.
 
B

Bryan Olson

Antoon said:
> Bryan Olson schreef:
>
>
> IMO, with find a number of "features" of python come together.
> that create an awkward situation.
>
> 1) 0 is a false value, but indexes start at 0 so you can't
> return 0 to indicate nothing was found.
>
> 2) -1 is returned, which is both a true value and a legal
> index.
>
> It probably is too late now, but I always felt, find should
> have returned None when the substring isn't found.

None is certainly a reasonable candidate. The one-past-the-end
value, len(sequence), would be fine, and follows the preferred
idiom of C/C++. I don't see any elegant way to arrange for
successful finds always to return a true value and unsuccessful
calls to return a false value.

The really broken part is that unsuccessful searches return a
legal index.

My suggestion doesn't change what find() returns, and doesn't
break code. Negative one is a reasonable choice to represent an
unsuccessful search -- provided it is not a legal index. Instead
of changing what find() returns, we should heal the
special-case-when-index-is-negative-in-a-certain-range wart.
 
R

Rick Wotnaz

Steve Holden asked:

In the course of programming, yes, absolutly.


What you just said, versus what the doc says.


There are a number of good options. A legal index is not one of
them.

Practically speaking, what difference would it make? Supposing find
returned None for not-found. How would you use it in your code that
would make it superior to what happens now? In either case you
would have to test for the not-found state before relying on the
index returned, wouldn't you? Or do you have a use that would
eliminate that step?
 
S

Steve Holden

Bryan said:
None is certainly a reasonable candidate. The one-past-the-end
value, len(sequence), would be fine, and follows the preferred
idiom of C/C++. I don't see any elegant way to arrange for
successful finds always to return a true value and unsuccessful
calls to return a false value.

The really broken part is that unsuccessful searches return a
legal index.
We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.
My suggestion doesn't change what find() returns, and doesn't
break code. Negative one is a reasonable choice to represent an
unsuccessful search -- provided it is not a legal index. Instead
of changing what find() returns, we should heal the
special-case-when-index-is-negative-in-a-certain-range wart.
What I don't understand is why you want it to return something that
isn't a legal index. Before using the result you always have to perform
a test to discriminate between the found and not found cases. So I don't
really see why this wart has put such a bug up your ass.

regards
Steve
 
B

Bryan Olson

Steve said:
>> Antoon Pardon wrote:
>> > It probably is too late now, but I always felt, find should
>> > have returned None when the substring isn't found.
>>
>> None is certainly a reasonable candidate. [...]
>> The really broken part is that unsuccessful searches return a
>> legal index.
>>
> We might agree, before further discussion, that this isn't the most
> elegant part of Python's design, and it's down to history that this tiny
> little wart remains.

I don't think my proposal breaks historic Python code, and I
don't think it has the same kind of unfortunate subtle
consequences as the current indexing scheme. You may think the
wart is tiny, but the duct-tape* is available so let's cure it.

[*] http://www.google.com/search?as_q=warts+"duct+tape"

> What I don't understand is why you want it to return something that
> isn't a legal index.

In this case, so that errors are caught as close to their
occurrence as possible. I see no good reason for the following
to happily print 'y'.

s = 'buggy'
print s[s.find('w')]
> Before using the result you always have to perform
> a test to discriminate between the found and not found cases. So I don't
> really see why this wart has put such a bug up your ass.

The bug that got me was what a slice object reports as the
'stop' bound when the step is negative and the slice includes
index 0. Took me hours to figure out why my code was failing.

The double-meaning of -1, as both an exclusive stopping bound
and an alias for the highest valid index, is just plain whacked.
Unfortunately, as negative indexes are currently handled, there
is no it-just-works value that slice could return.
 
R

Reinhold Birkenfeld

Bryan said:
Steve said:
Bryan said:
Antoon Pardon wrote:
It probably is too late now, but I always felt, find should
have returned None when the substring isn't found.

None is certainly a reasonable candidate. [...]
The really broken part is that unsuccessful searches return a
legal index.
We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.

I don't think my proposal breaks historic Python code, and I
don't think it has the same kind of unfortunate subtle
consequences as the current indexing scheme. You may think the
wart is tiny, but the duct-tape* is available so let's cure it.

[*] http://www.google.com/search?as_q=warts+"duct+tape"

Well, nobody stops you from posting this on python-dev and be screamed
at by Guido...

just-kidding-ly
Reinhold
 
T

Terry Reedy

Bryan Olson said:
The double-meaning of -1, as both an exclusive stopping bound
and an alias for the highest valid index, is just plain whacked.

I agree in this sense: the use of any int as an error return is an
unPythonic *nix-Cism, which I believe was copied therefrom. Str.find is
redundant with the Pythonic exception-raising str.index and I think it
should be removed in Py3.

Therefore, I think changing it now is untimely and changing the language
because of it backwards.

Terry J. Reedy
 
P

Paul Rubin

Terry Reedy said:
I agree in this sense: the use of any int as an error return is an
unPythonic *nix-Cism, which I believe was copied therefrom. Str.find is
redundant with the Pythonic exception-raising str.index and I think it
should be removed in Py3.

I like having it available so you don't have to clutter your code with
try/except if the substring isn't there. But it should not return a
valid integer index.
 
T

Terry Reedy

Paul Rubin said:
I like having it available so you don't have to clutter your code with
try/except if the substring isn't there. But it should not return a
valid integer index.

The try/except pattern is a pretty basic part of Python's design. One
could say the same about clutter for *every* function or method that raises
an exception on invalid input. Should more or even all be duplicated? Why
just this one?




Terry J. Reedy
 
T

Torsten Bronger

Hallöchen!

Terry Reedy said:
The try/except pattern is a pretty basic part of Python's design.
One could say the same about clutter for *every* function or
method that raises an exception on invalid input. Should more or
even all be duplicated? Why just this one?

Granted, try/except can be used for deliberate case discrimination
(which may even happen in the standard library in many places),
however, it is only the second most elegant method -- the most
elegant being "if". Where "if" does the job, it should be prefered
in my opinion.

Tschö,
Torsten.
 
P

Paul Rubin

Terry Reedy said:
The try/except pattern is a pretty basic part of Python's design. One
could say the same about clutter for *every* function or method that raises
an exception on invalid input. Should more or even all be duplicated? Why
just this one?

Someone must have thought str.find was worth having, or else it
wouldn't be in the library.
 
R

Raymond Hettinger

Bryan said:
The conclusion is inescapable: Python's handling of negative
subscripts is a wart. Indexing from the high end is too useful
to give up, but it should be specified by the slicing/indexing
operation, not by the value of the index expression.


PPEP (Proposed Python Enhancement Proposal): New-Style Indexing

Instead of:

sequence[start : stop : step]

new-style slicing uses the syntax:

sequence[start ; stop ; step]

<klingon>
Bah!
</klingon>

The pythonic way to handle negative slicing is to use reversed(). The
principle is that the mind more easily handles this in two steps,
specifying the range a forward direction, and then reversing it.

IOW, it is easier to identify the included elements and see the
direction of:

reversed(xrange(1, 20, 2))

than it is for:

xrange(19, -1, -2)

See PEP 322 for discussion and examples:
http://www.python.org/peps/pep-0322.html



Raymond
 
T

Terry Reedy

Paul Rubin said:
Someone must have thought str.find was worth having, or else it
wouldn't be in the library.

Well, Guido no longer thinks it worth having and emphatically agreed that
it should be added to one of the 'To be removed' sections of PEP 3000.

Terry J. Reedy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top