Bug in slice type

Steve Holden · Aug 27, 2005

Bryan said:
Steve said:

Bryan said:

Antoon Pardon wrote:
It probably is too late now, but I always felt, find should
have returned None when the substring isn't found.

None is certainly a reasonable candidate. [...]
The really broken part is that unsuccessful searches return a
legal index.

Click to expand...

We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.

Click to expand...

I don't think my proposal breaks historic Python code, and I
don't think it has the same kind of unfortunate subtle
consequences as the current indexing scheme. You may think the
wart is tiny, but the duct-tape* is available so let's cure it.

[*] http://www.google.com/search?as_q=warts+"duct+tape"

What I don't understand is why you want it to return something that
isn't a legal index.

Click to expand...

In this case, so that errors are caught as close to their
occurrence as possible. I see no good reason for the following
to happily print 'y'.

s = 'buggy'
print s[s.find('w')]

Before using the result you always have to perform
a test to discriminate between the found and not found cases. So I don't
really see why this wart has put such a bug up your ass.

Click to expand...

The bug that got me was what a slice object reports as the
'stop' bound when the step is negative and the slice includes
index 0. Took me hours to figure out why my code was failing.

The double-meaning of -1, as both an exclusive stopping bound
and an alias for the highest valid index, is just plain whacked.
Unfortunately, as negative indexes are currently handled, there
is no it-just-works value that slice could return.

If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.

This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

regards
Steve

Steve Holden · Aug 27, 2005

Torsten said:
HallÃ¶chen!

Granted, try/except can be used for deliberate case discrimination
(which may even happen in the standard library in many places),
however, it is only the second most elegant method -- the most
elegant being "if". Where "if" does the job, it should be prefered
in my opinion.

Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Which is what I've been trying to say all along.

regards
Steve

Robert Kern · Aug 27, 2005

Steve said:
Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Sure there is. -1 is a valid index; None is not. -1 as a sentinel is
specific to str.find(); None is used all over Python as a sentinel.

If I may digress for a bit, my advisor is currently working on a project
that is processing seafloor depth datasets starting from a few decades
ago. A lot of this data was orginally to be processed using FORTRAN
software, so in the idiom of much FORTRAN software from those days, 9999
is often used to mark missing data. Unfortunately, 9999 is a perfectly
valid datum in most of the unit systems used by the various datasets.

Now he has to find a grad student to traul through the datasets and
clean up the really invalid 9999's (as well as other such fun tasks like
deciding if a dataset that says it's using feet is actually using meters).

I have already called "Not It."

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Paul Rubin · Aug 27, 2005

Steve Holden said:
Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Of course there is. -1 is (under Python's perverse semantics) a valid
subscript. sys.maxint is an artifact of Python's fixed-size int
datatype, which is fading away under int/long unification, so it's
something that soon won't exist and shouldn't be used. None and ()
are invalid subscripts so would be reasonable return values, unlike -1
and sys.maxint. Of those, None is preferable to () because of its
semantic connotations.

Paul Rubin · Aug 27, 2005

Steve Holden said:
If you want an exception from your code when 'w' isn't in the string
you should consider using index() rather than find.

The idea is you expect w to be in the string. If w isn't in the
string, your code has a bug, and programs with bugs should fail as
early as possible so you can locate the bugs quickly and easily. That
is why, for example,

x = 'buggy'[None]

raises an exception instead of doing something stupid like returning 'g'.

Terry Reedy · Aug 27, 2005

Paul Rubin said:
Of course there is. -1 is (under Python's perverse semantics) a valid
subscript. sys.maxint is an artifact of Python's fixed-size int
datatype, which is fading away under int/long unification, so it's
something that soon won't exist and shouldn't be used. None and ()
are invalid subscripts so would be reasonable return values, unlike -1
and sys.maxint. Of those, None is preferable to () because of its
semantic connotations.

I agree here that None is importantly different from -1 for the reason
stated. The use of -1 is, I am sure, a holdover from statically typed
languages (C, in particular) that require all return values to be of the
same type, even if the 'return value' is actually meant to indicat that
there is no valid return value.

Terry J. Reedy

Bryan Olson · Aug 27, 2005

Steve said:
>> [...] I see no good reason for the following
>> to happily print 'y'.
>>
>> s = 'buggy'
>> print s[s.find('w')]
>>

>> > Before using the result you always have to perform
>> > a test to discriminate between the found and not found cases. So I >> don't
>> > really see why this wart has put such a bug up your ass.

Click to expand...

>>
>> The bug that got me was what a slice object reports as the
>> 'stop' bound when the step is negative and the slice includes
>> index 0. Took me hours to figure out why my code was failing.
>>
>> The double-meaning of -1, as both an exclusive stopping bound
>> and an alias for the highest valid index, is just plain whacked.
>> Unfortunately, as negative indexes are currently handled, there
>> is no it-just-works value that slice could return.
>>
>>

Click to expand...

> If you want an exception from your code when 'w' isn't in the string you
> should consider using index() rather than find.

That misses the point. The code is a hypothetical example of
what a novice or imperfect Pythoners might have to deal with.
The exception isn't really wanted; it's just vastly superior to
silently returning a nonsensical value.

> Otherwise, whatever find() returns you will have to have an "if" in
> there to handle the not-found case.
>
> This just sounds like whining to me. If you want to catch errors, use a
> function that will raise an exception rather than relying on the
> invalidity of the result.

I suppose if you ignore the real problems and the proposed
solution, it might sound a lot like whining.

Steve Holden · Aug 27, 2005

Terry said:
I agree here that None is importantly different from -1 for the reason
stated. The use of -1 is, I am sure, a holdover from statically typed
languages (C, in particular) that require all return values to be of the
same type, even if the 'return value' is actually meant to indicat that
there is no valid return value.

While I agree that it would have been more sensible to choose None in
find()'s original design, there's really no reason to go breaking
existing code just to fix it.

Guido has already agreed that find() can change (or even disappear) in
Python 3.0, so please let's just leave things as they are for now.

A corrected find() that returns None on failure is a five-liner.

regards
Steve

Steve Holden · Aug 27, 2005

Paul said:
Steve Holden said:

If you want an exception from your code when 'w' isn't in the string
you should consider using index() rather than find.

Click to expand...

The idea is you expect w to be in the string. If w isn't in the
string, your code has a bug, and programs with bugs should fail as
early as possible so you can locate the bugs quickly and easily. That
is why, for example,

x = 'buggy'[None]

raises an exception instead of doing something stupid like returning 'g'.

You did read the sentence you were replying to, didn't you?

regards
Steve

Paul Rubin · Aug 27, 2005

Steve Holden said:
A corrected find() that returns None on failure is a five-liner.

If I wanted to write five lines instead of one everywhere in a Python
program, I'd use Java.

skip · Aug 27, 2005

Paul> If I wanted to write five lines instead of one everywhere in a
Paul> Python program, I'd use Java.

+1 for QOTW.

Skip

Steve Holden · Aug 27, 2005

Paul said:
If I wanted to write five lines instead of one everywhere in a Python
program, I'd use Java.

We are arguing about trivialities here. Let's stop before it gets
interesting

regards
Steve

Bryan Olson · Aug 28, 2005

Steve said:
> Paul Rubin wrote:
> We are arguing about trivialities here. Let's stop before it gets
> interesting

Some of us are looking beyond the trivia of what string.find()
should return, at an unfortunate interaction of Python features,
brought on by the special-casing of negative indexes. The wart
bites novice or imperfect Python programmers in simple cases
such as string.find(), or when their subscripts accidentally
fall off the low end. It bites programmers who want to fully
implement Python slicing, because of the double-and-
contradictory- interpretation of -1, as both an exclusive ending
bound and the index of the last element. It bites documentation
authors who naturally think of the non-negative subscript as
*the* index of a sequence item.

bearophileHUGS · Aug 28, 2005

I agree with Bryan Olson.
I think it's a kind of bug, and it has to be fixed, like few other
things.

But I understand that this change can give little problems to the
already written code...

Bye,
bearophile

Steve Holden · Aug 28, 2005

Bryan said:
Some of us are looking beyond the trivia of what string.find()
should return, at an unfortunate interaction of Python features,
brought on by the special-casing of negative indexes. The wart
bites novice or imperfect Python programmers in simple cases
such as string.find(), or when their subscripts accidentally
fall off the low end. It bites programmers who want to fully
implement Python slicing, because of the double-and-
contradictory- interpretation of -1, as both an exclusive ending
bound and the index of the last element. It bites documentation
authors who naturally think of the non-negative subscript as
*the* index of a sequence item.

Sure. I wrote two days ago:

We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.

While I agree it's a trap for the unwary I still don't regard it as a
major wart. But I'm all in favor of discussions to make 3.0 a better
language.

regards
Steve

Magnus Lycka · Aug 29, 2005

Robert said:
If I may digress for a bit, my advisor is currently working on a project
that is processing seafloor depth datasets starting from a few decades
ago. A lot of this data was orginally to be processed using FORTRAN
software, so in the idiom of much FORTRAN software from those days, 9999
is often used to mark missing data. Unfortunately, 9999 is a perfectly
valid datum in most of the unit systems used by the various datasets.

Now he has to find a grad student to traul through the datasets and
clean up the really invalid 9999's (as well as other such fun tasks like
deciding if a dataset that says it's using feet is actually using meters).

I'm afraid this didn't end with FORTRAN. It's not that long ago
that I wrote a program for my wife that combined a data editor
with a graph display, so that she could clean up time lines with
length and weight data for children (from an international research
project performed during the 90's). 99cm is not unreasonable as a
length, but if you see it in a graph with other length measurements,
it's easy to spot most of the false ones, just as mistyped year part
in a date (common in the beginning of a new year).

Perhaps graphics can help this grad student too? It's certainly much
easier to spot deviations in curves than in an endless line of
numbers if the curves would normally be reasonably smooth.

Antoon Pardon · Aug 29, 2005

Op 2005-08-27 said:
Well, Guido no longer thinks it worth having and emphatically agreed that
it should be added to one of the 'To be removed' sections of PEP 3000.

I think a properly implented find is better than an index.

If we only have index, Then asking for permission is no longer a
possibility. If we have a find that returns None, we can either
ask permission before we index or be forgiven by the exception
that is raised.

Antoon Pardon · Aug 29, 2005

Op 2005-08-27 said:
If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

Sometimes it is convenient to have the exception thrown at a later
time.

Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.

And maybe the more convenient place for this "if" is in a whole different
part of your program, a part where using -1 as an invalid index isn't
at all obvious.

This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

You always seem to look at such things in a very narrow scope. You never
seem to consider that various parts of a program have to work together.

So what happens if you have a module that is collecting string-index
pair, colleted from various other parts. In one part you
want to select the last letter, so you pythonically choose -1 as
index. In an other part you get a result of find and are happy
with -1 as an indictation for an invalid index. Then these
data meet.

Robert Kern · Aug 29, 2005

Magnus said:
I'm afraid this didn't end with FORTRAN. It's not that long ago
that I wrote a program for my wife that combined a data editor
with a graph display, so that she could clean up time lines with
length and weight data for children (from an international research
project performed during the 90's). 99cm is not unreasonable as a
length, but if you see it in a graph with other length measurements,
it's easy to spot most of the false ones, just as mistyped year part
in a date (common in the beginning of a new year).

Perhaps graphics can help this grad student too? It's certainly much
easier to spot deviations in curves than in an endless line of
numbers if the curves would normally be reasonably smooth.

Yes! In fact, that was the context of the discussion when my advisor
told me about this project. Another student had written an interactive
GUI for exploring bathymetry maps. My advisor: "That kind of thing would
be really great for this new project, etc. etc."

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Steven Bethard · Aug 29, 2005

Antoon said:
I think a properly implented find is better than an index.

See the current thread in python-dev[1], which proposes a new method,
str.partition(). I believe that Raymond Hettinger has shown that almost
all uses of str.find() can be more clearly be represented with his
proposed function.

STeVe

[1]http://mail.python.org/pipermail/python-dev/2005-August/055781.html

Slice lists and extended slicing	0	Jan 26, 2011
xslice idea \| a generator slice	7	Jul 11, 2013
Possible improvement to slice opperations.	40	Sep 4, 2005
itertools.islice and slice objects	1	Nov 23, 2004
How about adding slice notation to iterators/generators?	13	Oct 16, 2009
slice lists	2	Sep 19, 2005
bug in str.startswith() and str.endswith()	4	May 27, 2011
slice's indices() method	2	Oct 30, 2006

Bug in slice type

Steve Holden

Steve Holden

Robert Kern

Paul Rubin

Paul Rubin

Terry Reedy

Bryan Olson

Steve Holden

Steve Holden

Paul Rubin

skip

Steve Holden

Bryan Olson

bearophileHUGS

Steve Holden

Magnus Lycka

Antoon Pardon

Antoon Pardon

Robert Kern

Steven Bethard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads