Bug in slice type

S

Steve Holden

Bryan said:
Steve said:
Bryan said:
Antoon Pardon wrote:
It probably is too late now, but I always felt, find should
have returned None when the substring isn't found.

None is certainly a reasonable candidate. [...]
The really broken part is that unsuccessful searches return a
legal index.
We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.

I don't think my proposal breaks historic Python code, and I
don't think it has the same kind of unfortunate subtle
consequences as the current indexing scheme. You may think the
wart is tiny, but the duct-tape* is available so let's cure it.

[*] http://www.google.com/search?as_q=warts+"duct+tape"

What I don't understand is why you want it to return something that
isn't a legal index.

In this case, so that errors are caught as close to their
occurrence as possible. I see no good reason for the following
to happily print 'y'.

s = 'buggy'
print s[s.find('w')]
Before using the result you always have to perform
a test to discriminate between the found and not found cases. So I don't
really see why this wart has put such a bug up your ass.

The bug that got me was what a slice object reports as the
'stop' bound when the step is negative and the slice includes
index 0. Took me hours to figure out why my code was failing.

The double-meaning of -1, as both an exclusive stopping bound
and an alias for the highest valid index, is just plain whacked.
Unfortunately, as negative indexes are currently handled, there
is no it-just-works value that slice could return.
If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.

This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

regards
Steve
 
S

Steve Holden

Torsten said:
Hallöchen!




Granted, try/except can be used for deliberate case discrimination
(which may even happen in the standard library in many places),
however, it is only the second most elegant method -- the most
elegant being "if". Where "if" does the job, it should be prefered
in my opinion.
Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Which is what I've been trying to say all along.

regards
Steve
 
R

Robert Kern

Steve said:
Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Sure there is. -1 is a valid index; None is not. -1 as a sentinel is
specific to str.find(); None is used all over Python as a sentinel.

If I may digress for a bit, my advisor is currently working on a project
that is processing seafloor depth datasets starting from a few decades
ago. A lot of this data was orginally to be processed using FORTRAN
software, so in the idiom of much FORTRAN software from those days, 9999
is often used to mark missing data. Unfortunately, 9999 is a perfectly
valid datum in most of the unit systems used by the various datasets.

Now he has to find a grad student to traul through the datasets and
clean up the really invalid 9999's (as well as other such fun tasks like
deciding if a dataset that says it's using feet is actually using meters).

I have already called "Not It."

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
P

Paul Rubin

Steve Holden said:
Of course. But onc you (sensibly) decide to use an "if" then there
really isn't much difference between -1, None, () and sys.maxint as
a sentinel value, is there?

Of course there is. -1 is (under Python's perverse semantics) a valid
subscript. sys.maxint is an artifact of Python's fixed-size int
datatype, which is fading away under int/long unification, so it's
something that soon won't exist and shouldn't be used. None and ()
are invalid subscripts so would be reasonable return values, unlike -1
and sys.maxint. Of those, None is preferable to () because of its
semantic connotations.
 
P

Paul Rubin

Steve Holden said:
If you want an exception from your code when 'w' isn't in the string
you should consider using index() rather than find.

The idea is you expect w to be in the string. If w isn't in the
string, your code has a bug, and programs with bugs should fail as
early as possible so you can locate the bugs quickly and easily. That
is why, for example,

x = 'buggy'[None]

raises an exception instead of doing something stupid like returning 'g'.
 
T

Terry Reedy

Paul Rubin said:
Of course there is. -1 is (under Python's perverse semantics) a valid
subscript. sys.maxint is an artifact of Python's fixed-size int
datatype, which is fading away under int/long unification, so it's
something that soon won't exist and shouldn't be used. None and ()
are invalid subscripts so would be reasonable return values, unlike -1
and sys.maxint. Of those, None is preferable to () because of its
semantic connotations.

I agree here that None is importantly different from -1 for the reason
stated. The use of -1 is, I am sure, a holdover from statically typed
languages (C, in particular) that require all return values to be of the
same type, even if the 'return value' is actually meant to indicat that
there is no valid return value.

Terry J. Reedy
 
B

Bryan Olson

Steve said:
>> [...] I see no good reason for the following
>> to happily print 'y'.
>>
>> s = 'buggy'
>> print s[s.find('w')]
>>
>> > Before using the result you always have to perform
>> > a test to discriminate between the found and not found cases. So I >> don't
>> > really see why this wart has put such a bug up your ass.
>>
>> The bug that got me was what a slice object reports as the
>> 'stop' bound when the step is negative and the slice includes
>> index 0. Took me hours to figure out why my code was failing.
>>
>> The double-meaning of -1, as both an exclusive stopping bound
>> and an alias for the highest valid index, is just plain whacked.
>> Unfortunately, as negative indexes are currently handled, there
>> is no it-just-works value that slice could return.
>>
>>
> If you want an exception from your code when 'w' isn't in the string you
> should consider using index() rather than find.

That misses the point. The code is a hypothetical example of
what a novice or imperfect Pythoners might have to deal with.
The exception isn't really wanted; it's just vastly superior to
silently returning a nonsensical value.

> Otherwise, whatever find() returns you will have to have an "if" in
> there to handle the not-found case.
>
> This just sounds like whining to me. If you want to catch errors, use a
> function that will raise an exception rather than relying on the
> invalidity of the result.

I suppose if you ignore the real problems and the proposed
solution, it might sound a lot like whining.
 
S

Steve Holden

Terry said:
I agree here that None is importantly different from -1 for the reason
stated. The use of -1 is, I am sure, a holdover from statically typed
languages (C, in particular) that require all return values to be of the
same type, even if the 'return value' is actually meant to indicat that
there is no valid return value.

While I agree that it would have been more sensible to choose None in
find()'s original design, there's really no reason to go breaking
existing code just to fix it.

Guido has already agreed that find() can change (or even disappear) in
Python 3.0, so please let's just leave things as they are for now.

A corrected find() that returns None on failure is a five-liner.

regards
Steve
 
S

Steve Holden

Paul said:
Steve Holden said:
If you want an exception from your code when 'w' isn't in the string
you should consider using index() rather than find.


The idea is you expect w to be in the string. If w isn't in the
string, your code has a bug, and programs with bugs should fail as
early as possible so you can locate the bugs quickly and easily. That
is why, for example,

x = 'buggy'[None]

raises an exception instead of doing something stupid like returning 'g'.

You did read the sentence you were replying to, didn't you?

regards
Steve
 
P

Paul Rubin

Steve Holden said:
A corrected find() that returns None on failure is a five-liner.

If I wanted to write five lines instead of one everywhere in a Python
program, I'd use Java.
 
S

skip

Paul> If I wanted to write five lines instead of one everywhere in a
Paul> Python program, I'd use Java.

+1 for QOTW.

Skip
 
S

Steve Holden

Paul said:
If I wanted to write five lines instead of one everywhere in a Python
program, I'd use Java.

We are arguing about trivialities here. Let's stop before it gets
interesting :)

regards
Steve
 
B

Bryan Olson

Steve said:
> Paul Rubin wrote:
> We are arguing about trivialities here. Let's stop before it gets
> interesting :)

Some of us are looking beyond the trivia of what string.find()
should return, at an unfortunate interaction of Python features,
brought on by the special-casing of negative indexes. The wart
bites novice or imperfect Python programmers in simple cases
such as string.find(), or when their subscripts accidentally
fall off the low end. It bites programmers who want to fully
implement Python slicing, because of the double-and-
contradictory- interpretation of -1, as both an exclusive ending
bound and the index of the last element. It bites documentation
authors who naturally think of the non-negative subscript as
*the* index of a sequence item.
 
B

bearophileHUGS

I agree with Bryan Olson.
I think it's a kind of bug, and it has to be fixed, like few other
things.

But I understand that this change can give little problems to the
already written code...

Bye,
bearophile
 
S

Steve Holden

Bryan said:
Some of us are looking beyond the trivia of what string.find()
should return, at an unfortunate interaction of Python features,
brought on by the special-casing of negative indexes. The wart
bites novice or imperfect Python programmers in simple cases
such as string.find(), or when their subscripts accidentally
fall off the low end. It bites programmers who want to fully
implement Python slicing, because of the double-and-
contradictory- interpretation of -1, as both an exclusive ending
bound and the index of the last element. It bites documentation
authors who naturally think of the non-negative subscript as
*the* index of a sequence item.
Sure. I wrote two days ago:
We might agree, before further discussion, that this isn't the most
elegant part of Python's design, and it's down to history that this tiny
little wart remains.

While I agree it's a trap for the unwary I still don't regard it as a
major wart. But I'm all in favor of discussions to make 3.0 a better
language.

regards
Steve
 
M

Magnus Lycka

Robert said:
If I may digress for a bit, my advisor is currently working on a project
that is processing seafloor depth datasets starting from a few decades
ago. A lot of this data was orginally to be processed using FORTRAN
software, so in the idiom of much FORTRAN software from those days, 9999
is often used to mark missing data. Unfortunately, 9999 is a perfectly
valid datum in most of the unit systems used by the various datasets.

Now he has to find a grad student to traul through the datasets and
clean up the really invalid 9999's (as well as other such fun tasks like
deciding if a dataset that says it's using feet is actually using meters).

I'm afraid this didn't end with FORTRAN. It's not that long ago
that I wrote a program for my wife that combined a data editor
with a graph display, so that she could clean up time lines with
length and weight data for children (from an international research
project performed during the 90's). 99cm is not unreasonable as a
length, but if you see it in a graph with other length measurements,
it's easy to spot most of the false ones, just as mistyped year part
in a date (common in the beginning of a new year).

Perhaps graphics can help this grad student too? It's certainly much
easier to spot deviations in curves than in an endless line of
numbers if the curves would normally be reasonably smooth.
 
A

Antoon Pardon

Op 2005-08-27 said:
Well, Guido no longer thinks it worth having and emphatically agreed that
it should be added to one of the 'To be removed' sections of PEP 3000.

I think a properly implented find is better than an index.

If we only have index, Then asking for permission is no longer a
possibility. If we have a find that returns None, we can either
ask permission before we index or be forgiven by the exception
that is raised.
 
A

Antoon Pardon

Op 2005-08-27 said:
If you want an exception from your code when 'w' isn't in the string you
should consider using index() rather than find.

Sometimes it is convenient to have the exception thrown at a later
time.
Otherwise, whatever find() returns you will have to have an "if" in
there to handle the not-found case.

And maybe the more convenient place for this "if" is in a whole different
part of your program, a part where using -1 as an invalid index isn't
at all obvious.
This just sounds like whining to me. If you want to catch errors, use a
function that will raise an exception rather than relying on the
invalidity of the result.

You always seem to look at such things in a very narrow scope. You never
seem to consider that various parts of a program have to work together.

So what happens if you have a module that is collecting string-index
pair, colleted from various other parts. In one part you
want to select the last letter, so you pythonically choose -1 as
index. In an other part you get a result of find and are happy
with -1 as an indictation for an invalid index. Then these
data meet.
 
R

Robert Kern

Magnus said:
I'm afraid this didn't end with FORTRAN. It's not that long ago
that I wrote a program for my wife that combined a data editor
with a graph display, so that she could clean up time lines with
length and weight data for children (from an international research
project performed during the 90's). 99cm is not unreasonable as a
length, but if you see it in a graph with other length measurements,
it's easy to spot most of the false ones, just as mistyped year part
in a date (common in the beginning of a new year).

Perhaps graphics can help this grad student too? It's certainly much
easier to spot deviations in curves than in an endless line of
numbers if the curves would normally be reasonably smooth.

Yes! In fact, that was the context of the discussion when my advisor
told me about this project. Another student had written an interactive
GUI for exploring bathymetry maps. My advisor: "That kind of thing would
be really great for this new project, etc. etc."

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top