Can't get around "IndexError: list index out of range"

S

Steven D'Aprano

More often and easier to implement than dict.has_key / get?

No, *less* often. That's the point -- it is fairly common for people to
want dictionary lookup to return a default value, but quite rare for them
to want sequence lookup to return a default value. A sequence with a
default value would be, in some sense, equivalent to an infinite list:

[A, B, ... , Z, default, default, default, ... ]

where the A...Z are actual values. (Obviously you don't actually have an
infinite list.)

I can't think of any practical use for such a thing; the only thing that
comes close is in some Cellular Automata it is sometimes useful to imagine
cells out of range to be always in some default state. But generally
speaking, you don't implement Cellular Automata with lists.


Uh, no. KeyError.

dict.get() doesn't raise KeyError. That's the whole point of get(), it
returns a default value instead of raising KeyError.

Agreed. but why implement a certain functionality in one place but leave
it out of another?

Because implementation requires time and effort. It makes the regression
tests more complicated and bloats the code base. If the feature doesn't
scratch anybody's itch, if there isn't a clear use for it, it won't be
implemented. If you don't care enough to even make a formal feature
request, let alone a PEP, then why should people who care even less
actually write the code?
 
M

MonkeeSage

No, *less* often. That's the point -- it is fairly common for people to
want dictionary lookup to return a default value, but quite rare for them
to want sequence lookup to return a default value. A sequence with a
default value would be, in some sense, equivalent to an infinite list:

Ah, yes. Infinite dictionaries are much better! I guess you could think
of it like providing a infinitely indexed list (or infinitely keyed
dict), but a better way to think of it is as providing a
non-terminating exception where the default value is the exceptional
case. And I don't see why it would be so rare to do something like:

if sys.argv.get(1): ...

With list.has_index() / get(), the following (pretty common I think)
idiom:

try:
data = some_unknown_seq[2]
except IndexError:
data = None
if data: ...

Can become:

data = some_unknown_seq.get(2)
if data: ...

Perhaps list.get() wouldn't be used as often as dict.get(), but it
would be used a fair amount I think. Looking at the standard library
(2.5 source), I find 30 places where "except IndexError" appears, and
two places where a comment says that some method "raises IndexError" on
some condition. I haven't looked at the context of them, but I'd wager
that many of them would benefit from list.has_index() and / or get().
Here is my script to search the libs:

import os, re
found = {}
for path, dirs, files in os.walk('./Lib'):
for afile in files:
afile = open(os.path.join(path, afile))
lines = afile.readlines()
afile.close()
for line in lines:
match = re.search(r'((except|raises) IndexError)', line)
if match:
found[afile.name] = match.group(1)
for item in found.items():
print '%s (%s)' % item
print 'Found %d matches' % len(found)
dict.get() doesn't raise KeyError. That's the whole point of get(), it
returns a default value instead of raising KeyError.

Right. Exactly. Accessing a non-existent key raises a KeyError, but
dict.get() short-curcuits the exception and gives you a default value
(which is None unless explicitly changed). So instead of trying the key
and catching a KeyError, you can use a simple conditional and ask if
d.has_key(key), or assign d.get(key) and test the assignee. So, why
isn't there a list.has_index() / get()?
If you don't care enough to even make a formal feature
request, let alone a PEP, then why should people who care even less
actually write the code?

I'm thinking about it. I just wanted to see if anyone knew of, or could
come up with, a good reason why it isn't / shouldn't be there.
Apparently not (at least not one that doesn't also bite the dict
convenience methods), so I'll probably go ahead and make a feature
request in the next few days.

Regards,
Jordan
 
F

Fredrik Lundh

MonkeeSage said:
With list.has_index() / get(), the following (pretty common I think)
idiom:

try:
data = some_unknown_seq[2]
except IndexError:
data = None
if data: ...

umm. you could at least write:

try:
data = some_unknown_seq[2]
except IndexError:
pass
else:
... deal with data ...

but "let's hypergeneralize and treat sequences and mappings as the same
thing" proposals are nothing new; a trip to the archives might be help-
ful.

</F>
 
M

MonkeeSage

but "let's hypergeneralize and treat sequences and mappings as the same
thing" proposals are nothing new; a trip to the archives might be help-
ful.

Huh? I don't want to treat sequences and mappings as the same thing.
I'm talking about adding two similar convenience methods for sequences
as already exist for mappings. That may make the two APIs closer, but
that's not necessarily a bad thing (think the square-bracket accessor).
Besides, has_index is sufficiently different already. If it's really a
problem, change get() to at() for sequences. seq.at(2).

So far the reasons offered against adding those convenience methods
are:

Reason: It's unnecessary / bloat.
- Then the same thing is true of the dict methods.

Reason: It's not useful.
- I know of several use cases and could probably find others.

Reason: It takes effort to implement it. Why don't you do it yourself
if it's such a great idea!
- Mabye I will. But that is only a reason why they aren't currently
implemented, not why they *shouldn't* be.

Reason: It makes sequences and mapping to much alike.
- Then change the names for the sequences methods.

That is to say, no good reason has been offered for why these methods
shouldn't be implemented.

Regards,
Jordan
 
S

Steve Holden

MonkeeSage said:
Huh? I don't want to treat sequences and mappings as the same thing.
I'm talking about adding two similar convenience methods for sequences
as already exist for mappings. That may make the two APIs closer, but
that's not necessarily a bad thing (think the square-bracket accessor).
Besides, has_index is sufficiently different already. If it's really a
problem, change get() to at() for sequences. seq.at(2).

So far the reasons offered against adding those convenience methods
are:

Reason: It's unnecessary / bloat.
- Then the same thing is true of the dict methods.
No: you are proposing to add features to the sequence interface for
which there are few demonstrable use cases.
Reason: It's not useful.
- I know of several use cases and could probably find others.
Well I certainly didn't find your last one particularly convincing: the
attempt to reference a non-existent sequence member is almost always a
programming error.
Reason: It takes effort to implement it. Why don't you do it yourself
if it's such a great idea!
- Mabye I will. But that is only a reason why they aren't currently
implemented, not why they *shouldn't* be.

Reason: It makes sequences and mapping to much alike.
- Then change the names for the sequences methods.

That is to say, no good reason has been offered for why these methods
shouldn't be implemented.
I would argue exactly the opposite: the reason why they shouldn't be
implemented is because no good reason has been presented why they *should*.

regards
Steve
 
F

Fredrik Lundh

MonkeeSage said:
Huh? I don't want to treat sequences and mappings as the same thing.
I'm talking about adding two similar convenience methods for sequences
as already exist for mappings.

so what makes you think you're the first one who's ever talked about that?

</F>
 
M

MonkeeSage

No: you are proposing to add features to the sequence interface for
which there are few demonstrable use cases.

If I really wanted to find them, how many instances do you think I
could find [in the standard lib and community-respected third-party
libs] of sequence index checking like "if 2 < len(seq)" and / or
try-excepting like "try: seq[2] ..."? Judging by the fact that there
isn't any other way to *safely* handle dynamic sequences (esp.
sequences which grow based on side-effects which may or may not be
present, depending on the execution path through the code); I'd guess
the number is alot higher than you seem to think.
Well I certainly didn't find your last one particularly convincing: the
attempt to reference a non-existent sequence member is almost always a
programming error.

Unless you are interacting with user input, or other anomalous data
source. And in that case you need to do explicit index checking or wrap
your code with a try...except; that or you need a convenience function
or method that implements a non-terminating exception, and you just
check for the exceptional case, like dictionaries have with get(). I
find the latter approach to be easier to read and write (see link
below), as well as understand.
I would argue exactly the opposite: the reason why they shouldn't be
implemented is because no good reason has been presented why they *should*.

Pretend like there are no dict.has_key and dict.get methods. Can you
provide a good reason(s) why they should be implemented? Not necessity
-- since you can do the same thing more verbosely. Usefulness? --
Probably; but I think the list methods would also be useful (see
above). Succinctness [1]? -- The list methods have the same advantage.
Anything else?

[1] http://mail.python.org/pipermail/python-dev/1999-July/000594.html



I looked yesterday and only found a few posts. A couple involved
Marc-Andre Lemburg, and mxTools, where he has a get() function that
works for sequences and mappings; that's not what I'm suggesting.
However, I found one from 1997 where he mentioned a patch to python 1.5
which added list.get, but I couldn't find the patch or any discussion
of why it was (presumably) rejected. The only other post I found that
was relevant was one on the dev-python list (mentioned in the July 1-15
summery [1]). And the only thing mentioned there as a reason against it
is that "It encourages bad coding. You
shouldn't be searching lists and tuples like that unless you know what
you're doing." (Whatever that is supposed to mean!).

Just point me to the discussion where the good reasons (or any at all)
against my suggestion can be found and I'll be glad to read it. I
couldn't find it.

[1]
http://www.python.org/dev/summary/2006-07-01_2006-07-15/#adding-list-get-and-tuple-get

Regards,
Jordan
 
S

Steve Holden

MonkeeSage said:
No: you are proposing to add features to the sequence interface for
which there are few demonstrable use cases.


If I really wanted to find them, how many instances do you think I
could find [in the standard lib and community-respected third-party
libs] of sequence index checking like "if 2 < len(seq)" and / or
try-excepting like "try: seq[2] ..."? Judging by the fact that there
isn't any other way to *safely* handle dynamic sequences (esp.
sequences which grow based on side-effects which may or may not be
present, depending on the execution path through the code); I'd guess
the number is alot higher than you seem to think.
Keep right on guessing.
Unless you are interacting with user input, or other anomalous data
source. And in that case you need to do explicit index checking or wrap
your code with a try...except; that or you need a convenience function
or method that implements a non-terminating exception, and you just
check for the exceptional case, like dictionaries have with get(). I
find the latter approach to be easier to read and write (see link
below), as well as understand.
OK, so now we appear to be arguing about whether a feature should go
into Python because *you* find it to be easier to read and write. But I
don't see a groundswell of support from other readers saying "Wow, I've
always wanted to do it like that".
I would argue exactly the opposite: the reason why they shouldn't be
implemented is because no good reason has been presented why they *should*.


Pretend like there are no dict.has_key and dict.get methods. Can you
provide a good reason(s) why they should be implemented? Not necessity
-- since you can do the same thing more verbosely. Usefulness? --
Probably; but I think the list methods would also be useful (see
above). Succinctness [1]? -- The list methods have the same advantage.
Anything else?

[1] http://mail.python.org/pipermail/python-dev/1999-July/000594.html
Nope. In fact d.has_key(k) is a historical spelling, retained only for
backwards compatibility, of k in dict. As to the d.get(k, default)
method I really don't see a compelling use case despite your
protestations, and I don't seem to be alone. Please feel free to start
recruiting support.
I looked yesterday and only found a few posts. A couple involved
Marc-Andre Lemburg, and mxTools, where he has a get() function that
works for sequences and mappings; that's not what I'm suggesting.
However, I found one from 1997 where he mentioned a patch to python 1.5
which added list.get, but I couldn't find the patch or any discussion
of why it was (presumably) rejected. The only other post I found that
was relevant was one on the dev-python list (mentioned in the July 1-15
summery [1]). And the only thing mentioned there as a reason against it
is that "It encourages bad coding. You
shouldn't be searching lists and tuples like that unless you know what
you're doing." (Whatever that is supposed to mean!).

Just point me to the discussion where the good reasons (or any at all)
against my suggestion can be found and I'll be glad to read it. I
couldn't find it.

[1]
http://www.python.org/dev/summary/2006-07-01_2006-07-15/#adding-list-get-and-tuple-get
The fact that nobody has listed the good reasons why I shouldn't try to
make a computer from mouldy cheese doesn't make this a good idea.

regards
Steve
 
D

Dennis Lee Bieber

Nope. In fact d.has_key(k) is a historical spelling, retained only for
backwards compatibility, of k in dict. As to the d.get(k, default)
method I really don't see a compelling use case despite your
protestations, and I don't seem to be alone. Please feel free to start
recruiting support.

Well, I sure don't see any use for a list equivalent of

d.get(k, default)

The main usage I've ever seen for that has been in statements like:

d[k] = d.get(k, default) + something #or other functions

which would make a list equivalent look like

l = l.get(i, default) + something...

But how do you handle the case of:

l = []
i = 10

l = l.get(i, 0) + 1

Do you suddenly extend l by 9 undefined entries?
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
M

MonkeeSage

Keep right on guessing.

I hope I'm not offending one whom I consider to be much more skilled
and versed than I am, not only in python, but in programming in
general; but I must say: it seems you are being rather obtuse here. I
think I laid out the principal clearly enough, and I know you have the
mental capacity to extrapolate from the principal to general use cases.
But even so, here is a simple use case from the standard library
(python 2.5 release source):

In Libs/site.py, lines 302-306:

try:
for i in range(lineno, lineno + self.MAXLINES):
print self.__lines
except IndexError:
break

With my proposal, that could be written as:

for i in range(lineno, lineno + self.MAXLINES):
if self.__lines.has_index(i):
print self.__lines
else:
break

Granted, in this particular case the amount of code is not reduced, but
(and I would hope you'd agree) the control flow is certainly easier to
follow.
OK, so now we appear to be arguing about whether a feature should go
into Python because *you* find it to be easier to read and write. But I
don't see a groundswell of support from other readers saying "Wow, I've
always wanted to do it like that".

*Someone* (other than me!) obviously found it nice to have the dict
convenience methods. As for garnishing support, I almost see that as
more of a cultural, rather than pragmatic issue. I.e., if it's not
there already, then it shouldn't be there: "what is is what should be".
Of course, consistently following that naive presumption would totally
stiffle *any* extension to python. However, (I think) I do understand
the psychology of the matter, and don't fault those who cannot see past
what already is (not meaning to implicate you or Fredrick or anyone
else -- the comment is innocent).
In fact d.has_key(k) is a historical spelling, retained only for
backwards compatibility, of k in dict. As to the d.get(k, default)
method I really don't see a compelling use case despite your
protestations, and I don't seem to be alone. Please feel free to start
recruiting support.

As I stated to another poster; I'm not really concerned with
implementation details, only with the presence or absence of
convenience methods. You can write "if k in d" as easily as "if index <
len(seq)". But semantically, they are similar enough, in my (admittedly
lowly) estimation, to desevere similar convenience methods.
The fact that nobody has listed the good reasons why I shouldn't try to
make a computer from mouldy cheese doesn't make this a good idea.

Heh. True. I didn't mean to imply that I was arguing from the negative.
I was only tring to shift the perspective to include the reasons for
the dict convenience methods. If C is good for A, and A is sufficiently
similar to B, then C is good for B. But if C is just crud, then forget
it all around. ;)


But how do you handle the case of:

l = []
i = 10

l = l.get(i, 0) + 1


You don't; you just let the IndexError fall through. Same as a KeyError
for d[k]. My propopsal is in regard to convencience methods, not to
direct access.

Ps. Sorry if this comes through twice, Google is being wierd right now.

Regards,
Jordan
 
S

Steve Holden

MonkeeSage said:
Keep right on guessing.


I hope I'm not offending one whom I consider to be much more skilled
and versed than I am, not only in python, but in programming in
general; but I must say: it seems you are being rather obtuse here. I
think I laid out the principal clearly enough, and I know you have the
mental capacity to extrapolate from the principal to general use cases.
But even so, here is a simple use case from the standard library
(python 2.5 release source):

In Libs/site.py, lines 302-306:

try:
for i in range(lineno, lineno + self.MAXLINES):
print self.__lines
except IndexError:
break

With my proposal, that could be written as:

for i in range(lineno, lineno + self.MAXLINES):
if self.__lines.has_index(i):
print self.__lines
else:
break

Granted, in this particular case the amount of code is not reduced, but
(and I would hope you'd agree) the control flow is certainly easier to
follow.

OK, so now we appear to be arguing about whether a feature should go
into Python because *you* find it to be easier to read and write. But I
don't see a groundswell of support from other readers saying "Wow, I've
always wanted to do it like that".


*Someone* (other than me!) obviously found it nice to have the dict
convenience methods. As for garnishing support, I almost see that as
more of a cultural, rather than pragmatic issue. I.e., if it's not
there already, then it shouldn't be there: "what is is what should be".
Of course, consistently following that naive presumption would totally
stiffle *any* extension to python. However, (I think) I do understand
the psychology of the matter, and don't fault those who cannot see past
what already is (not meaning to implicate you or Fredrick or anyone
else -- the comment is innocent).

In fact d.has_key(k) is a historical spelling, retained only for
backwards compatibility, of k in dict. As to the d.get(k, default)
method I really don't see a compelling use case despite your
protestations, and I don't seem to be alone. Please feel free to start
recruiting support.


As I stated to another poster; I'm not really concerned with
implementation details, only with the presence or absence of
convenience methods. You can write "if k in d" as easily as "if index <
len(seq)". But semantically, they are similar enough, in my (admittedly
lowly) estimation, to desevere similar convenience methods.

The fact that nobody has listed the good reasons why I shouldn't try to
make a computer from mouldy cheese doesn't make this a good idea.


Heh. True. I didn't mean to imply that I was arguing from the negative.
I was only tring to shift the perspective to include the reasons for
the dict convenience methods. If C is good for A, and A is sufficiently
similar to B, then C is good for B. But if C is just crud, then forget
it all around. ;)


But how do you handle the case of:

l = []
i = 10

l = l.get(i, 0) + 1



You don't; you just let the IndexError fall through. Same as a KeyError
for d[k]. My propopsal is in regard to convencience methods, not to
direct access.

Ps. Sorry if this comes through twice, Google is being wierd right now.

I think we'll just have to agree to differ in this repsecrt, as I don't
see your suggestions for extending the sequence API as particularly
helpful. The one case you quote doesn't actually use the construct to
supply a default value, it just terminates a loop early if the sequence
length is below expectation (unless I've misread the code: that does
happen).

I don't really think it's a biggy, I just don't see the parallels that
you do. Your point about blindness to the need for change is one that
we do have to be careful of (not that I'm the person to consult about
what goes into the language anyway). But if you think you've had an
argument here, try writing a PEP for this and see if you can persuade
the denizens of python-dev to accept (and implement) it.

regard
Steve
 
T

Terry Reedy

MonkeeSage said:
But even so, here is a simple use case from the standard library
(python 2.5 release source):

In Libs/site.py, lines 302-306:

try:
for i in range(lineno, lineno + self.MAXLINES):
print self.__lines
except IndexError:
break


Is there an outer loop being 'break'ed? If not, it should be pass instead.
With my proposal, that could be written as:

for i in range(lineno, lineno + self.MAXLINES):
if self.__lines.has_index(i):
print self.__lines
else:
break


This break is swallowed by the for loop, so not exactly equivalent, I
think. In any case, these are both clumsy and I believe I would instead
write something like

for i in range(lineno, len(self.__lines)):
print self.__lines

or even better, use islice -- for line in islice(...): print line
So not a persuasive use case to me.

Terry Jan Reedy
 
F

Fredrik Lundh

MonkeeSage said:
In Libs/site.py, lines 302-306:

try:
for i in range(lineno, lineno + self.MAXLINES):
print self.__lines
except IndexError:
break

With my proposal, that could be written as:

for i in range(lineno, lineno + self.MAXLINES):
if self.__lines.has_index(i):
print self.__lines
else:
break

Granted, in this particular case the amount of code is not reduced, but
(and I would hope you'd agree) the control flow is certainly easier to
follow.


so to "improve" a piece of code that's been optimized for the common
case, you're adding an extra method call and a test to the inner loop?

and this because you think Python programmers don't understand try-
except statements ?

I think we can all safely ignore you now.

</F>
 
F

Fredrik Lundh

Terry said:
Is there an outer loop being 'break'ed?
yes.

This break is swallowed by the for loop, so not exactly equivalent, I
think.

the code is supposed to break out of the outer loop when it runs out of
lines, so yes, monkeeboy's code is broken in more than one way.
In any case, these are both clumsy and I believe I would instead
write something like

for i in range(lineno, len(self.__lines)):
print self.__lines


that doesn't do the same thing, either.

and you both seem to be missing that

try:
for loop over something:
do something with something
except IndexError:
...

is a common pydiom when you expect to process quite a few somethings,
and you don't want to waste time on calculating end conditions up front
or checking for end conditions inside the loop or otherwise looking
before you leap (LBYL); the pydiom lets you process things as quickly as
you possibly can, as long as you possibly can, and deal with the end of
the sequence when you hit it (EAFP*).

</F>

*) insert martelli essay here
 
M

MonkeeSage

I think we'll just have to agree to differ in this repsecrt, as I don't
see your suggestions for extending the sequence API as particularly
helpful.

No worries. :)


so to "improve" a piece of code that's been optimized for the common
case, you're adding an extra method call and a test to the inner loop?

and this because you think Python programmers don't understand try-
except statements ?

Uh, yeah, "You're all idiots and I'm not really a 'Python
Programmer'(TM)" -- that's exactly what I was meaning to say. I'm
suprised your telepathic powers let you pick up on that, since I didn't
say anything that could even remotely be construed that way. Freaky.

And, FWIW, the "optimized" version is not much faster than that put
forth by the stupid peon (me), even when my suggestion is implemented
in pure python:

$ cat test.py
import timeit
ary = ['blah'] * 10

def has_index(seq, idx):
return idx < len(seq)
def at(seq, idx):
if has_index(seq, idx):
return seq[idx]

def t1():
while 1:
try:
for i in range(11):
ary
except IndexError:
break
def t2():
go = True
while go:
for i in range(11):
if has_index(ary, i):
ary
else:
go = False
def t3():
go = True
while go:
for i in range(11):
val = at(ary, i)
if val:
val
else:
go = False

print 't1 time:'
print timeit.Timer('t1()', 'from __main__ import t1').timeit()
print 't2 time:'
print timeit.Timer('t2()', 'from __main__ import t2').timeit()
print 't3 time:'
print timeit.Timer('t3()', 'from __main__ import t3').timeit()

$ python test.py
t1 time:
15.9402189255
t2 time:
18.6002299786
t3 time:
23.2494211197
I think we can all safely ignore you now.

You could have done that all along. I'm no Max Planck, and this isn't
quantum mechanics. But more interesting than whether it's "safe" to
ignore suggestions for improvement is whether its actually beneficial
to do so.



I don't think this is strictly a question of EAFP vs. LBYL, but more a
question of convenience. This very moment in python I can say both
"try: d[k] ..." and "if d.has_key[k] / k in d".

Regards,
Jordan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,119
Latest member
IrmaNorcro
Top