"index" method only for mutable sequences??

C

Chris Mellon

Maybe someone had to much alcohol when they were coding? Maybe they
don't know better? Maybe they thought that an index method on a
sequence made sense? Who are you to spoil their fun? Could it be that
YOU are the B&D person?

If you want a language that just adds whatever methods anyone thinks
of, along with whatever aliases for it any can think of, to every data
type, you know where to find Ruby.
 
S

Steven D'Aprano

Lists are designed for sequences of homogeneous items, e.g.:

L = [1, 2, 4, 8, 16, 32]
while tuples are designed to be more like structs or records, with
heterogeneous items, e.g.:

T = ("Fred", 32, 12.789, {}, None, '\t')

I think you are confused.

Anything is possible.

Last time I heard this homogeneous items stuf,
it had nothing to do with the types being the same. They were homogeneous
because they somehow belonged together and heterogeneous because they
just happened to live together. Similarity of type played no part in
calling the data homogeneous or heterogeneous.

Nevertheless, regardless of whether the items have the same type or
different types, you don't need an index method for heterogeneous items.

Like I said, think of a tuple as a struct. Even if the fields of the
struct all have the same type, there is little reason to ever ask "which
field has such-and-such a value?".

Anyway, that's was the reasoning. As I've said, tuples do double-duty as
both immutable lists and struct-like objects. I wouldn't object to them
growing index and count methods -- but you won't see me volunteering to
write the code for that, because I don't care that much.

So how about it? All you people who desperately want tuples to grow an
index method -- will any of you donate your time to write and maintain the
code?
 
P

Paul Boddie

If you want a language that just adds whatever methods anyone thinks
of, along with whatever aliases for it any can think of, to every data
type, you know where to find Ruby.

Nobody is asking for Ruby, as far as I can see. I even submitted a
quick patch to provide tuple.index (a method that has already been
thought of), given the triviality of the solution, but you won't find
me asking for a bundle of different convenience methods with all their
aliases on every object, regardless of whether you can monkey-patch
them after the fact or not. For example:

http://www.ruby-doc.org/core/classes/Array.html#M002235

There's a pretty big chasm between wanting to be able to apply
existing functionality exactly to a type which for some reason never
acquired it and embracing the method proliferation and other low-
hanging fruit-picking seemingly popular in Ruby. In observing this,
one can make objective decisions about things like this...

http://wiki.python.org/moin/AbstractBaseClasses

Note that, in that document, index and count are methods of
MutableSequence. Quite why this should be from a conceptual
perspective is baffling, but don't underestimate the legacy influence
in such matters.

Paul
 
C

Chris Mellon

Nobody is asking for Ruby, as far as I can see. I even submitted a
quick patch to provide tuple.index (a method that has already been
thought of), given the triviality of the solution, but you won't find
me asking for a bundle of different convenience methods with all their
aliases on every object, regardless of whether you can monkey-patch
them after the fact or not. For example:

Note that the mail I responded to was using being drunk, not knowing
any better, and having fun as use cases for the method. That sounds
like Ruby-style method proliferation to me ;)

Note that, in that document, index and count are methods of
MutableSequence. Quite why this should be from a conceptual
perspective is baffling, but don't underestimate the legacy influence
in such matters.

Well, I'm not Guido obviously, but here's why I don't find it baffling.

There are 2 main reasons why you'd use an immutable sequence for something:
1) You want to make sure it's not modified by a callee. This is
unPythonic and mostly unnecessary. Pass them a copy if you're that
paranoid.

2) Because you are representing a known, structured data type. This
can be either unordered, in which case index() is meaningless (as in
frozenset), or it can be ordered, in which case the order is an
implicit part of the structure. In such a case, index() is also
meaningless, and should never be necessary.

The primary use case for index on tuple is because people use them as
immutable lists. That's fine as far as it goes, but I want to know
what the justification is for using an immutable list, and if you have
one why you need to use index() on it. There's only 2 use cases I've
heard so far for that: certain types of binary parsing, which I don't
think is common enough to justify modification to the language core,
and third party libs returning silly things, which I *certainly* don't
think justifies changes to the language core.

I'm pretty against tuples growing index() ever. I think that if there
is a real need (and just because I haven't seen one doesn't mean it's
not there) for an immutable list, the introduction of fozenlist() to
the collections module (or something) would be a better solution. On
the other hand, if tuples grew list methods and we got a new immutable
sequence that had order, unpacking, and *named fields* along the lines
of Pascal records I'd be happy with that too.
 
M

Michael Zawrotny

The primary use case for index on tuple is because people use them as
immutable lists. That's fine as far as it goes, but I want to know
what the justification is for using an immutable list, and if you have
one why you need to use index() on it. There's only 2 use cases I've
heard so far for that: certain types of binary parsing, which I don't
think is common enough to justify modification to the language core,
and third party libs returning silly things, which I *certainly* don't
think justifies changes to the language core.

In regard to unpacking binary data and similar uses, those are
often perfomance sensitive, here is some benchmark data from
timeit.Timer on looking for an item in a list/tuple via list.index() or
list(tuple).index(). The best case scenario (short list) takes about a
20% performance hit.

run time (sec)
size runs list tuple ratio (t/l)
1000000 100 1.997614 5.526909 2.766755
100000 1000 2.049710 5.291704 2.581684
10000 10000 1.970400 2.714083 1.377428
1000 100000 2.325089 3.013624 1.296133
100 1000000 6.213748 7.661165 1.232938
10 10000000 44.970536 53.685698 1.193797

For whatever it's worth, I'm mildly in favor of index() and count() being
added to tuples.


Mike
 
A

Antoon Pardon

So how about it? All you people who desperately want tuples to grow an
index method -- will any of you donate your time to write and maintain the
code?

But as far as I understood the code is already there; the code for
list.index being usable almost as it is.

It doesn't seem to be a question of where to put valuable resource.
AFAIU it is simply a question of do the developers want it or not.
 
A

Antoon Pardon

Note that the mail I responded to was using being drunk, not knowing
any better, and having fun as use cases for the method. That sounds
like Ruby-style method proliferation to me ;)



Well, I'm not Guido obviously, but here's why I don't find it baffling.

There are 2 main reasons why you'd use an immutable sequence for something:
1) You want to make sure it's not modified by a callee. This is
unPythonic and mostly unnecessary. Pass them a copy if you're that
paranoid.

Why then does python itself provide immutables? I find this reasoning
more than baffling. There has been all these arguments about why
it is best to use immutables as dictionary keys. But the moment
the topic changes, someone else comes with the comment that
wanting your sequence to be immuatble is unpythonic.

I once had a problem I like to solve by having a dictionary
where the keys were multidimensional points on an integer grid.
For a number of reasons I thought it would be easier if I could
use lists, but most people argued that would be a bad idea and
that I should use tuples, because they are immutable.

Of course if I now would want to find out if the point is on an
axis and which axis that is, I cannot use index because that is
not available.
 
A

Antoon Pardon

One might perversely allow extension to lists and tuples to allow

[3, 4] in [1, 2, 3, 4, 5, 6]

to succeed, but that's forcing the use case beyond normal limits.

I'd love to have that! There are at least one million use cases for
finding a sequence in a sequence and implementing it yourself is
non-trivial. Plus then both list and tuple's index methods would work
*exactly* like string's. It would be easier to document and more
useful. A big win.

=======================
It would be ambiguous: [3,4] in [[1,2], [3,4], [5,6]] is True now.

Strings are special in that s can only be a (sub)string of length 1.
'b' in 'abc' is True. This makes looking for longer substrings easy.

However, [2] in [1,2,3] is False. IE, list is not normally a list. So
looking for sublists is different from looking for items.


Well I think this illustrates nicely what can happen if you design by
use cases.

Let us assume for a moment that finding out if one list is a sublist of
a second list gets considered something usefull enough to be included
in Python. Now the in operator can't be used for this because it
would create ambiguities. So it would become either a new operator
or a new method. But whatever the solution it would be different
from the string solution.

Now if someone would have thought about how "st1 in st2" would
generalize to other sequemce if st1 contained more than one
character they probably would have found the possible inconsistency
that could create and though about using an other way than using
the in-operator for this with strings. A way that wouldn't create
ambiguities when it was considered to be extended to other sequences.
 
P

Paul Boddie

Why then does python itself provide immutables? I find this reasoning
more than baffling. There has been all these arguments about why
it is best to use immutables as dictionary keys.

You've answered your own question. If you had a mutable dictionary
key, stored something in a dictionary using that key, then modified
the key and tried to retrieve the stored item using that same key
object, you might never find that item again. This is something of a
simplification (you'd have to look into details of things like
__hash__ and __eq__, I imagine), but this is just one area where
immutability is central to the operation of the feature concerned.

Other languages provide some control over immutability with things
like "const", and there are good reasons for having such things,
although you do need to know what you're doing as a programmer when
using them. Some people might argue that the dictionary key example
given above is contrived: "Of course it won't work if you modify the
key!" they might say. Having some idea of which objects are immutable
can provide some protection from inadvertent mutation, however.
But the moment the topic changes, someone else comes with the comment that
wanting your sequence to be immuatble is unpythonic.

As soon as "unpythonic" is mentioned we enter subjective territory.

Paul
 
A

Antoon Pardon

You've answered your own question.

Well since it was meant as a rethorical question that is hardly
surprising. But may be I should work harder on my rethorical
skill since you missed that.
 
S

Steve Holden

Antoon said:
One might perversely allow extension to lists and tuples to allow

[3, 4] in [1, 2, 3, 4, 5, 6]

to succeed, but that's forcing the use case beyond normal limits.
I'd love to have that! There are at least one million use cases for
finding a sequence in a sequence and implementing it yourself is
non-trivial. Plus then both list and tuple's index methods would work
*exactly* like string's. It would be easier to document and more
useful. A big win.

=======================
It would be ambiguous: [3,4] in [[1,2], [3,4], [5,6]] is True now.

Strings are special in that s can only be a (sub)string of length 1.
'b' in 'abc' is True. This makes looking for longer substrings easy.

However, [2] in [1,2,3] is False. IE, list is not normally a list. So
looking for sublists is different from looking for items.


Well I think this illustrates nicely what can happen if you design by
use cases.

Let us assume for a moment that finding out if one list is a sublist of
a second list gets considered something usefull enough to be included
in Python. Now the in operator can't be used for this because it
would create ambiguities. So it would become either a new operator
or a new method. But whatever the solution it would be different
from the string solution.

That's because strings are different from other sequences. See below.
Now if someone would have thought about how "st1 in st2" would
generalize to other sequemce if st1 contained more than one
character they probably would have found the possible inconsistency
that could create and though about using an other way than using
the in-operator for this with strings. A way that wouldn't create
ambiguities when it was considered to be extended to other sequences.
The fact is that strings are the only sequences composed of subsequences
of length 1 - in other words the only sequences where type(s) ==
type(s[0:1]) is an invariant condition.

This was discussed (at my instigation, IIRC) on python-dev when Python
(2.4?) adopted the enhanced semantics for "in" on strings - formerly
only tests for single characters were allowed - but wasn't thought
significant enough to deny what was felt to be a "natural" usage for
strings only.

regards
Steve
 
S

Steve Holden

Antoon said:
One might perversely allow extension to lists and tuples to allow

[3, 4] in [1, 2, 3, 4, 5, 6]

to succeed, but that's forcing the use case beyond normal limits.
I'd love to have that! There are at least one million use cases for
finding a sequence in a sequence and implementing it yourself is
non-trivial. Plus then both list and tuple's index methods would work
*exactly* like string's. It would be easier to document and more
useful. A big win.

=======================
It would be ambiguous: [3,4] in [[1,2], [3,4], [5,6]] is True now.

Strings are special in that s can only be a (sub)string of length 1.
'b' in 'abc' is True. This makes looking for longer substrings easy.

However, [2] in [1,2,3] is False. IE, list is not normally a list. So
looking for sublists is different from looking for items.


Well I think this illustrates nicely what can happen if you design by
use cases.

Let us assume for a moment that finding out if one list is a sublist of
a second list gets considered something usefull enough to be included
in Python. Now the in operator can't be used for this because it
would create ambiguities. So it would become either a new operator
or a new method. But whatever the solution it would be different
from the string solution.

That's because strings are different from other sequences. See below.
Now if someone would have thought about how "st1 in st2" would
generalize to other sequemce if st1 contained more than one
character they probably would have found the possible inconsistency
that could create and though about using an other way than using
the in-operator for this with strings. A way that wouldn't create
ambiguities when it was considered to be extended to other sequences.
The fact is that strings are the only sequences composed of subsequences
of length 1 - in other words the only sequences where type(s) ==
type(s[0:1]) is an invariant condition.

This was discussed (at my instigation, IIRC) on python-dev when Python
(2.4?) adopted the enhanced semantics for "in" on strings - formerly
only tests for single characters were allowed - but wasn't thought
significant enough to deny what was felt to be a "natural" usage for
strings only.

regards
Steve
 
A

Antoon Pardon

Antoon said:
One might perversely allow extension to lists and tuples to allow

[3, 4] in [1, 2, 3, 4, 5, 6]

to succeed, but that's forcing the use case beyond normal limits.
I'd love to have that! There are at least one million use cases for
finding a sequence in a sequence and implementing it yourself is
non-trivial. Plus then both list and tuple's index methods would work
*exactly* like string's. It would be easier to document and more
useful. A big win.

=======================
It would be ambiguous: [3,4] in [[1,2], [3,4], [5,6]] is True now.

Strings are special in that s can only be a (sub)string of length 1.
'b' in 'abc' is True. This makes looking for longer substrings easy.

However, [2] in [1,2,3] is False. IE, list is not normally a list. So
looking for sublists is different from looking for items.


Well I think this illustrates nicely what can happen if you design by
use cases.

Let us assume for a moment that finding out if one list is a sublist of
a second list gets considered something usefull enough to be included
in Python. Now the in operator can't be used for this because it
would create ambiguities. So it would become either a new operator
or a new method. But whatever the solution it would be different
from the string solution.

That's because strings are different from other sequences. See below.
Now if someone would have thought about how "st1 in st2" would
generalize to other sequemce if st1 contained more than one
character they probably would have found the possible inconsistency
that could create and though about using an other way than using
the in-operator for this with strings. A way that wouldn't create
ambiguities when it was considered to be extended to other sequences.
The fact is that strings are the only sequences composed of subsequences
of length 1 - in other words the only sequences where type(s) ==
type(s[0:1]) is an invariant condition.


Yes this allows you to do some things for strings in a way that would
be impossible or ambiguous should you want to do the same things for
other kind of sequences.

The question is: should you?

if you want to provide new functionality for strings and you have
the choice between doing it

1) in a way, that will make it easy to extend this functionality
in a consistent way to other sequences

2) in a way that will make it impossible to extend this functionality
in a consistent way to other sequences.

then I think that unless you have very good arguments you should pick (1).

Because if you pick (2) even if this functionality is never extened in
the language itself, you make it more difficult for programmers to add
this functionality in a consistent way to a subclass of list themselves.

People are always defending duck-typing in this news group and now python
has chosen to choose the option that makes duck-typing more difficult.
This was discussed (at my instigation, IIRC) on python-dev when Python
(2.4?) adopted the enhanced semantics for "in" on strings - formerly
only tests for single characters were allowed - but wasn't thought
significant enough to deny what was felt to be a "natural" usage for
strings only.

Which I consider a pity.
 
C

Carsten Haese

People are always defending duck-typing in this news group and now python
has chosen to choose the option that makes duck-typing more difficult.

Au contraire! The "inconsistent" behavior of "in" is precisely what
duck-typing is all about: Making the operator behave in a way that makes
sense in its context. Nobody seems to be complaining about "+" behaving
"inconsistently" depending on whether you're adding numbers or
sequences.

-Carsten
 
S

Steven D'Aprano

I once had a problem I like to solve by having a dictionary
where the keys were multidimensional points on an integer grid.
For a number of reasons I thought it would be easier if I could
use lists, but most people argued that would be a bad idea and
that I should use tuples, because they are immutable.

Also because code that raises "TypeError: list objects are unhashable" is
probably not going to work very well.

Of course if I now would want to find out if the point is on an axis and
which axis that is, I cannot use index because that is not available.

If memory is more important to you than speed:

class IndexTuple(tuple):
def index(self, target):
for i, x in enumerate(self):
if x == target: return i
raise ValueError

Or if speed is more important to you than memory:

class IndexTuple2(tuple):
def index(self, target):
return list(self).index(target)

If you prefer not to subclass, you can write an index function:

def index(sequence_or_mapping, target):
try:
return sequence_or_mapping.index(target)
except AttributeError:
return list(sequence_or_mapping).index(target)


So much fuss over such a little thing... yes it would be nice if tuples
grew an index method, but it isn't hard to work around the lack.
 
D

Donn Cave

Antoon Pardon said:
On 2007-04-11, Steven D'Aprano <[email protected]>
wrote:
Lists are designed for sequences of homogeneous items, e.g.:

L = [1, 2, 4, 8, 16, 32]
while tuples are designed to be more like structs or records, with
heterogeneous items, e.g.:

T = ("Fred", 32, 12.789, {}, None, '\t')

I think you are confused. Last time I heard this homogeneous items stuf,
it had nothing to do with the types being the same. They were homogeneous
because they somehow belonged together and heterogeneous because they
just happened to live together. Similarity of type played no part in
calling the data homogeneous or heterogeneous.

Then you are confused. The typical use case for tuples are database
records. The columns in the table can have completely different types but
the values in a row, represented as a Python tuple, of course belong
together.

Don't blame me. I don't agree with the view. But that was sort of the
explanation that was given here last time I remember this topic came
up in defending why tuples and lists differ in a number of ways that
are less obvious.

They wrote about lists containing homogeneous items and tuples
containing hetergenous items but stressed rather strongly that
this shouldn't be understood in terms of type similarities.

Well, yes - consider for example the "tm" tuple returned
from time.localtime() - it's all integers, but heterogeneous
as could be - tm[0] is Year, tm[1] is Month, etc., and it
turns out that not one of them is alike. The point is exactly
that we can't discover these differences from the items itself -
so it isn't about Python types - but rather from the position
of the item in the struct/tuple. (For the person who is about
to write to me that localtime() doesn't exactly return a tuple: QED)

Donn Cave, (e-mail address removed)
 
A

Antoon Pardon

Au contraire! The "inconsistent" behavior of "in" is precisely what
duck-typing is all about: Making the operator behave in a way that makes
sense in its context.

No it isn't. Ducktyping is about similar objects using a similar
interface to invoke similar behaviour and getting similar result.

So that if you write a function you don't concern yourself with
the type of the arguments but depend on the similar behaviour.

Suppose someone writes a function that acts on a sequence.
The algorithm used depending on the following invariant.

i = s.index(e) => s = e

Then this algorithm is no longer guaranteed to work with strings.


On the other hand I subclass list and add a sub method
to check for the argument being a sublist of the object.
Now I write a function that depends on this functionality.
But although strings have the functionality I can't use
them as argument because the functionality is invoked
in a different way.
Nobody seems to be complaining about "+" behaving
"inconsistently" depending on whether you're adding numbers or
sequences.

You are wrong. I already mentioned problems with it. The
problem is that there are structures that are numbers and
sequences at the same time. So I have a choice. Either I
overload the "+" to get an addition or to get a concatanation.

In the first case I can't trust my structure to work with
functions that expect a general sequence because they
may depend on the fact that "+" concatenates. In the
other case I can't trust my structure to work with
numbers because they may depend on the fact that "+"
behaves like an addition.
 
A

Antoon Pardon

Also because code that raises "TypeError: list objects are unhashable" is
probably not going to work very well.



If memory is more important to you than speed:

class IndexTuple(tuple):
def index(self, target):
for i, x in enumerate(self):
if x == target: return i
raise ValueError

Or if speed is more important to you than memory:

class IndexTuple2(tuple):
def index(self, target):
return list(self).index(target)

If you prefer not to subclass, you can write an index function:

def index(sequence_or_mapping, target):
try:
return sequence_or_mapping.index(target)
except AttributeError:
return list(sequence_or_mapping).index(target)


So much fuss over such a little thing... yes it would be nice if tuples
grew an index method, but it isn't hard to work around the lack.

Yes it is a little thing. But if it is such a little thing why do
the developers don't simply add it?

Python like any other human product has it shortcomings. I can live
with them. If a wart turns up and the general answer would be, yes
we know it is a wart but it is the result of how python grew or
it was the best compromise we could think of and too much depends
on it now to change it. This kind of threads would die quickly.

Instead we often enough see the warts getting defended as good design.
 
S

Steven D'Aprano

Yes it is a little thing. But if it is such a little thing why do
the developers don't simply add it?

Perhaps because they've got better things to do than spend all their time
adding little things that are the work of thirty seconds for a developer
like yourself to create when you need it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,165
Latest member
JavierBrak
Top