"index" method only for mutable sequences??

S

Steve Holden

Antoon said:
No it isn't. Ducktyping is about similar objects using a similar
interface to invoke similar behaviour and getting similar result.

So that if you write a function you don't concern yourself with
the type of the arguments but depend on the similar behaviour.
Please note that "similar" does not mean "exact".

The behavior of str.__contains__ and list.__contains__ is similar.

Duck-typing allows natural access to polymorphism. You appear to be
making semantic distinctions merely for the sake of continuing this
rather fatuous thread.
Suppose someone writes a function that acts on a sequence.
The algorithm used depending on the following invariant.

i = s.index(e) => s = e

Then this algorithm is no longer guaranteed to work with strings.

Because strings have different properties than other sequences. I can't
help pointing out that your invariant is invalid for tuples also,
because tuples don't have a .index() method.
On the other hand I subclass list and add a sub method
to check for the argument being a sublist of the object.
Now I write a function that depends on this functionality.
But although strings have the functionality I can't use
them as argument because the functionality is invoked
in a different way.


You are wrong. I already mentioned problems with it. The
problem is that there are structures that are numbers and
sequences at the same time. So I have a choice. Either I
overload the "+" to get an addition or to get a concatanation.

In the first case I can't trust my structure to work with
functions that expect a general sequence because they
may depend on the fact that "+" concatenates. In the
other case I can't trust my structure to work with
numbers because they may depend on the fact that "+"
behaves like an addition.
Good grief.
 
A

Antoon Pardon

Please note that "similar" does not mean "exact".

That is because I don't want to get down in an argument about
whether tp[:3] and ls[:3] is similar behaviour or exact the
same behaviour when tp is a tuple and ls is a list.
The behavior of str.__contains__ and list.__contains__ is similar.

That would depend on how much you find things may differ and
still call them similar. IMO they are not similar enough
since "12" in "123" doesn't behave like [1,2] in [1,2,3]
Duck-typing allows natural access to polymorphism. You appear to be
making semantic distinctions merely for the sake of continuing this
rather fatuous thread.

I gave an argument that showed that the specific way the in
functionality was extended in strings makes duck-typing (and
by extention natural access to polymorphism) more difficult.
although it may do so in a way that is not significant to
you and the other developers.

Now if you don't agree with the argument presented that
is fine with me. If you think the problem is not big
enough to bother with, that is fine with me too.
But the argument doesn't disappear simply because you
think ill of my intentions.

And consider that each small inconsistency in itself
may be not important enough to remove. But if you
have enough of them remembering all these special
cases can become tedious.
Suppose someone writes a function that acts on a sequence.
The algorithm used depending on the following invariant.

i = s.index(e) => s = e

Then this algorithm is no longer guaranteed to work with strings.

Because strings have different properties than other sequences. I can't
help pointing out that your invariant is invalid for tuples also,
because tuples don't have a .index() method.


Strings have some properties that are different and some
properties that are similar with other sequences. My argument
is that if you want to facilitate duck typing and natural access to
polymorphism in peoples functions that work with sequences in general
you'd better take care that the sequence api of strings resembles
the sequence api of other sequences as good as possible.

You on the other hand seem to argue that since strings have
properties where they differ from other sequences it no longer
is so important that the sequence api of strings resembles those
of other sequences.
 
B

Brian van den Broek

Antoon Pardon said unto the world upon 04/13/2007 02:46 AM:
Yes it is a little thing. But if it is such a little thing why do
the developers don't simply add it?

It's wafer thin!
 
S

Steve Holden

Antoon said:
Please note that "similar" does not mean "exact".

That is because I don't want to get down in an argument about
whether tp[:3] and ls[:3] is similar behaviour or exact the
same behaviour when tp is a tuple and ls is a list.
The behavior of str.__contains__ and list.__contains__ is similar.

That would depend on how much you find things may differ and
still call them similar. IMO they are not similar enough
since "12" in "123" doesn't behave like [1,2] in [1,2,3]
And it never will, because of the property of strings I mentioned
previously. Unless you want to introduce a character type into Python
there is no way that you are ever going to be be satisfied.
I gave an argument that showed that the specific way the in
functionality was extended in strings makes duck-typing (and
by extention natural access to polymorphism) more difficult.
although it may do so in a way that is not significant to
you and the other developers.
I am not "a developer".
Now if you don't agree with the argument presented that
is fine with me. If you think the problem is not big
enough to bother with, that is fine with me too.
But the argument doesn't disappear simply because you
think ill of my intentions.
Apparently.

And consider that each small inconsistency in itself
may be not important enough to remove. But if you
have enough of them remembering all these special
cases can become tedious.
But not as tedious as this eternal discussion of already-decided issues.
Suppose someone writes a function that acts on a sequence.
The algorithm used depending on the following invariant.

i = s.index(e) => s = e

Then this algorithm is no longer guaranteed to work with strings.

Because strings have different properties than other sequences. I can't
help pointing out that your invariant is invalid for tuples also,
because tuples don't have a .index() method.


Strings have some properties that are different and some
properties that are similar with other sequences. My argument
is that if you want to facilitate duck typing and natural access to
polymorphism in peoples functions that work with sequences in general
you'd better take care that the sequence api of strings resembles
the sequence api of other sequences as good as possible.

This is just a bald restatement of the same argument you feel makes it
desirable to add an index() method to tuples. If taken to its logical
(and ridiculous) extreme there should only be one sequence type in Python.
You on the other hand seem to argue that since strings have
properties where they differ from other sequences it no longer
is so important that the sequence api of strings resembles those
of other sequences.
Well, of course. Programming languages are for human users, and they
should do what human users find most natural. Since humans can disagree
the developers (amongst who I do not count myself, although I *am*
concerned about the development of Python) have to try and go by
consensus, which by and large they do reasonably successfully.

So what I suppose I *am* saying is that your opinions would seem to
differ from the consensus. While you are not in a minority of one you
are in a minority, and it would be nice if we could proceed without
having to continually revisit each small design decision on a continuous
basis.

regards
Steve
 
S

Steve Holden

Brian said:
Antoon Pardon said unto the world upon 04/13/2007 02:46 AM:

It's wafer thin!
Quite. [The Python language adds an index() method to tuples and
promptly EXPLODES]. Thank you, Mr. Creosote.

regards
Steve
 
R

Rhamphoryncus

Suppose someone writes a function that acts on a sequence.
The algorithm used depending on the following invariant.

i = s.index(e) => s = e

Then this algorithm is no longer guaranteed to work with strings.


It never worked correctly on unicode strings anyway (which becomes the
canonical string in python 3.0). The base unit exposed by the
implementation is rarely what you want to operate upon.

The terminology is pretty confusing, but let's see if I can lay out
the relationships here:

byte ≤ code unit ≤ code point ≤ scalar value ≤ grapheme cluster ~
character ≤ syllable ≤ word ≤ sentence ≤ paragraph

"12" in "123" allows you to handle bytes through scalar values the
same way, glossing over the implementation details (such as UTF-32 on
linux and UTF-16 on windows).
 
P

Paul Rubin

Steve Holden said:
This is just a bald restatement of the same argument you feel makes it
desirable to add an index() method to tuples. If taken to its logical
(and ridiculous) extreme there should only be one sequence type in
Python.

That doesn't sound ridiculous given type/class unification. There
could be a single sequence class that implements functions like index.
Subclasses like strings, tuples, lists, etc. would inherit from it.
Some of them might have optimized or customized implementations of
those standard operations, others might not.
 
P

Paul Rubin

Rhamphoryncus said:
i = s.index(e) => s = e
Then this algorithm is no longer guaranteed to work with strings.

It never worked correctly on unicode strings anyway (which becomes the
canonical string in python 3.0).


What?! Are you sure? That sounds broken to me.
 
H

Hendrik van Rooyen

sense in its context. Nobody seems to be complaining about "+" behaving
"inconsistently" depending on whether you're adding numbers or
sequences.

I would If I thought it would do some good - the plus sign as a joiner
was, I think, a bad decision.

Just write a routine to calculate the checksum of an Intel Hex file record
to see what I mean.

- Hendrik
 
H

Hendrik van Rooyen

Donn Cave said:
Well, yes - consider for example the "tm" tuple returned
from time.localtime() - it's all integers, but heterogeneous
as could be - tm[0] is Year, tm[1] is Month, etc., and it
turns out that not one of them is alike. The point is exactly
that we can't discover these differences from the items itself -
so it isn't about Python types - but rather from the position
of the item in the struct/tuple. (For the person who is about
to write to me that localtime() doesn't exactly return a tuple: QED)

This is the point where the whole thing falls apart in my head and
I get real confused - I can't find a reason why, list or tuple, the first
item can't be something, the second something else, etc...

About the only reason you would use a tuple is if you want to
use it as a key to a dict - and then only because you have to,
you can't use a list as the language stands.

- Hendrik
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

The use case has already been discussed. Removing the pointless
inconsistency between lists and tuples means you can stop having to
remember it, so you can free up brain cells for implementing useful
things. That increases your programming productivity.

So to increase consistency, the .index method should be removed
from lists, as well, IMO. If you find yourself doing a linear
search, something is wrong.

Regards,
Martin
 
H

Hendrik van Rooyen

So to increase consistency, the .index method should be removed
from lists, as well, IMO. If you find yourself doing a linear
search, something is wrong.
I agree.
You should at the very least make it a binary search.
To do that you have to sort the list.
Much more efficient.

; - )

Please Sir, can I have the DoubleDict I asked for elsewhere in this thread?

- Hendrik
 
S

Steven D'Aprano

This is the point where the whole thing falls apart in my head and
I get real confused - I can't find a reason why, list or tuple, the first
item can't be something, the second something else, etc...

It's not that they _can't_ be, but that the two sequence types were
designed for different uses:

Here are some tuples of statistics about people:

fred = (35, 66, 212) # age, height, weight
george = (42, 75, 316)

They are tuples because each one is like a Pascal record or a C struct.

It isn't likely that you'll be in a position where you know that Fred's
age, height or weight is 66, but you don't know which one and so need
fred.index() to find out. Hence, tuples weren't designed to have an index
method.

Here are some lists of statistics about people:

ages = [35, 42, 26, 17, 18]
heights = [66, 75, 70, 61, 59]
weights = [212, 316, 295, 247, 251]
# notice that the first column is fred, the second column is george, etc.

They are mutable lists rather than immutable tuples because you don't know
ahead of time how many data items you need to store.

Now, it is likely that you'll want to know which column(s) have an age of
26, so lists were designed to have an index method.

Now, that's the sort of things tuples and lists were designed for. If you
want to use them for something else, you're free to.
 
R

Rhamphoryncus

Rhamphoryncus said:
i = s.index(e) => s = e
Then this algorithm is no longer guaranteed to work with strings.

It never worked correctly on unicode strings anyway (which becomes the
canonical string in python 3.0).


What?! Are you sure? That sounds broken to me.


Nope, it's pretty fundamental to working with text, unicode only being
an extreme example: there's a wide number of ways to break down a
chunk of text, making the odds of "e" being any particular one fairly
low. Python's unicode type only makes this slightly worse, not
promising any particular one is available.

For example, if you had an algorithm designed for ascii that gathered
statistics on how common each "character" is, you'd want to redesign
it to use either grapheme clusters or scalar values, then improve it
to merge duplicate characters. You'd need to roll your own iterator
though, Python doesn't provide a method that's specifically grapheme
clusters or scalar values (and if I'm wrong I'd love to hear it!).
 
P

Paul Rubin

Rhamphoryncus said:
i = s.index(e) => s = e
Then this algorithm is no longer guaranteed to work with strings.
It never worked correctly on unicode strings anyway (which becomes the
canonical string in python 3.0).


What?! Are you sure? That sounds broken to me.


Nope, it's pretty fundamental to working with text, unicode only being
an extreme example: there's a wide number of ways to break down a
chunk of text, making the odds of "e" being any particular one fairly
low. Python's unicode type only makes this slightly worse, not
promising any particular one is available.


I don't understand this. I thought that unicode was a character
coding system like ascii, except with an enormous character set
combined with a bunch of different algorithms for encoding unicode
strings as byte sequences. But I've thought of those algorithms
(UTF-8 and so forth) as basically being kludgy data compression
schemes, and unicode strings are still just sequences of code points.
 
A

Antoon Pardon

Antoon said:
Antoon Pardon wrote:
On Thu, 2007-04-12 at 14:10 +0000, Antoon Pardon wrote:
People are always defending duck-typing in this news group and now python
has chosen to choose the option that makes duck-typing more difficult.
Au contraire! The "inconsistent" behavior of "in" is precisely what
duck-typing is all about: Making the operator behave in a way that makes
sense in its context.
No it isn't. Ducktyping is about similar objects using a similar
interface to invoke similar behaviour and getting similar result.

So that if you write a function you don't concern yourself with
the type of the arguments but depend on the similar behaviour.

Please note that "similar" does not mean "exact".

That is because I don't want to get down in an argument about
whether tp[:3] and ls[:3] is similar behaviour or exact the
same behaviour when tp is a tuple and ls is a list.
The behavior of str.__contains__ and list.__contains__ is similar.

That would depend on how much you find things may differ and
still call them similar. IMO they are not similar enough
since "12" in "123" doesn't behave like [1,2] in [1,2,3]
And it never will, because of the property of strings I mentioned
previously. Unless you want to introduce a character type into Python
there is no way that you are ever going to be be satisfied.

The properties of strings didn't force the developers to make those
two behave differently. They could have made the choice that "12"
in "123" returned False and could have introduced a method that would
return True or False depending on whether the argument was a substring
or not. The same method could then eventually be used in other sequences
to test whether the argument was a subsequence or not. Either by
the python-developers themselves if they ever thought that usefull
or by any programmer who could add this functionality to a subclass.

Yes the properties of strings allowed for the solution the python
developers have chosen, a solution not extendable to other
sequence types. So yes [1,2] in [1,2,3] will never behave like "12" in
"123" currently does and the properties of strings allowed it to evolve this way
but in the end it was a design choice that could have been made differently
and could have been made in a way to allow more duck typing and more
access to polymorphism.
But not as tedious as this eternal discussion of already-decided issues.

A number of those "decided" issue have been changed. Besides
nobody is forcing you to participate. If you think these
kind of issues is too tedious for your taste, feel free
to no longer participate.
This is just a bald restatement of the same argument you feel makes it
desirable to add an index() method to tuples.

No it is not a bald statement. If tuple would have methods like index
and count, more functions could be written that are indifferent to
the argument being a tuple or a list or at least it would make
writing such a function easier, so it would allow for more
duck typing and give more access to polymorphism.

You may think this kind of duck typing and polymorphism insignificant
but that doesn't change the truth about the above statement.
If taken to its logical
(and ridiculous) extreme there should only be one sequence type in Python.

No it doesn't. There is a big difference between having sequences with
different properties because there is a need for those different
properties and making things more different than needed and using
the need for different properties to introduce differences that are
unnecessary.
Well, of course. Programming languages are for human users, and they
should do what human users find most natural. Since humans can disagree
the developers (amongst who I do not count myself, although I *am*
concerned about the development of Python) have to try and go by
consensus, which by and large they do reasonably successfully.

But the defence of not having tuple.index has never been about
what was natural to the user or not, but has always been about what
tuples were supposedly intended for.
So what I suppose I *am* saying is that your opinions would seem to
differ from the consensus. While you are not in a minority of one you
are in a minority, and it would be nice if we could proceed without
having to continually revisit each small design decision on a continuous
basis.

I am not so sure I'm in a minority. This kind of thing is not decided by
consensus at least not among the python users. It is the sole decision of
the BDFL. Besides, this is usenet, all kind of things get revisted here on
a continuous basis. Why should design decisions be an exception?
 
R

Rhamphoryncus

I don't understand this. I thought that unicode was a character
coding system like ascii, except with an enormous character set
combined with a bunch of different algorithms for encoding unicode
strings as byte sequences. But I've thought of those algorithms
(UTF-8 and so forth) as basically being kludgy data compression
schemes, and unicode strings are still just sequences of code points.

Indexing cost, memory efficiency, and canonical representation: pick
two. You can't use a canonical representation (scalar values) without
some sort of costly search when indexing (O(log n) probably) or by
expanding to the worst-case size (UTF-32). Python has taken the
approach of always providing efficient indexing (O(1)), but you can
compile it with either UTF-16 (better memory efficiency) or UTF-32
(canonical representation).

As an aside, I feel the need to clarify the terms "code points" and
"scalar values". The only difference is that "code points" includes
the surrogates, whereas "scalar values" does not. As the surrogates
are just an encoding detail of UTF-16 I feel this makes "scalar
values" the more canonical term. It's all quite confusing though x_x.
 
P

Paul Rubin

Rhamphoryncus said:
Indexing cost, memory efficiency, and canonical representation: pick
two. You can't use a canonical representation (scalar values) without
some sort of costly search when indexing (O(log n) probably) or by
expanding to the worst-case size (UTF-32). Python has taken the
approach of always providing efficient indexing (O(1)), but you can
compile it with either UTF-16 (better memory efficiency) or UTF-32
(canonical representation).

I still don't get it. UTF-16 is just a data compression scheme, right?
I mean, s[17] isn't the 17th character of the (unicode) string regardless
of which memory byte it happens to live at? It could be that that accessing
it takes more than constant time, but that's hidden by the implementation.

So where does the invariant c==s[s.index(c)] fail, assuming s contains c?
 
N

Neil Hodgson

Paul Rubin:
I still don't get it. UTF-16 is just a data compression scheme, right?
I mean, s[17] isn't the 17th character of the (unicode) string regardless
of which memory byte it happens to live at? It could be that that accessing
it takes more than constant time, but that's hidden by the implementation.

Python Unicode strings are arrays of code units which are either 16
or 32 bits wide with the width of a code unit determined when Python is
compiled. s[17] will be the 18th code unit of the string and is found by
indexing with no ancillary data structure or processing to interpret the
string as a sequence of code points.

This is the same technique used by other languages such as Java.
Implementing the Python string type with a data structure that can
switch between UTF-8, UTF-16 and UTF-32 while preserving the appearance
of a UTF-32 sequence has been proposed but has not gained traction due
to issues of complexity and cost.

Neil
 
R

Roel Schroeven

Paul Rubin schreef:
Rhamphoryncus said:
Indexing cost, memory efficiency, and canonical representation: pick
two. You can't use a canonical representation (scalar values) without
some sort of costly search when indexing (O(log n) probably) or by
expanding to the worst-case size (UTF-32). Python has taken the
approach of always providing efficient indexing (O(1)), but you can
compile it with either UTF-16 (better memory efficiency) or UTF-32
(canonical representation).

I still don't get it. UTF-16 is just a data compression scheme, right?
I mean, s[17] isn't the 17th character of the (unicode) string regardless
of which memory byte it happens to live at? It could be that that accessing
it takes more than constant time, but that's hidden by the implementation.

So where does the invariant c==s[s.index(c)] fail, assuming s contains c?

I didn't get it either, but now I understand. Like you, I thought Python
Unicode strings contain a canonical representation (in interface, not
necessarily in implementation) but apparently that is not true; see
Neil's post and the reference manual
(http://docs.python.org/ref/types.html#l2h-22).

A simple example on my Python installation, apparently compiled to use
UTF-16 (sys.maxunicode == 65535):
>>> s = u'\u1d400'
>>> s.index(s) 0
>>> s[0] u'\u1d40'
>>> s == s[0]
False


In this case s[0] is not the full Unicode scalar, but instead just the
first part of the surrogate pair consisting of 0x1D40 (in s[0]) and
0x0000 (in s[1]).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,157
Latest member
MercedesE4
Top