General question about Python design goals

  • Thread starter Christoph Zwerschke
  • Start date
E

Ed Singleton

Depends on how you understand "perfectly usable." My collegue always
carries his expensive racing bike to our office in the 3rd floor out of
fear it may get wet or stolen. But I think this is not very convenient
and I want to avoid discussions with our boss about skid marks on the
carpet and things like that. Probably that collegue would not complain
as well if he had to cast tuples to lists for counting items - you see,
people are different ;-)

If you're leaving skid marks on the floor, maybe you need better underwear?

http://www.google.co.uk/search?q=skid+marks+slang

Sorry to lower the tone.

Ed
 
C

Christoph Zwerschke

Donn said:
As I'm sure everyone still reading has already heard, the natural usage
of a tuple is as a heterogenous sequence. I would like to explain this
using the concept of an "application type", by which I mean the set of
values that would be valid when applied to a particular context. For
example, os.spawnv() takes as one of its arguments a list of command
arguments, time.mktime() takes a tuple of time values. A homogeneous
sequence is one where a and a[x:y] (where x:y is not 0:-1) have
the same application type. A list of command arguments is clearly
homogeneous in this sense - any sequence of strings is a valid input,
so any slice of this sequence must also be valid. (Valid in the type
sense, obviously the value and thus the result must change.) A tuple
of time values, though, must have exactly 9 elements, so it's heterogeneous
in this sense, even though all the values are integer.

I understand what you want to say, but I would not use the terms
"homogenuous" or "heterogenous" since their more obvious meaning is that
all elements of a collection have the same type.

What you are calling an "application type" is a range of values, and the
characteristic you are describing is that the range of values is not
left when you slice (or extend) an object. So what you are describing is
simply a slicable/extendable application type. It is obvious that you
would use lists for this purpose, and not tuples, I completely agree
with you here. But this is just a consequence of the immutability of
tuples which is their more fundamental characteristic.

Let me give an example: Take all nxn matrices as your application type.
That applicaiton type is clearly not slicable/extendable, because this
would change the dimension, thus not "heterogenous" in your definition.

So would you use tuples (of tuples) or lists (of lists) here? Usually
you will use lists, because you want to be able to operate on the
matrices and transform them in place. So you see the more fundamental
characteristic and reason for prefering lists over tuples is mutability.

Let us assume you want to calculate the mathematical rank of such a
matrix. You would bring it in upper echelon shape (here, you are
operating on the rows, thus you would use lists) and then you would
count the all-zero rows. Ok, this is not an example for using count() on
tuples, but it is an example for using count() on a "heterogenous"
collection in your definition.

I completely agree that you will need count() and item() much less
frequently on tuples because of their immutability. This is obvious.
(Tuples themselves are already used less frequently than lists for this
reason.) But I still cannot see why you would *never* use it or why it
would be bad style.

And I don't understand why those who smile at my insistence on design
principles of consistence - propagating practicability instead - are
insisting themselves on some very philosophical and non-obvious design
principles or characteristics of tuples.

-- Christoph
 
A

Antoon Pardon

Quoth (e-mail address removed):
| Christoph Zwerschke wrote:
...
|> Sorry, but I still do not get it. Why is it a feature if I cannot count
|> or find items in tuples? Why is it bad program style if I do this? So
|> far I haven't got any reasonable explanation and I think there is no.
|
| I have no idea, I can understand their view, not necessarily agree. And
| reasonable explanation is not something I usually find on this group,
| for issues like this.

It's hard to tell from this how well you do understand it, and of
course it's hard to believe another explanation is going to make
any difference to those who are basically committed to the opposing
point of view. But what the hell.

Tuples and lists really are intended to serve two fundamentally different
purposes. We might guess that just from the fact that both are included
in Python, in fact we hear it from Guido van Rossum, and one might add
that other languages also make this distinction (more clearly than Python.)

As I'm sure everyone still reading has already heard, the natural usage
of a tuple is as a heterogenous sequence. I would like to explain this
using the concept of an "application type", by which I mean the set of
values that would be valid when applied to a particular context. For
example, os.spawnv() takes as one of its arguments a list of command
arguments, time.mktime() takes a tuple of time values. A homogeneous
sequence is one where a and a[x:y] (where x:y is not 0:-1) have
the same application type. A list of command arguments is clearly
homogeneous in this sense - any sequence of strings is a valid input,
so any slice of this sequence must also be valid. (Valid in the type
sense, obviously the value and thus the result must change.) A tuple
of time values, though, must have exactly 9 elements, so it's heterogeneous
in this sense, even though all the values are integer.

One doesn't count elements in this kind of a tuple, because it's presumed
to have a natural predefined number of elements. One doesn't search for
values in this kind of a tuple, because the occurrence of a value has
meaning only in conjunction with its location, e.g., t[4] is how many
minutes past the hour, but t[5] is how many seconds, etc.

I don't agree with this. Something can be a hetergenous sequence, but
the order can be arbitrary, so that any order of the elements can work
as long as at is well defined beforehand. Points on a 2D latice are
by convention notated as (x,y) but (y,x) works just as well. When
working in such a lattice it is possible to be interested in those
points that lay on one of the axes. Since the X-axis and the Y-axis
play a symmetrical role, it is possible that it doesn't matter which
axis the point is on. So counting how many of the coordinates are
zero is natural way to check if a point is on an axe. Doing a find
is a natural way to check if a point is on an axis and at the same
time find out which one.

So that a sequence is heterogenous in this sense doesn't imply
that count, find and other methods of such kind don't make sense.
 
A

Antoon Pardon

Mike said:
So why the $*@& (please excuse my Perl) does "for x in 1, 2, 3" work?

because the syntax says so:

http://docs.python.org/ref/for.html
Seriously. Why doesn't this have to be phrased as "for x in list((1,
2, 3))", just like you have to write list((1, 2, 3)).count(1), etc.?

because anything that supports [] can be iterated over.

This just begs the question. If tuples are supposed to be such
heterogenous sequences, one could indeed question why they
support [].

And even if good arguments are given why tuples shouls support
[], the fact that the intention of tuples and list are so
different casts doubts on the argument that supporting []
is enough reason to support iteration.

One could equally also argue that since iteration is at the heart
of methods like index, find and count, that supporting iteration
is sufficient reason to support these methods.
 
C

Christoph Zwerschke

I think this all boils down to the following:

* In their most frequent use case where tuples are used as lightweight
data structures keeping together heterogenous values (values with
different types or meanings), index() and count() do not make much sense.

I completely agree that his is the most frequent case. Still there are
cases where tuples are used to keep homogenous values together (for
instance, RGB values, points in space, rows of a matrix). In these cases
it would be principally useful to have index() and count() methods.

But:

* Very frequently you will use only 2- or 3-tuples, where direct queries
may be faster than item() and count(). (That's probably why Antoon's RGB
example was rejected as use case though it was principally a good one).

* Very frequently you want to perform operations on these objects and
change their elements, so you would use lists instead of tuples anyway.
See my use case where you would determine whether a vector is zero by
count()ing its zero entries or the rank of a matrix by count()ing zero rows.

* You will use item() and count() in situations where you are dealing
with a small discrete range of values in your collection. Often you will
use strings instead of tuples in these cases, if you don't need to sum()
the items, for instance.

So, indeed, very few use cases will remain if you filter throught the
above. But this does not mean that they do not exist. And "special cases
aren't special enough to break the rules." It should be easy to imagine
use cases now.

Take for example, a chess game. You are storing the pieces in a
64-tuple, where every piece has an integer value corresponding to its
value in the game (white positive, black negative). You can approximate
the value of a position by building the sum(). You want to use the tuple
as a key for a dictionary of stored board constellations (e.g. an
opening dictionary), therefore you don't use a list.

Now you want to find the field where the king is standing. Very easy
with the index() method. Or you want to find the number of pawns on the
board. Here you could use the count() method.

-- Christoph
 
F

Fredrik Lundh

Christoph said:
It should be easy to imagine use cases now.

Take for example, a chess game. You are storing the pieces in a
64-tuple, where every piece has an integer value corresponding to its
value in the game (white positive, black negative). You can approximate
the value of a position by building the sum(). You want to use the tuple
as a key for a dictionary of stored board constellations (e.g. an
opening dictionary), therefore you don't use a list.

Now you want to find the field where the king is standing. Very easy
with the index() method. Or you want to find the number of pawns on the
board. Here you could use the count() method.

now, I'm no expert on data structures for chess games, but I find it hard to
believe that any chess game implementer would use a data structure that re-
quires linear searches for everything...

</F>
 
C

Christoph Zwerschke

Chris said:
First, remember that while your idea is obvious and practical and
straightforward to you, everybodys crummy idea is like that to them.
And while I'm not saying your idea is crummy, bear in mind that not
everyone is sharing your viewpoint.

That's completely ok. What I wanted to know is *why* people do not share
my viewpoint and whether I can understand their reasons. Often, there
are good reasons that I do understand after some discussion. In this
case, there were some arguments but they were not convincing for me.

I think the rest of your arguments have been already discussed in this
thread. People seem to have different opinions here.

-- Christoph
 
C

Christoph Zwerschke

Fredrik said:
Christoph Zwerschke wrote:
now, I'm no expert on data structures for chess games, but I find it hard to
believe that any chess game implementer would use a data structure that re-
quires linear searches for everything...

Using linear arrays to represent chess boards is pretty common in
computer chess. Often, the array is made larger than 64 elements to make
sure moves do not go off the board but hit unbeatable pseudo pieces
standing around the borders. But in principle, linear arrays of that
kind are used, and for good reasons.

I already feared that a discussion about the details and efficiency of
this implementation would follow. But this was not the point here.

-- Christoph
 
S

Steve Holden

Antoon said:
Mike Meyer wrote:

So why the $*@& (please excuse my Perl) does "for x in 1, 2, 3" work?

because the syntax says so:

http://docs.python.org/ref/for.html

Seriously. Why doesn't this have to be phrased as "for x in list((1,
2, 3))", just like you have to write list((1, 2, 3)).count(1), etc.?

because anything that supports [] can be iterated over.


This just begs the question. If tuples are supposed to be such
heterogenous sequences, one could indeed question why they
support [].
Presumably because it's necessary to extract the individual values
(though os.stat results recently became addressable by attribute name as
well as by index, and this is an indication of the originally intended
purpose of tuples).
And even if good arguments are given why tuples shouls support
[], the fact that the intention of tuples and list are so
different casts doubts on the argument that supporting []
is enough reason to support iteration.

One could equally also argue that since iteration is at the heart
of methods like index, find and count, that supporting iteration
is sufficient reason to support these methods.
One could even go so far as to prepare a patch to implement the required
methods and see if it were accepted (though wibbling is a much easier
alternative). Personally I find the collective wisdom of the Python
developers, while not infallible, a good guide as to what's pythonistic
and what's not. YMMV.

regards
Steve
 
S

Steve Holden

Christoph said:
I think this all boils down to the following:

* In their most frequent use case where tuples are used as lightweight
data structures keeping together heterogenous values (values with
different types or meanings), index() and count() do not make much sense.

I completely agree that his is the most frequent case. Still there are
cases where tuples are used to keep homogenous values together (for
instance, RGB values, points in space, rows of a matrix). In these cases
it would be principally useful to have index() and count() methods.
Why? Why does it make sense to ask whether an RGB color has a particular
value for one of red, green or blue? Why does it make sense to ask how
many elements there are in an RGB color? It doesn't, so you must be
talking about (ordered) *collections* of such items.

If you want a list of RGB colors then use a list. If you want a list of
points in space then use a list. Why is a tuple preferable? [If the
answer is "because a tuple can't be changed" go to the bottom of the class].
But:

* Very frequently you will use only 2- or 3-tuples, where direct queries
may be faster than item() and count(). (That's probably why Antoon's RGB
example was rejected as use case though it was principally a good one).

* Very frequently you want to perform operations on these objects and
change their elements, so you would use lists instead of tuples anyway.
See my use case where you would determine whether a vector is zero by
count()ing its zero entries or the rank of a matrix by count()ing zero rows.

* You will use item() and count() in situations where you are dealing
with a small discrete range of values in your collection. Often you will
use strings instead of tuples in these cases, if you don't need to sum()
the items, for instance.

So, indeed, very few use cases will remain if you filter throught the
above. But this does not mean that they do not exist. And "special cases
aren't special enough to break the rules." It should be easy to imagine
use cases now.

Take for example, a chess game. You are storing the pieces in a
64-tuple, where every piece has an integer value corresponding to its
value in the game (white positive, black negative). You can approximate
the value of a position by building the sum(). You want to use the tuple
as a key for a dictionary of stored board constellations (e.g. an
opening dictionary), therefore you don't use a list.
This is a pretty bogus use case. Seems to me like a special case it's
not worth breaking the rules for!
Now you want to find the field where the king is standing. Very easy
with the index() method. Or you want to find the number of pawns on the
board. Here you could use the count() method.
Bearing in mind the (likely) performance impact of using these items as
dict keys don't you think some other representation would be preferable?

regards
Steve
 
F

Fredrik Lundh

Christoph said:
Using linear arrays to represent chess boards is pretty common in
computer chess. Often, the array is made larger than 64 elements to make
sure moves do not go off the board but hit unbeatable pseudo pieces
standing around the borders. But in principle, linear arrays of that
kind are used, and for good reasons.

really? a quick literature search only found clever stuff like bitboards,
pregenerated move tables, incremental hash algorithms, etc. the kind
of stuff you'd expect from a problem domain like chess.
I already feared that a discussion about the details and efficiency of
this implementation would follow. But this was not the point here.

so pointing out that the use case you provided has nothing to do with
reality is besides the point ? you're quickly moving into kook-country
here.

</F>
 
S

Steve Holden

Mike said:
Did somebody actually use "Practicality beats purity" as an excuse for
not making list.count and string.count have the same arguments? If so,
I missed it. I certainly don't agree with that - count ought to work
right in this case.
I agree that the two methods are oddly inconsonant. But then since the
introduction of the "has substring" meaning for the "in" operator,
strings and lists are also inconsistent over that operation.
I don't think this would make much sense in present-day Python, mostly
because there's no such thing as a character, only a string of length
one. So a string is actually a sequence of sequences of length one. I
guess this is one of those cases where practicality beat purity. It's as
though an indexing operation on a list got you a list of length one
rather than the element at that index. Strings have always been
anomalous in this respect.
If it happens all the time, you shouldn't have any trouble nameing a
number of things that a majority of users think are misfeatures that
aren't being fixed. Could you do that?
Doubtless he could. Paul's a smart guy. The missing element will be
unanimity on the rationale.

regards
Steve
 
C

Chris Mellon

I think this all boils down to the following:

* In their most frequent use case where tuples are used as lightweight
data structures keeping together heterogenous values (values with
different types or meanings), index() and count() do not make much sense.

I completely agree that his is the most frequent case. Still there are
cases where tuples are used to keep homogenous values together (for
instance, RGB values, points in space, rows of a matrix). In these cases
it would be principally useful to have index() and count() methods.

But:

* Very frequently you will use only 2- or 3-tuples, where direct queries
may be faster than item() and count(). (That's probably why Antoon's RGB
example was rejected as use case though it was principally a good one).

* Very frequently you want to perform operations on these objects and
change their elements, so you would use lists instead of tuples anyway.
See my use case where you would determine whether a vector is zero by
count()ing its zero entries or the rank of a matrix by count()ing zero rows.

* You will use item() and count() in situations where you are dealing
with a small discrete range of values in your collection. Often you will
use strings instead of tuples in these cases, if you don't need to sum()
the items, for instance.

So, indeed, very few use cases will remain if you filter throught the
above. But this does not mean that they do not exist. And "special cases
aren't special enough to break the rules." It should be easy to imagine
use cases now.

Take for example, a chess game. You are storing the pieces in a
64-tuple, where every piece has an integer value corresponding to its
value in the game (white positive, black negative). You can approximate
the value of a position by building the sum(). You want to use the tuple
as a key for a dictionary of stored board constellations (e.g. an
opening dictionary), therefore you don't use a list.

This really looks to me like you have your priorities inverted.
Practically everything you want to do with this structure is
list-like, so why not make it a list and convert it to a tuple when
you need to use it as an index? Even better, since you're doing a lot
of list operations, why not make it a list and define unique IDs or
something to use as indices?

A minor change in your design/thinking (not trying to use a tuple as a
frozen list) instead of a change in the language (making tuples more
like frozen lists) seems to be the warranted solution. But maybe thats
just me.
 
C

Christoph Zwerschke

Steve said:
Christoph said:
I completely agree that his is the most frequent case. Still there are
cases where tuples are used to keep homogenous values together (for
instance, RGB values, points in space, rows of a matrix). In these
cases it would be principally useful to have index() and count() methods.
Why? Why does it make sense to ask whether an RGB color has a particular
value for one of red, green or blue? Why does it make sense to ask how
many elements there are in an RGB color? It doesn't, so you must be
talking about (ordered) *collections* of such items.
>
If you want a list of RGB colors then use a list. If you want a list of
points in space then use a list. Why is a tuple preferable? [If the
answer is "because a tuple can't be changed" go to the bottom of the
class].

I cannot follow you here. How would you store RGB values? I think they
are a perfect use case for tuples in the spirit of Guido's "lightweight
C structs." So, in the spirit of Guido I would store them as tuples, or
return them as tuples by a function getpixel(x,y). Why should I not be
allowed to check for getpixel(xy).count(0) == n for black pixels in an
image with n layers? Yes, you could set BLACK=(0,)*n and test against
BLACK. You can always do things differently.
This is a pretty bogus use case. Seems to me like a special case it's
not worth breaking the rules for!

I already explained why use cases for count() and index() on tuples will
principally be rare. So it will always be easy for you to call them
"special cases". But they are there, and they make sense.

Also, we are not talking about "breaking" a rule or any existing code
here, but about generalizing or *broadening* a rule. (What do you do if
a rule itself is broken?)

Here is another example illustrating the problem - coincidentally, from
Fredrik's blog, http://effbot.org/zone/image-histogram-optimization.htm
(found it when googling for getpixel):

def histogram4(image):
# wait, use the list.count operator
data = list(image.getdata())
result = []
for i in range(256):
result.append(data.count(i))
return result

Here, we could imagine the getdata() method returning a tuple. Again,
why must it be casted to a list, just to use count()? In reality,
getdata() returns a sequence. But again, wouldn't it be nice if all
sequences provided count() and index() methods? Then there would be no
need to create a list from the sequence before counting.

-- Christoph
 
A

Aahz

[Since part of my post seems to have gotten lost in this thread, I
figured I would repeat it]

That's a fair cop. Submit a patch and it'll probably get accepted.

This is one of those little things that happens in language evolution;
not everything gets done right the first time. But Python is developed
by volunteers: if you want this fixed, the first step is to submit a bug
report on SF (or go ahead and submit a patch if you have the expertise).
(I'm quite comfortable channeling Guido and other developers in saying a
patch will get accepted.)
 
C

Christoph Zwerschke

Fredrik said:
really? a quick literature search only found clever stuff like bitboards,
pregenerated move tables, incremental hash algorithms, etc. the kind
of stuff you'd expect from a problem domain like chess.

I don't know where you googled, but my sources do not say that bitboards
are the *only* possible or reasonable representation:

http://chess.verhelst.org/1997/03/10/representations/
http://en.wikipedia.org/wiki/Computer_chess#Board_representations
http://www.aihorizon.com/essays/chessai/boardrep.htm
http://www.oellermann.com/cftchess/notes/boardrep.html

Many programs still use the array representation. For example:
http://www.nothingisreal.com/cheops/
http://groups.msn.com/RudolfPosch/technicalprogamdescription1.msnw
Even GNU Chess did not use bitboards before version 5.
Here is an example in Python:
http://www.kolumbus.fi/jyrki.alakuijala/pychess.html

I did not say that there aren't more sophisticated and elaborate board
representations than linear or two-dimensional arrays. But they are the
simplest and most immediate and intuitive solution, and they have indeed
been used for a long time in the 8-bit aera. Bitboards may be more
performant, particularly if you are directly programming in assembler or
C on a 64 bit machine, but not necessarily in Python. But they are also
more difficult to handle. Which representation to use also depends on
the algorithms you are using. You wouldn't write a performant chess
engine in Python anyway. But assume you want to test a particular chess
tree pruning algorithm (that does not depend on board representation)
and write a prototype for that in Python, later making a performant
implementation in assembler. You would not care so much about the
effectivity of your board representation in the prototype, but rather
about how easy it can be handled.

I think it is telling that you have to resort to a debate about
bitboards vs. arrays in order to dismiss my simple use case for index()
and count() as "unreal".

-- Christoph
 
R

Rick Wotnaz

[...]
Tuples and lists really are intended to serve two fundamentally
different purposes. We might guess that just from the fact that
both are included in Python, in fact we hear it from Guido van
Rossum, and one might add that other languages also make this
distinction (more clearly than Python.)

As I'm sure everyone still reading has already heard, the
natural usage of a tuple is as a heterogenous sequence. I would
like to explain this using the concept of an "application type",
by which I mean the set of values that would be valid when
applied to a particular context. For example, os.spawnv() takes
as one of its arguments a list of command arguments,
time.mktime() takes a tuple of time values. A homogeneous
sequence is one where a and a[x:y] (where x:y is not 0:-1)
have the same application type. A list of command arguments is
clearly homogeneous in this sense - any sequence of strings is a
valid input, so any slice of this sequence must also be valid.
(Valid in the type sense, obviously the value and thus the
result must change.) A tuple of time values, though, must have
exactly 9 elements, so it's heterogeneous in this sense, even
though all the values are integer.

One doesn't count elements in this kind of a tuple, because it's
presumed to have a natural predefined number of elements. One
doesn't search for values in this kind of a tuple, because the
occurrence of a value has meaning only in conjunction with its
location, e.g., t[4] is how many minutes past the hour, but t[5]
is how many seconds, etc.

I have to confess that this wasn't obvious to me, either, at
first, and in fact probably about half of my extant code is
burdened with the idea that a tuple is a smart way to economize
on the overhead of a list. Somewhere along the line, I guess
about 5 years ago? maybe from reading about it here, I saw the
light on this, and since then my code has gotten easier to read
and more robust. Lists really are better for all the kinds of
things that lists are for -- just for example, [1] reads a lot
better than (1,) -- and the savings on overhead is not worth the
cost to exploit it. My tendency to seize on this foolish
optimization is however pretty natural, as is the human tendency
to try to make two similar things interchangeable. So we're
happy to see that tuple does not have the features it doesn't
need, because it helps in a small way to make Python code
better. If only by giving us a chance to have this little chat
once in a while.

Donn, this is a reasonable argument, and in general I don't have a
problem with the distinction between tuples and lists. I have heard
and understand the argument that the intended purpose of tuple
creation is to mimic C structs, so it seems reasonable to suppose
that one knows what was placed in them. Lists are dynamic by
nature, so you need a little more help getting information about
their current state.

However, there is at least one area where this distinction is
bogus. Lists cannot be used as dictionary keys (as it now stands).
But in practice, it is often useful to create a list of values,
cast the list to a tuple, and use that as a dictionary key. It
makes little sense to keep a list of that same information around,
so in practice, the tuple/key is the container that retains the
original information. But that tuple was dynamically created, and
it isn't always true that items were placed in it deliberately.

In other words, the fact that the key is now a tuple is unrelated
to the essential nature of tuples. Not all of the tools used in
examining lists are available to the key as a tuple, though it is
really nothing more than a frozen list.

Sure, you can cast it to a list to use the list methods, but that
requires creating objects just to throw away, which seems a little
wasteful, especially since that's what you had to do to create the
key to begin with.

I'm sure Antoon wouldn't object if lists were to be allowed as
dictionary keys, which would eliminate the multiple castings for
that situation. I wouldn't, either.

I'd extend this a little to say that tuples are (at least
potentially) created dynamically quite often in other contexts as
well, so that despite their designed intent, in practice they are
used a little differently a good bit of the time. So why not adjust
the available features to the practice?
 
F

Fredrik Lundh

Christoph said:
I think it is telling that you have to resort to a debate about
bitboards vs. arrays in order to dismiss my simple use case for
index() and count() as "unreal".

kook.

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top