Help understanding the decisions *behind* python?

I

Inky 788

[snip] We
understand that lists are mutable and tuples are not, but we're a
little lost as to why the two were kept separate from the start. They
both perform a very similar job as far as we can tell.
My guess is that it was probably for optimization reasons long ago.
I've never heard a *good* reason why Python needs both.

The good reason is the immutability, which lets you use
a tuple as a dict key.  

Thanks for the reply Hendrik (and Steven (other reply)). Perhaps I'm
just not sophisticated enough, but I've never wanted to use a list/
tuple as a dict key. This sounds like obscure usage, and a bit
contrived as a reason for having *both* lists and tuples.
 
L

Luis Alberto Zarrabeitia Gomez

Quoting Inky 788 said:
Thanks for the reply Hendrik (and Steven (other reply)). Perhaps I'm
just not sophisticated enough, but I've never wanted to use a list/
tuple as a dict key. This sounds like obscure usage, and a bit
contrived as a reason for having *both* lists and tuples.

I don't seem to understand your definition of obscure and contrived. It seems
that you got late to this thread, and you missed the examples. I'd suggest you
to go back on this thread and look at them.


heights = {}
heights[1,2] = 5
heights[1,3] = 7
heights[3,5] = 1

addresses[lastname, firstname] = my_home_address

census[location, age] = census.get((location, age), 0) + 1

All those are using tuples as dict keys.

Regards,
 
G

Gabriel Genellina

On Jul 22, 2:36 am, Hendrik van Rooyen <[email protected]>
wrote:

Thanks for the reply Hendrik (and Steven (other reply)). Perhaps I'm
just not sophisticated enough, but I've never wanted to use a list/
tuple as a dict key. This sounds like obscure usage, and a bit
contrived as a reason for having *both* lists and tuples.

Many people posted useful examples of tuples as dictionary keys in this
thread. Just to add another one (emulate SQL GROUP BY):

ope_by_dept = defaultdict(int)
total_times = defaultdict(float)

for dept_name, ope_name, ope_date, engineer in list_of_operations:
ope_by_dept[dept_name, ope_start.month] += 1
total_times[dept_name, engineer] += ope_end - ope_start

print "Operations per department per month"
for dept_name, month in sorted(ope_by_dept):
print dept_name, month, ope_by_dept[dept_name, month]
 
H

Hendrik van Rooyen

On Jul 22, 2:36 am, Hendrik van Rooyen <[email protected]>

wrote:

Thanks for the reply Hendrik (and Steven (other reply)). Perhaps I'm
just not sophisticated enough, but I've never wanted to use a list/
tuple as a dict key. This sounds like obscure usage, and a bit
contrived as a reason for having *both* lists and tuples.

Steven showed why you cannot have a mutable thing
as a key in a dict.

if you think it is contrived, then please consider how you would
keep track of say the colour of a pixel on a screen at position
(x,y) - this is about the simplest "natural" tuple format and
example.

There are other equally valid examples, as has been pointed
out. (may have been in another thread - am a bit confused
about that right now)

- Hendrik
 
T

Tim Rowe

2009/7/22 Inky 788 said:
Thanks for the reply Hendrik (and Steven (other reply)). Perhaps I'm
just not sophisticated enough, but I've never wanted to use a list/
tuple as a dict key. This sounds like obscure usage, and a bit
contrived as a reason for having *both* lists and tuples.

If you are used to working in a language that doesn't allow it then
you'll probably carry on using the work-arounds that you have always
used. It almost certainly only seems obscure because you're not
considering it when it would be a natural solution. In a language that
builds *very* heavily on the concept of dictionaries it's not obscure
at all!
 
I

Inky 788

Steven showed why you cannot have a mutable thing
as a key in a dict.

if you think it is contrived, then please consider how you would
keep track of say the colour of a pixel on a screen at position
(x,y) - this is about the simplest "natural" tuple format and
example.

My guess is that this is probably the way most people do it:

~~~~
#!/usr/bin/env python

import sys
import random

if len( sys.argv ) != 3:
print "Please pass exactly 2 ints. Exiting."
sys.exit(1)

NUM_COLUMNS = int( sys.argv[1] )
NUM_ROWS = int( sys.argv[2] )

print "Making array of %s columns by %s rows." % (NUM_COLUMNS,
NUM_ROWS)

def rand():
return int( 255 * random.random())

def make_a_pixel():
# red green blue
return [rand(), rand(), rand()]

def make_a_row(num_columns):
temp_row = []
for i in range(num_columns):
temp_row.append( make_a_pixel() )
return temp_row

def make_array_of_pixels(num_columns, num_rows):
rows = []
for i in range(num_rows):
rows.append( make_a_row(num_columns) )
return rows

def show_pixels(pixel_array):
for row in pixel_array:
for pixel in row:
print pixel, ' ',
print


rows_of_pixels = make_array_of_pixels(NUM_COLUMNS, NUM_ROWS)

show_pixels(rows_of_pixels)
~~~~
 
U

Uncle Roastie

My colleagues and I have been working with python for around 6 months
now, and while we love a lot of what python has done for us and what
it enables us to do some of the decisions behind such certain
data-types and their related methods baffle us slightly (when compared
to the decisions made in other, similarly powerful languages).

Specifically the "differences" between lists and tuples have us
confused and have caused many "discussions" in the office. We
understand that lists are mutable and tuples are not, but we're a
little lost as to why the two were kept separate from the start. They
both perform a very similar job as far as we can tell.

Consider the following:
x = [2,1,3]
x.sort()
print x

[1, 2, 3]

Now, if the sort operations were unable to affect the original
structure of the list (as in JavaScript) you'd effectively have a
tuple which you could add/remove from, and the example above would
look more like:
x = [2,1,3]
print x.sort() [1, 2, 3]
print x

[2,1,3]

This make a lot more sense to us, and follows the convention from
other languages. It would also mean chaining methods to manipulate
lists would be easier:
x = [2,1,3]
print x.sort()[0] 3
print x

[2,1,3]

We often find we need to do manipulations like the above without
changing the order of the original list, and languages like JS allow
this. We can't work out how to do this in python though, other than
duplicating the list, sorting, reversing, then discarding.

We're not looking to start any arguments or religious wars and we're
not asking that python be changed into something its not. We'd simply
like to understand the decision behind the lists and tuple structures.
We feel that in not "getting" the difference between the two types we
may be missing out on using these data structures to their full
potential.

A tuple can be used like a struct in C - the number of fields is meant
to
be fixed and should not be dynamically changed.
 
H

Hendrik van Rooyen

On Jul 23, 3:42 am, Hendrik van Rooyen <[email protected]>

wrote: 8<------------------------
Steven showed why you cannot have a mutable thing
as a key in a dict.

if you think it is contrived, then please consider how you would
keep track of say the colour of a pixel on a screen at position
(x,y) - this is about the simplest "natural" tuple format and
example.

My guess is that this is probably the way most people do it:

~~~~
#!/usr/bin/env python

import sys
import random

if len( sys.argv ) != 3:
print "Please pass exactly 2 ints. Exiting."
sys.exit(1)

NUM_COLUMNS = int( sys.argv[1] )
NUM_ROWS = int( sys.argv[2] )

print "Making array of %s columns by %s rows." % (NUM_COLUMNS,
NUM_ROWS)

def rand():
return int( 255 * random.random())

def make_a_pixel():
# red green blue
return [rand(), rand(), rand()]

def make_a_row(num_columns):
temp_row = []
for i in range(num_columns):
temp_row.append( make_a_pixel() )
return temp_row

def make_array_of_pixels(num_columns, num_rows):
rows = []
for i in range(num_rows):
rows.append( make_a_row(num_columns) )
return rows

def show_pixels(pixel_array):
for row in pixel_array:
for pixel in row:
print pixel, ' ',
print


rows_of_pixels = make_array_of_pixels(NUM_COLUMNS, NUM_ROWS)

show_pixels(rows_of_pixels)
~~~~

Good grief! - Not a dictionary in sight!

I had something in mind where point was an (x,y) tuple,
and point_colours was a dict of possibly named points
with a list of RGB values.
(The colour can change, the point is fixed)

If I were doing it your way, I would use the Array module's Array.
But then, strictly speaking it would not be your way any more,
would it?

- Hendrik
 
J

John Nagle

Beni said:
The *technical* reason is immutability for dict keys.
Dict could allow mutable objects as keys by comparing *by value*,
making a copy on insertion and hashing the current value on lookup.
Prior art: the 2.3 sets module allows mutable Sets as elements in
Sets, by making ImmutableSet copies on insertion, and hashing Sets as
if they are temporarily immutable on lookup.

This inspired PEP 351 and ambitious proposals to expand the approach
to all Python with a copy-on-write scheme. But these ideas were
rejected, and the 2.4 builtin sets only allow frozenset elements.
Half the reason is technical: copy-on-write and harder than it sounds,
and without it you pay a performance price.

But the deeper reason is style: immutable types are convenient!
The allow a pure-functional style of code, which can be simpler.
Of course, sometimes an imperative style is simpler. Depends on the
problem.

An interesting issue is Python objects, which are always mutable.
A "dict" of Python objects is allowed, but doesn't consider
the contents of the objects, just their identity (address). Only
built-in types are immutable; one cannot create a class of immutable objects.
So things like "frozenset" had to be built into the language.
A tuple is really a frozen list. Arguably, frozen objects
should have been a general concept. Conceptually, they're
simple - once "__init__" has run, there can be no more changes
to fields of the object.

Compare the C++ approach, where you can have a "map" of
any object that defines the "<" operator. Python has
"__cmp__" for objects, but built-in dictionaries don't use it.
Of course, one could create a "map" class in Python that
uses "__cmp__", which would allow dictionary-type operations
on arbitrary objects.

There are some interesting uses for immutable objects in
multithreaded programs. They can be shared between threads
without locking. If it weren't for the infamous Global
Interpreter Lock, which basically limits Python programs to using
one CPU at a time, Python would have to deal with locking
in a more advanced way. A system where immutable objects
can be shared and passed between threads, and mutable objects
must be "synchronized" (in the Java sense) to be shared would
be useful.

John Nagle
 
S

Steven D'Aprano

An interesting issue is Python objects, which are always mutable.
A "dict" of Python objects is allowed, but doesn't consider the contents
of the objects, just their identity (address). Only built-in types are
immutable; one cannot create a class of immutable objects.

Yes you can, for some definition of "can":

http://northernplanets.blogspot.com/2007/01/immutable-instances-in-python.html


Admittedly pure Python objects are only "cooperatively immutable". The
immutability relies on the caller not going out of its way to break the
instance, so you can mutate it if you work at it. One could, I suppose,
try putting in complicated tricks to prevent that, but anything written
in Python can ultimately be modified in Python.
 
J

John Nagle

Steven said:
Yes you can, for some definition of "can":

http://northernplanets.blogspot.com/2007/01/immutable-instances-in-python.html


Admittedly pure Python objects are only "cooperatively immutable".

Right. I've been thinking about this as a way of improving
concurrency handling. The general idea is that objects shared
across threads would have to be either be immutable or synchronized.
Regular objects would be limited to a single thread. It's a path
to multithread programs without a global lock. Needs more work
to become a serious proposal. (It may not be worth the trouble;
a few years ago we were hearing about how 64-core CPUs were going
to become mainstream, but instead, the industry is going in the
direction of the same compute power for less money and with
less power consumption.)

John Nagle
 
B

Benjamin Kaplan

   An interesting issue is Python objects, which are always mutable.
A "dict" of Python objects is allowed, but doesn't consider
the contents of the objects, just their identity (address). Only
built-in types are immutable; one cannot create a class of immutable
objects.
So things like "frozenset" had to be built into the language.
A tuple is really a frozen list.  Arguably, frozen objects
should have been a general concept.  Conceptually, they're
simple - once "__init__" has run, there can be no more changes
to fields of the object.

   Compare the C++ approach, where you can have a "map" of
any object that defines the "<" operator.  Python has
"__cmp__" for objects, but built-in dictionaries don't use it.
Of course, one could create a "map" class in Python that
uses "__cmp__", which would allow dictionary-type operations
on arbitrary objects.


Python dictionaries are hash maps so they use hashes, not comparisons,
to store objects. By default the hash is equal to the object's
identity but all you need to do to change it is define your own
__hash__ method. If you were to make a C++ hash map, it wouldn't use
comparisons either.
 
D

Dave Angel

Benjamin said:
Python dictionaries are hash maps so they use hashes, not comparisons,
to store objects. By default the hash is equal to the object's
identity but all you need to do to change it is define your own
__hash__ method. If you were to make a C++ hash map, it wouldn't use
comparisons either.

<snip>
I think you'll find that to make a proper dict of user-objects, you need
to define not only __hash__(), but also __eq__() methods. A hash is
only a starting place. In case of collision, the dict class calls
another method to make sure.

And of course, for the dict to function reasonably, you have to make
sure that the hash and eq functions return consistent results, at least
pretending that the objects involved are immutable.

DaveA
 
J

Joshua Bronson

I find it interesting that the heapq functions tell you in the
documentation that they aren't suitable for use where n==1 or where n is
near the total size of the sequence whereas random.sample() chooses what it
thinks is the best algorithm based on the input. At the very least I would
have thought the heapq functions could check for n==1 and call min/max when
it is.

I've had the same thought myself a number of times, so I filed a bug:
http://bugs.python.org/issue6614
 
R

Raymond Hettinger

Steven D'Aprano said:
But that's the wrong solution to the problem. The OP wants the largest
(or smallest) item, which he expects to get by sorting, then grabbing
the first element:
sorted(alist)[0]

That requires sorting the entire list, only to throw away everything
except the first item. A better solution is to use min() and max(),
neither of which sort the list, so they are much faster.

And if they want the n smallest or largest items (where n is significantly
less than the length of the list[*]) they might consider using
heapq.nsmallest() and heapq.nlargest()

I find it interesting that the heapq functions tell you in the
documentation that they aren't suitable for use where n==1 or where n is
near the total size of the sequence whereas random.sample() chooses what it
thinks is the best algorithm based on the input. At the very least I would
have thought the heapq functions could check for n==1 and call min/max when
it is.

The heapq.py code in Py2.7 and Py3.1 already does some automatic
algorithm selection:
http://svn.python.org/view/python/branches/release31-maint/Lib/heapq.py?revision=73579&view=markup

The automatic seletion of alternatives only occurs in clear-cut cases
(such as n==1
or where n==len(iterable) when the iterable has a known length). For
the
remaining cases, the switchover point from heapq to sorted needs a
programmer's judgment based on whether the input iterable has a known
length, the cost of comparison function, and whether input is already
partially ordered.

The advice in the docs helps the reader understand the
relationships between min, max, nsmallest, nlargest, and sorted.


Raymond
 
R

Raymond Hettinger

Specifically the "differences" between lists and tuples have us
confused and have caused many "discussions" in the office. We
understand that lists are mutable and tuples are not, but we're a
little lost as to why the two were kept separate from the start. They
both perform a very similar job as far as we can tell.

The underlying C code for the two is substantially the same. In terms
of code, tuples are in effect just immutable lists that have the
added
feature of being hashable (and therefore usable as dictionary keys or
elements of sets).

Beyond the mutable/hashable distinction, there is an important
philosophical distinction articulated by Guido. He deems tuples
to be useful for struct like groupings of non-homogenous fields
and lists to be useful for sequences of homogenous data suitable
for looping.

While nothing in the list/tuple code requires you to make that
distinction,
it is important because that philosophy pervades the language. If you
follow Guido's direction, you'll find that the various parts of the
language fit together better. Do otherwise and you'll be going
against
the grain.


Raymond

P.S. The C code for lists and tuples have a slightly difference
internal
structure that makes tuples a little more space efficient (since their
size is known at the outset and doesn't change during use).
 
E

Emmanuel Surleau

The underlying C code for the two is substantially the same. In terms
of code, tuples are in effect just immutable lists that have the
added
feature of being hashable (and therefore usable as dictionary keys or
elements of sets).

Beyond the mutable/hashable distinction, there is an important
philosophical distinction articulated by Guido. He deems tuples
to be useful for struct like groupings of non-homogenous fields
and lists to be useful for sequences of homogenous data suitable
for looping.

While nothing in the list/tuple code requires you to make that
distinction,
it is important because that philosophy pervades the language. If you
follow Guido's direction, you'll find that the various parts of the
language fit together better. Do otherwise and you'll be going
against
the grain.

I might be wrong, but I get the impression that many people don't indeed "get
it" and use tuples as static arrays and lists when the array needs to be
dynamically resized. This is certainly how I use them as well.

This would tend to show that Guido's notion here was not particularly
intuitive.

Cheers,

Emm
 
M

Masklinn

I might be wrong, but I get the impression that many people don't
indeed "get
it" and use tuples as static arrays and lists when the array needs
to be
dynamically resized. This is certainly how I use them as well.

This would tend to show that Guido's notion here was not particularly
intuitive.
It's intuitive if you come to Python knowing other languages with
tuples (which are mostly functional, and in which tuples are *never*
sequences/iterables). At the end of the day, and if Guido's intention
truly was what Raymond says, implementing tuples as immutable sequence
was a mistake. And the documentation is to blame as well: in it,
tuples are clearly described as a *sequence* type, not a *structure*
type.
 
T

Terry Reedy

I think the use of the with 'homegeneous' in this context is wrong
because 1) it is ambiguous or even sometimes not the case, and 2) it
misleads. Everything is an object, so all collections are in that sense
homogeneous. If the meaning is restricted to type(ob), than a list
'mixing' ints and floats would be non-homogeneous and not 'legitimate',
but Guido does not really mean that.

The word tuple comes from relational databases as a generalization of
single, double, triple, quadruple, quintuple, sextuple, sestuple,
octuple, etc. A tuple is a data record with a fixed number of fields
with individual meaning. There is nothing about homogeneity of data type
in that definition. A triple of floats is legitimately a tuple when each
is a coordinate (the individual meanings). In other contexts, the same
triple might properly be a list (such as of heights of people
arbitrarily ordered).

which needs to be properly articulated...

Understanding the idea certainly helps understanding the design.
For instance, it explains why no tuple comprehensions.
I might be wrong, but I get the impression that many people don't
indeed "get it" and use tuples as static arrays and lists when the
array needs to be dynamically resized. This is certainly how I use
them as well.

You also *need* lists if you want to dynamically re-arrange or replace.
If you *must not* re-arrange or edit, then a tuple will prevent that.

I would also refer back to the idea of tuple as database record.
But a list is needed to edit without copying, so lists can be used for
writable database records.

I believe tuple(iterable) for indefinite-length iterables is somewhat
equivalent to tuple(list(iterable)), so there is usually no point to
constructing a tuple that way from such an input.

For collections defined by constants, there is a real advantage to using
tuples, since tuple constants are compiled when the bytecode is
compiled, rather than being dynamically constructed. Inside an inner
loop, this might make enough time difference that "Practicality beats
purity" would apply.
This would tend to show that Guido's notion here was not particularly
intuitive.

The problem is that it *is* intuitive, on his part, and usually not well
explained rationally.

Terry Jan Reedy
 
E

Emmanuel Surleau

I think the use of the with 'homegeneous' in this context is wrong
because 1) it is ambiguous or even sometimes not the case, and 2) it
misleads. Everything is an object, so all collections are in that sense
homogeneous. If the meaning is restricted to type(ob), than a list
'mixing' ints and floats would be non-homogeneous and not 'legitimate',
but Guido does not really mean that.

The word tuple comes from relational databases as a generalization of
single, double, triple, quadruple, quintuple, sextuple, sestuple,
octuple, etc. A tuple is a data record with a fixed number of fields
with individual meaning. There is nothing about homogeneity of data type
in that definition. A triple of floats is legitimately a tuple when each
is a coordinate (the individual meanings). In other contexts, the same
triple might properly be a list (such as of heights of people
arbitrarily ordered).

My understanding is that, in this context, it's not so much data types which
are heterogeneous, but the semantic meaning of the data. For instance, a tuple
containing (first_name, last_name, address) would be a "legitimate" tuple, but
not a tuple containing (address, address, address), which, if we follow
Guido's philosophy, ought to be represented as a list.

Whether including the distinction in the language offers any practical benefit
is questionable.

[snip]
The problem is that it *is* intuitive, on his part, and usually not well
explained rationally.

Excellent point.

Cheers,

Emm
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top