Tuple question

P

Paul Rubin

Donn Cave said:
I expect it's used relatively infrequently, and for different
reasons. "if len(info) == 5", for example - just from that
line from a relatively popular Python application, would you
guess info is a list, or a tuple?

if I say:

def f(*args):
print len(args)

I'd certainly expect to receive args as a tuple, and would still want
to be able to find its length.
 
G

greg

Alex said:
And your class's instances wouldn't then be hashable any more unless
they defined a __hash__ method -- have you tried?

And even if you did give it a __hash__ method, you wouldn't
be able to get it to work properly as a dict key. It's an
inescapable fact of the way dicts work.

It wouldn't be a valid dictionary key. You might be able to fool
Python into accepting it, but it would malfunction.

Greg
 
D

Donn Cave

Quoth (e-mail address removed) (Alex Martelli):
|> Quoth (e-mail address removed) (Alex Martelli):
....
|> | So by this argument len(t) should not work if t is a tuple...
|>
|> I expect it's used relatively infrequently, and for different
|> reasons. "if len(info) == 5", for example - just from that
|> line from a relatively popular Python application, would you
|> guess info is a list, or a tuple?
|
| No idea -- could just as well be a dict, an array.array, whatever.
| That's the very point and the beauty of polymorphism - I don't CARE what
| kind of container 'info' is, I know by this snippet that we're testing
| if it has exactly 5 items, or not.

I can't tell if you're just being playfully obtuse, or you really
don't recognize the pattern. For me, usage like this is a fairly
familiar sight with tuples, where it's part of a kind of poor man's
user defined data type - we get this tuple, and if it's 5 items it
would be last version's info, 6 items is the present version. This
use for len() is totally about the intact tuple and has nothing to
do with sequential access.

|> There you go, they shouldn't be indexed or sliced, that's right!
|> Named attributes would be nice, but otherwise you use pattern
|> matching (to the extent support in Python -- key, value = item.)
|> Makes for more readable code.
|
| Not for sufficiently long tuples. Take the 9-item tuples that the time
| module used to use before they grew names: having to unpack the tuple to
| access a single item of it would be exceedingly tedious and heavily
| boilerplatey to boot.
|
| Besides, I _do_ need to have immutable sequences that are suitable as
| dict keys. Today, x=tuple(mylist) performs that role admirably. So,
| say I have a dict d, indexed by such tuples -- of different lengths --
| and I want all the keys into d that have 23 as their first item. Today
| this is a trivial task -- but if I couldn't index tuples I WOULD have a
| problem... basically I would need "frozen lists" (and "frozen dicts"
| where I today use tuple(d.iteritems())...). Today, tuples serve all of
| these roles -- poor man's structs (lacking names), immutable 'frozen'
| lists for dict-keying roles, etc. We need at least two new builtin
| types to take their place if we want to remove tuple indexing...

Well, it's not like I'm really proposing to take away the ersatz
list properties that tuples have. It's just that if they were gone,
I think you'd have roughly the tuple that languages of the Lisp family
have had for so long, and apparently been so happy with while discontent
has festered in the Python world. (It has been a long while since I
wrote my last Lisp program, but I'm assuming that's where all the FP
languages got it from.)

Donn
 
D

Donn Cave

Quoth Bryan Olson <[email protected]>:
....
| Alas, that's ML, not Python. Were that Python's designers'
| intent, why isn't it part of Python's design? Why would we want
| to live within the confines of static typing, but without the
| safety and efficiency advantages of a type-checking compiler?

That is not what is homogeneous about a list. That would indeed
be an absurd contradiction, so it should be easy to convince you
that it isn't how anyone proposes you should use a list. So, what
do they mean?

A homogeneous sequence, in the sense that makes sense in Python,
is one where any slice is has the same functional meaning to the
application. Of course the data is different and that can have
fundamental consequences, but it's different than (key, value)
for example where a[:] is the only slice that preserves its
meaning.

Whether it was a well chosen word for it or or not, this notion
of how lists are designed to be used, as opposed to tuples, is
evidently why there is no index() function. That's all.

Donn Cave, (e-mail address removed)
 
B

Bryan Olson

Donn said:
> Quoth Bryan Olson:
> ...
> | Alas, that's ML, not Python. Were that Python's designers'
> | intent, why isn't it part of Python's design? Why would we want
> | to live within the confines of static typing, but without the
> | safety and efficiency advantages of a type-checking compiler?
>
> That is not what is homogeneous about a list. That would indeed
> be an absurd contradiction, so it should be easy to convince you
> that it isn't how anyone proposes you should use a list. So, what
> do they mean?

I can't tell to what you are responding.
> A homogeneous sequence, in the sense that makes sense in Python,
> is one where any slice is has the same functional meaning to the
> application. Of course the data is different and that can have
> fundamental consequences, but it's different than (key, value)
> for example where a[:] is the only slice that preserves its
> meaning.

That's pretty much what lists in the ML family offer, with
static but polymorphic typing. Lisp uses lists for both
homogeneous and heterogeneous sequences. Python allows lists or
tuples to be treated either way. The rule to use them based on
the homogeneous/heterogeneous distinctions strikes me more as
programming language naivete than Python expertise.
 
A

Alex Martelli

greg said:
And even if you did give it a __hash__ method, you wouldn't
be able to get it to work properly as a dict key. It's an
inescapable fact of the way dicts work.

Well, it depends on how you define 'properly'. Semantics CAN be
respected, it's _performance_ that may disappoint;-).

It wouldn't be a valid dictionary key. You might be able to fool
Python into accepting it, but it would malfunction.

It might be made to function, just VERY slowly:

def __hash__(self): return 42

Containing this will make ANY class hashable no matter what. But you'd
better buy a new and very fast machine if you're gonna use multiple
instances of such a class to key into a dictionary...;-)


Alex
 
P

Peter Hansen

Bryan said:
x = (1, 2)
y = (3, 4)
print x + y

In answer to this and to Alex' response, I should say that
in the context of whether using a tuple for the final
storage saves memory or not, neither response means much.

I said "without having the info in a list" but what I
meant to say was "without having the information already
stored elsewhere before it is put in a tuple". The
above just has two tuples which add up in memory usage
to the same amount as the tuple you end up with, meaning
that just prior to the deletion of the two temporary
tuples (if that even happens) you are using *twice*
the memory you need to use. Clearly that doesn't
help you much if memory is scarce.

Alex shows use of a generator... fine, but how do you
build the tuple without storing up the results of the
generator first somewhere else? You can't preallocate the
space for the tuple if you don't know how long it will
be, but you have to preallocate the space for a tuple
(I believe, in the interpreter anyway, if not at the
programmer level) so you must therefore be storing the
entire output of the generator somewhere just prior
to the tuple creation: same problem as above.

I know Alex knows all this (or has some additional
info that I don't have and which he'll shortly provide),
so I can only assume he was reacting only to my poor
choice of wording with 'list' and/or was ignoring the
context of the discussion (memory usage).

-Peter
 
A

Alex Martelli

Peter Hansen said:
Alex shows use of a generator... fine, but how do you
build the tuple without storing up the results of the
generator first somewhere else? You can't preallocate the

You, programming in Python, can't, but the interpreter, internally, can
-- because function _PyTuple_Resize (nor being directly callable from
Pytjon) _CAN_ resize a tuple more efficiently than you imply:
space for the tuple if you don't know how long it will
be, but you have to preallocate the space for a tuple
(I believe, in the interpreter anyway, if not at the
programmer level) so you must therefore be storing the
entire output of the generator somewhere just prior
to the tuple creation: same problem as above.

No problem at all -- if your available memory is in one chunk, the
resize can work in place. (If your available memory is fragmented you
can of course be hosed, since Python's can't move allocated blocks and
thus can't compact things up to cure your fragmentation; but even when
memory's tight it's not unusual for it to not be fragmented).
I know Alex knows all this (or has some additional
info that I don't have and which he'll shortly provide),

You don't have the sources of the Python interpreter? They're freely
available for download, why would I need to provide them?! Anyway, just
follow, with your C debugger or whatever (I think mere code inspection
will be fine), what happens when you call x = tuple(someiterator()).

Moreover, if the 'someiterator()' iterator can return a reasonable
__len__, I believe Python 2.4 should now able to use that to estimate
the needed length in advance for greater efficiency, but I haven't
looked deeply into that, yet... indeed, some simple timeit.py tests
suggest to me that 2.4 alpha 3 only implements that optimization to
preallocate lists, not tuples. But even if so, that's clearly a matter
of mere expediency, not one of any intrinsic problem as you make it
sound.

so I can only assume he was reacting only to my poor
choice of wording with 'list' and/or was ignoring the
context of the discussion (memory usage).

Claiming that you have to have all info in memory before a tuple can be
built is simply wrong -- your previous claim that the info had to be in
a list was even "wronger", sure, but that doesn't make your current
weaker claims correct in the least.


Alex
 
P

Peter Hansen

Alex said:
Claiming that you have to have all info in memory before a tuple can be
built is simply wrong -- your previous claim that the info had to be in
a list was even "wronger", sure, but that doesn't make your current
weaker claims correct in the least.

So, back in the original context here... would you agree that
use of a tuple is "quite realistic" (i.e. a technique that will
save significant amounts of memory) when "there is a huge amount
of static data that you need to access quickly"?

(Quotations from the OP of this part of the thread.)

-Peter
 
M

Mel Wilson

So, to restate my original question, why should my mutable,
content-based-eqality class instance be a valid dictionary key, when a
list is not? Which part of a list's behavior makes it inherently
unusable as a key? I'm not asking about design philosophy, I'm asking
about observable behavior.

a = [1,2,3,4,5]
D = {a:'First five natural numbers'}

The hash of the list should be based on value, so that after

b = [1,2,3,4,5]
c = D

c would be 'First five natural numbers', even though b is
not the same object as a.

Now suppose

a.append(6)

What then happens on

e = D

b no longer equals a, so the existing dictionary item can't
match b as a dictionary key. Moreover, the relevant
dictionary item is filed under hash([1,2,3,4,5]) and not
under hash([1,2,3,4,5,6]), so trying to access D with a key
equal to the appended-to value of a won't hash to the right
place to find a's item. Trouble.

If you roll your own hashable class, it's assumed you have
thought about these issues.

Regards. Mel.
 
A

Alex Martelli

Peter Hansen said:
So, back in the original context here... would you agree that
use of a tuple is "quite realistic" (i.e. a technique that will
save significant amounts of memory) when "there is a huge amount
of static data that you need to access quickly"?

A list of N items over-allocates a bit more than 12% slots (N>>2+6 for
large N). As to whether that saving can be all that significant, I have
my doubts -- a typical list or tuple will uniquely hold a different
object per slot, and an object is way larger than a slot -- the saving
is only in slots, since the objects will be around anyway whether held
in a list or tuple. So the overall savings of memory should be a few
percent at most, excepting unlikely cases where many slots of the
container are referring to the same (relatively few) objects, and even
in the best case can't be more than 10% or so. I would not normally
call even 10% a "significant" amount of memory saving.

This has nothing to do with the issue of where you're keeping the data
before building the container, really. You could have plenty of memory
_at the time you're building the container_, towards the start of your
program, and yet be quite interested in having the long-lasting
container eat up as little memory as feasible, in order to leave memory
available for other uses later, in those crunch-times where you know
your program can end up memory-cramped. Unfortunately, the savings
available this way are not big, I believe.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,006
Latest member
LauraSkx64

Latest Threads

Top