Tuple slices

G

George Sakkis

Why does slicing a tuple returns a new tuple instead of a view of the existing one, given that
tuples are immutable ? I ended up writing a custom ImmutableSequence class that does this, but I
wonder why it is not implemented for tuples.

George
 
S

Steven Bethard

Fredrik said:
George said:
Why does slicing a tuple returns a new tuple instead of a view of the existing one, given that
tuples are immutable ?
really?

a = 1, 2, 3
b = a[:]
a is b
True

My impression was that full tuple copies didn't actually copy, but that
slicing a subset of a tuple might. Not exactly sure how to test this, but:

py> a = 1, 2, 3
py> a[:2] is a[:2]
False

So _something_ at least is different between the two slices...

Steve
 
F

Fredrik Lundh

Steven said:
a = 1, 2, 3
b = a[:]
a is b
True

My impression was that full tuple copies didn't actually copy, but that slicing a subset of a
tuple might. Not exactly sure how to test this, but:

py> a = 1, 2, 3
py> a[:2] is a[:2]
False

yup. and to figure out why things are done this way, consider this case:
>>> a = give_me_a_huge_tuple()
>>> len(a) (a rather large number)
>>> b = a[:2]
>>> del a

(IIRC, I proposed to add "substrings" when I implemented the Unicode string
type, but that idea was rejected, for the very same "and how do you get rid of
the original object" reason)

</F>
 
S

Steven Bethard

Fredrik said:
Steven said:
My impression was that full tuple copies didn't actually copy, but that slicing a subset of a
tuple might. Not exactly sure how to test this, but:

py> a = 1, 2, 3
py> a[:2] is a[:2]
False

yup. and to figure out why things are done this way, consider this case:
a = give_me_a_huge_tuple()
len(a) (a rather large number)
b = a[:2]
del a

(IIRC, I proposed to add "substrings" when I implemented the Unicode string
type, but that idea was rejected, for the very same "and how do you get rid of
the original object" reason)

Ahh. Yeah, that seems sensible. I don't think I've ever written code
like that, but if I did, I'd almost certainly want it to work as it does
now...

Steve
 
J

Jeff Epler

The cpython implementation stores tuples in memory like this:
[common fields for all Python objects]
[common fields for all variable-size python objects, including tuple size]
[fields specific to tuple objects, if any]
[array of PyObject*, one for each item in the tuple]
This way of storing variable-size Python objects was chosen in part
because it reuqires only one allocation for an object, not two.
However, there is no way for one tuple to point to a slice of another
tuple.

there's no reason that some other python implementation couldn't make a
different choice.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFB9V1/Jd01MZaTXX0RAnDMAJ0f2v26tba9j46KsYV3SkylB51KlQCfeTtK
YYuTzz1nulvDc8Ad9p78AGo=
=+SO0
-----END PGP SIGNATURE-----
 
G

George Sakkis

Fredrik Lundh said:
Steven said:
a = 1, 2, 3
b = a[:]
a is b
True

My impression was that full tuple copies didn't actually copy, but that slicing a subset of a
tuple might. Not exactly sure how to test this, but:

py> a = 1, 2, 3
py> a[:2] is a[:2]
False

yup. and to figure out why things are done this way, consider this case:
a = give_me_a_huge_tuple()
len(a) (a rather large number)
b = a[:2]
del a

(IIRC, I proposed to add "substrings" when I implemented the Unicode string
type, but that idea was rejected, for the very same "and how do you get rid of
the original object" reason)

</F>

Fair enough. So perhaps the question is whether such cases are more regular than something like:
a = give_me_a_huge_tuple()
slices = [a[i:j] for i in xrange(len(a)) for j in xrange(i+1, len(a)+1)]

George
 
P

Peter Hansen

George said:
Fair enough. So perhaps the question is whether such cases are more regular than something like:
a = give_me_a_huge_tuple()
slices = [a[i:j] for i in xrange(len(a)) for j in xrange(i+1, len(a)+1)]

I believe the general usage of tuples tends to mean that
"give_me_a_huge_tuple()" just doesn't happen except with
those who insist on using tuples as though they were nothing
other than read-only lists.

If you use a tuple the way it was apparently intended, you
are extraordinarily unlikely to find yourself with a
huge one requiring slicing in such a way that you care
whether it is a "view" or a new object.

-Peter
 
T

Terry Reedy

George Sakkis said:
Why does slicing a tuple returns a new tuple instead of a
view of the existing one, given that
tuples are immutable ? I ended up writing a custom
ImmutableSequence class that does this, but I
wonder why it is not implemented for tuples.

Numpy and Numarray both do this -- generate views into arrays -- because
they are specifically aimed at large datasets for which the memory savings
overrides the complications.

Aside from the problem of not being able to delete the underlying object,
the view object for a tuple would have to be a new type of object with a
new set of methods. So there would be one more thing to learn. And so
'type(o) is tuple' would generally have to be replaced by
'isinstance(type(o), (tuple,tupleview)).

For slices that are used within expressions and then discarded,
a= (1,2,3)
b = a(1:) + a:)1)
an internal tuple view might be an interesting optimization.

Terry J. Reedy
 
G

George Sakkis

Peter Hansen said:
George said:
Fair enough. So perhaps the question is whether such cases are more regular than something like:
a = give_me_a_huge_tuple()
slices = [a[i:j] for i in xrange(len(a)) for j in xrange(i+1, len(a)+1)]

I believe the general usage of tuples tends to mean that
"give_me_a_huge_tuple()" just doesn't happen except with
those who insist on using tuples as though they were nothing
other than read-only lists.

If you use a tuple the way it was apparently intended, you
are extraordinarily unlikely to find yourself with a
huge one requiring slicing in such a way that you care
whether it is a "view" or a new object.

-Peter

Actually my initial motivation was not a huge tuple I had to slice many times. It was something much
less extraordinarily unlikely, a recursive function with a sequence parameter:

def foo(sequence):
# base_case
# do_stuff()
combine(foo(sequence[:n]),
foo(sequence[n:]))

Having each slice be a view of the original sequence instead of a fresh copy would be a Good Thing
(tm).

George
 
G

George Sakkis

Terry Reedy said:
Aside from the problem of not being able to delete the underlying object,
the view object for a tuple would have to be a new type of object with a
new set of methods.

It *could*, but it doesn't have to. One can represent a view as essentially an object with a pointer
to a memory buffer and a (start,stop,step) triple. Then a "real tuple" is just a "view" with the
triple being (0, len(sequence), 1).

George
 
J

Jeff Shannon

[My newsreader crapped out on sending this; apologies if it appears
twice.]

George said:
It *could*, but it doesn't have to. One can represent a view as essentially an object with a pointer
to a memory buffer and a (start,stop,step) triple. Then a "real tuple" is just a "view" with the
triple being (0, len(sequence), 1).

Except that that's not how Python tuples *are* constructed, and it'd
be a pretty big deal to redo the architecture of all tuples to support
this relatively special case.

Even if this re-architecting were done, you're still constructing a
new object -- the difference is that you're creating this
(start,stop,step) triple instead of duplicating a set of PyObject*
pointers, and then doing math based on those values instead of
straightforward pointer access. I'm not at all convinced that this
really saves you a significant amount for tuple slices (really, you're
still constructing a new tuple anyhow, aren't you?), and it's going to
cost a bit in both execution time and complexity in the common case
(accessing tuples without slicing). If there's a big cost in object
construction, it's probably going to be the memory allocation, and for
a reasonable tuple the size of the memory required is not going to
significantly affect the allocation time.

Jeff Shannon
Technician/Programmer
Credit International
 
G

George Sakkis

Jeff Shannon said:
[My newsreader crapped out on sending this; apologies if it appears
twice.]

George said:
It *could*, but it doesn't have to. One can represent a view as essentially an object with a pointer
to a memory buffer and a (start,stop,step) triple. Then a "real tuple" is just a "view" with the
triple being (0, len(sequence), 1).

Except that that's not how Python tuples *are* constructed, and it'd
be a pretty big deal to redo the architecture of all tuples to support
this relatively special case.

Even if this re-architecting were done, you're still constructing a
new object -- the difference is that you're creating this
(start,stop,step) triple instead of duplicating a set of PyObject*
pointers, and then doing math based on those values instead of
straightforward pointer access. I'm not at all convinced that this
really saves you a significant amount for tuple slices (really, you're
still constructing a new tuple anyhow, aren't you?), and it's going to
cost a bit in both execution time and complexity in the common case
(accessing tuples without slicing). If there's a big cost in object
construction, it's probably going to be the memory allocation, and for
a reasonable tuple the size of the memory required is not going to
significantly affect the allocation time.

Jeff Shannon
Technician/Programmer
Credit International

You're probably right about the allocation time, but my main concern is the memory required for each
slice, which can be O(n) wrt the length of the whole tuple. I would like this to be constant (at
least if there was a way around to the problem of deleting the underlying sequence).

George
 
T

Terry Reedy

George Sakkis said:
Actually my initial motivation was not a huge tuple I had to slice many
times. It was something much
less extraordinarily unlikely, a recursive function with a sequence
parameter:

def foo(sequence):
# base_case
# do_stuff()
combine(foo(sequence[:n]),
foo(sequence[n:]))

Having each slice be a view of the original sequence instead of a fresh
copy would be a Good Thing

Why? To save time? memory? Either would require more that a few bytes per
slice. If they are, you can probably use virtual slices as follows.

def foo(sequence):
def _foo(seq, start, stop)
# base_case
# do_stuff()
combine(_foo(seq, start, n), _foo(seq, n, stop))
_foo(sequence, 0, len(sequence)

In other words, if you don't really want slices copied out of the sequence,
then don't slice! Just use 2 ints to indicate the working region or view.
Both this and using a nested function with additional params are standard
techniques. This also works when the seq is mutable and you want changes
to the 'slice' to change the original, as in quicksort.

Terry J. Reedy
 
N

Nick Coghlan

George said:
You're probably right about the allocation time, but my main concern is the memory required for each
slice, which can be O(n) wrt the length of the whole tuple. I would like this to be constant (at
least if there was a way around to the problem of deleting the underlying sequence).

If you really want a view into a tuple (or any sequence for that matter):

from itertools import islice
islice(iter(seq), start, end, step)

Then use the various functions in itertools to work with the resulting iterator
(since the syntactic support for working with iterators is currently quite
limited - slicing, concatenation and repetition are spelt with
itertools.islice(), itertools.chain() and itertools.repeat(), rather than with
their standard syntactic equivalents "[x:y:z]", "+" and "*").

The above approach does suffer from the problem of holding a reference to the
original sequence, though.

Cheers,
Nick.
 
G

gsakkis

Terry said:
George Sakkis said:
Actually my initial motivation was not a huge tuple I had to slice many
times. It was something much
less extraordinarily unlikely, a recursive function with a sequence
parameter:

def foo(sequence):
# base_case
# do_stuff()
combine(foo(sequence[:n]),
foo(sequence[n:]))

Having each slice be a view of the original sequence instead of a fresh
copy would be a Good Thing

Why? To save time? memory? Either would require more that a few bytes per
slice. If they are, you can probably use virtual slices as follows.

def foo(sequence):
def _foo(seq, start, stop)
# base_case
# do_stuff()
combine(_foo(seq, start, n), _foo(seq, n, stop))
_foo(sequence, 0, len(sequence)

In other words, if you don't really want slices copied out of the sequence,
then don't slice! Just use 2 ints to indicate the working region or view.
Both this and using a nested function with additional params are standard
techniques. This also works when the seq is mutable and you want changes
to the 'slice' to change the original, as in quicksort.

Terry J. Reedy


I would say these are standard *C/C++* techniques; slicing simplifies
things and thus seems more 'pythonic' to me.

George
 
G

George Sakkis

An iterator is perfectly ok if all you want is to iterate over the
elements of a view, but as you noted, iterators are less flexible than
the underlying sequence. The view should be (or at least appear)
identical in functionality (i.e. public methods) with its underlying
sequence.

George
 
T

Terry Reedy

I would say these are standard *C/C++* techniques; slicing simplifies
things and thus seems more 'pythonic' to me.

Unless you are GvR or Tim Peters, throwing 'pythonic' at me doesn't cut it
with me, especially when you use it so shallowly. The current Pythonic
meaning of 'slice', as defined by GvR and implemented in core Python
sequence objects and documented in the reference manuals and reiterated by
GvR on PyDev in just the last day, is to make an independent in-memory
#copy# of the indicated part of a sequence. Both aspects are intentional
and desired features, not accidents.

Yes, slicing, even in the Pythonic sense, may well simplify the OP's
algorithm (of which he gave almost no detail), but the whole point of this
thread is that he does not want to do that (make copy slices). While he
might shortsightedly think that he wants Guido to redefine slice and
replace the current implementation of tuple with a more spacious, possibly
slower one that would allow that definition, that will not solve his
current problem, if indeed he has one.

As George Sakkis the OP noted, the essential data constituting a contiguous
section view are the underlying sequence and two position markers. Whether
one works with these directly or packages them into a tuple or user class
instance is a matter of relative conveniences. As it turns out, I was
thinking about the design choices involved in a generic sequence view class
just the morning before reading the original post. But I have no idea
whether GS's foo function would justify the added overhead of such a thing.
It partly depends on what he wishes to optimize, which I asked about, but
have not yet seen an answer about. So I suggested the simplest approach
that would work. And that, to me, *is* pythonic!

Terry J. Reedy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top