Tuple slices

Discussion in 'Python' started by George Sakkis, Jan 24, 2005.

1. George SakkisGuest

Why does slicing a tuple returns a new tuple instead of a view of the existing one, given that
tuples are immutable ? I ended up writing a custom ImmutableSequence class that does this, but I
wonder why it is not implemented for tuples.

George

George Sakkis, Jan 24, 2005

2. Fredrik LundhGuest

George Sakkis wrote:

> Why does slicing a tuple returns a new tuple instead of a view of the existing one, given that
> tuples are immutable ?

really?

>>> a = 1, 2, 3
>>> b = a[:]
>>> a is b

True

</F>

Fredrik Lundh, Jan 24, 2005

3. Steven BethardGuest

Fredrik Lundh wrote:
> George Sakkis wrote:
>
>>Why does slicing a tuple returns a new tuple instead of a view of the existing one, given that
>>tuples are immutable ?

>
> really?
>
>
>>>>a = 1, 2, 3
>>>>b = a[:]
>>>>a is b

> True

My impression was that full tuple copies didn't actually copy, but that
slicing a subset of a tuple might. Not exactly sure how to test this, but:

py> a = 1, 2, 3
py> a[:2] is a[:2]
False

So _something_ at least is different between the two slices...

Steve

Steven Bethard, Jan 24, 2005
4. Pedro WerneckGuest

On Mon, 24 Jan 2005 18:45:46 +0100
"Fredrik Lundh" <> wrote:

> George Sakkis wrote:
>
> > Why does slicing a tuple returns a new tuple instead of a view of
> > the existing one, given that tuples are immutable ?

>
> really?

Well... seems like this case is optimized to return the original tuple
just incrementing its reference count and returning

tupleobject.c, 330-335

if (ilow == 0 && ihigh == a->ob_size && PyTuple_CheckExact(a)) {
Py_INCREF(a);
return (PyObject *)a;
}

>
> >>> a = 1, 2, 3
> >>> b = a[:]
> >>> a is b

> True
>
> </F>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list

Pedro Werneck, Jan 24, 2005
5. Pedro WerneckGuest

On Mon, 24 Jan 2005 18:45:46 +0100
"Fredrik Lundh" <> wrote:

> George Sakkis wrote:
>
> > Why does slicing a tuple returns a new tuple instead of a view of
> > the existing one, given that tuples are immutable ?

>
> really?

Well... seems like this case (slicing the whole tuple) is optimized to
return the original tuple just incrementing its reference count and returning

tupleobject.c, 330-335

if (ilow == 0 && ihigh == a->ob_size && PyTuple_CheckExact(a)) {
Py_INCREF(a);
return (PyObject *)a;
}

>
> >>> a = 1, 2, 3
> >>> b = a[:]
> >>> a is b

> True
>
> </F>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list

Pedro Werneck, Jan 24, 2005
6. Fredrik LundhGuest

Steven Bethard wrote:

>>>>>a = 1, 2, 3
>>>>>b = a[:]
>>>>>a is b

>> True

>
> My impression was that full tuple copies didn't actually copy, but that slicing a subset of a
> tuple might. Not exactly sure how to test this, but:
>
> py> a = 1, 2, 3
> py> a[:2] is a[:2]
> False

yup. and to figure out why things are done this way, consider this case:

>>> a = give_me_a_huge_tuple()
>>> len(a)

(a rather large number)
>>> b = a[:2]
>>> del a

(IIRC, I proposed to add "substrings" when I implemented the Unicode string
type, but that idea was rejected, for the very same "and how do you get rid of
the original object" reason)

</F>

Fredrik Lundh, Jan 24, 2005
7. Steven BethardGuest

Fredrik Lundh wrote:
> Steven Bethard wrote:
>>
>>My impression was that full tuple copies didn't actually copy, but that slicing a subset of a
>>tuple might. Not exactly sure how to test this, but:
>>
>>py> a = 1, 2, 3
>>py> a[:2] is a[:2]
>>False

>
> yup. and to figure out why things are done this way, consider this case:
>
> >>> a = give_me_a_huge_tuple()
> >>> len(a)

> (a rather large number)
> >>> b = a[:2]
> >>> del a

>
> (IIRC, I proposed to add "substrings" when I implemented the Unicode string
> type, but that idea was rejected, for the very same "and how do you get rid of
> the original object" reason)

Ahh. Yeah, that seems sensible. I don't think I've ever written code
like that, but if I did, I'd almost certainly want it to work as it does
now...

Steve

Steven Bethard, Jan 24, 2005
8. Jeff EplerGuest

The cpython implementation stores tuples in memory like this:
[common fields for all Python objects]
[common fields for all variable-size python objects, including tuple size]
[fields specific to tuple objects, if any]
[array of PyObject*, one for each item in the tuple]
This way of storing variable-size Python objects was chosen in part
because it reuqires only one allocation for an object, not two.
However, there is no way for one tuple to point to a slice of another
tuple.

there's no reason that some other python implementation couldn't make a
different choice.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFB9V1/Jd01MZaTXX0RAnDMAJ0f2v26tba9j46KsYV3SkylB51KlQCfeTtK
=+SO0
-----END PGP SIGNATURE-----

Jeff Epler, Jan 24, 2005
9. George SakkisGuest

"Fredrik Lundh" <> wrote in message
news:...
> Steven Bethard wrote:
>
> >>>>>a = 1, 2, 3
> >>>>>b = a[:]
> >>>>>a is b
> >> True

> >
> > My impression was that full tuple copies didn't actually copy, but that slicing a subset of a
> > tuple might. Not exactly sure how to test this, but:
> >
> > py> a = 1, 2, 3
> > py> a[:2] is a[:2]
> > False

>
> yup. and to figure out why things are done this way, consider this case:
>
> >>> a = give_me_a_huge_tuple()
> >>> len(a)

> (a rather large number)
> >>> b = a[:2]
> >>> del a

>
> (IIRC, I proposed to add "substrings" when I implemented the Unicode string
> type, but that idea was rejected, for the very same "and how do you get rid of
> the original object" reason)
>
> </F>

Fair enough. So perhaps the question is whether such cases are more regular than something like:
a = give_me_a_huge_tuple()
slices = [a[i:j] for i in xrange(len(a)) for j in xrange(i+1, len(a)+1)]

George

George Sakkis, Jan 24, 2005
10. Peter HansenGuest

George Sakkis wrote:
> Fair enough. So perhaps the question is whether such cases are more regular than something like:
> a = give_me_a_huge_tuple()
> slices = [a[i:j] for i in xrange(len(a)) for j in xrange(i+1, len(a)+1)]

I believe the general usage of tuples tends to mean that
"give_me_a_huge_tuple()" just doesn't happen except with
those who insist on using tuples as though they were nothing

If you use a tuple the way it was apparently intended, you
are extraordinarily unlikely to find yourself with a
huge one requiring slicing in such a way that you care
whether it is a "view" or a new object.

-Peter

Peter Hansen, Jan 24, 2005
11. Terry ReedyGuest

"George Sakkis" <> wrote in message
news:...
> Why does slicing a tuple returns a new tuple instead of a
> view of the existing one, given that
> tuples are immutable ? I ended up writing a custom
> ImmutableSequence class that does this, but I
> wonder why it is not implemented for tuples.

Numpy and Numarray both do this -- generate views into arrays -- because
they are specifically aimed at large datasets for which the memory savings
overrides the complications.

Aside from the problem of not being able to delete the underlying object,
the view object for a tuple would have to be a new type of object with a
new set of methods. So there would be one more thing to learn. And so
'type(o) is tuple' would generally have to be replaced by
'isinstance(type(o), (tuple,tupleview)).

For slices that are used within expressions and then discarded,
a= (1,2,3)
b = a(1 + a1)
an internal tuple view might be an interesting optimization.

Terry J. Reedy

Terry Reedy, Jan 24, 2005
12. George SakkisGuest

"Peter Hansen" <> wrote in message news:...
> George Sakkis wrote:
> > Fair enough. So perhaps the question is whether such cases are more regular than something like:
> > a = give_me_a_huge_tuple()
> > slices = [a[i:j] for i in xrange(len(a)) for j in xrange(i+1, len(a)+1)]

>
> I believe the general usage of tuples tends to mean that
> "give_me_a_huge_tuple()" just doesn't happen except with
> those who insist on using tuples as though they were nothing
>
> If you use a tuple the way it was apparently intended, you
> are extraordinarily unlikely to find yourself with a
> huge one requiring slicing in such a way that you care
> whether it is a "view" or a new object.
>
> -Peter

Actually my initial motivation was not a huge tuple I had to slice many times. It was something much
less extraordinarily unlikely, a recursive function with a sequence parameter:

def foo(sequence):
# base_case
# do_stuff()
combine(foo(sequence[:n]),
foo(sequence[n:]))

Having each slice be a view of the original sequence instead of a fresh copy would be a Good Thing
(tm).

George

George Sakkis, Jan 24, 2005
13. George SakkisGuest

"Terry Reedy" <> wrote in message
news:...
>
> Aside from the problem of not being able to delete the underlying object,
> the view object for a tuple would have to be a new type of object with a
> new set of methods.

It *could*, but it doesn't have to. One can represent a view as essentially an object with a pointer
to a memory buffer and a (start,stop,step) triple. Then a "real tuple" is just a "view" with the
triple being (0, len(sequence), 1).

George

George Sakkis, Jan 24, 2005
14. Jeff ShannonGuest

[My newsreader crapped out on sending this; apologies if it appears
twice.]

George Sakkis wrote:

> "Terry Reedy" <> wrote in message
> news:...
>
>>Aside from the problem of not being able to delete the underlying object,
>>the view object for a tuple would have to be a new type of object with a
>>new set of methods.

>
> It *could*, but it doesn't have to. One can represent a view as essentially an object with a pointer
> to a memory buffer and a (start,stop,step) triple. Then a "real tuple" is just a "view" with the
> triple being (0, len(sequence), 1).

Except that that's not how Python tuples *are* constructed, and it'd
be a pretty big deal to redo the architecture of all tuples to support
this relatively special case.

Even if this re-architecting were done, you're still constructing a
new object -- the difference is that you're creating this
(start,stop,step) triple instead of duplicating a set of PyObject*
pointers, and then doing math based on those values instead of
straightforward pointer access. I'm not at all convinced that this
really saves you a significant amount for tuple slices (really, you're
still constructing a new tuple anyhow, aren't you?), and it's going to
cost a bit in both execution time and complexity in the common case
(accessing tuples without slicing). If there's a big cost in object
construction, it's probably going to be the memory allocation, and for
a reasonable tuple the size of the memory required is not going to
significantly affect the allocation time.

Jeff Shannon
Technician/Programmer
Credit International

Jeff Shannon, Jan 25, 2005
15. George SakkisGuest

"Jeff Shannon" <> wrote in message news:...
> [My newsreader crapped out on sending this; apologies if it appears
> twice.]
>
> George Sakkis wrote:
>
> > "Terry Reedy" <> wrote in message
> > news:...
> >
> >>Aside from the problem of not being able to delete the underlying object,
> >>the view object for a tuple would have to be a new type of object with a
> >>new set of methods.

> >
> > It *could*, but it doesn't have to. One can represent a view as essentially an object with a

pointer
> > to a memory buffer and a (start,stop,step) triple. Then a "real tuple" is just a "view" with the
> > triple being (0, len(sequence), 1).

>
> Except that that's not how Python tuples *are* constructed, and it'd
> be a pretty big deal to redo the architecture of all tuples to support
> this relatively special case.
>
> Even if this re-architecting were done, you're still constructing a
> new object -- the difference is that you're creating this
> (start,stop,step) triple instead of duplicating a set of PyObject*
> pointers, and then doing math based on those values instead of
> straightforward pointer access. I'm not at all convinced that this
> really saves you a significant amount for tuple slices (really, you're
> still constructing a new tuple anyhow, aren't you?), and it's going to
> cost a bit in both execution time and complexity in the common case
> (accessing tuples without slicing). If there's a big cost in object
> construction, it's probably going to be the memory allocation, and for
> a reasonable tuple the size of the memory required is not going to
> significantly affect the allocation time.
>
> Jeff Shannon
> Technician/Programmer
> Credit International
>

You're probably right about the allocation time, but my main concern is the memory required for each
slice, which can be O(n) wrt the length of the whole tuple. I would like this to be constant (at
least if there was a way around to the problem of deleting the underlying sequence).

George

George Sakkis, Jan 25, 2005
16. Terry ReedyGuest

"George Sakkis" <> wrote in message
news:...
> Actually my initial motivation was not a huge tuple I had to slice many
> times. It was something much
> less extraordinarily unlikely, a recursive function with a sequence
> parameter:
>
> def foo(sequence):
> # base_case
> # do_stuff()
> combine(foo(sequence[:n]),
> foo(sequence[n:]))
>
> Having each slice be a view of the original sequence instead of a fresh
> copy would be a Good Thing

Why? To save time? memory? Either would require more that a few bytes per
slice. If they are, you can probably use virtual slices as follows.

def foo(sequence):
def _foo(seq, start, stop)
# base_case
# do_stuff()
combine(_foo(seq, start, n), _foo(seq, n, stop))
_foo(sequence, 0, len(sequence)

In other words, if you don't really want slices copied out of the sequence,
then don't slice! Just use 2 ints to indicate the working region or view.
Both this and using a nested function with additional params are standard
techniques. This also works when the seq is mutable and you want changes
to the 'slice' to change the original, as in quicksort.

Terry J. Reedy

Terry Reedy, Jan 25, 2005
17. Nick CoghlanGuest

George Sakkis wrote:
> You're probably right about the allocation time, but my main concern is the memory required for each
> slice, which can be O(n) wrt the length of the whole tuple. I would like this to be constant (at
> least if there was a way around to the problem of deleting the underlying sequence).

If you really want a view into a tuple (or any sequence for that matter):

from itertools import islice
islice(iter(seq), start, end, step)

Then use the various functions in itertools to work with the resulting iterator
(since the syntactic support for working with iterators is currently quite
limited - slicing, concatenation and repetition are spelt with
itertools.islice(), itertools.chain() and itertools.repeat(), rather than with
their standard syntactic equivalents "[x:y:z]", "+" and "*").

The above approach does suffer from the problem of holding a reference to the
original sequence, though.

Cheers,
Nick.

--
Nick Coghlan | | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net

Nick Coghlan, Jan 25, 2005
18. Guest

Terry Reedy wrote:
> "George Sakkis" <> wrote in message
> news:...
> > Actually my initial motivation was not a huge tuple I had to slice

many
> > times. It was something much
> > less extraordinarily unlikely, a recursive function with a sequence

> > parameter:
> >
> > def foo(sequence):
> > # base_case
> > # do_stuff()
> > combine(foo(sequence[:n]),
> > foo(sequence[n:]))
> >
> > Having each slice be a view of the original sequence instead of a

fresh
> > copy would be a Good Thing

>
> Why? To save time? memory? Either would require more that a few

bytes per
> slice. If they are, you can probably use virtual slices as follows.
>
> def foo(sequence):
> def _foo(seq, start, stop)
> # base_case
> # do_stuff()
> combine(_foo(seq, start, n), _foo(seq, n, stop))
> _foo(sequence, 0, len(sequence)
>
> In other words, if you don't really want slices copied out of the

sequence,
> then don't slice! Just use 2 ints to indicate the working region or

view.
> Both this and using a nested function with additional params are

standard
> techniques. This also works when the seq is mutable and you want

changes
> to the 'slice' to change the original, as in quicksort.
>
> Terry J. Reedy

I would say these are standard *C/C++* techniques; slicing simplifies
things and thus seems more 'pythonic' to me.

George

, Jan 25, 2005
19. George SakkisGuest

An iterator is perfectly ok if all you want is to iterate over the
elements of a view, but as you noted, iterators are less flexible than
the underlying sequence. The view should be (or at least appear)
identical in functionality (i.e. public methods) with its underlying
sequence.

George

George Sakkis, Jan 25, 2005
20. Terry ReedyGuest

<> wrote in message
news:...
>
> Terry Reedy wrote:
>> In other words, if you don't really want slices copied out of the
>> sequence,
>> then don't slice! Just use 2 ints to indicate the working region or
>> view.
>> Both this and using a nested function with additional params are
>> standard
>> techniques. This also works when the seq is mutable and you want
>> changes
>> to the 'slice' to change the original, as in quicksort.

>
> I would say these are standard *C/C++* techniques; slicing simplifies
> things and thus seems more 'pythonic' to me.

Unless you are GvR or Tim Peters, throwing 'pythonic' at me doesn't cut it
with me, especially when you use it so shallowly. The current Pythonic
meaning of 'slice', as defined by GvR and implemented in core Python
sequence objects and documented in the reference manuals and reiterated by
GvR on PyDev in just the last day, is to make an independent in-memory
#copy# of the indicated part of a sequence. Both aspects are intentional
and desired features, not accidents.

Yes, slicing, even in the Pythonic sense, may well simplify the OP's
algorithm (of which he gave almost no detail), but the whole point of this
thread is that he does not want to do that (make copy slices). While he
might shortsightedly think that he wants Guido to redefine slice and
replace the current implementation of tuple with a more spacious, possibly
slower one that would allow that definition, that will not solve his
current problem, if indeed he has one.

As George Sakkis the OP noted, the essential data constituting a contiguous
section view are the underlying sequence and two position markers. Whether
one works with these directly or packages them into a tuple or user class
instance is a matter of relative conveniences. As it turns out, I was
thinking about the design choices involved in a generic sequence view class
just the morning before reading the original post. But I have no idea
whether GS's foo function would justify the added overhead of such a thing.
It partly depends on what he wishes to optimize, which I asked about, but
have not yet seen an answer about. So I suggested the simplest approach
that would work. And that, to me, *is* pythonic!

Terry J. Reedy

Terry Reedy, Jan 25, 2005