Feature Proposal: Sequence .join method

D

David Murmann

Hi all!

I could not find out whether this has been proposed before (there are
too many discussion on join as a sequence method with different
semantics). So, i propose a generalized .join method on all sequences
with these semantics:

def join(self, seq):
T = type(self)
result = T()
if len(seq):
result = T(seq[0])
for item in seq[1:]:
result = result + self + T(item)
return result

This would allow code like the following:

[0].join([[5], [42, 5], [1, 2, 3], [23]])

resulting in:
[5, 0, 42, 5, 0, 1, 2, 3, 0, 23]

You might have noticed that this contains actually two propsals, so if
you don't like str.join applying str() on each item in the sequence
replace the line
result = result + self + T(item)
with
result = result + self + item
My version has been turned down in the past as far as i read, yet, i
find the first version more useful in the new context... you can pass a
sequence of lists or tuples or really any sequence to the method and it
does what you think (at least what i think :).

Any comments welcome,
David.
 
D

David Murmann

David said:
replace the line
result = result + self + T(item)
with
result = result + self + item
and of course the line
result = T(seq[0])
with
result = seq[0]
 
S

Steven Bethard

David said:
Hi all!

I could not find out whether this has been proposed before (there are
too many discussion on join as a sequence method with different
semantics). So, i propose a generalized .join method on all sequences
with these semantics:

def join(self, seq):
T = type(self)
result = T()
if len(seq):
result = T(seq[0])
for item in seq[1:]:
result = result + self + T(item)
return result

This would allow code like the following:

[0].join([[5], [42, 5], [1, 2, 3], [23]])

I don't like the idea of having to put this on all sequences. If you
want this, I'd instead propose it as a function (perhaps builtin,
perhaps in some other module).

Also, this particular implementation is a bad idea. The repeated += to
result is likely to result in O(N**2) behavior.

STeVe
 
D

David Murmann

Steven said:
David said:
Hi all!

I could not find out whether this has been proposed before (there are
too many discussion on join as a sequence method with different
semantics). So, i propose a generalized .join method on all sequences
with these semantics:

def join(self, seq):
T = type(self)
result = T()
if len(seq):
result = T(seq[0])
for item in seq[1:]:
result = result + self + T(item)
return result

This would allow code like the following:

[0].join([[5], [42, 5], [1, 2, 3], [23]])

I don't like the idea of having to put this on all sequences. If you
want this, I'd instead propose it as a function (perhaps builtin,
perhaps in some other module).

Also, this particular implementation is a bad idea. The repeated += to
result is likely to result in O(N**2) behavior.

STeVe

Hi and thanks for the fast reply,

i just figured out that the following implementation is probably much
faster, and short enough to be used in place for every of my use cases:

def join(sep, seq):
return reduce(lambda x, y: x + sep + y, seq, type(sep)())

so, i'm withdrawing my proposal, and instead propose to keep reduce and
lambda in py3k ;).

thanks again,
David.
 
D

David Murmann

def join(sep, seq):
return reduce(lambda x, y: x + sep + y, seq, type(sep)())

damn, i wanted too much. Proper implementation:

def join(sep, seq):
if len(seq):
return reduce(lambda x, y: x + sep + y, seq)
return type(sep)()

but still short enough

see you,
David.
 
F

Fredrik Lundh

David said:
I could not find out whether this has been proposed before (there are
too many discussion on join as a sequence method with different
semantics). So, i propose a generalized .join method on all sequences

so all you have to do now is to find the sequence base class, and
you're done...

</F>
 
G

Guest

I don't like the idea of having to put this on all sequences. If you
want this, I'd instead propose it as a function (perhaps builtin,
perhaps in some other module).

itertools module seems the right place for it.

itertools.chain(*a)

is the same as the proposed

[].join(a)
 
T

Terry Reedy

David Murmann said:
damn, i wanted too much. Proper implementation:

def join(sep, seq):
if len(seq):
return reduce(lambda x, y: x + sep + y, seq)
return type(sep)()

but still short enough

For general use, this is both too general and not general enough.

If len(seq) exists then seq is probably reiterable, in which case it may be
possible to determine the output length and preallocate to make the process
O(n) instead of O(n**2). I believe str.join does this. A user written
join for lists could also. A tuple function could make a list first and
then tuple(it) at the end.

If seq is a general (non-empty) iterable, len(seq) may raise an exception
even though the reduce would work fine.

Terry J. Reedy
 
M

Michael Spencer

Terry said:
For general use, this is both too general and not general enough.

If len(seq) exists then seq is probably reiterable, in which case it may be
possible to determine the output length and preallocate to make the process
O(n) instead of O(n**2). I believe str.join does this. A user written
join for lists could also. A tuple function could make a list first and
then tuple(it) at the end.

If seq is a general (non-empty) iterable, len(seq) may raise an exception
even though the reduce would work fine.

Terry J. Reedy
For the general iterable case, you could have something like this:
... it = iter(iterable)
... next = it.next()
... try:
... while 1:
... item = next
... next = it.next()
... yield item
... yield sep
... except StopIteration:
... yield item
...
>>> list(interleave(100,range(10))) [0, 100, 1, 100, 2, 100, 3, 100, 4, 100, 5, 100, 6, 100, 7, 100, 8, 100, 9]
>>>

but I can't think of a use for it ;-)

Michael
 
D

David Murmann

Michael said:
Terry said:
For general use, this is both too general and not general enough.

If len(seq) exists then seq is probably reiterable, in which case it
may be possible to determine the output length and preallocate to make
the process O(n) instead of O(n**2). I believe str.join does this. A
user written join for lists could also. A tuple function could make a
list first and then tuple(it) at the end.

If seq is a general (non-empty) iterable, len(seq) may raise an
exception even though the reduce would work fine.

Terry J. Reedy
For the general iterable case, you could have something like this:
... it = iter(iterable)
... next = it.next()
... try:
... while 1:
... item = next
... next = it.next()
... yield item
... yield sep
... except StopIteration:
... yield item
...[0, 100, 1, 100, 2, 100, 3, 100, 4, 100, 5, 100, 6, 100, 7, 100, 8,
100, 9]

Well, as (e-mail address removed) pointed out, there is already
itertools.chain which almost does this. In my opinion it could be useful
to add an optional keyword argument to it (like "connector" or "link"),
which is iterated between the other arguments.
but I can't think of a use for it ;-)

Of course, i have a use case, but i don't know whether this is useful
enough to be added to the standard library. (Yet this would be a much
smaller change than changing all sequences ;)

thanks for all replies,
David.
 
D

David Murmann

Hi again,

i wrote a small patch that changes itertools.chain to take a "link"
keyword argument. If given, it is iterated between the normal arguments,
otherwise the behavior is unchanged.

I'd like to hear your opinion on both, the functionality and the actual
implementation (as this is one of the first things i ever wrote in C).

till then,
David.

Index: python/dist/src/Modules/itertoolsmodule.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/itertoolsmodule.c,v
retrieving revision 1.41
diff -c -r1.41 itertoolsmodule.c
*** python/dist/src/Modules/itertoolsmodule.c 26 Aug 2005 06:42:30 -0000 1.41
--- python/dist/src/Modules/itertoolsmodule.c 30 Sep 2005 22:28:38 -0000
***************
*** 1561,1587 ****
int tuplesize = PySequence_Length(args);
int i;
PyObject *ittuple;

! if (!_PyArg_NoKeywords("chain()", kwds))
! return NULL;

/* obtain iterators */
assert(PyTuple_Check(args));
ittuple = PyTuple_New(tuplesize);
if(ittuple == NULL)
return NULL;
! for (i=0; i < tuplesize; ++i) {
! PyObject *item = PyTuple_GET_ITEM(args, i);
! PyObject *it = PyObject_GetIter(item);
! if (it == NULL) {
! if (PyErr_ExceptionMatches(PyExc_TypeError))
! PyErr_Format(PyExc_TypeError,
! "chain argument #%d must support iteration",
! i+1);
! Py_DECREF(ittuple);
! return NULL;
}
- PyTuple_SET_ITEM(ittuple, i, it);
}

/* create chainobject structure */
--- 1561,1621 ----
int tuplesize = PySequence_Length(args);
int i;
PyObject *ittuple;
+ PyObject *link = NULL;

! if (kwds != NULL && PyDict_Check(kwds)) {
! link = PyDict_GetItemString(kwds, "link");
! if (link != NULL)
! /* create more space for the link iterators */
! tuplesize = tuplesize*2-1;
! }

/* obtain iterators */
assert(PyTuple_Check(args));
ittuple = PyTuple_New(tuplesize);
if(ittuple == NULL)
return NULL;
! if (link == NULL) {
! /* no keyword argument provided */
! for (i=0; i < tuplesize; ++i) {
! PyObject *item = PyTuple_GET_ITEM(args, i);
! PyObject *it = PyObject_GetIter(item);
! if (it == NULL) {
! if (PyErr_ExceptionMatches(PyExc_TypeError))
! PyErr_Format(PyExc_TypeError,
! "chain argument #%d must support iteration",
! i+1);
! Py_DECREF(ittuple);
! return NULL;
! }
! PyTuple_SET_ITEM(ittuple, i, it);
! }
! }
! else {
! for (i=0; i < tuplesize; ++i) {
! PyObject *it = NULL;
! if (i%2 == 0) {
! PyObject *item = PyTuple_GET_ITEM(args, i/2);
! it = PyObject_GetIter(item);
! }
! else {
! it = PyObject_GetIter(link);
! }
! if (it == NULL) {
! if (PyErr_ExceptionMatches(PyExc_TypeError)) {
! if (i%2 == 0)
! PyErr_Format(PyExc_TypeError,
! "chain argument #%d must support iteration",
! i/2+1);
! else
! PyErr_Format(PyExc_TypeError,
! "chain keyword argument link must support iteration");
! }
! Py_DECREF(ittuple);
! return NULL;
! }
! PyTuple_SET_ITEM(ittuple, i, it);
}
}

/* create chainobject structure */
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top