transforming a list into a string

J

jblazi

Let us assume I have a list like

['1','2','7','8','12','13]

and would like to transoform it into the string

'{1,2},{7,8},{12,13}'

Which is the simplest way of achiebing this? (The list is in fact much
longer and I may have to cut the resulting strings into chunks of 100 or
so.)

TIA,

jb
 
P

Peter Otten

jblazi said:
Let us assume I have a list like

['1','2','7','8','12','13]

and would like to transoform it into the string

'{1,2},{7,8},{12,13}'

Which is the simplest way of achiebing this? (The list is in fact much
longer and I may have to cut the resulting strings into chunks of 100 or
so.)
from itertools import izip
items = ['1','2','7','8','12','13']
it = iter(items)
",".join(["{%s,%s}" % i for i in izip(it, it)])
'{1,2},{7,8},{12,13}'


Peter
 
R

Roy Smith

jblazi said:
Let us assume I have a list like

['1','2','7','8','12','13]

I'm assuming there's supposed to be another single-quote after the 13?
and would like to transoform it into the string

'{1,2},{7,8},{12,13}'

This works, and is pretty straight-forward:

source = ['1','2','7','8','12','13']
temp = []

while source:
x = source.pop(0)
y = source.pop(0)
temp.append ('{%s,%s}' % (x, y))

result = ','.join (temp)
print result

This prints what you want:

$ /tmp/list2string.py
{1,2},{7,8},{12,13}

Accumulating the repeating bits of the result string in a list, and then
putting them together with a join operation is a common idiom in python.
You can build up strings by doing string addition:

temp = temp + '{%s,%s}' % (x, y)

This will have the same result, but suffers from quadratic run times as
it keeps building and destroying immutable strings.
Which is the simplest way of achiebing this? (The list is in fact much
longer and I may have to cut the resulting strings into chunks of 100 or
so.)

How long is "much longer", and how important is it that this runs fast?
The code above runs in O(n). You can probably play some tricks to tweak
the speed a little, but in the big picture, you're not going to do any
better than O(n).

The above code also assumes you have an even number of items in source,
and will bomb if you don't. You probably want to fix that :)
 
J

Jp Calderone

[snip]

This works, and is pretty straight-forward:

source = ['1','2','7','8','12','13']
temp = []

while source:
x = source.pop(0)
y = source.pop(0)
temp.append ('{%s,%s}' % (x, y))

result = ','.join (temp)
print result

[snip]

How long is "much longer", and how important is it that this runs fast?
The code above runs in O(n). You can probably play some tricks to tweak
the speed a little, but in the big picture, you're not going to do any
better than O(n).

Are you sure? Did you consider the complexity of list.pop(0)?

Jp
 
R

Roy Smith

Jp Calderone said:
Are you sure? Did you consider the complexity of list.pop(0)?

I'm assuming list.pop() is O(1), i.e. constant time. Is it not?

Of course, one of my pet peeves about Python is that the complexity of
the various container operations are not documented. This leaves users
needing to guess what they are, based on assumptions of how the
containers are probably implemented.
 
T

Terry Reedy

Roy Smith said:
I'm assuming list.pop() is O(1), i.e. constant time. Is it not?

Yes, list.pop() (from the end) is O(1) (which is why -1 is the default arg
;-).
However, list.pop(0) (from the front) was O(n) thru 2.3.
The listobject.c code has been rewritten for 2.4 and making the latter O(1)
also *may* have been one of the results. (This was discussed as desireable
but I don't know the result.)
Of course, one of my pet peeves about Python is that the complexity of
the various container operations are not documented.

So volunteer a doc patch, or contribute $ to hire someone ;-).

Terry J. Reedy
 
T

Tim Peters

[Terry Reedy]
However, list.pop(0) (from the front) was O(n) thru 2.3.
The listobject.c code has been rewritten for 2.4 and making the latter O(1)
also *may* have been one of the results. (This was discussed as desireable
but I don't know the result.)

Didn't happen -- it would have been too disruptive to the basic list
type. Instead, Raymond Hettinger added a cool new dequeue type for
2.4, with O(1) inserts and deletes at both ends, regardless of access
pattern (all the hacks suggested for the base list type appeared to
suffer under *some* pathological (wrt the hack in question) access
pattern). However, general indexing into a deque is O(N) (albeit with
a small constant factor). deque[0] and deque[-1] are O(1).

[Roy Smith]
Lists and tuples and array.arrays are contiguous vectors, dicts are
hash tables. Everything follows from that in obvious ways -- butyou
already knew that. The *language* doesn't guarantee any of it,
though; that's just how CPython is implemented. FYI, the deque type
is implemented as a doubly-linked list of blocks, each block a
contiguous vector of (at most) 46 elements.
 
R

Roy Smith

Tim Peters said:
Lists and tuples and array.arrays are contiguous vectors, dicts are
hash tables. Everything follows from that in obvious ways -- but you
already knew that.

OK, so it sounds like you want to reverse() the list before the loop,
then pop() items off the back. Two O(n) passes [I'm assuming reverse()
is O(N)] beats O(n^2).
 
T

Tim Peters

[Roy Smith]
OK, so it sounds like you want to reverse() the list before the loop,
then pop() items off the back. Two O(n) passes [I'm assuming reverse()
is O(N)]
Yes.

beats O(n^2).

Absolutely. Note that Peter Otten previously posted a lovely O(N)
solution in this thread, although it may be too clever for some
tastes:
from itertools import izip
items = ['1','2','7','8','12','13']
it = iter(items)
",".join(["{%s,%s}" % i for i in izip(it, it)]) '{1,2},{7,8},{12,13}'
 
R

Roy Smith

Tim Peters said:
Note that Peter Otten previously posted a lovely O(N)
solution in this thread, although it may be too clever for some
tastes:
from itertools import izip
items = ['1','2','7','8','12','13']
it = iter(items)
",".join(["{%s,%s}" % i for i in izip(it, it)])
'{1,2},{7,8},{12,13}'

Personally, I'm not a big fan of clever one-liners. They never seem
like such a good idea 6 months from now when you're trying to figure out
what you meant when you wrote it 6 months ago.
 
A

Andrew Bennetts

Tim Peters said:
Note that Peter Otten previously posted a lovely O(N)
solution in this thread, although it may be too clever for some
tastes:
from itertools import izip
items = ['1','2','7','8','12','13']
it = iter(items)
",".join(["{%s,%s}" % i for i in izip(it, it)])
'{1,2},{7,8},{12,13}'

Personally, I'm not a big fan of clever one-liners. They never seem
like such a good idea 6 months from now when you're trying to figure out
what you meant when you wrote it 6 months ago.

It's a two-liner, not a one-liner (although it could be made into a
one-liner with enough contortions...).

The only other way I could see to expand this solution would be to write it
as:
it = iter(items)
pairs = izip(it, it)
s = ",".join(["{%s,%s}" % i for i in pairs])

I don't know if three-liners meet your threshold for verbosity ;)

Well, you could write it as:

pairs = []
it = iter(items):
while True:
try:
pair = it.next(), it.next()
except StopIteration:
break
pairs.append(pair)
s = ",".join(["{%s,%s}" % i for i in pairs])

But at that point, the scaffolding is large enough that it obscures the
purpose -- I definitely find this harder to read than the two-liner.

I find Peter's original form easy to read -- if you understand how "izip(it,
it)" works (which is a very simple and elegant way to iterate over (it[n],
it[n+1]) pairs), the rest is very clear.

-Andrew.
 
R

Roy Smith

Andrew Bennetts said:
It's a two-liner, not a one-liner (although it could be made into a
one-liner with enough contortions...).

The only other way I could see to expand this solution would be to write it
as:
it = iter(items)
pairs = izip(it, it)
s = ",".join(["{%s,%s}" % i for i in pairs])

I don't know if three-liners meet your threshold for verbosity ;)

It's better, but I'd unroll it one more step and write:

it = iter (items)
pairs = izip (it, it)
strings = ["{%s,%s}" % i for i in pairs]
s = ",".join (strings)

I'm also not particularly happy about the choice of "it" as a variable
name. The "izip (it, it)" construct makes me think of Dr. Evil :)
I find Peter's original form easy to read -- if you understand how "izip(it,
it)" works (which is a very simple and elegant way to iterate over (it[n],
it[n+1]) pairs), the rest is very clear.

It's not the izip bit that bothers me in the original, it's the deeply
nested construct of

",".join(["{%s,%s}" % i for i in izip(it, it)])

There's too much going on in that one line to get your head around
easily. I suppose people who are really into functional programming
might find it understandable, but I find it rather obtuse.
 
B

Bengt Richter

Tim Peters said:
Note that Peter Otten previously posted a lovely O(N)
solution in this thread, although it may be too clever for some
tastes:

from itertools import izip
items = ['1','2','7','8','12','13']
it = iter(items)
",".join(["{%s,%s}" % i for i in izip(it, it)])
'{1,2},{7,8},{12,13}'

Personally, I'm not a big fan of clever one-liners. They never seem
like such a good idea 6 months from now when you're trying to figure out
what you meant when you wrote it 6 months ago.

It's a two-liner, not a one-liner (although it could be made into a
one-liner with enough contortions...).
Assuming items definition doesn't count in the line count,
>>> items = ['1','2','7','8','12','13']

then one line seems to do it, not that obscurely (depending on your glasses ;-)
>>> ",".join(["{%s,%s}"%(n(),n()) for n in [iter(items).next] for i in xrange(0,len(items),2)])
'{1,2},{7,8},{12,13}'
The only other way I could see to expand this solution would be to write it
as:
it = iter(items)
pairs = izip(it, it)
s = ",".join(["{%s,%s}" % i for i in pairs])

I don't know if three-liners meet your threshold for verbosity ;)

Well, you could write it as:

pairs = []
it = iter(items):
while True:
try:
pair = it.next(), it.next()
except StopIteration:
break
pairs.append(pair)
s = ",".join(["{%s,%s}" % i for i in pairs])

But at that point, the scaffolding is large enough that it obscures the
purpose -- I definitely find this harder to read than the two-liner.

I find Peter's original form easy to read -- if you understand how "izip(it,
it)" works (which is a very simple and elegant way to iterate over (it[n],
it[n+1]) pairs), the rest is very clear.
Agreed.

Regards,
Bengt Richter
 
P

Peter Otten

Roy said:
I'm also not particularly happy about the choice of "it" as a variable
name. The "izip (it, it)" construct makes me think of Dr. Evil :)

Using "it" is just my convention (contradictio in adiecto :) for iterators
used in a pure algorithm rather than with a meaning determined by a
concrete use case. It's similar to the traditional i in counting loops,
having grown an additional "t" to disambiguate it. If you have a better
name for the purpose, don't hesitate to tell me...
It's not the izip bit that bothers me in the original, it's the deeply
nested construct of

",".join(["{%s,%s}" % i for i in izip(it, it)])

There's too much going on in that one line to get your head around
easily. I suppose people who are really into functional programming
might find it understandable, but I find it rather obtuse.

I don't think three levels can be called "deep nesting". In particular the
"somestr".join() construct is so ubiquitous that your brain will "optimize"
it away after reading a small amount of Python code. But I see my oneliner
meets serious opposition. Well, sometimes a few self-explanatory names and
a helper function can do wonders:

import itertools

def pairs(seq):
it = iter(seq)
return itertools.izip(it, it)

coords = ['1','2','7','8','12','13']
points = []
for xy in pairs(coords):
points.append("{%s, %s}" % xy)

print ", ".join(points)

That should be clear even to someone who has never heard of generators. Note
that pairs() is only called once and therefore does not affect the speed of
execution. Personally, I'd still go with the list comprehension instead of
the above for-loop.

By the way - expanding on Michele Simionato's chop(),
http://mail.python.org/pipermail/python-list/2004-May/222673.html
I've written a generalized version of pairs():

_missing = object()

def ntuples(seq, N=2, filler=_missing):
""" Yield a sequence in portions of N-tuples.
[('a', 'b', 'c'), ('d', 'e', 'f')]
[('a', 'b'), ('c', 'x')]
"""
if filler is _missing:
it = iter(seq)
else:
it = itertools.chain(iter(seq), itertools.repeat(filler, N-1))
iters = (it,) * N
return itertools.izip(*iters)

Enjoy :)

Peter
 
B

Bengt Richter

S

Steven Rumbalski

Peter Otten wrote:

Peter's solution:
from itertools import izip
items = ['1','2','7','8','12','13']
it = iter(items)
",".join(["{%s,%s}" % i for i in izip(it, it)])
'{1,2},{7,8},{12,13}'

My first thought was:
items = ['1','2','7','8','12','13']
",".join(["{%s,%s}" % i for i in zip(items[::2], items[1::2])])
 '{1,2},{7,8},{12,13}'

Two lines less, but it creates three unnecessary lists.  I like
Peter's better.

--Steven Rumbalski
 
C

Christopher T King

Absolutely. Note that Peter Otten previously posted a lovely O(N)
solution in this thread, although it may be too clever for some
tastes:
from itertools import izip
items = ['1','2','7','8','12','13']
it = iter(items)
",".join(["{%s,%s}" % i for i in izip(it, it)]) '{1,2},{7,8},{12,13}'

A bit too clever for mine, mostly because neither izip() nor zip() is
guaranteed to process its arguments in a left-to-right order (although
there's no reason for them not to). I'd rather do this:
from itertools import izip, islice
items = ['1','2','7','8','12','13']
it1 = islice(items,0,None,2)
it2 = islice(items,1,None,2)
",".join(["{%s,%s}" % i for i in izip(it1, it2)])
'{1,2},{7,8},{12,13}'

Although it doesn't improve efficiency any (or have that 'slick' feel), it
does prevent needless head-scratching :)

Curious, why isn't slicing of generators defined, using islice(), so "it1
= iter(items)[0::2]" is valid?
 
T

Tim Peters

[Christopher T King]
...
Curious, why isn't slicing of generators defined, using islice(), so "it1
= iter(items)[0::2]" is valid?

The real question then is why iterators don't, because a
generator-function returns a generator-iterator, and the latter
supplies only the methods in the general iterator protocol (next() and
__iter__()). That protocol was deliberately minimal, to make it easy
for all kinds of objects to play along. islice() was invented long
after. Now that islice() exists, it may indeed make sense to use it
to give a meaning to slice notation applied to iterators. But doing
so requires that iterators implement the appropriate type slots to
"look like" they're also "sequences" (that's how dispatching for slice
notation works), and that's a lot more to ask of objects. If only
some iterators implement it (like generator-iterators), then the
general interchangeability of iterable objects we enjoy today would be
damaged too.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top