flatten a level one list

R

Robin Becker

Is there some smart/fast way to flatten a level one list using the
latest iterator/generator idioms.

The problem arises in coneverting lists of (x,y) coordinates into a
single list of coordinates eg

f([(x0,y0),(x1,y1),....]) --> [x0,y0,x1,y1,....] or

g([x0,x1,x2,......],[y0,y1,y2,....]) --> [x0,y0,x1,y1,....]

clearly if f is doable then g can be done using zip. I suppose this is a
special case flatten, but can flatten be done fast? The python recipes
seem rather slow compared to the builtin functions.
 
B

bonono

Robin said:
Is there some smart/fast way to flatten a level one list using the
latest iterator/generator idioms.

The problem arises in coneverting lists of (x,y) coordinates into a
single list of coordinates eg

f([(x0,y0),(x1,y1),....]) --> [x0,y0,x1,y1,....] or

g([x0,x1,x2,......],[y0,y1,y2,....]) --> [x0,y0,x1,y1,....]

clearly if f is doable then g can be done using zip. I suppose this is a
special case flatten, but can flatten be done fast? The python recipes
seem rather slow compared to the builtin functions.

how fast is fast ?

for this case, is the following good enough ?

def flat(li):
for x,y in li:
yield x
yield y
 
D

David Murmann

Robin said:
Is there some smart/fast way to flatten a level one list using the
latest iterator/generator idioms.

The problem arises in coneverting lists of (x,y) coordinates into a
single list of coordinates eg

f([(x0,y0),(x1,y1),....]) --> [x0,y0,x1,y1,....] or

g([x0,x1,x2,......],[y0,y1,y2,....]) --> [x0,y0,x1,y1,....]

clearly if f is doable then g can be done using zip. I suppose this is a
special case flatten, but can flatten be done fast? The python recipes
seem rather slow compared to the builtin functions.

well then:

first of all, i need to say, if speed really matters, do it in C.
that being said, python can be fast, too. for this task psyco is your
friend. i got this output from the script given below:

without psyco:

flatten1: 2.78046748059
flatten2: 2.90226239686
flatten3: 4.91070862996
goopy_flatten1: 8.22951110963
goopy_flatten2: 8.56373180172

with psyco:

flatten1: 1.17390339924
flatten2: 1.7209583052
flatten3: 1.18490295558
goopy_flatten1: 1.34892236194
goopy_flatten2: 1.68568386584

the goopy function is taken from the google-functional package (but is
treated a bit unfair, i must admit, being wrapped in a lambda)

so, what does that show us? izip seems a bit faster than zip with these
input data. you want to do your own timings with more realistic data.
and all these functions are what just came to my mind, i'm sure they
can be improved.

hope this helps,

--
David.

used script:
----------------------------------------------------------------

from itertools import izip

xdata = range(1000)
ydata = range(1000)[::-1]

def flatten1():
return [x for pair in izip(xdata, ydata) for x in pair]

def flatten2():
return [x for pair in zip(xdata, ydata) for x in pair]

def flatten3():
res = []
for pair in izip(xdata, ydata):
for x in pair:
res.append(x)
return res

def goopy_flatten(seq):
lst = []
for x in seq:
if type(x) is list or type(x) is tuple:
for val in x:
lst.append(val)
else:
lst.append(x)
return lst

goopy_flatten1 = lambda: goopy_flatten(izip(xdata, ydata))
goopy_flatten2 = lambda: goopy_flatten(zip(xdata, ydata))

if __name__=='__main__':
from timeit import Timer

functions = ['flatten1', 'flatten2', 'flatten3', 'goopy_flatten1', 'goopy_flatten2']

print 'without psyco:'
print

for fn in functions:
t = Timer(fn+'()', 'from __main__ import '+fn)
print fn+':', t.timeit(5000)

try: import psyco; psyco.full()
except ImportError: pass

print
print 'with psyco:'
print

for fn in functions:
t = Timer(fn+'()', 'from __main__ import '+fn)
print fn+':', t.timeit(5000)
 
M

Michael Spencer

....

David said:
> Some functions and timings
....

Here are some more timings of David's functions, and a couple of additional
contenders that time faster on my box (I don't have psyco):

# From David Murman
from itertools import izip

xdata = range(1000)
ydata = range(1000)[::-1]

def flatten1(x, y):
return [i for pair in izip(x, y) for i in pair]

def flatten2(x, y):
return [i for pair in zip(x, y) for i in pair]

def flatten3(x, y):
res = []
for pair in izip(x, y):
for i in pair:
res.append(i)
return res


# New attempts:
from itertools import imap
def flatten4(x, y):
l = []
list(imap(l.extend, izip(x, y)))
return l


from Tkinter import _flatten
def flatten5(x, y):
return list(_flatten(zip(x, y)))

flatten_funcs = [flatten1, flatten2, flatten3, flatten4, flatten5]

def testthem():
flatten1res = flatten_funcs[0](xdata, ydata)
for func in flatten_funcs:
assert func(xdata, ydata) == flatten1res

def timethem():
for func in flatten_funcs:
print shell.timefunc(func, xdata, ydata)
flatten1(...) 704 iterations, 0.71msec per call
flatten2(...) 611 iterations, 0.82msec per call
flatten3(...) 344 iterations, 1.46msec per call
flatten4(...) 1286 iterations, 389.08usec per call
flatten5(...) 1219 iterations, 410.24usec per call
Michael
 
T

Tim Hochberg

Michael said:

Here's one more that's quite fast using Psyco, but only average without it.


def flatten6():
n = min(len(xdata), len(ydata))
result = [None] * (2*n)
for i in xrange(n):
result[2*i] = xdata
result[2*i+1] = ydata

-tim

Here are some more timings of David's functions, and a couple of additional
contenders that time faster on my box (I don't have psyco):

# From David Murman
from itertools import izip

xdata = range(1000)
ydata = range(1000)[::-1]

def flatten1(x, y):
return [i for pair in izip(x, y) for i in pair]

def flatten2(x, y):
return [i for pair in zip(x, y) for i in pair]

def flatten3(x, y):
res = []
for pair in izip(x, y):
for i in pair:
res.append(i)
return res


# New attempts:
from itertools import imap
def flatten4(x, y):
l = []
list(imap(l.extend, izip(x, y)))
return l


from Tkinter import _flatten
def flatten5(x, y):
return list(_flatten(zip(x, y)))

flatten_funcs = [flatten1, flatten2, flatten3, flatten4, flatten5]

def testthem():
flatten1res = flatten_funcs[0](xdata, ydata)
for func in flatten_funcs:
assert func(xdata, ydata) == flatten1res

def timethem():
for func in flatten_funcs:
print shell.timefunc(func, xdata, ydata)
flatten1(...) 704 iterations, 0.71msec per call
flatten2(...) 611 iterations, 0.82msec per call
flatten3(...) 344 iterations, 1.46msec per call
flatten4(...) 1286 iterations, 389.08usec per call
flatten5(...) 1219 iterations, 410.24usec per call
Michael
 
M

Michael Spencer

Tim said:
Michael said:

Here's one more that's quite fast using Psyco, but only average without it.


def flatten6():
n = min(len(xdata), len(ydata))
result = [None] * (2*n)
for i in xrange(n):
result[2*i] = xdata
result[2*i+1] = ydata

-tim

Indeed:

I added yours to the list (after adding the appropriate return)
flatten1(...) 702 iterations, 0.71msec per call
flatten2(...) 641 iterations, 0.78msec per call
flatten3(...) 346 iterations, 1.45msec per call
flatten4(...) 1447 iterations, 345.66usec per call
flatten5(...) 1218 iterations, 410.55usec per call
flatten6(...) 531 iterations, 0.94msec per call
(See earlier post for flatten1-5)

Michael
 
R

Robin Becker

Paul said:
Paul Rubin said:
import operator
a=[(1,2),(3,4),(5,6)]
reduce(operator.add,a)

(1, 2, 3, 4, 5, 6)


(Note that the above is probably terrible if the lists are large and
you're after speed.)
yes, and it is all in C and so could be a contender for the speed champ.
I guess what you're saying is that it's doing

(1,2)
(1,2)+(3,4)
(1,2,3,4)+(5,6)

ie we do n or n-1 tuple additions each of which requires tuple
allocation etc etc

A fast implementation would probably allocate the output list just once
and then stream the values into place with a simple index.
 
C

Cyril Bazin

Another try:

def flatten6(x, y):
return list(chain(*izip(x, y)))

(any case, this is shorter ;-)

Cyril

Tim said:
Michael said:
Robin Becker schrieb:
Is there some smart/fast way to flatten a level one list using the
latest iterator/generator idioms.
...

David Murmann wrote:
Some functions and timings
...

Here's one more that's quite fast using Psyco, but only average without it.


def flatten6():
n = min(len(xdata), len(ydata))
result = [None] * (2*n)
for i in xrange(n):
result[2*i] = xdata
result[2*i+1] = ydata

-tim

Indeed:

I added yours to the list (after adding the appropriate return)
flatten1(...) 702 iterations, 0.71msec per call
flatten2(...) 641 iterations, 0.78msec per call
flatten3(...) 346 iterations, 1.45msec per call
flatten4(...) 1447 iterations, 345.66usec per call
flatten5(...) 1218 iterations, 410.55usec per call
flatten6(...) 531 iterations, 0.94msec per call
(See earlier post for flatten1-5)

Michael
 
P

Peter Otten

Tim said:
Here's one more that's quite fast using Psyco, but only average without
it.
def flatten6():
n = min(len(xdata), len(ydata))
result = [None] * (2*n)
for i in xrange(n):
result[2*i] = xdata
result[2*i+1] = ydata


I you require len(xdata) == len(ydata) there's an easy way to move the loop
into C:

def flatten7():
n = len(xdata)
assert len(ydata) == n
result = [None] * (2*n)
result[::2] = xdata
result[1::2] = ydata
return result

$ python -m timeit 'from flatten import flatten6 as f' 'f()'
1000 loops, best of 3: 847 usec per loop
$ python -m timeit 'from flatten import flatten7 as f' 'f()'
10000 loops, best of 3: 43.9 usec per loop

Peter
 
P

Paul Rubin

Robin Becker said:
...
A fast implementation would probably allocate the output list just
once and then stream the values into place with a simple index.

That's what I hoped "sum" would do, but instead it barfs with a type
error. So much for duck typing.
 
B

bonono

Robin said:
Paul said:
Paul Rubin said:
import operator
a=[(1,2),(3,4),(5,6)]
reduce(operator.add,a)

(1, 2, 3, 4, 5, 6)


(Note that the above is probably terrible if the lists are large and
you're after speed.)
yes, and it is all in C and so could be a contender for the speed champ.
I guess what you're saying is that it's doing
That is what I thought too but seems that [x for pair in li for x in
pair] is the fastest on my machine and what is even stranger is that if
I use psyco.full(), I got a 10x speed up for this solution(list
comprehension) which is head and shoulder above all the other suggested
so far.
 
B

bearophileHUGS

Well, maybe it's time to add a n-levels flatten() function to the
language (or to add it to itertools). Python is open source, but I am
not able to modify its C sources yet... Maybe Raymond Hettinger can
find some time to do it for Py 2.5.

Bye,
bearophile
 
R

Raymond Hettinger

[Robin Becker]
Is there some smart/fast way to flatten a level one list using the
latest iterator/generator idioms.

The problem arises in coneverting lists of (x,y) coordinates into a
single list of coordinates eg

f([(x0,y0),(x1,y1),....]) --> [x0,y0,x1,y1,....]

Here's one way:
d = [('x0','y0'), ('x1','y1'), ('x2','y2'), ('x3', 'y3')]
list(chain(*d))
['x0', 'y0', 'x1', 'y1', 'x2', 'y2', 'x3', 'y3']

FWIW, if you're into working out puzzles, there's no end of interesting
iterator algebra tricks. Here are a few identities for your
entertainment:

# Given s (any sequence) and n (a non-negative integer):
assert zip(*izip(*tee(s,n))) == [tuple(s)]*n
assert list(chain(*tee(s,n))) == list(s)*n
assert map(itemgetter(0),groupby(sorted(s))) == sorted(set(s))


Raymond
 
S

Sion Arrowsmith

That's what I hoped "sum" would do, but instead it barfs with a type
error. So much for duck typing.

sum(...)
sum(sequence, start=0) -> value

If you're using sum() as a 1-level flatten you need to give it
start=[].
 
P

Paul Rubin

Sion Arrowsmith said:
sum(sequence, start=0) -> value

If you're using sum() as a 1-level flatten you need to give it
start=[].

Oh, right, I should have remembered that. Thanks. Figuring out
whether it's quadratic or linear would still take an experiment or
code inspection which I'm not up for at the moment.
 
R

Robin Becker

Peter said:
Tim Hochberg wrote:

Here's one more that's quite fast using Psyco, but only average without
it.


def flatten6():
n = min(len(xdata), len(ydata))
result = [None] * (2*n)
for i in xrange(n):
result[2*i] = xdata
result[2*i+1] = ydata



I you require len(xdata) == len(ydata) there's an easy way to move the loop
into C:

def flatten7():
n = len(xdata)
assert len(ydata) == n
result = [None] * (2*n)
result[::2] = xdata
result[1::2] = ydata
return result

$ python -m timeit 'from flatten import flatten6 as f' 'f()'
1000 loops, best of 3: 847 usec per loop
$ python -m timeit 'from flatten import flatten7 as f' 'f()'
10000 loops, best of 3: 43.9 usec per loop

Peter


That's the winner for my machine and it works in the case I need :)

The numbers are microseconds/call

I used 20 reps and n from 10 up to 1000.

no psyco
Name 10 20 100 200 500 1000
flatten1 111.383 189.298 745.709 1397.300 3499.579 6628.775
flatten2 142.923 209.496 907.182 1521.618 3565.397 7197.228
flatten3 176.224 314.342 1385.958 2733.560 6726.693 12879.067
flatten4 112.696 163.010 518.250 901.288 1979.749 3657.364
flatten5 78.334 110.768 386.949 711.794 1617.664 3255.749
flatten6 142.867 230.420 894.639 1767.012 4499.734 9017.906
flatten6a 163.093 263.330 1071.337 2084.287 5209.433 10383.610
flatten6b 180.582 275.761 1063.794 2074.705 5057.123 10043.567
flatten6c 167.898 253.664 974.202 1948.181 4821.339 9562.780
flatten6d 132.475 201.702 738.194 1406.659 3612.107 7242.038
flatten7 59.030 62.354 90.347 130.771 254.613 438.994
flatten8 88.978 173.737 1667.111 5674.297 28907.501 106330.749
flatten8a 107.388 225.951 2323.563 7088.136 34254.381 114538.384


psyco
Name 10 20 100 200 500 1000
flatten1 84.424 114.596 393.374 714.728 1809.448 3197.837
flatten2 102.387 136.302 507.243 942.494 2276.770 4451.990
flatten3 85.206 111.020 379.713 715.957 1607.104 3191.188
flatten4 102.667 144.599 509.255 856.450 1839.591 3425.128
flatten5 79.898 115.490 383.904 730.484 1739.411 3515.978
flatten6 54.560 61.293 183.012 332.109 837.146 1604.366
flatten6a 79.647 108.114 405.107 752.917 1873.674 3824.620
flatten6b 111.746 132.978 473.189 907.378 2217.600 4257.357
flatten6c 98.756 110.629 376.724 730.037 1772.963 3524.247
flatten6d 59.253 69.199 172.731 295.820 717.577 1402.720
flatten7 51.291 39.754 65.707 104.902 233.214 405.694
flatten8 87.050 166.837 1665.407 5410.576 28459.567 107847.422
flatten8a 122.753 251.457 2766.944 7931.204 36353.503 120773.674

###############################
from itertools import izip
import timeit

_R=100

def flatten1(x, y):
'''D Murman'''
return [i for pair in izip(x, y) for i in pair]

def flatten2(x, y):
'''D Murman'''
return [i for pair in zip(x, y) for i in pair]

def flatten3(x, y):
'''D Murman'''
res = []
for pair in izip(x, y):
for i in pair:
res.append(i)
return res

# New attempts:
from itertools import imap
def flatten4(x, y):
'''D Murman'''
l = []
list(imap(l.extend, izip(x, y)))
return l


from Tkinter import _flatten
def flatten5(x, y):
'''D Murman'''
return list(_flatten(zip(x, y)))

def flatten6(x,y):
'''Tim Hochberg'''
n = min(len(x), len(y))
result = [None] * (2*n)
for i in xrange(n):
result[2*i] = xdata
result[2*i+1] = ydata
return result

def flatten6a(x,y):
'''Robin Becker variant of 6'''
n = min(len(x), len(y))
result = [None] * (2*n)
for i in xrange(n):
result[2*i:2*i+2] = xdata,ydata
return result

def flatten6b(x,y):
'''Robin Becker variant of 6'''
n = min(len(x), len(y))
result = [None] * (2*n)
for i,pair in enumerate(zip(xdata,ydata)):
result[2*i:2*i+2] = pair
return result

def flatten6c(x,y):
'''Robin Becker variant of 6'''
n = min(len(x), len(y))
result = [None] * (2*n)
for i,pair in enumerate(izip(xdata,ydata)):
result[2*i:2*i+2] = pair
return result

def flatten6d(x,y):
'''Robin Becker variant of 6'''
n = min(len(x), len(y))
result = [None] * (2*n)
j = 0
for i in xrange(n):
result[j] = xdata
result[j+1] = ydata
j+=2
return result

from operator import add as operator_add
def flatten8(x,y):
'''Paul Rubin'''
return reduce(operator_add,zip(x,y),())

def flatten8a(x,y):
'''Robin Becker variant of 8'''
return reduce(operator_add,(xy for xy in izip(x,y)),())

def flatten7(x,y):
'''Peter Otten special case equal lengths'''
n = len(x)
assert len(y) == n
result = [None] * (2*n)
result[::2] = x
result[1::2] = y
return result

funcs = [(n,v) for n,v in globals().items() if callable(v) and
n.startswith('flatten')]
funcs.sort()

def testthem():
res0 = funcs[0][1](xdata, ydata)
for name,func in funcs:
res = list(func(xdata, ydata))
if res!=res0:
print name,' fails', type(res0), type(res), res0[:5],res[:5],
res0[-5:],res[-5:]

def timethem(D,n):
for name,func in funcs:
t = timeit.Timer(name+"(xdata,ydata)",'from __main__ import
xdata,ydata,'+name)
D.setdefault(name,{})[n] = 1e7*t.timeit(_R)/float(_R)

if __name__=='__main__':
N = [10, 20, 100, 200, 500, 1000]
xdata = range(N[-1])
ydata = xdata[::-1]
testthem()
for p in 'no psyco','psyco':
D={}
if p=='psyco':
import psyco
psyco.full()
for n in N:
xdata = range(n)
ydata = xdata[::-1]
timethem(D,n)
print '\n',p
fmt = '%%%ds' % max(map(len,[x[0] for x in funcs]))
print (fmt + len(N)*' %9d') % (('Name',)+tuple(N))
fmt1 = fmt + len(N)*' %9.3f'
for name,func in funcs:
print fmt1 % tuple([name]+[D[name][n] for n in N])
print
###############################
 
D

David Murmann

Robin said:
# New attempts:
from itertools import imap
def flatten4(x, y):
'''D Murman'''
l = []
list(imap(l.extend, izip(x, y)))
return l


from Tkinter import _flatten
def flatten5(x, y):
'''D Murman'''
return list(_flatten(zip(x, y)))

well, i would really like to take credit for these, but they're
not mine ;) (credit goes to Michael Spencer). i especially like
flatten4, even if its not as fast as the phenomenally faster
flatten7.
 
N

Nick Craig-Wood

Sion Arrowsmith said:
sum(...)
sum(sequence, start=0) -> value

If you're using sum() as a 1-level flatten you need to give it
start=[].

Except if you are trying to sum arrays of strings...
Traceback (most recent call last):

I've no idea why this limitation is here... perhaps it is because pre
python2.4 calling += on strings was very slow?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,281
Latest member
Pedroaciny

Latest Threads

Top