flatten a list of list

T

Terry

Hi,

Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:

new_list=[]
for l in list_of_list:
new_list.extend(l)

or,

new_list=reduce(lambda x,y:x.extend(y), list_of_list)

br, Terry
 
C

Chris Rebert

Hi,

Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:

new_list=[]
for l in list_of_list:
   new_list.extend(l)

or,

new_list=reduce(lambda x,y:x.extend(y), list_of_list)

#only marginally better:
from operator import add
new_list = reduce(add, list_of_list)

#from the itertools recipes:
from itertools import chain
def flatten(listOfLists):
return list(chain.from_iterable(listOfLists))

Cheers,
Chris
 
M

Michael Fötsch

Terry said:
Is there a simple way (the pythonic way) to flatten a list of list?

This is probably the shortest it can get:

sum(list_of_lists, [])


Kind Regards,
M.F.
 
S

Steven D'Aprano

Hi,

Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:

new_list=[]
for l in list_of_list:
   new_list.extend(l)

or,

new_list=reduce(lambda x,y:x.extend(y), list_of_list)

#only marginally better:
from operator import add
new_list = reduce(add, list_of_list)

Surely that's going to be O(N**2)?

I'd predict that would perform even worse than O(N**2) string
concatenation, on account that a string of size N has to allocate space
for N bytes, but a list of size N allocates space for N pointers (each 4
bytes, or 8 depending on the system), rounded up to the nearest power of
two. Also CPython can optimize string concatenation to O(N) under some
circumstances, but not lists.

.... L = [ ([None]*5000) for x in xrange(%d) ]
.... from operator import add
.... """
Timer("reduce(add, L)", setup % 4).repeat(number=1000) [0.53808808326721191, 0.51404905319213867, 0.51157188415527344]
Timer("reduce(add, L)", setup % 8).repeat(number=1000) [2.1178171634674072, 3.8830602169036865, 4.72245192527771]
Timer("reduce(add, L)", setup % 16).repeat(number=1000)
[17.190337896347046, 21.572744131088257, 21.584265947341919]


Looks like O(N**2) behaviour to me.
 
C

Chris Rebert

Hi,

Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:

new_list=[]
for l in list_of_list:
   new_list.extend(l)

or,

new_list=reduce(lambda x,y:x.extend(y), list_of_list)

#only marginally better:
from operator import add
new_list = reduce(add, list_of_list)

Surely that's going to be O(N**2)?

The OP asked for "simple", not "best", "most proper", or "fastest". My
comment was intended to mean that the code was marginally *simpler*,
not faster.

Cheers,
Chris
 
S

Steven D'Aprano

Terry said:
Is there a simple way (the pythonic way) to flatten a list of list?

This is probably the shortest it can get:

sum(list_of_lists, [])


That's also O(N**2).

from timeit import Timer
setup = "L = [ ([None]*5000) for x in xrange(%d) ]"
Timer("sum(L, [])", setup % 4).repeat(number=1000) [0.6070549488067627, 0.54354715347290039, 0.54686999320983887]
Timer("sum(L, [])", setup % 8).repeat(number=1000) [2.1285719871520996, 3.6722278594970703, 4.0785009860992432]
Timer("sum(L, [])", setup % 16).repeat(number=1000)
[18.370341062545776, 20.40509295463562, 21.871708869934082]
 
T

Terry

Hi,
Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:
new_list=[]
for l in list_of_list:
   new_list.extend(l)
or,
new_list=reduce(lambda x,y:x.extend(y), list_of_list)
#only marginally better:
from operator import add
new_list = reduce(add, list_of_list)
Surely that's going to be O(N**2)?

The OP asked for "simple", not "best", "most proper", or "fastest". My
comment was intended to mean that the code was marginally *simpler*,
not faster.

Cheers,
Chris
--http://blog.rebertia.com

Well, if possible, I'd like not only to know a simple solution, but
also the 'best', the 'most proper' and the 'fastest':)

If they are not the same.

br, Terry
 
S

Steven D'Aprano

Hi,

Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:

new_list=[]
for l in list_of_list:
new_list.extend(l)

I don't think that scales terribly well. In my testing, it performs about
as badly as the O(N**2) behaviours others have suggested (using sum or
reduce with add) -- perhaps a bit worse for small N, but not quite as
badly for large N.

new_list=reduce(lambda x,y:x.extend(y), list_of_list)

That doesn't even work.
list_of_list = [ [1,2,3], [2, 4, 8] ]
new_list=reduce(lambda x,y:x.extend(y), list_of_list)
new_list is None
True



Chris' suggestion using itertools seems pretty good:
.... L = [ [None]*5000 for _ in xrange(%d) ]
.... from itertools import chain
.... """
Timer("list(chain.from_iterable(L))", setup % 4).repeat(number=1000) [0.61839914321899414, 0.61799716949462891, 0.62065696716308594]
Timer("list(chain.from_iterable(L))", setup % 8).repeat(number=1000) [1.2618398666381836, 1.3385050296783447, 3.9113419055938721]
Timer("list(chain.from_iterable(L))", setup % 16).repeat(number=1000)
[3.1349358558654785, 4.8554730415344238, 5.4319999217987061]
 
S

Steven D'Aprano

The OP asked for "simple", not "best", "most proper", or "fastest". My
comment was intended to mean that the code was marginally *simpler*, not
faster.

Fair enough, but he also asked for Pythonic, and while some people might
argue that "terrible performance" is Pythonic, I hope you wouldn't be one
of them! :)

If it soothes your ruffled sense of honour *wink*, I think your solution
with itertools.chain is probably the best so far.
 
C

Chris Rebert

Fair enough, but he also asked for Pythonic, and while some people might
argue that "terrible performance" is Pythonic, I hope you wouldn't be one
of them! :)

Indeed not. :) I expected it would be worse performance-wise than the
OP's original due to all the intermediate lists that get produced; it
shouldn't be used in optimized production code.
If it soothes your ruffled sense of honour *wink*, I think your solution
with itertools.chain is probably the best so far.

Except it's not really my solution, it's whoever put it in the
itertools docs's. :(
But I'm glad to been able to help by pointing the recipe out. :)

Cheers,
Chris
 
B

Bearophile

Chris Rebert:
The OP asked for "simple", not "best", "most proper", or "fastest". My
comment was intended to mean that the code was marginally *simpler*,
not faster.

Yep, the OP has asked for simple code. But often this is not the right
way to solve this situation. A better way is to create (or copy) a
flatten that's efficient and well tested & debugged, put it into some
library of utilities, and then use import&use it when it's needed.

Bye,
bearophile
 
F

Francesco Bochicchio

On Aug 16, 1:25 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

....
Chris' suggestion using itertools seems pretty good:

... L = [ [None]*5000 for _ in xrange(%d) ]
... from itertools import chain
... """>>> Timer("list(chain.from_iterable(L))", setup % 4).repeat(number=1000)

[0.61839914321899414, 0.61799716949462891, 0.62065696716308594]>>> Timer("list(chain.from_iterable(L))", setup % 8).repeat(number=1000)

[1.2618398666381836, 1.3385050296783447, 3.9113419055938721]>>> Timer("list(chain.from_iterable(L))", setup % 16).repeat(number=1000)

[3.1349358558654785, 4.8554730415344238, 5.4319999217987061]

I had a peek at itertools ( which is a C module btw) and realized that
chain solves the problem by creating a chain object,
which is a sort of generator. Both creating the chain object and
converting the chain object to a list seem to be O(N), so
the whole is O(N) too ...

Then I tried this pure python version:

# ----- CODE
from timeit import Timer
setup = """\\
L = [ [None]*5000 for _ in range(%d) ]
def pychain( list_of_list ):
for l in list_of_list:
for elem in l:
yield elem
"""

print( Timer("list(pychain(L))", setup % 4).repeat(number=1000))
print( Timer("list(pychain(L))", setup % 8).repeat(number=1000))
print( Timer("list(pychain(L))", setup % 16).repeat(number=1000))
# ----- END CODE


and got times that seem to confirm it :

[2.818755865097046, 2.7880589962005615, 2.79232120513916]
[5.588631868362427, 5.588244915008545, 5.587780952453613]
[11.620548009872437, 11.39465618133545, 11.40834903717041]

For reference, here are the times of the itertools.chain solution on
my laptop:

[0.6518809795379639, 0.6491332054138184, 0.6483590602874756]
[1.3188841342926025, 1.3173959255218506, 1.3207998275756836]
[2.7200729846954346, 2.5402050018310547, 2.543621063232422]

All this with Python 3.1 compiled from source on Xubuntu 8.10.

Ciao
 
A

Alan G Isaac

Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:
new_list=[]
for l in list_of_list:
new_list.extend(l)


new_list = list(xi for lst in list_of_list for xi in lst)

hth,
Alan Isaac
 
P

Paul Rubin

Terry said:
Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:

new_list=[]
for l in list_of_list:
new_list.extend(l)

from itertools import chain
new_list = list(chain(list_of_list))
 
T

Tim Cook

Hi,

Is there a simple way (the pythonic way) to flatten a list of list?
rather than my current solution:

new_list=[]
for l in list_of_list:
    new_list.extend(l)

or,

new_list=reduce(lambda x,y:x.extend(y), list_of_list)

br, Terry

Well, This is not simple but it is comprhensive in that it has to do
several things. I am using it to decompose deeply nested lists from
Pyparsing output that may have strings in a variety of languages.
Performance wise I do not know how it stacks up against the other
examples but it works for me. :)

def flatten(x):
"""flatten(sequence) -> list

Returns a single, flat list which contains all elements retrieved
from the sequence and all recursively contained sub-sequences
(iterables). All strings are converted to unicode.

"""
result = []
for el in x:
#if isinstance(el, (list, tuple)):
if hasattr(el, "__iter__") and not isinstance(el, basestring):
result.extend(flatten(el))
else:
result.append(el)


# all strings must be unicode
rtnlist=[]
for x in result:
if isinstance(x,str):
# replace any brackets so Python doesn't think it's a list
and we still have a seperator.
x=x.replace('[','_')
x=x.replace(']','_')
try:
x=unicode(x, "utf8") # need more decode types here
except UnicodeDecodeError:
x=unicode(x, "latin1")
except UnicodeDecodeError:
x=unicode(x,"iso-8859-1")
except UnicodeDecodeError:
x=unicode(x,"eucJP")

rtnlist.append(x)

return rtnlist
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,226
Latest member
KristanTal

Latest Threads

Top