unique elements from list of lists

T

Tekkaman

I have a list of lists and I want to define an iterator (let's call
that uniter) over all unique elements, in any order. For example,
calling:

sorted(uniter([['a', 'b', 'd'], ['b', 'c'], ['a', 'c', 'd']]))

must return ['a', 'b', 'c', 'd']. I tried the following
implementations:

from itertools import chain
def uniter1(listOfLists):
for item in set(chain(*listOfLists)): yield item

def uniter2(listOfLists):
for item in reduce(
lambda x,y: x|y,
[set(list_) for list_ in listOfLists]
): yield item

speed test with timeit says the first one is slightly faster. What
bothers me is that it builds a set from an iterator and then another
iterator from the set. Is there a way to implement this using only
iterators? I also tried a "full python" implementation (by that I mean
one that does not use the built-in set and keeps track of previously
yielded items in a list) but the one I pulled out is about 180 times
slower. Here it is:

def uniter3(listOfLists):
done = []
for list_ in listOfLists:
for item in list_:
if not item in done:
done.append(item)
yield item

Thanks in advance for any contribution.

-- Simone
 
P

Peter Otten

Tekkaman said:
I have a list of lists and I want to define an iterator (let's call
that uniter) over all unique elements, in any order. For example,
calling:

sorted(uniter([['a', 'b', 'd'], ['b', 'c'], ['a', 'c', 'd']]))

must return ['a', 'b', 'c', 'd']. I tried the following
implementations:

from itertools import chain
def uniter1(listOfLists):
for item in set(chain(*listOfLists)): yield item

def uniter(lists):
return iter(set(chain(*lists)))

This avoids the explicit for-loop.

Peter
 
A

azrael

try something else. im posting the code from a kiosk which has no
python, sooo..... no code. only explanation

if my memory works well there is a function in python that takes a
multidimensional list and returns its values as a one-dimension list.

def main():
list =unknownFunction([['a', 'b', 'd'], ['b', 'c'], ['a', 'c',
'd']) # =a, b, d, b, c, a, c, d
temp = []
for i in list:
check(i, temp, list)
sort(list)

def check(pos, temp, list ):
for i in temp:
if temp== list[pos]
del list[pos]
check(pos, temp, list)
temp.append(list[pos])

im not sure if this should work but the meaning is:
the first three elements will be appended into the list directly
because there was no like this in the temp
while there are elements in the list take a pivot value and check if
they are unique in the list.

check:
while there are elements in the temporary list check if the pivot
exists in the temp. if false then append it to temp.

if true delete the element and go into the recursion on the same index
why?

temp a, b, d
list a, b, d, (b), c, a, c, d

temp a, b, d
list a, b, d, (c) a, c, d

temp a, b, d, c
list a, b, d, c, (a) c, d

temp a, b, d, c
list a, b, d, c, (c), d

temp a, b, d, c
list a, b, d, c, (d)


temp a, b, d, c
list a, b, d, c

list a, b, c, d

one way to do it. this works well if you need the list in the order
they appear. sort it and it works well


but i think that there is the possibility to do it somthing like the
merge sort. maybee it would work. try it. Merge the sublists and
remove the duplicates. (L1,L2,L3) -> (L12, L3) -> (L123)
this should work pretty well




Tekkaman je napisao/la:
I have a list of lists and I want to define an iterator (let's call
that uniter) over all unique elements, in any order. For example,
calling:

sorted(uniter([['a', 'b', 'd'], ['b', 'c'], ['a', 'c', 'd']]))

must return ['a', 'b', 'c', 'd']. I tried the following
implementations:

from itertools import chain
def uniter1(listOfLists):
for item in set(chain(*listOfLists)): yield item

def uniter2(listOfLists):
for item in reduce(
lambda x,y: x|y,
[set(list_) for list_ in listOfLists]
): yield item

speed test with timeit says the first one is slightly faster. What
bothers me is that it builds a set from an iterator and then another
iterator from the set. Is there a way to implement this using only
iterators? I also tried a "full python" implementation (by that I mean
one that does not use the built-in set and keeps track of previously
yielded items in a list) but the one I pulled out is about 180 times
slower. Here it is:

def uniter3(listOfLists):
done = []
for list_ in listOfLists:
for item in list_:
if not item in done:
done.append(item)
yield item

Thanks in advance for any contribution.

-- Simone
 
A

azrael

tra using the firs sublist (list[1]) as cell.then take zhe second
sublist and take a value from it at once and if the value from list[2]
doesnt exist in list[1] then insert it into list[1] at the correct
place. Something like the insertionsort.
 
B

bearophileHUGS

Tekkaman:

If the sublists contain hashable elements you can use this:

def uniter(lists):
merge = set()
for sub in lists:
merge = merge.union(sub)
for el in merge:
yield el

data = [['a', 'b', 'd'], ['b', 'c'], ['a', 'c', 'd']]
print list(uniter(data))

But often this too may be enough:

def uniter(lists):
merge = set()
for sub in lists:
merge = merge.union(sub)
return merge

Bye to the gentle Pegas too,
bearophile
 
T

Tekkaman

Thanks everybody!
Azrael: your suggestions involve python-level membership testing and
dummy list construction just like my uniter3 example, so I'm afraid
they would not go any faster. There's no built-in sequence flattening
function that I know of btw.
bearophile: your implementation is very similar to my uniter2
(subsequent set unions) but without the additional lambda def
overhead, and in fact it goes faster. Not faster than uniter though.
Peter: your solution is the fastest, removing the explicit for loop
resulted in a 30% speed gain. Looks like using set() is a must.

-- Simone
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top