A way to re-organize a list

beginner · Jul 19, 2007

Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]

All the solutions I came up with are difficult to read and even harder
to go back and change, and I just feel are too complicated for such a
simple problem. I am wondering if anyone here has some insight.

If there is a 'functional' way to do this, it would be even greater.

Thanks,
Geoffrey

Alex Martelli · Jul 19, 2007

beginner said:
Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]

All the solutions I came up with are difficult to read and even harder
to go back and change, and I just feel are too complicated for such a
simple problem. I am wondering if anyone here has some insight.

If there is a 'functional' way to do this, it would be even greater.

If it's already sorted as in your example, itertools.groupby may be what
you want. To pick a key, you can use operator.itemgetter, which builds
functions picking a specified N-th item of a sequence.

Consider...:
[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3), ('A', 'b', 1), ('A', 'b',
2), ('B', 'a', 1), ('B', 'b', 1)][('A', <itertools._grouper object at 0x4d110>), ('B',
<itertools._grouper object at 0x4d120>)]

the items in the sequence groupby returns are (key, subsequence) pairs;
you don't care about the key, so here's a first step (grouping by first
item only):

[list(ss) for k, ss in gb(l, itm(0))]

Click to expand...

Click to expand...

[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3), ('A', 'b', 1), ('A', 'b',
2)], [('B', 'a', 1), ('B', 'b', 1)]]

However, you want a second level of grouping -- so, instead of the plain
list(ss), you need to apply this concept again:

[[list(sss) for k,sss in gb(ss,itm(1))] for k, ss in gb(l, itm(0))]

Click to expand...

Click to expand...

[[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3)], [('A', 'b', 1), ('A',
'b', 2)]], [[('B', 'a', 1)], [('B', 'b', 1)]]]

....and there you are.

May be more readable with a little auxiliary function to capture what I
just called "this concept" with a readable name:

def group(s, i): return [list(ss) for k, ss in gb(s, itm(i))] ....
[group(ss,1) for ss in group(l,0)]

Click to expand...

Click to expand...

[[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3)], [('A', 'b', 1), ('A',
'b', 2)]], [[('B', 'a', 1)], [('B', 'b', 1)]]]

This does one more "list(...)" step, but if your lists aren't huge the
readability may be worth the small slow-down.

Alex

Carsten Haese · Jul 19, 2007

Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]

That's easy with the magic of itertools.groupby and operator.itemgetter:

from itertools import groupby
from operator import itemgetter
from pprint import pprint
first = itemgetter(0)
second = itemgetter(1)
l = [ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A", .... "b", 2), ("B", "a", 1), ("B", "b", 1)]
result = [[list(g2) for _,g2 in groupby(g1,second)] for _,g1 in groupby(l,first)]
pprint(result)

Click to expand...

Click to expand...

[[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3)],
[('A', 'b', 1), ('A', 'b', 2)]],
[[('B', 'a', 1)], [('B', 'b', 1)]]]

HTH,

marduk · Jul 19, 2007

Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]

One way of doing it:

def group_by(i, my_list):
d = {}
for t in my_list:
d[t] = d.get(t, []) + [t]

return d.values()

Gordon Airporte · Jul 19, 2007

beginner said:
What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

itertools.groupby has already been mentioned, but it has a very specific
and complex behavior which may not be exactly what you need. I found
this handy class:

class groupby(dict):
def __init__(self, seq, key=lambda x:x):
for value in seq:
k = key(value)
self.setdefault(k, []).append(value)
__iter__ = dict.iteritems

which you would use like this:

byfirst = groupby(l, lambda x: x[0])
print byfirst['A']
print byfirst['B']
bysecond = groupby(l, lambda x: x[1])
print bysecond['a']
print bysecond['b']

beginner · Aug 1, 2007

Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]

All the solutions I came up with are difficult to read and even harder
to go back and change, and I just feel are too complicated for such a
simple problem. I am wondering if anyone here has some insight.

If there is a 'functional' way to do this, it would be even greater.

Thanks,
Geoffrey

I guess I still don't quite get functional programming. Here is my
imperitive way to do it in O(n).

"""Put a sorted hirerchical structure into nested groups"""

def group_items(source_list, f_list, target=[]):
"""Group_items: Group a list based on the grouping functions.

source_list is a list or iterator that produces a stream of source
records.

f_list is a list of functions. Each function takes the form
f(a,b), where
a and b are two records. If the two records should be in the same
group, it returns
True, otherwise it returns False.

If target is not provided, a new list will be created. Otherwise,
the records are
appended to the provided list.
"""

last_item=None

for new_item in source_list:
level=len(f_list)
t=target
for fn in f_list:
if t==[] or last_item==None or not fn(last_item,
new_item):
ng=new_item
for i in range(level): ng=[ng]
t.append(ng)
break
else:
t=t[-1]
level -= 1
else:
t.append(new_item)
last_item=new_item

return target

if __name__=="__main__":
import pprint
def f(a,b):
return a[0]==b[0]
def g(a,b):
return a[1]==b[1]

mydata=[[1,2,3,4],[1,2,4,5],[1,2,"A","B"],[2,2,"A","C"]]
t=group_items(mydata, [f,g])
pprint.pprint(t)

Marc 'BlackJack' Rintsch · Aug 1, 2007

Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]

All the solutions I came up with are difficult to read and even harder
to go back and change, and I just feel are too complicated for such a
simple problem. I am wondering if anyone here has some insight.

If there is a 'functional' way to do this, it would be even greater.

Thanks,
Geoffrey

Click to expand...

I guess I still don't quite get functional programming. Here is my
imperitive way to do it in O(n).

I didn't try to follow it but are you sure about O(n)? With the inner
loop going from 0 to `level` it looks suspiciously quadratic to me. You
really should try to grasp the `itertools.groupby()` solution.

And the result of that script looks strange:

[[[[1, 2, 3, 4], [1, 2, 4, 5], [1, 2, 'A', 'B']]], [[[2, 2, 'A', 'C']]]]

Aren't the two top level elements wrapped in one list too much?
Furthermore the test data doesn't contain an example where the first item
is the same but the second item is different.

def group_items(source_list, f_list, target=[]):

Default arguments are evaluated *once* when the ``def`` is executed. So
all calls to this function share the same list object bound to `target`.
Call this function twice without an explicit `target` and you'll see the
problem.

Ciao,
Marc 'BlackJack' Rintsch

beginner · Aug 1, 2007

Hi Everyone,
I have a simple list reconstruction problem, but I don't really know
how to do it.
I have a list that looks like this:
l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]
What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:
out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]
All the solutions I came up with are difficult to read and even harder
to go back and change, and I just feel are too complicated for such a
simple problem. I am wondering if anyone here has some insight.
If there is a 'functional' way to do this, it would be even greater.
Thanks,
Geoffrey

Click to expand...

Click to expand...

I guess I still don't quite get functional programming. Here is my
imperitive way to do it in O(n).

Click to expand...

I didn't try to follow it but are you sure about O(n)? With the inner
loop going from 0 to `level` it looks suspiciously quadratic to me. You
really should try to grasp the `itertools.groupby()` solution.

And the result of that script looks strange:

[[[[1, 2, 3, 4], [1, 2, 4, 5], [1, 2, 'A', 'B']]], [[[2, 2, 'A', 'C']]]]

Aren't the two top level elements wrapped in one list too much?
Furthermore the test data doesn't contain an example where the first item
is the same but the second item is different.

def group_items(source_list, f_list, target=[]):

Click to expand...

Default arguments are evaluated *once* when the ``def`` is executed. So
all calls to this function share the same list object bound to `target`.
Call this function twice without an explicit `target` and you'll see the
problem.

Ciao,
Marc 'BlackJack' Rintsch- Hide quoted text -

- Show quoted text -

Wow. Thanks for the life-saving response. It uncovers a serious bug
about the default argument! Thank you Marc!

It is actually O(n) because it scans the list of records once and
exactly once. The inner loop is trying to make multi-level grouping.
In f_list, you define each level of grouping. For example,
f_list=[f,g] in the above code means 1) you want to put records into
sub-lists so that within each sub-list, the first element of the
records are the same. then 2) within each group, you want to further
group them by the second element of the records. That is why the
results look like the following:

[
[ #first level grouping
[[1, 2, 3, 4], [1, 2, 4, 5], [1, 2, 'A', 'B']] #second level
grouping
],
[ #first level grouping
[[2, 2, 'A', 'C']] #second level grouping
]
]

In the example, the inner loop is executed twice. So it loops a total
of 2*n times and it is O(n). If you want to make m level grouping, the
execution time would be O(m*n).

I looked at itertools. It is really cool. But if I just want to make
multiple nested groups, reusing the above function is going to take
less code than using itertools.groupby every time. Also, in
itertools.groupby, I think you have to scan the list of records
multiple times to make nested groups. Although it is still O(m*n),
each operation of groupby is going to be slower.

The one thing I still haven't figured out, is to make nested
generators of multiple nested grouping, so that I can use a multi-loop
to traverse them. A loop like below:

for i in big_group:
for j in i:
for k in j:
.....

i, j, k are generators.

I think short of changing my function to be recursive and the outer
function 'yield' to the inner function, which calls yield to make an
inner generator, it will be difficult to do.

Thanks again for pointing out the default value problem.

Geoffrey

Average of MultiMode of a list of a list	1	Oct 28, 2022
Calculate rang and derang of ordering of subsets	0	Feb 6, 2024
Calculate rang and derang of ordering of subsets	1	Feb 6, 2024
Range / empty list issues??	1	Dec 11, 2023
How to automatically append x,y values to a matrix	2	Jan 30, 2023
Is there a way to add strings to a list without the quotation marks in C++?	1	Nov 9, 2020
Iterate through a list and try log in to a website with urllib and re	8	Mar 3, 2014
Is there a way to input a unique number for each array output?	4	Aug 31, 2022

A way to re-organize a list

beginner

Alex Martelli

Carsten Haese

marduk

Gordon Airporte

beginner

Marc 'BlackJack' Rintsch

beginner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads