A way to re-organize a list

B

beginner

Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]


All the solutions I came up with are difficult to read and even harder
to go back and change, and I just feel are too complicated for such a
simple problem. I am wondering if anyone here has some insight.

If there is a 'functional' way to do this, it would be even greater.


Thanks,
Geoffrey
 
A

Alex Martelli

beginner said:
Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]


All the solutions I came up with are difficult to read and even harder
to go back and change, and I just feel are too complicated for such a
simple problem. I am wondering if anyone here has some insight.

If there is a 'functional' way to do this, it would be even greater.

If it's already sorted as in your example, itertools.groupby may be what
you want. To pick a key, you can use operator.itemgetter, which builds
functions picking a specified N-th item of a sequence.

Consider...:
[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3), ('A', 'b', 1), ('A', 'b',
2), ('B', 'a', 1), ('B', 'b', 1)][('A', <itertools._grouper object at 0x4d110>), ('B',
<itertools._grouper object at 0x4d120>)]

the items in the sequence groupby returns are (key, subsequence) pairs;
you don't care about the key, so here's a first step (grouping by first
item only):
[list(ss) for k, ss in gb(l, itm(0))]
[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3), ('A', 'b', 1), ('A', 'b',
2)], [('B', 'a', 1), ('B', 'b', 1)]]

However, you want a second level of grouping -- so, instead of the plain
list(ss), you need to apply this concept again:
[[list(sss) for k,sss in gb(ss,itm(1))] for k, ss in gb(l, itm(0))]
[[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3)], [('A', 'b', 1), ('A',
'b', 2)]], [[('B', 'a', 1)], [('B', 'b', 1)]]]

....and there you are.

May be more readable with a little auxiliary function to capture what I
just called "this concept" with a readable name:
def group(s, i): return [list(ss) for k, ss in gb(s, itm(i))] ....
[group(ss,1) for ss in group(l,0)]
[[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3)], [('A', 'b', 1), ('A',
'b', 2)]], [[('B', 'a', 1)], [('B', 'b', 1)]]]

This does one more "list(...)" step, but if your lists aren't huge the
readability may be worth the small slow-down.


Alex
 
C

Carsten Haese

Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]

That's easy with the magic of itertools.groupby and operator.itemgetter:
from itertools import groupby
from operator import itemgetter
from pprint import pprint
first = itemgetter(0)
second = itemgetter(1)
l = [ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A", .... "b", 2), ("B", "a", 1), ("B", "b", 1)]
result = [[list(g2) for _,g2 in groupby(g1,second)] for _,g1 in groupby(l,first)]
pprint(result)
[[[('A', 'a', 1), ('A', 'a', 2), ('A', 'a', 3)],
[('A', 'b', 1), ('A', 'b', 2)]],
[[('B', 'a', 1)], [('B', 'b', 1)]]]

HTH,
 
M

marduk

Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]

One way of doing it:

def group_by(i, my_list):
d = {}
for t in my_list:
d[t] = d.get(t, []) + [t]

return d.values()
 
G

Gordon Airporte

beginner said:
What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

itertools.groupby has already been mentioned, but it has a very specific
and complex behavior which may not be exactly what you need. I found
this handy class:

class groupby(dict):
def __init__(self, seq, key=lambda x:x):
for value in seq:
k = key(value)
self.setdefault(k, []).append(value)
__iter__ = dict.iteritems

which you would use like this:

byfirst = groupby(l, lambda x: x[0])
print byfirst['A']
print byfirst['B']
bysecond = groupby(l, lambda x: x[1])
print bysecond['a']
print bysecond['b']
 
B

beginner

Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]

All the solutions I came up with are difficult to read and even harder
to go back and change, and I just feel are too complicated for such a
simple problem. I am wondering if anyone here has some insight.

If there is a 'functional' way to do this, it would be even greater.

Thanks,
Geoffrey

I guess I still don't quite get functional programming. Here is my
imperitive way to do it in O(n).

"""Put a sorted hirerchical structure into nested groups"""

def group_items(source_list, f_list, target=[]):
"""Group_items: Group a list based on the grouping functions.

source_list is a list or iterator that produces a stream of source
records.

f_list is a list of functions. Each function takes the form
f(a,b), where
a and b are two records. If the two records should be in the same
group, it returns
True, otherwise it returns False.

If target is not provided, a new list will be created. Otherwise,
the records are
appended to the provided list.
"""

last_item=None

for new_item in source_list:
level=len(f_list)
t=target
for fn in f_list:
if t==[] or last_item==None or not fn(last_item,
new_item):
ng=new_item
for i in range(level): ng=[ng]
t.append(ng)
break
else:
t=t[-1]
level -= 1
else:
t.append(new_item)
last_item=new_item

return target


if __name__=="__main__":
import pprint
def f(a,b):
return a[0]==b[0]
def g(a,b):
return a[1]==b[1]

mydata=[[1,2,3,4],[1,2,4,5],[1,2,"A","B"],[2,2,"A","C"]]
t=group_items(mydata, [f,g])
pprint.pprint(t)
 
M

Marc 'BlackJack' Rintsch

Hi Everyone,

I have a simple list reconstruction problem, but I don't really know
how to do it.

I have a list that looks like this:

l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]

What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:

out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]

All the solutions I came up with are difficult to read and even harder
to go back and change, and I just feel are too complicated for such a
simple problem. I am wondering if anyone here has some insight.

If there is a 'functional' way to do this, it would be even greater.

Thanks,
Geoffrey

I guess I still don't quite get functional programming. Here is my
imperitive way to do it in O(n).

I didn't try to follow it but are you sure about O(n)? With the inner
loop going from 0 to `level` it looks suspiciously quadratic to me. You
really should try to grasp the `itertools.groupby()` solution.

And the result of that script looks strange:

[[[[1, 2, 3, 4], [1, 2, 4, 5], [1, 2, 'A', 'B']]], [[[2, 2, 'A', 'C']]]]

Aren't the two top level elements wrapped in one list too much?
Furthermore the test data doesn't contain an example where the first item
is the same but the second item is different.
def group_items(source_list, f_list, target=[]):

Default arguments are evaluated *once* when the ``def`` is executed. So
all calls to this function share the same list object bound to `target`.
Call this function twice without an explicit `target` and you'll see the
problem.

Ciao,
Marc 'BlackJack' Rintsch
 
B

beginner

Hi Everyone,
I have a simple list reconstruction problem, but I don't really know
how to do it.
I have a list that looks like this:
l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
"b", 2), ("B", "a", 1), ("B", "b", 1)]
What I want to do is to reorganize it in groups, first by the middle
element of the tuple, and then by the first element. I'd like the
output look like this:
out=[
[ #group by first element "A"
[("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
second element "a"
[ ("A", "b", 1), ("A", "b", 2)], #group by second element
"b"
],
[ #group by first element "B"
[("B", "a", 1)],
[("B", "b", 1)]
]
]
All the solutions I came up with are difficult to read and even harder
to go back and change, and I just feel are too complicated for such a
simple problem. I am wondering if anyone here has some insight.
If there is a 'functional' way to do this, it would be even greater.
Thanks,
Geoffrey
I guess I still don't quite get functional programming. Here is my
imperitive way to do it in O(n).

I didn't try to follow it but are you sure about O(n)? With the inner
loop going from 0 to `level` it looks suspiciously quadratic to me. You
really should try to grasp the `itertools.groupby()` solution.

And the result of that script looks strange:

[[[[1, 2, 3, 4], [1, 2, 4, 5], [1, 2, 'A', 'B']]], [[[2, 2, 'A', 'C']]]]

Aren't the two top level elements wrapped in one list too much?
Furthermore the test data doesn't contain an example where the first item
is the same but the second item is different.
def group_items(source_list, f_list, target=[]):

Default arguments are evaluated *once* when the ``def`` is executed. So
all calls to this function share the same list object bound to `target`.
Call this function twice without an explicit `target` and you'll see the
problem.

Ciao,
Marc 'BlackJack' Rintsch- Hide quoted text -

- Show quoted text -


Wow. Thanks for the life-saving response. It uncovers a serious bug
about the default argument! Thank you Marc!

It is actually O(n) because it scans the list of records once and
exactly once. The inner loop is trying to make multi-level grouping.
In f_list, you define each level of grouping. For example,
f_list=[f,g] in the above code means 1) you want to put records into
sub-lists so that within each sub-list, the first element of the
records are the same. then 2) within each group, you want to further
group them by the second element of the records. That is why the
results look like the following:

[
[ #first level grouping
[[1, 2, 3, 4], [1, 2, 4, 5], [1, 2, 'A', 'B']] #second level
grouping
],
[ #first level grouping
[[2, 2, 'A', 'C']] #second level grouping
]
]

In the example, the inner loop is executed twice. So it loops a total
of 2*n times and it is O(n). If you want to make m level grouping, the
execution time would be O(m*n).

I looked at itertools. It is really cool. But if I just want to make
multiple nested groups, reusing the above function is going to take
less code than using itertools.groupby every time. Also, in
itertools.groupby, I think you have to scan the list of records
multiple times to make nested groups. Although it is still O(m*n),
each operation of groupby is going to be slower.

The one thing I still haven't figured out, is to make nested
generators of multiple nested grouping, so that I can use a multi-loop
to traverse them. A loop like below:

for i in big_group:
for j in i:
for k in j:
.....

i, j, k are generators.

I think short of changing my function to be recursive and the outer
function 'yield' to the inner function, which calls yield to make an
inner generator, it will be difficult to do.

Thanks again for pointing out the default value problem.

Geoffrey
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,218
Latest member
JolieDenha

Latest Threads

Top