# toy list processing problem: collect similar terms

Discussion in 'Python' started by Xah Lee, Sep 26, 2010.

1. ### Xah LeeGuest

here's a interesting toy list processing problem.

I have a list of lists, where each sublist is labelled by
a number. I need to collect together the contents of all sublists
sharing
the same label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to
produce:

output:
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

a Mathematica solution is here:
http://xahlee.org/UnixResource_dir/writ/notations_mma.html

R5RS Scheme lisp solution:
http://xahlee.org/UnixResource_dir/writ/Sourav_Mukherjee_sourav.work_gmail.scm
by Sourav Mukherjee

also, a Common Lisp solution can be found here:

anyone care to give a solution in Python, Perl, javascript, or other
lang? am guessing the scheme solution can be much improved... perhaps
using some lib but that seems to show scheme is pretty weak if the lib
is non-standard.

Xah âˆ‘ xahlee.org â˜„

Xah Lee, Sep 26, 2010

2. ### Gary HerronGuest

On 09/25/2010 09:05 PM, Xah Lee wrote:
> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing
> the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> a Mathematica solution is here:
> http://xahlee.org/UnixResource_dir/writ/notations_mma.html
>
> R5RS Scheme lisp solution:
> http://xahlee.org/UnixResource_dir/writ/Sourav_Mukherjee_sourav.work_gmail.scm
> by Sourav Mukherjee
>
> also, a Common Lisp solution can be found here:
>
> anyone care to give a solution in Python, Perl, javascript, or other
> lang? am guessing the scheme solution can be much improved... perhaps
> using some lib but that seems to show scheme is pretty weak if the lib
> is non-standard.
>
> Xah âˆ‘ xahlee.org â˜„
>

Python 3: (I have not tried to match the exact format of your output,
but I get the right things is the right order.)

data = ((0,'a','b'), (1,'c','d'), (2,'e','f'), (3,'g','h'),
(1,'i','j'), (2,'k','l'), (4,'m','n'), (2,'o','p'),
(4,'q','r'), (5,'s','t'))

from collections import OrderedDict
r = OrderedDict()
for label,*rest in data:
r.setdefault(label, []).extend(rest)
print(list(r.values()))

produces:

(['a', 'b'], ['c', 'd', 'i', 'j'], ['e', 'f', 'k', 'l', 'o', 'p'], ['g',
'h'], ['m', 'n', 'q', 'r'], ['s', 't'])

--
Gary Herron, PhD.
Department of Computer Science
DigiPen Institute of Technology
(425) 895-4418

Gary Herron, Sep 26, 2010

3. ### Alexander BurgerGuest

In PicoLisp:

(mapcar
'((X) (apply conc (cdr X)))
(group List) )

Cheers,
- Alex

Alexander Burger, Sep 26, 2010
4. ### Paul RubinGuest

Python solution follows. Removed all crossposts since massive
crossposting is a standard trolling tactic.

from collections import defaultdict

def collect(xss):
d = defaultdict(list)
for xs in xss:
d[xs[0]].extend(xs[1:])
return sorted(v for k,v in d.items())

y = [['0','a','b'], ['1','c','d'], ['2','e','f'], ['3','g','h'],
['1','i','j'], ['2','k','l'], ['4','m','n'], ['2','o','p'],
['4','q','r'], ['5','s','t']]

print collect(y)

Paul Rubin, Sep 26, 2010
5. ### Paul RubinGuest

Python solution follows (earlier one with an error cancelled). All
crossposting removed since crossposting is a standard trolling tactic.

from collections import defaultdict

def collect(xss):
d = defaultdict(list)
for xs in xss:
d[xs[0]].extend(xs[1:])
return list(v for k,v in sorted(d.items()))

y = [[0,'a','b'], [1,'c','d'], [2,'e','f'], [3,'g','h'], [1,'i','j'],
[2,'k','l'], [4,'m','n'], [2,'o','p'], [4,'q','r'], [5,'s','t']]

print collect(y)

Paul Rubin, Sep 26, 2010
6. ### livibetterGuest

Here is mine for Python:

l = [[0, 'a', 'b'], [1, 'c', 'd'], [2, 'e', 'f'], [3, 'g', 'h'], [1,
'i', 'j'], [2, 'k', 'l'], [4, 'm', 'n'], [2, 'o', 'p'], [4, 'q', 'r'],
[5, 's', 't']]
d = {}
for idx, items in [(e[0], e[1:]) for e in l]: d[idx] = d[idx] + items
if idx in d else items
print d.values()

Output:
[['a', 'b'], ['c', 'd', 'i', 'j'], ['e', 'f', 'k', 'l', 'o', 'p'],
['g', 'h'], ['m', 'n', 'q', 'r'], ['s', 't']]

livibetter, Sep 26, 2010
7. ### Arnaud DelobelleGuest

On 26 Sep, 08:47, livibetter <> wrote:
> Here is mine for Python:
>
> l = [[0, 'a', 'b'], [1, 'c', 'd'], [2, 'e', 'f'], [3, 'g', 'h'], [1,
> 'i', 'j'], [2, 'k', 'l'], [4, 'm', 'n'], [2, 'o', 'p'], [4, 'q', 'r'],
> [5, 's', 't']]
> d = {}
> for idx, items in [(e[0], e[1:]) for e in l]: d[idx] = d[idx] + items
> if idx in d else items
> print d.values()
>
> Output:
> [['a', 'b'], ['c', 'd', 'i', 'j'], ['e', 'f', 'k', 'l', 'o', 'p'],
> ['g', 'h'], ['m', 'n', 'q', 'r'], ['s', 't']]

from itertools import groupby
from operator import itemgetter

l = [[0, 'a', 'b'], [1, 'c', 'd'], [2, 'e', 'f'], [3, 'g', 'h'], [1,
'i', 'j'], [2, 'k', 'l'], [4, 'm', 'n'], [2, 'o', 'p'], [4, 'q',
'r'],
[5, 's', 't']]

[
[x for g in gs for x in g[1:]]
for _, gs in groupby(sorted(l), itemgetter(0))
]

--
Arnaud

Arnaud Delobelle, Sep 26, 2010
8. ### Dr.RuudGuest

On 2010-09-26 06:05, Xah Lee wrote:

> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

The input is a string on STDIN,
and the output is a string on STDOUT?

Use a hash:

perl -MData:umper -wle '\$Data:umper::Sortkeys = 1;
my \$t = "((0 a b) (1 c d) (2 e f) (3 g h) (1 i j)"
. " (2 k l) (4 m n) (2 o p) (4 q r) (5 s t))";

push @{ \$h{ \$1 } }, \$2 while \$t =~ /(\w+)([^)]*)/g; # gist

print Dumper \%h;
'

or an array:

perl -wle '
my \$t = "((0 a b) (1 c d) (2 e f) (3 g h) (1 i j)"
. " (2 k l) (4 m n) (2 o p) (4 q r) (5 s t))";

push @{\$a[\$1]},\$2 while \$t =~ /(\w+)\s+([^)]*)/g; # gist.1
print "((".join(") (",map join(" ",@\$_),@a )."))"; # gist.2
'

Or if the list is not just a string, but a real data structure in the
script:

perl -wle'
my \$t = [ [qw/0 a b/], [qw/1 c d/], [qw/2 e f/], [qw/3 g h/],
[qw/1 i j/], [qw/2 k l/], [qw/4 m n/], [qw/2 o p/],
[qw/4 q r/], [qw/5 s t/] ];

push @{ \$a[ \$_->[0] ] }, [ @\$_[ 1, 2 ] ] for @\$t; # AoAoA

printf "((%s))\n", join ") (",
map join( " ",
map join( " ", @\$_ ), @\$_
), @a;
'

Etc.

--
Ruud

Dr.Ruud, Sep 26, 2010
9. ### Jürgen ExnerGuest

Alexander Burger <> wrote:
>In PicoLisp:

What the f**** does PicoLisp have to with Perl?

jue

Jürgen Exner, Sep 26, 2010
10. ### Jürgen ExnerGuest

livibetter <> wrote:
>Here is mine for Python:

What the f*** does Python have to do with Perl?

jue

Jürgen Exner, Sep 26, 2010
11. ### Pascal J. BourguignonGuest

Xah Lee <> writes:

> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing
> the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> a Mathematica solution is here:
> http://xahlee.org/UnixResource_dir/writ/notations_mma.html
>
> R5RS Scheme lisp solution:
> http://xahlee.org/UnixResource_dir/writ/Sourav_Mukherjee_sourav.work_gmail.scm
> by Sourav Mukherjee
>
> also, a Common Lisp solution can be found here:

It's too complex. Just write:

(let ((list '((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n)
(2 o p) (4 q r) (5 s t))))

(mapcar (lambda (class) (reduce (function append) class :key (function rest)))
(com.informatimago.common-lisp.list:equivalence-classes list :key (function first)))

)

--> ((S T) (Q R M N) (G H) (O P K L E F) (I J C D) (A B))

--
__Pascal Bourguignon__ http://www.informatimago.com/

Pascal J. Bourguignon, Sep 26, 2010
12. ### John BokmaGuest

JÃ¼rgen Exner <> writes:

> livibetter <> wrote:
>>Here is mine for Python:

>
> What the f*** does Python have to do with Perl?

Clueless ****. I unsubscribed from comp.lang.perl.misc to avoid the
retards like you and now you come in via the backdoor. If you have a
problem with Xah report him with Google Groups and with his hosting
provider 1&1 like I do. Dreamhost kicked him out that way.

--
John Bokma j3b

Freelance Perl & Python Development: http://castleamber.com/

John Bokma, Sep 26, 2010
13. ### Xah LeeGuest

2010-09-26

On Sep 25, 11:17Â pm, Paul Rubin <> wrote:
> Python solution follows (earlier one with an error cancelled). Â All
> crossposting removed since crossposting is a standard trolling tactic.
>
> Â  Â  from collections import defaultdict
>
> Â  Â  def collect(xss):
> Â  Â  Â  Â  d = defaultdict(list)
> Â  Â  Â  Â  for xs in xss:
> Â  Â  Â  Â  Â  Â  d[xs[0]].extend(xs[1:])
> Â  Â  Â  Â  return list(v for k,v in sorted(d.items()))
>
> Â  Â  y = [[0,'a','b'], [1,'c','d'], [2,'e','f'], [3,'g','h'], [1,'i','j'],
> Â  Â  Â  Â  Â [2,'k','l'], [4,'m','n'], [2,'o','p'], [4,'q','r'], [5,'s','t']]
>
> Â  Â  print collect(y)

Hi Paul,

thanks for your solution, and thanks to many other who provided
solutions. It'll take some time to go thru them.

interested, you can read in the following:

â€¢ ã€ˆCross-posting &amp; Language Factionsã€‰
http://xahlee.org/Netiquette_dir/cross-post.html

that.

i'll go over the solutions and post if i have anything interesting to
say. â˜º Possbly will select some to show on my site with credit of
course.

Xah âˆ‘ xahlee.org â˜„

Xah Lee, Sep 26, 2010
14. ### Xah LeeGuest

On Sep 26, 7:56Â am, Sherm Pendley <> wrote:
> JÃ¼rgen Exner <> writes:
> > Alexander Burger <> wrote:
> >>In PicoLisp:

>
> > What the f**** does PicoLisp have to with Perl?

>
> It's Xah. He cross-posts in an attempt to start a language feud.
>
> Please don't feed the troll.

sorry i disagree. And please don't randomly accuse... I posted the
following in reply to Paul Rubin's very valuable post, to
comp.lang.python only. But since you cross-posted your accusation, and
there are about 3 other posts of similar nature accusing me cross-
posted to all groups, so am posting a response to all groups too.

--------------------------------------------------
Paul,

....

interested, you can read in the following:

â€¢ ã€ˆCross-posting &amp; Language Factionsã€‰
http://xahlee.org/Netiquette_dir/cross-post.html

that.

i'll go over the solutions and post if i have anything interesting to
say. â˜º Possbly will select some to show on my site with credit of
course.

Xah âˆ‘ xahlee.org â˜„

Xah Lee, Sep 26, 2010
15. ### Bakul ShahGuest

On 9/25/10 9:05 PM, Xah Lee wrote:
> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing
> the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

...
> anyone care to give a solution in Python, Perl, javascript, or other
> lang? am guessing the scheme solution can be much improved... perhaps
> using some lib but that seems to show scheme is pretty weak if the lib
> is non-standard.

In Q (from kx.com)[1]:

x(0; "a"; "b");(1; "c"; "d");(2; "e"; "f");(3; "g"; "h");(1; "i"; "j")
(2; "k"; "l");(4; "m"; "n");(2; "o"; "p");(4; "q"; "r");(5; "s"; "t"))
f:{each [,/] each [each [1 _]] x @ value group x[;0]}
f x

outputs

"ab"
"cdij"
"efklop"
"gh"
"mnqr"
"st"

Note that this is actually a pretty general solution in that *all*
the functions used in it operate on a variety of types.
- The label could be anything, not just an integer.
- What follows a label can be a list of anything.
- Everything, except for = (the group function), has to do with
*shaping* the output in the way you want. It is all basically
cut-n-paste or pulling apart your Lego blocks and constructing
a new toy from them! And most languages are really poor at that.
*This* is why proponents of various languages should pay attention
to such problems.

----
[1] In k3 (an older language from kx.com that you may be able to find
online), x is the same but f is as follows:

f:{,/'1_''x@=x[;0]}

Bakul Shah, Sep 26, 2010
16. ### Jürgen ExnerGuest

John Bokma <> wrote:
>Jürgen Exner <> writes:
>
>> livibetter <> wrote:
>>>Here is mine for Python:

>>
>> What the f*** does Python have to do with Perl?

>
>Clueless ****. I unsubscribed from comp.lang.perl.misc to avoid the
>retards like you and now you come in via the backdoor. If you have a
>problem with Xah report him with Google Groups and with his hosting
>provider 1&1 like I do. Dreamhost kicked him out that way.

I have solved my problems with Xah years ago, that's what killfiles are
for. And I have no idea why you are bringing up Xah in your current
rant.

It was livibetter who without any motivation or reasoning posted Python
code in CLPM. If at least he had asked something like "How can I write
something similar in Perl?" or _ANYTHING_ that would have made this even
remotely meaningful for CLPM, then ok.
But he didn't. He just dumped 7 lines of Python code to CLPM and his
only comment was "Here is mine for Python". Yeah, great comment. Why
would anyone in CLPM possibly care about those 7 lines of Phython code?

jue

jue

Jürgen Exner, Sep 26, 2010
17. ### Ertugrul SÃ¶ylemezGuest

Xah Lee <> wrote:

> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by a number. I
> need to collect together the contents of all sublists sharing the same
> label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> [...]
>
> anyone care to give a solution in Python, Perl, javascript, or other
> lang? am guessing the scheme solution can be much improved... perhaps
> using some lib but that seems to show scheme is pretty weak if the lib
> is non-standard.

In Haskell the solution looks like this:

import qualified Data.Map as M
import qualified Data.Set as S
import Data.Map (Map)
import Data.Set (Set)

collect :: (Ord a, Ord k) => [Map k (Set a)] -> Map k (Set a)
collect = M.unionsWith S.union

Greets,
Ertugrul

--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://ertes.de/

Ertugrul SÃ¶ylemez, Sep 27, 2010
18. ### Ertugrul SÃ¶ylemezGuest

Ertugrul SÃ¶ylemez <> wrote:

> In Haskell the solution looks like this:
>
> [...]

And before anyone starts to rant, I didn't pay attention to where I'm
X-posting this stuff. Sorry for that. But on the other hand you could
say that I'm giving the Perl people (apparently the only people feeling
the urge to rant anyway) the chance to come up with a more elegant
solution. =)

Greets,
Ertugrul

--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://ertes.de/

Ertugrul SÃ¶ylemez, Sep 27, 2010
19. ### MirkoGuest

reduced-tagged (was Re: toy list processing problem: collect similar terms)

On Sep 26, 12:05 am, Xah Lee <> wrote:

I am hijacking the following post and driving it to Cuba (the Monthy
Python fans will know what I refer to). I want to create a `reduce'-
like function that can handle similar problems.

Xah said:
> here's a interesting toy list processing problem.
>
> I have a list of lists, where each sublist is labelled by
> a number. I need to collect together the contents of all sublists
> sharing
> the same label. So if I have the list
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce:
>
> output:
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> stuffed deleted.

Here is my Common Lisp (and I only care about Common Lisp answers)
attempt to create a `reduce'-like function to handle this kind of a
problem (you will see that I am still struggling with the code and the
documentation).

(defun reduce-tagged (function sequence &key
(key-tag #'first)
(key-datum #'rest))
"Use a binary operation, `function' to combine a sequence of tagged
elements. like-tagged elements are `reduce'd according to `function'
and returned in a list ...

`sequence' is a sequence of tagged elements. reduce-m will reduce
like-tagged-elements.

If `key-tag' is supplied it is used to extract the element tag. If
`key-tag' is not supplied, the function `first' is used.

If `key-datum' is supplied, it is used to extract the element datum.
If `key-datum' is not supplied, the function `rest' is used.

"
(let ((hash (make-hash-table)))
(dolist (datum sequence)
(let ((tag (funcall key-tag datum))
(values (funcall key-datum datum)))
(multiple-value-bind (it present)
(gethash tag hash)
(if present
(setf (gethash tag hash)
(apply function (gethash tag hash) values))
(setf (gethash tag hash) values)))))
(let (result)
(dohash (key value hash)
(push (list key value) result))
result)))

Comments, improvements? I am looking for a general purpose function
like reduce that I
can apply in various situations.

Thanks,

Mirko

Mirko, Sep 27, 2010
20. ### MirkoGuest

Re: reduce-tagged (was Re: toy list processing problem: collectsimilar terms)

On Sep 27, 11:18 am, Mirko <> wrote:
> On Sep 26, 12:05 am, Xah Lee <> wrote:
>
> I am hijacking the following post and driving it to Cuba (the Monthy
> Python fans will know what I refer to).  I want to create a `reduce'-
> like function that can handle similar problems.
>
> Xah said:
>
>
>
> > here's a interesting toy list processing problem.

>
> > I have a list of lists, where each sublist is labelled by
> > a number. I need to collect together the contents of all sublists
> > sharing
> > the same label. So if I have the list

>
> > ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> > r) (5 s t))

>
> > where the first element of each sublist is the label, I need to
> > produce:

>
> > output:
> > ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

>
> > stuffed deleted.

>
> Here is my Common Lisp (and I only care about Common Lisp answers)
> attempt to create a `reduce'-like function to handle this kind of a
> problem (you will see that I am still struggling with the code and the
> documentation).
>
> (defun reduce-tagged (function sequence &key
>                  (key-tag #'first)
>                  (key-datum #'rest))
>   "Use a binary operation, `function' to combine a sequence of tagged
> elements.  like-tagged elements are `reduce'd according to `function'
> and returned in a list ...
>
> `sequence' is a sequence of tagged elements.  reduce-m will reduce
> like-tagged-elements.
>
> If `key-tag' is supplied it is used to extract the element tag.  If
> `key-tag' is not supplied, the function `first' is used.
>
> If `key-datum' is supplied, it is used to extract the element datum.
> If `key-datum' is not supplied, the function `rest' is used.
>
> "
>   (let ((hash (make-hash-table)))
>     (dolist (datum sequence)
>       (let ((tag (funcall key-tag datum))
>             (values (funcall key-datum datum)))
>         (multiple-value-bind (it present)
>             (gethash tag hash)
>           (if present
>               (setf (gethash tag hash)
>                     (apply function (gethash tag hash) values))
>               (setf (gethash tag hash) values)))))
>     (let (result)
>       (dohash (key value hash)
>         (push (list key value) result))
>       result)))
>
> Comments, improvements?  I am looking for a general purpose function
> like reduce that I
> can apply in various situations.
>
> Thanks,
>
> Mirko

Correction: the previous code used a non-portable clisp macro
`dohash' (looks nice, doesn't it?)

Here is the version with maphash:

(defun reduce-tagged (function sequence &key
(key-tag #'first)
(key-datum #'rest))
"Use a binary operation, `function' to combine a sequence of tagged
elements. like-tagged elements are `reduce'd according to `function'

`sequence' is a sequence of tagged elements. reduce-m will reduce
like-tagged-elements.

If `key-tag' is supplied it is used to extract the element tag. If
`key-tag' is not supplied, the function `first' is used.

If `key-datum' is supplied, it is used to extract the element datum.
If `key-datum' is not supplied, the function `rest' is used.

"
(let ((hash (make-hash-table)))
(dolist (datum sequence)
(let ((tag (funcall key-tag datum))
(values (funcall key-datum datum)))
(multiple-value-bind (it present)
(gethash tag hash)
(declare (ignore it))
(if present
(setf (gethash tag hash)
(apply function (gethash tag hash) values))
(setf (gethash tag hash) values)))))
(let (result)
(maphash #'(lambda(key value)
(push (list key value) result))
hash)
result)))

Mirko

Mirko, Sep 27, 2010