toy list processing problem: collect similar terms

X

Xah Lee

here's a interesting toy list processing problem.

I have a list of lists, where each sublist is labelled by
a number. I need to collect together the contents of all sublists
sharing
the same label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to
produce:

output:
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

a Mathematica solution is here:
http://xahlee.org/UnixResource_dir/writ/notations_mma.html

R5RS Scheme lisp solution:
http://xahlee.org/UnixResource_dir/writ/Sourav_Mukherjee_sourav.work_gmail.scm
by Sourav Mukherjee

also, a Common Lisp solution can be found here:
http://groups.google.com/group/comp.lang.lisp/browse_frm/thread/5d1ded8824bc750b?

anyone care to give a solution in Python, Perl, javascript, or other
lang? am guessing the scheme solution can be much improved... perhaps
using some lib but that seems to show scheme is pretty weak if the lib
is non-standard.

Xah ∑ xahlee.org ☄
 
G

Gary Herron

here's a interesting toy list processing problem.

I have a list of lists, where each sublist is labelled by
a number. I need to collect together the contents of all sublists
sharing
the same label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to
produce:

output:
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

a Mathematica solution is here:
http://xahlee.org/UnixResource_dir/writ/notations_mma.html

R5RS Scheme lisp solution:
http://xahlee.org/UnixResource_dir/writ/Sourav_Mukherjee_sourav.work_gmail.scm
by Sourav Mukherjee

also, a Common Lisp solution can be found here:
http://groups.google.com/group/comp.lang.lisp/browse_frm/thread/5d1ded8824bc750b?

anyone care to give a solution in Python, Perl, javascript, or other
lang? am guessing the scheme solution can be much improved... perhaps
using some lib but that seems to show scheme is pretty weak if the lib
is non-standard.

Xah ∑ xahlee.org ☄


Python 3: (I have not tried to match the exact format of your output,
but I get the right things is the right order.)

data = ((0,'a','b'), (1,'c','d'), (2,'e','f'), (3,'g','h'),
(1,'i','j'), (2,'k','l'), (4,'m','n'), (2,'o','p'),
(4,'q','r'), (5,'s','t'))

from collections import OrderedDict
r = OrderedDict()
for label,*rest in data:
r.setdefault(label, []).extend(rest)
print(list(r.values()))

produces:

(['a', 'b'], ['c', 'd', 'i', 'j'], ['e', 'f', 'k', 'l', 'o', 'p'], ['g',
'h'], ['m', 'n', 'q', 'r'], ['s', 't'])
 
A

Alexander Burger

In PicoLisp:

(mapcar
'((X) (apply conc (cdr X)))
(group List) )

Cheers,
- Alex
 
P

Paul Rubin

Python solution follows. Removed all crossposts since massive
crossposting is a standard trolling tactic.

from collections import defaultdict

def collect(xss):
d = defaultdict(list)
for xs in xss:
d[xs[0]].extend(xs[1:])
return sorted(v for k,v in d.items())

y = [['0','a','b'], ['1','c','d'], ['2','e','f'], ['3','g','h'],
['1','i','j'], ['2','k','l'], ['4','m','n'], ['2','o','p'],
['4','q','r'], ['5','s','t']]

print collect(y)
 
P

Paul Rubin

Python solution follows (earlier one with an error cancelled). All
crossposting removed since crossposting is a standard trolling tactic.

from collections import defaultdict

def collect(xss):
d = defaultdict(list)
for xs in xss:
d[xs[0]].extend(xs[1:])
return list(v for k,v in sorted(d.items()))

y = [[0,'a','b'], [1,'c','d'], [2,'e','f'], [3,'g','h'], [1,'i','j'],
[2,'k','l'], [4,'m','n'], [2,'o','p'], [4,'q','r'], [5,'s','t']]

print collect(y)
 
L

livibetter

Here is mine for Python:

l = [[0, 'a', 'b'], [1, 'c', 'd'], [2, 'e', 'f'], [3, 'g', 'h'], [1,
'i', 'j'], [2, 'k', 'l'], [4, 'm', 'n'], [2, 'o', 'p'], [4, 'q', 'r'],
[5, 's', 't']]
d = {}
for idx, items in [(e[0], e[1:]) for e in l]: d[idx] = d[idx] + items
if idx in d else items
print d.values()

Output:
[['a', 'b'], ['c', 'd', 'i', 'j'], ['e', 'f', 'k', 'l', 'o', 'p'],
['g', 'h'], ['m', 'n', 'q', 'r'], ['s', 't']]
 
A

Arnaud Delobelle

Here is mine for Python:

l = [[0, 'a', 'b'], [1, 'c', 'd'], [2, 'e', 'f'], [3, 'g', 'h'], [1,
'i', 'j'], [2, 'k', 'l'], [4, 'm', 'n'], [2, 'o', 'p'], [4, 'q', 'r'],
[5, 's', 't']]
d = {}
for idx, items in [(e[0], e[1:]) for e in l]: d[idx] = d[idx] + items
if idx in d else items
print d.values()

Output:
[['a', 'b'], ['c', 'd', 'i', 'j'], ['e', 'f', 'k', 'l', 'o', 'p'],
['g', 'h'], ['m', 'n', 'q', 'r'], ['s', 't']]

from itertools import groupby
from operator import itemgetter

l = [[0, 'a', 'b'], [1, 'c', 'd'], [2, 'e', 'f'], [3, 'g', 'h'], [1,
'i', 'j'], [2, 'k', 'l'], [4, 'm', 'n'], [2, 'o', 'p'], [4, 'q',
'r'],
[5, 's', 't']]

[
[x for g in gs for x in g[1:]]
for _, gs in groupby(sorted(l), itemgetter(0))
]
 
D

Dr.Ruud

I have a list of lists, where each sublist is labelled by
a number. I need to collect together the contents of all sublists
sharing the same label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q r) (5 s t))

where the first element of each sublist is the label, I need to
produce:

output:
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

The input is a string on STDIN,
and the output is a string on STDOUT?


Use a hash:

perl -MData::Dumper -wle '$Data::Dumper::Sortkeys = 1;
my $t = "((0 a b) (1 c d) (2 e f) (3 g h) (1 i j)"
. " (2 k l) (4 m n) (2 o p) (4 q r) (5 s t))";

push @{ $h{ $1 } }, $2 while $t =~ /(\w+)([^)]*)/g; # gist

print Dumper \%h;
'

or an array:

perl -wle '
my $t = "((0 a b) (1 c d) (2 e f) (3 g h) (1 i j)"
. " (2 k l) (4 m n) (2 o p) (4 q r) (5 s t))";

push @{$a[$1]},$2 while $t =~ /(\w+)\s+([^)]*)/g; # gist.1
print "((".join(") (",map join(" ",@$_),@a )."))"; # gist.2
'


Or if the list is not just a string, but a real data structure in the
script:

perl -wle'
my $t = [ [qw/0 a b/], [qw/1 c d/], [qw/2 e f/], [qw/3 g h/],
[qw/1 i j/], [qw/2 k l/], [qw/4 m n/], [qw/2 o p/],
[qw/4 q r/], [qw/5 s t/] ];

push @{ $a[ $_->[0] ] }, [ @$_[ 1, 2 ] ] for @$t; # AoAoA

printf "((%s))\n", join ") (",
map join( " ",
map join( " ", @$_ ), @$_
), @a;
'

Etc.
 
P

Pascal J. Bourguignon

here's a interesting toy list processing problem.

I have a list of lists, where each sublist is labelled by
a number. I need to collect together the contents of all sublists
sharing
the same label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to
produce:

output:
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

a Mathematica solution is here:
http://xahlee.org/UnixResource_dir/writ/notations_mma.html

R5RS Scheme lisp solution:
http://xahlee.org/UnixResource_dir/writ/Sourav_Mukherjee_sourav.work_gmail.scm
by Sourav Mukherjee

also, a Common Lisp solution can be found here:
http://groups.google.com/group/comp.lang.lisp/browse_frm/thread/5d1ded8824bc750b?

It's too complex. Just write:

(let ((list '((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n)
(2 o p) (4 q r) (5 s t))))

(mapcar (lambda (class) (reduce (function append) class :key (function rest)))
(com.informatimago.common-lisp.list:equivalence-classes list :key (function first)))

)

--> ((S T) (Q R M N) (G H) (O P K L E F) (I J C D) (A B))
 
J

John Bokma

Jürgen Exner said:
What the f*** does Python have to do with Perl?

Clueless ****. I unsubscribed from comp.lang.perl.misc to avoid the
retards like you and now you come in via the backdoor. If you have a
problem with Xah report him with Google Groups and with his hosting
provider 1&1 like I do. Dreamhost kicked him out that way.
 
X

Xah Lee

2010-09-26

Python solution follows (earlier one with an error cancelled).  All
crossposting removed since crossposting is a standard trolling tactic.

    from collections import defaultdict

    def collect(xss):
        d = defaultdict(list)
        for xs in xss:
            d[xs[0]].extend(xs[1:])
        return list(v for k,v in sorted(d.items()))

    y = [[0,'a','b'], [1,'c','d'], [2,'e','f'], [3,'g','h'], [1,'i','j'],
         [2,'k','l'], [4,'m','n'], [2,'o','p'], [4,'q','r'], [5,'s','t']]

    print collect(y)

Hi Paul,

thanks for your solution, and thanks to many other who provided
solutions. It'll take some time to go thru them.

btw, i disagree about your remark on crossposting. For those
interested, you can read in the following:

• 〈Cross-posting & Language Factions〉
http://xahlee.org/Netiquette_dir/cross-post.html

if anyone wants to argue with me about this, there's a my blog link at
the bottom of the page where you can leave a comment. Feel free to use
that.

i'll go over the solutions and post if i have anything interesting to
say. ☺ Possbly will select some to show on my site with credit of
course.

Xah ∑ xahlee.org ☄
 
X

Xah Lee

It's Xah. He cross-posts in an attempt to start a language feud.

Please don't feed the troll.

sorry i disagree. And please don't randomly accuse... I posted the
following in reply to Paul Rubin's very valuable post, to
comp.lang.python only. But since you cross-posted your accusation, and
there are about 3 other posts of similar nature accusing me cross-
posted to all groups, so am posting a response to all groups too.

--------------------------------------------------
Paul,

....

btw, i disagree about your remark on crossposting. For those
interested, you can read in the following:

• 〈Cross-posting & Language Factions〉
http://xahlee.org/Netiquette_dir/cross-post.html

if anyone wants to argue with me about this, there's a my blog link at
the bottom of the page where you can leave a comment. Feel free to use
that.

i'll go over the solutions and post if i have anything interesting to
say. ☺ Possbly will select some to show on my site with credit of
course.

Xah ∑ xahlee.org ☄
 
B

Bakul Shah

here's a interesting toy list processing problem.

I have a list of lists, where each sublist is labelled by
a number. I need to collect together the contents of all sublists
sharing
the same label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to
produce:

output:
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t)) ...
anyone care to give a solution in Python, Perl, javascript, or other
lang? am guessing the scheme solution can be much improved... perhaps
using some lib but that seems to show scheme is pretty weak if the lib
is non-standard.

In Q (from kx.com)[1]:

x:((0; "a"; "b");(1; "c"; "d");(2; "e"; "f");(3; "g"; "h");(1; "i"; "j")
(2; "k"; "l");(4; "m"; "n");(2; "o"; "p");(4; "q"; "r");(5; "s"; "t"))
f:{each [,/] each [each [1 _]] x @ value group x[;0]}
f x

outputs

"ab"
"cdij"
"efklop"
"gh"
"mnqr"
"st"

Note that this is actually a pretty general solution in that *all*
the functions used in it operate on a variety of types.
- The label could be anything, not just an integer.
- What follows a label can be a list of anything.
- Everything, except for = (the group function), has to do with
*shaping* the output in the way you want. It is all basically
cut-n-paste or pulling apart your Lego blocks and constructing
a new toy from them! And most languages are really poor at that.
*This* is why proponents of various languages should pay attention
to such problems.
 
J

Jürgen Exner

John Bokma said:
Clueless ****. I unsubscribed from comp.lang.perl.misc to avoid the
retards like you and now you come in via the backdoor. If you have a
problem with Xah report him with Google Groups and with his hosting
provider 1&1 like I do. Dreamhost kicked him out that way.

I have solved my problems with Xah years ago, that's what killfiles are
for. And I have no idea why you are bringing up Xah in your current
rant.

It was livibetter who without any motivation or reasoning posted Python
code in CLPM. If at least he had asked something like "How can I write
something similar in Perl?" or _ANYTHING_ that would have made this even
remotely meaningful for CLPM, then ok.
But he didn't. He just dumped 7 lines of Python code to CLPM and his
only comment was "Here is mine for Python". Yeah, great comment. Why
would anyone in CLPM possibly care about those 7 lines of Phython code?

jue

jue
 
E

Ertugrul Söylemez

Xah Lee said:
here's a interesting toy list processing problem.

I have a list of lists, where each sublist is labelled by a number. I
need to collect together the contents of all sublists sharing the same
label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to
produce:

output:
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

[...]

anyone care to give a solution in Python, Perl, javascript, or other
lang? am guessing the scheme solution can be much improved... perhaps
using some lib but that seems to show scheme is pretty weak if the lib
is non-standard.

In Haskell the solution looks like this:

import qualified Data.Map as M
import qualified Data.Set as S
import Data.Map (Map)
import Data.Set (Set)

collect :: (Ord a, Ord k) => [Map k (Set a)] -> Map k (Set a)
collect = M.unionsWith S.union


Greets,
Ertugrul
 
E

Ertugrul Söylemez

Ertugrul Söylemez said:
In Haskell the solution looks like this:

[...]

And before anyone starts to rant, I didn't pay attention to where I'm
X-posting this stuff. Sorry for that. But on the other hand you could
say that I'm giving the Perl people (apparently the only people feeling
the urge to rant anyway) the chance to come up with a more elegant
solution. =)


Greets,
Ertugrul
 
M

Mirko

I am hijacking the following post and driving it to Cuba (the Monthy
Python fans will know what I refer to). I want to create a `reduce'-
like function that can handle similar problems.

Xah said:
here's a interesting toy list processing problem.

I have a list of lists, where each sublist is labelled by
a number. I need to collect together the contents of all sublists
sharing
the same label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to
produce:

output:
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

stuffed deleted.

Here is my Common Lisp (and I only care about Common Lisp answers)
attempt to create a `reduce'-like function to handle this kind of a
problem (you will see that I am still struggling with the code and the
documentation).

(defun reduce-tagged (function sequence &key
(key-tag #'first)
(key-datum #'rest))
"Use a binary operation, `function' to combine a sequence of tagged
elements. like-tagged elements are `reduce'd according to `function'
and returned in a list ...

`sequence' is a sequence of tagged elements. reduce-m will reduce
like-tagged-elements.

If `key-tag' is supplied it is used to extract the element tag. If
`key-tag' is not supplied, the function `first' is used.

If `key-datum' is supplied, it is used to extract the element datum.
If `key-datum' is not supplied, the function `rest' is used.

"
(let ((hash (make-hash-table)))
(dolist (datum sequence)
(let ((tag (funcall key-tag datum))
(values (funcall key-datum datum)))
(multiple-value-bind (it present)
(gethash tag hash)
(if present
(setf (gethash tag hash)
(apply function (gethash tag hash) values))
(setf (gethash tag hash) values)))))
(let (result)
(dohash (key value hash)
(push (list key value) result))
result)))

Comments, improvements? I am looking for a general purpose function
like reduce that I
can apply in various situations.

Thanks,

Mirko
 
M

Mirko

I am hijacking the following post and driving it to Cuba (the Monthy
Python fans will know what I refer to).  I want to create a `reduce'-
like function that can handle similar problems.

Xah said:









Here is my Common Lisp (and I only care about Common Lisp answers)
attempt to create a `reduce'-like function to handle this kind of a
problem (you will see that I am still struggling with the code and the
documentation).

(defun reduce-tagged (function sequence &key
                 (key-tag #'first)
                 (key-datum #'rest))
  "Use a binary operation, `function' to combine a sequence of tagged
elements.  like-tagged elements are `reduce'd according to `function'
and returned in a list ...

`sequence' is a sequence of tagged elements.  reduce-m will reduce
like-tagged-elements.

If `key-tag' is supplied it is used to extract the element tag.  If
`key-tag' is not supplied, the function `first' is used.

If `key-datum' is supplied, it is used to extract the element datum.
If `key-datum' is not supplied, the function `rest' is used.

"
  (let ((hash (make-hash-table)))
    (dolist (datum sequence)
      (let ((tag (funcall key-tag datum))
            (values (funcall key-datum datum)))
        (multiple-value-bind (it present)
            (gethash tag hash)
          (if present
              (setf (gethash tag hash)
                    (apply function (gethash tag hash) values))
              (setf (gethash tag hash) values)))))
    (let (result)
      (dohash (key value hash)
        (push (list key value) result))
      result)))

Comments, improvements?  I am looking for a general purpose function
like reduce that I
can apply in various situations.

Thanks,

Mirko

Correction: the previous code used a non-portable clisp macro
`dohash' (looks nice, doesn't it?)

Here is the version with maphash:

(defun reduce-tagged (function sequence &key
(key-tag #'first)
(key-datum #'rest))
"Use a binary operation, `function' to combine a sequence of tagged
elements. like-tagged elements are `reduce'd according to `function'

`sequence' is a sequence of tagged elements. reduce-m will reduce
like-tagged-elements.

If `key-tag' is supplied it is used to extract the element tag. If
`key-tag' is not supplied, the function `first' is used.

If `key-datum' is supplied, it is used to extract the element datum.
If `key-datum' is not supplied, the function `rest' is used.

"
(let ((hash (make-hash-table)))
(dolist (datum sequence)
(let ((tag (funcall key-tag datum))
(values (funcall key-datum datum)))
(multiple-value-bind (it present)
(gethash tag hash)
(declare (ignore it))
(if present
(setf (gethash tag hash)
(apply function (gethash tag hash) values))
(setf (gethash tag hash) values)))))
(let (result)
(maphash #'(lambda(key value)
(push (list key value) result))
hash)
result)))

Mirko
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top