Partitioning with Set.divide

Peter Szinek · Apr 29, 2007

Hello all,

I have been playing with partitioning a set recently and I am stuck with
an issue. The whole story is here:

http://www.rubyrailways.com/partitioning-sets-in-ruby/

A quick version for those who would not like to read the article:

Consider this input:

a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34

the goal is to create a partition based on the character in the first
column, i.e.:

<Set: <Set: {"a 9 0 31", "a 7 83 1 23 3", "a 53 2 3", "a 1 3 22 "}>,
<Set: {"b 1 23 ", "b 1 14 4 15 16 2", "b 8 62 1 23", "b 4 45 4 16 7"}>,
<Set: {"c 5 16 2 34", "c 3 42 2 31 4 6"}>}>

Which is exactly what Set.divide does. However, there is one problem: I
would like to know if there are duplicate lines. I.e. divide returns the
same result, no matter that the input is this:

c 5 16 2 34
c 5 16 2 34
c 5 16 2 34

or this:

c 5 16 2 34

What I would need is a modified divide which returns also the count of
the elements in the input set (at least for those elements which are
more than once in the set). Is this doable or do I have to roll some
code to do this for me additionally?

Cheers,
Peter

__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby

SonOfLilit · Apr 29, 2007

Well, you could first count how many repetitions there are of each
line and then partition the set of pairs [line, count].

e.g.

h = Hash.new {0} # is this how I set a default value?

STDIN.each_line {|l| h[l] += 1}

partitioning = Set.new(h.to_a).divide{|a| a[0][0]}

Aur

SonOfLilit · Apr 29, 2007

Well, you could first count how many repetitions there are of each
line and then partition the set of pairs [line, count].

e.g.

h = Hash.new {0} # is this how I set a default value?

STDIN.each_line {|l| h[l] += 1}

partitioning = Set.new(h.to_a).divide{|a| a[0][0]}

Aur

Hello all,

I have been playing with partitioning a set recently and I am stuck with
an issue. The whole story is here:

http://www.rubyrailways.com/partitioning-sets-in-ruby/

A quick version for those who would not like to read the article:

Consider this input:

a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34

the goal is to create a partition based on the character in the first
column, i.e.:

<Set: <Set: {"a 9 0 31", "a 7 83 1 23 3", "a 53 2 3", "a 1 3 22 "}>,
<Set: {"b 1 23 ", "b 1 14 4 15 16 2", "b 8 62 1 23", "b 4 45 4 16 7"}>,
<Set: {"c 5 16 2 34", "c 3 42 2 31 4 6"}>}>

Which is exactly what Set.divide does. However, there is one problem: I
would like to know if there are duplicate lines. I.e. divide returns the
same result, no matter that the input is this:

c 5 16 2 34
c 5 16 2 34
c 5 16 2 34

or this:

c 5 16 2 34

What I would need is a modified divide which returns also the count of
the elements in the input set (at least for those elements which are
more than once in the set). Is this doable or do I have to roll some
code to do this for me additionally?

Cheers,
Peter

__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby

Click to expand...

Even better:
h = Hash.new {0} # is this how I set a default value?
STDIN.each_line {|l| h[l] += 1}
partitioning = h.to_set.divide{|a| a[0][0]} # changed to .to_set,
which is great. might need to be .to_set{|k,v| k,v}

SonOfLilit · Apr 29, 2007

Sorry for the triple post, but I've read the article and propose a
completely different approach to the problem you present (also in a
comment there with typos):

h = {}
open('input.txt').each_line{|l| h[l[0..0]] += l[2..-1].split('
').inject(0) {|c,x| c+=x.to_i; c}}}
p h.map{|k,v| {k => v}} # to turn {"a" => 80, "b" => 60} into [{"a" =>
80}, {b => 60}]
# which is a pretty weird data structure IMHO, but I'll play by your rules
Let's look at your code:
#############
input = Set.new open('input.txt').readlines.map{|e| e.chomp}
groups = input.divide {|x,y| x.map[0][0] == y.map[0][0] } # what's map for?
#build the array of hashes
p groups.map.inject([]) {|a,g| # what's map for?
#build the hashes for the number sequences with same letters
a << g.map.inject(Hash.new(0)) {|h,v|
#for every sequence, sum the numbers it contains
h[v[0..0]] += v[2..-1].split(' ').inject(0) {|c,x|
c+=x.to_i; c}; h
}; a
}
#############
divide() seems redundant in your code. You both 1) divide(); and 2)
implement divide on your own; in the same code. You do four passes on
the lines (and maybe more in the map() calls, I can't figure those
out), in one of them passing twice on each line's content, forcing the
data to stay all in memory the whole time.

My code, which isn't debugged but shows my idea, passes once on the
lines, and in that pass twice on each line's contents. It doesn't keep
the data in memory.

BTW I couldn't understand the use of map() without a block. Mind explaining?

Aur Saraf

SonOfLilit · Apr 29, 2007

Ah, sorry. tryruby.hobix.com helped me find that hash.map() is
identical to hash.to_a

Peter Szinek · Apr 29, 2007

SonOfLilit said:
Ah, sorry. tryruby.hobix.com helped me find that hash.map() is
identical to hash.to_a

Well the problem was that I just came across Set.divide and wanted to
demonstrate it on an example - so I dug out an older problem of mine
where I thought I can show it off - you have seen the result.

I think this was a typical manifestation of the 'If you have a hammer,
everything looks like a nail' problem.

I guess if my goal would have been solving the task at hand, rather than
putting 'divide' into action, I would come up with a solution similar to
yours...

Thanks for the suggestion!

Cheers,
Peter
__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby

Robert Klemme · Apr 30, 2007

Hello all,

I have been playing with partitioning a set recently and I am stuck with
an issue. The whole story is here:

http://www.rubyrailways.com/partitioning-sets-in-ruby/

A quick version for those who would not like to read the article:

Consider this input:

a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34

the goal is to create a partition based on the character in the first
column, i.e.:

<Set: <Set: {"a 9 0 31", "a 7 83 1 23 3", "a 53 2 3", "a 1 3 22 "}>,
<Set: {"b 1 23 ", "b 1 14 4 15 16 2", "b 8 62 1 23", "b 4 45 4 16 7"}>,
<Set: {"c 5 16 2 34", "c 3 42 2 31 4 6"}>}>

Which is exactly what Set.divide does. However, there is one problem: I
would like to know if there are duplicate lines. I.e. divide returns the
same result, no matter that the input is this:

c 5 16 2 34
c 5 16 2 34
c 5 16 2 34

or this:

c 5 16 2 34

What I would need is a modified divide which returns also the count of
the elements in the input set (at least for those elements which are
more than once in the set). Is this doable or do I have to roll some
code to do this for me additionally?

Basically you need bags. Since a quick check does not reveal any, you
can roll your own pretty easily with a Hash with default value 0. This
is what I'd do: (see script at end). Of course you could save another
line by inlining "key".

Kind regards

robert

8<-----------------

#!/usr/bin/ruby

require 'pp'

parts = Hash.new {|h,k| h[k] = Hash.new(0)}

DATA.each do |line|
line.chomp!
key = line[/^\w+/]
parts[key][line] += 1
end

pp parts

__END__
a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34
c 5 16 2 34

Minimum Total Difficulty	0	Nov 15, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Taskcproblem calendar	4	Aug 31, 2023
A number everyday of the month "and" a different number depending on the day of the month´s day time	2	Mar 16, 2021
Problem with codewars.	5	Dec 4, 2023
How do I make this in C with for loop	3	Jan 16, 2023
Question help	4	Feb 15, 2023
Machine Learning.. Endless Struggle	3	Feb 16, 2023

Partitioning with Set.divide

Peter Szinek

SonOfLilit

SonOfLilit

SonOfLilit

SonOfLilit

Peter Szinek

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads