Grouping elements of an array

Steve Wilhelm · Mar 18, 2010

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

- Steve W.

Josh Cheek · Mar 19, 2010

[Note: parts of this message were removed to make it a legal post.]

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

- Steve W.

What about
A 0
B 20
C 40

Does that become
[[A,B],[B,C]] or [[A,B,C]] or something else? The congruence class here is
unclear.

Steve Wilhelm · Mar 19, 2010

Josh said:
A 0

Any help on how to do this in the "Ruby Way" would be appreciated.

- Steve W.

Click to expand...

What about
A 0
B 20
C 40

Does that become
[[A,B],[B,C]] or [[A,B,C]] or something else? The congruence class here
is
unclear.

It would be [[A,B,C]].

- Steve W.

Roger Braun · Mar 19, 2010

Hi

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

How about this:

1 arr = [0,15,35,100,205,300]
2 arr2 = [0, 20, 40]
3
4 def group(array)
5 array.map!{|e| [e]}
6 array.inject([]) do |r, e|
7 if r == [] or e[0] - r.last.last > 30 then
8 r.push(e)
9 else
10 r[-1].push(e[0])
11 end
12 r
13 end
14 end
15
16 puts arr.inspect
17 puts group(arr).inspect
18 puts arr2.inspect
19 puts group(arr2).inspect

Josh Cheek · Mar 19, 2010

[Note: parts of this message were removed to make it a legal post.]

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

- Steve W.

Here is my solution, it's conceptually similar to Roger's, though differs in
implementation http://gist.github.com/337195

Josh Cheek · Mar 19, 2010

[Note: parts of this message were removed to make it a legal post.]

Roger said:
Roger said:

How about this:

Click to expand...

You should really know about Enumerable#group_by.

irb(main):001:0> [0,15,35,100,205,300].group_by {|i| i/100 }
=> {0=>[0, 15, 35], 1=>[100], 2=>[205], 3=>[300]}

Your results are correct only because of a happenstance of the data. Add 99
in there, it should group with 100, but it groups with the 0...100
congruence class

ruby-1.9.1-p378 > [0,15,35,99,100,205,300].group_by {|i| i/100 }
=> {0=>[0, 15, 35, 99], 1=>[100], 2=>[205], 3=>[300]}

Because these groups are relative to each other, I think you must do
something like Roger or I did, where you iterate through the list and
compare it to the groups.

Roger Braun · Mar 19, 2010

Roger said:
Roger said:

How about this:

Click to expand...

You should really know about Enumerable#group_by.

irb(main):001:0> [0,15,35,100,205,300].group_by {|i| i/100 }
=> {0=>[0, 15, 35], 1=>[100], 2=>[205], 3=>[300]}

This does not solve the problem.

irb(main):011:0> [0,15,35,99,100,205,300].group_by{|i| i/100}
=> {0=>[0, 15, 35, 99], 1=>[100], 2=>[205], 3=>[300]}

but should be

[[0, 15, 35], [99, 100], [205], [300]]

at least if I understood the problem correctly.

Robert Klemme · Mar 19, 2010

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

Assuming records are ordered already - otherwise you need a sort in between.

require 'pp'

dat = <<DDD.each_line.map {|l|r = l.split;r[1]=r[1].to_i;r}
A 0
B 15
C 35
D 100
E 205
F 215
G 300
DDD

pp dat

gr = dat.inject [] do |agg, rec|
if agg.last && rec[1] - agg.last.last[1] <= 15
agg.last << rec
agg
else
agg << [rec]
end
end

pp gr

Note: I don't claim that this is *the* Ruby way.

Kind regards

robert

Steve Wilhelm · Mar 19, 2010

I have an array of records that...

Thank you all for your solutions.

The time they saved me, I'll spend learning about the techniques you
employed.

- Steve W.

¿½ ¤å¤¯ · Mar 19, 2010

[Note: parts of this message were removed to make it a legal post.]

Hello, I am new to Ruby. Below is my solution:

a=[0, 15, 35, 100, 205, 215, 300]
b=[]
c=[]
d=a[0]
a.each do |i|
if i - d < 30
c << i
else
b << c
c=
end
d =i
end
if c.size > 0
b << c
end

p b

Robert Klemme · Mar 19, 2010

At said:
At said:

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

Click to expand...

Lots of verbose answers. It can be quite short:

a = [['a',0],['b',15],['c',35],['d',100],['e',205],['f',215],['g',300]]

a.group_by {|name,val| val/100} .
collect do |key,list|
list.collect {|name, time| name}
end

# => [["a", "b", "c"], ["d"], ["e", "f"], ["g"]]

Yeah, but as far as I can see that's not a solution to the problem the
OP wanted to solve. He wants to build chains based on the delta and not
based on the fact that they fall in the same interval. Am I missing
something?

Kind regards

robert

Steve Wilhelm · Mar 19, 2010

After reviewing the suggestions, here is my code. records is the result
of a ActiveRecord::find with a member called timestamp containing a UNIX
timestamp.

I introduced an addition of a default value for the find_index call and
included a descending sort of the groups.

edges = records.each_cons(2).group_by {|(record_x, record_y)|
record_y.timestamp - record_x.timestamp < 30 }[false].map{ |edge|
edge[0] }

groups = records.group_by { |record| edges.find_index {|edge|
record.timestamp <= edge.timestamp } || edges.count }.values.sort_by {
|e| -e.count }

Thanks again for everyone's help.

- Steve W.

brabuhr · Mar 20, 2010

I have an array of records that contain timestamps at random intervals.
The records are ordered by timestamp.

I would like to convert the array into an array of arrays; each subarray
would contain "grouped records." Grouping would occur if the timestamp
of the next element in the original array is within thirty seconds of
the current element.

Example (second column is timestamp in seconds starting from zero).

A 0, B 15, C 35, D 100, E 205, F 215, G 300

would result in

[[A, B, C], [D], [E, F], [G]]

Any help on how to do this in the "Ruby Way" would be appreciated.

Here's another way:

require 'pp'
require 'set'

Struct.new("Record", :value, :timestamp)

data = [
Struct::Record.new('A', 0),
Struct::Record.new('B', 15),
Struct::Record.new('C', 35),
Struct::Record.new('D', 100),
Struct::Record.new('E', 205),
Struct::Record.new('F', 215),
Struct::Record.new('G', 300),
]

pp data

pp data.
to_set.
divide{|i, j| (i.timestamp - j.timestamp.abs) < 30}.
map{|s| s.to_a}

$ ruby -v z.rb
ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0]
[#<struct Struct::Record value="A", timestamp=0>,
#<struct Struct::Record value="B", timestamp=15>,
#<struct Struct::Record value="C", timestamp=35>,
#<struct Struct::Record value="D", timestamp=100>,
#<struct Struct::Record value="E", timestamp=205>,
#<struct Struct::Record value="F", timestamp=215>,
#<struct Struct::Record value="G", timestamp=300>]
[[#<struct Struct::Record value="D", timestamp=100>],
[#<struct Struct::Record value="G", timestamp=300>],
[#<struct Struct::Record value="F", timestamp=215>,
#<struct Struct::Record value="E", timestamp=205>],
[#<struct Struct::Record value="A", timestamp=0>,
#<struct Struct::Record value="C", timestamp=35>,
#<struct Struct::Record value="B", timestamp=15>]]

(Depending on the amount of data converting from array to sets to
arrays may be expensive

Tanaka Akira · Mar 21, 2010

2010/3/19 Steve Wilhelm said:
Example (second column is timestamp in seconds starting from zero).

A 0
B 15
C 35
D 100
E 205
F 215
G 300

would result in

[[A, B, C], [D], [E, F], [G]]

I think this kind of problems which slices consecutive elements in an array
is not well supported in Ruby.

Enumerable#each_slice is not usable because it slices for each fixed number
of elements.

Ruby 1.9.2 has Enumerable#slice_before and it is usable but not so elegant
because it needs to maintain previous element.

% ruby -e '
a = [
["A", 0],
["B", 15],
["C", 35],
["D", 100],
["E", 205],
["F", 215],
["G", 300]
]
prev = nil
p a.slice_before {|s,t|
tmp, prev = prev, t
tmp && (t-tmp) > 30
}.map {|es|
es.map {|s,t| s }
}
'
[["A", "B", "C"], ["D"], ["E", "F"], ["G"]]

We may need Enumerable#slice_between.

% ruby -e '
module Enumerable
def slice_between(&b)
Enumerator.new {|y|
first = true
buf = []
prev = nil
self.each {|elt|
if first
first = false
buf << elt
prev = elt
else
if b.call(prev, elt)
y << buf
buf = [elt]
else
buf << elt
end
prev = elt
end
}
if !buf.empty?
y << buf
end
}
end
end
a = [
["A", 0],
["B", 15],
["C", 35],
["D", 100],
["E", 205],
["F", 215],
["G", 300]
]
p a.slice_between {|(s1,t1),(s2,t2)|
(t2-t1) < 30
}.map {|es|
es.map {|s,t| s }
}
'
[["A"], ["B"], ["C", "D", "E"], ["F", "G"]]

Colin Bartlett · Mar 21, 2010

+1. There seems to be some real applications where that kind of tools are nifty.

Is a really elegant method possible for this sort of thing?
Some time ago I was trying to find duplicate files,
and set up an array of files with elements (paths, sizes, checksums),
and then wrote an Enumerable#each_group method.
The only way I could think of differentiating a yield for "comparison"
from a yield of the wanted array of "similar" objects
was to have the first item of each yield to be a boolean
with true meaning compare, and false meaning it's the wanted array.

I'd be interested in seeing a better Enumerable#each_group method,
if one is possible.

In the meantime, below is what I wrote for Enumerable#each_group,
with its application to Steve Wilhelm's problem.

module Enumerable
# Deeming successive enumerable objects to be "equivalent"
# using a block compare, yields an array of the equivalent items.
def each_group_use_block_compare()
arr = nil ; obj_g = nil
self.each do | obj |
if arr then
# first item in yield is "true" indicating a yield for comparison
if ( yield true, obj_g, obj ) == 0 then
arr << obj
obj_g = obj # group by adjacent objects, not by first in group
next
end
# first item in yield is "false" indicating a yield of the group
yield false, arr
end
obj_g = obj ; arr = [ obj ]
end
if arr then
# first item in yield is "false" indicating a yield of the group
yield false, arr
end
return self
end
end

# for the problem of Steve Wilhelm
def group_for_sw( arr )
arrg = []
arr.each_group_use_block_compare do | q_compare, aa, bb |
if q_compare then
if bb[1] - aa[1] < 30 then 0 else -1 end
else
aa.map! { |cc| cc[0] } # to preserve A, B, C objects, omit this
arrg << aa
end
end
arrg
end

arr = [ [ :A, 0 ], [ :B, 15 ], [ :C, 35 ],
[

, 100 ],
[ :E, 205 ], [ :F, 215 ],
[ :G, 300 ]
]
arrg = group_for_sw( arr )
p arrg #=> [[:A, :B, :C], [

], [:E, :F], [:G]]

Tanaka Akira · Mar 21, 2010

2010/3/21 Colin Bartlett said:
Is a really elegant method possible for this sort of thing?
Some time ago I was trying to find duplicate files,
and set up an array of files with elements (paths, sizes, checksums),
and then wrote an Enumerable#each_group method.
The only way I could think of differentiating a yield for "comparison"
from a yield of the wanted array of "similar" objects
was to have the first item of each yield to be a boolean
with true meaning compare, and false meaning it's the wanted array.

We need two blocks.
One for slice condition and one for sliced array.
But Ruby's method call can take only one block at most.

Enumerable#slice_before solves this problem by returning enumerator.

enum.slice_before {|elt| condition }.each {|ary| ... }

slice_between presented in [ruby-talk:359384] is similar.

enum.slice_between {|elt1, elt2| condition }.each {|ary| ... }

Another possible idea is using Proc argument.
enum.foo(lambda {|elt| condition }) {|ary| ... }

Colin Bartlett · Mar 21, 2010

We need two blocks.
One for slice condition and one for sliced array.
But Ruby's method call can take only one block at most.

Enumerable#slice_before solves this problem by returning enumerator.

enum.slice_before {|elt| condition }.each {|ary| ... }

slice_between presented in [ruby-talk:359384] is similar.

enum.slice_between {|elt1, elt2| condition }.each {|ary| ... }

Another possible idea is using Proc argument.
enum.foo(lambda {|elt| condition }) {|ary| ... }

I must admit that when I made my post I was having difficulty
following your code for
Enumerable#slice_between
and the result you show for it of
[["A"], ["B"], ["C", "D", "E"], ["F", "G"]]
isn't (I think) what the original poster wanted.
But I've just changed your condition of
(t2-t1) < 30
to
! ( (t2-t1) < 30 )
and that gives the wanted result of:
[["A", "B", "C"], ["D"], ["E", "F"], ["G"]]
and it was that that made my realise the meaning of "slice_between".

One issue here is is it going to be usually better to "group by
similarity" or "slice at difference",
which might affect what one calls such a method, and what condition is tested?
For what I was doing it made sense to me to think in terms of grouping.
As posted, Steve Wilhelm's problem was in terms of grouping, but could
be thought of as slicing at "difference"?

I hadn't thought of using a Proc argument for the comparison.
I've just amended my Enumerable#each_group to do that,
and that looks (to my tired eyes at the moment!) cleaner than what I had before.
So thanks for that Proc idea. (And it runs in 1.8 as well as in 1.9,
which isn't the case if you use Enumerable::Enumerator?)

def group_for_sw2( arr )
lambda_compare = lambda { |aa, bb|
if bb[1] - aa[1] < 30 then 0 else -1 end
}
arrg = []
arr.each_group( lambda_compare ) do | aa |
aa.map! { |cc| cc[0] } # to preserve A, B, C objects, omit this
arrg << aa
end
arrg
end

Colin Bartlett

Tanaka Akira · Mar 21, 2010

2010/3/21 Colin Bartlett said:
I must admit that when I made my post I was having difficulty
following your code for
Enumerable#slice_between
and the result you show for it of
[["A"], ["B"], ["C", "D", "E"], ["F", "G"]]
isn't (I think) what the original poster wanted.
But I've just changed your condition of
(t2-t1) < 30
to
! ( (t2-t1) < 30 )
and that gives the wanted result of:
[["A", "B", "C"], ["D"], ["E", "F"], ["G"]]
and it was that that made my realise the meaning of "slice_between".

Oops. You are right.

Grouping array elements while preserving order?	2	Sep 20, 2008
An empty initializer is invalid for an array with unspecified bound	0	Jul 1, 2020
grouping data of array	2	Aug 1, 2011
Program to find the largest integer element of an array.	1	Mar 2, 2022
What's the ruby way of grouping elements of an array?	5	Oct 19, 2007
Grouping elements of an array?	6	Jan 8, 2007
Sort from an array of objects into stacks without using java.util	0	Jul 6, 2021
Grouping/Sorting an Array of objects	5	Jul 21, 2008

Grouping elements of an array

Steve Wilhelm

Josh Cheek

Steve Wilhelm

Roger Braun

Josh Cheek

Josh Cheek

Roger Braun

Robert Klemme

Steve Wilhelm

¿½ ¤å¤¯

Robert Klemme

Steve Wilhelm

brabuhr

Tanaka Akira

Colin Bartlett

Tanaka Akira

Colin Bartlett

Tanaka Akira

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads