Grouping elements of an array

Discussion in 'Ruby' started by Steve Wilhelm, Mar 18, 2010.

  1. I have an array of records that contain timestamps at random intervals.
    The records are ordered by timestamp.

    I would like to convert the array into an array of arrays; each subarray
    would contain "grouped records." Grouping would occur if the timestamp
    of the next element in the original array is within thirty seconds of
    the current element.

    Example (second column is timestamp in seconds starting from zero).

    A 0
    B 15
    C 35
    D 100
    E 205
    F 215
    G 300

    would result in

    [[A, B, C], [D], [E, F], [G]]

    Any help on how to do this in the "Ruby Way" would be appreciated.

    - Steve W.
    --
    Posted via http://www.ruby-forum.com/.
     
    Steve Wilhelm, Mar 18, 2010
    #1
    1. Advertising

  2. Steve Wilhelm

    Josh Cheek Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Thu, Mar 18, 2010 at 4:50 PM, Steve Wilhelm <> wrote:

    > I have an array of records that contain timestamps at random intervals.
    > The records are ordered by timestamp.
    >
    > I would like to convert the array into an array of arrays; each subarray
    > would contain "grouped records." Grouping would occur if the timestamp
    > of the next element in the original array is within thirty seconds of
    > the current element.
    >
    > Example (second column is timestamp in seconds starting from zero).
    >
    > A 0
    > B 15
    > C 35
    > D 100
    > E 205
    > F 215
    > G 300
    >
    > would result in
    >
    > [[A, B, C], [D], [E, F], [G]]
    >
    > Any help on how to do this in the "Ruby Way" would be appreciated.
    >
    > - Steve W.
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >

    What about
    A 0
    B 20
    C 40

    Does that become
    [[A,B],[B,C]] or [[A,B,C]] or something else? The congruence class here is
    unclear.
     
    Josh Cheek, Mar 19, 2010
    #2
    1. Advertising

  3. Josh Cheek wrote:
    > On Thu, Mar 18, 2010 at 4:50 PM, Steve Wilhelm <>
    > wrote:
    >
    >> A 0
    >>
    >> Any help on how to do this in the "Ruby Way" would be appreciated.
    >>
    >> - Steve W.
    >> --
    >> Posted via http://www.ruby-forum.com/.
    >>
    >>

    > What about
    > A 0
    > B 20
    > C 40
    >
    > Does that become
    > [[A,B],[B,C]] or [[A,B,C]] or something else? The congruence class here
    > is
    > unclear.


    It would be [[A,B,C]].

    - Steve W.
    --
    Posted via http://www.ruby-forum.com/.
     
    Steve Wilhelm, Mar 19, 2010
    #3
  4. Steve Wilhelm

    Roger Braun Guest

    Hi

    On Thu, Mar 18, 2010 at 11:50 PM, Steve Wilhelm <> wrote:
    > I have an array of records that contain timestamps at random intervals.
    > The records are ordered by timestamp.
    >
    > I would like to convert the array into an array of arrays; each subarray
    > would contain "grouped records." Grouping would occur if the timestamp
    > of the next element in the original array is within thirty seconds of
    > the current element.
    >
    > Example (second column is timestamp in seconds starting from zero).
    >
    > A 0
    > B 15
    > C 35
    > D 100
    > E 205
    > F 215
    > G 300
    >
    > would result in
    >
    > [[A, B, C], [D], [E, F], [G]]
    >
    > Any help on how to do this in the "Ruby Way" would be appreciated.


    How about this:

    1 arr = [0,15,35,100,205,300]
    2 arr2 = [0, 20, 40]
    3
    4 def group(array)
    5 array.map!{|e| [e]}
    6 array.inject([]) do |r, e|
    7 if r == [] or e[0] - r.last.last > 30 then
    8 r.push(e)
    9 else
    10 r[-1].push(e[0])
    11 end
    12 r
    13 end
    14 end
    15
    16 puts arr.inspect
    17 puts group(arr).inspect
    18 puts arr2.inspect
    19 puts group(arr2).inspect

    --
    Roger Braun
    http://yononaka.de
    -tuebingen.de
     
    Roger Braun, Mar 19, 2010
    #4
  5. Steve Wilhelm

    Josh Cheek Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Thu, Mar 18, 2010 at 4:50 PM, Steve Wilhelm <> wrote:

    > I have an array of records that contain timestamps at random intervals.
    > The records are ordered by timestamp.
    >
    > I would like to convert the array into an array of arrays; each subarray
    > would contain "grouped records." Grouping would occur if the timestamp
    > of the next element in the original array is within thirty seconds of
    > the current element.
    >
    > Example (second column is timestamp in seconds starting from zero).
    >
    > A 0
    > B 15
    > C 35
    > D 100
    > E 205
    > F 215
    > G 300
    >
    > would result in
    >
    > [[A, B, C], [D], [E, F], [G]]
    >
    > Any help on how to do this in the "Ruby Way" would be appreciated.
    >
    > - Steve W.
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >

    Here is my solution, it's conceptually similar to Roger's, though differs in
    implementation http://gist.github.com/337195
     
    Josh Cheek, Mar 19, 2010
    #5
  6. Steve Wilhelm

    Josh Cheek Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Thu, Mar 18, 2010 at 9:29 PM, Urabe Shyouhei <>wrote:

    > Roger Braun wrote:
    > > How about this:

    >
    > You should really know about Enumerable#group_by.
    >
    > irb(main):001:0> [0,15,35,100,205,300].group_by {|i| i/100 }
    > => {0=>[0, 15, 35], 1=>[100], 2=>[205], 3=>[300]}
    >
    >
    >

    Your results are correct only because of a happenstance of the data. Add 99
    in there, it should group with 100, but it groups with the 0...100
    congruence class

    ruby-1.9.1-p378 > [0,15,35,99,100,205,300].group_by {|i| i/100 }
    => {0=>[0, 15, 35, 99], 1=>[100], 2=>[205], 3=>[300]}

    Because these groups are relative to each other, I think you must do
    something like Roger or I did, where you iterate through the list and
    compare it to the groups.
     
    Josh Cheek, Mar 19, 2010
    #6
  7. Steve Wilhelm

    Roger Braun Guest

    On Fri, Mar 19, 2010 at 4:29 AM, Urabe Shyouhei <> wrote:
    > Roger Braun wrote:
    >> How about this:

    >
    > You should really know about Enumerable#group_by.
    >
    > irb(main):001:0> [0,15,35,100,205,300].group_by {|i| i/100 }
    > => {0=>[0, 15, 35], 1=>[100], 2=>[205], 3=>[300]}


    This does not solve the problem.

    irb(main):011:0> [0,15,35,99,100,205,300].group_by{|i| i/100}
    => {0=>[0, 15, 35, 99], 1=>[100], 2=>[205], 3=>[300]}

    but should be

    [[0, 15, 35], [99, 100], [205], [300]]

    at least if I understood the problem correctly.

    --
    Roger Braun
    http://yononaka.de
    -tuebingen.de
     
    Roger Braun, Mar 19, 2010
    #7
  8. On 03/18/2010 11:50 PM, Steve Wilhelm wrote:
    > I have an array of records that contain timestamps at random intervals.
    > The records are ordered by timestamp.
    >
    > I would like to convert the array into an array of arrays; each subarray
    > would contain "grouped records." Grouping would occur if the timestamp
    > of the next element in the original array is within thirty seconds of
    > the current element.
    >
    > Example (second column is timestamp in seconds starting from zero).
    >
    > A 0
    > B 15
    > C 35
    > D 100
    > E 205
    > F 215
    > G 300
    >
    > would result in
    >
    > [[A, B, C], [D], [E, F], [G]]
    >
    > Any help on how to do this in the "Ruby Way" would be appreciated.


    Assuming records are ordered already - otherwise you need a sort in between.

    require 'pp'

    dat = <<DDD.each_line.map {|l|r = l.split;r[1]=r[1].to_i;r}
    A 0
    B 15
    C 35
    D 100
    E 205
    F 215
    G 300
    DDD

    pp dat

    gr = dat.inject [] do |agg, rec|
    if agg.last && rec[1] - agg.last.last[1] <= 15
    agg.last << rec
    agg
    else
    agg << [rec]
    end
    end

    pp gr

    Note: I don't claim that this is *the* Ruby way.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Mar 19, 2010
    #8
  9. > I have an array of records that...

    Thank you all for your solutions.

    The time they saved me, I'll spend learning about the techniques you
    employed.

    - Steve W.

    --
    Posted via http://www.ruby-forum.com/.
     
    Steve Wilhelm, Mar 19, 2010
    #9
  10. [Note: parts of this message were removed to make it a legal post.]


    Hello, I am new to Ruby. Below is my solution:

    a=[0, 15, 35, 100, 205, 215, 300]
    b=[]
    c=[]
    d=a[0]
    a.each do |i|
    if i - d < 30
    c << i
    else
    b << c
    c=
    end
    d =i
    end
    if c.size > 0
    b << c
    end

    p b
     
    ¿½ ¤å¤¯, Mar 19, 2010
    #10
  11. On 03/19/2010 03:33 PM, Glenn Jackman wrote:
    > At 2010-03-18 06:50PM, "Steve Wilhelm" wrote:
    >> I have an array of records that contain timestamps at random intervals.
    >> The records are ordered by timestamp.
    >>
    >> I would like to convert the array into an array of arrays; each subarray
    >> would contain "grouped records." Grouping would occur if the timestamp
    >> of the next element in the original array is within thirty seconds of
    >> the current element.
    >>
    >> Example (second column is timestamp in seconds starting from zero).
    >>
    >> A 0
    >> B 15
    >> C 35
    >> D 100
    >> E 205
    >> F 215
    >> G 300
    >>
    >> would result in
    >>
    >> [[A, B, C], [D], [E, F], [G]]
    >>
    >> Any help on how to do this in the "Ruby Way" would be appreciated.

    >
    > Lots of verbose answers. It can be quite short:
    >
    > a = [['a',0],['b',15],['c',35],['d',100],['e',205],['f',215],['g',300]]
    >
    > a.group_by {|name,val| val/100} .
    > collect do |key,list|
    > list.collect {|name, time| name}
    > end
    >
    > # => [["a", "b", "c"], ["d"], ["e", "f"], ["g"]]
    >


    Yeah, but as far as I can see that's not a solution to the problem the
    OP wanted to solve. He wants to build chains based on the delta and not
    based on the fact that they fall in the same interval. Am I missing
    something?

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Mar 19, 2010
    #11
  12. After reviewing the suggestions, here is my code. records is the result
    of a ActiveRecord::find with a member called timestamp containing a UNIX
    timestamp.

    I introduced an addition of a default value for the find_index call and
    included a descending sort of the groups.


    edges = records.each_cons(2).group_by {|(record_x, record_y)|
    record_y.timestamp - record_x.timestamp < 30 }[false].map{ |edge|
    edge[0] }

    groups = records.group_by { |record| edges.find_index {|edge|
    record.timestamp <= edge.timestamp } || edges.count }.values.sort_by {
    |e| -e.count }


    Thanks again for everyone's help.

    - Steve W.
    --
    Posted via http://www.ruby-forum.com/.
     
    Steve Wilhelm, Mar 19, 2010
    #12
  13. Steve Wilhelm

    Guest

    On Thu, Mar 18, 2010 at 6:50 PM, Steve Wilhelm <> wrote:
    > I have an array of records that contain timestamps at random intervals.
    > The records are ordered by timestamp.
    >
    > I would like to convert the array into an array of arrays; each subarray
    > would contain "grouped records." Grouping would occur if the timestamp
    > of the next element in the original array is within thirty seconds of
    > the current element.
    >
    > Example (second column is timestamp in seconds starting from zero).
    >
    > A 0, B 15, C 35, D 100, E 205, F 215, G 300
    >
    > would result in
    >
    > [[A, B, C], [D], [E, F], [G]]
    >
    > Any help on how to do this in the "Ruby Way" would be appreciated.


    Here's another way:

    require 'pp'
    require 'set'

    Struct.new("Record", :value, :timestamp)

    data = [
    Struct::Record.new('A', 0),
    Struct::Record.new('B', 15),
    Struct::Record.new('C', 35),
    Struct::Record.new('D', 100),
    Struct::Record.new('E', 205),
    Struct::Record.new('F', 215),
    Struct::Record.new('G', 300),
    ]

    pp data

    pp data.
    to_set.
    divide{|i, j| (i.timestamp - j.timestamp.abs) < 30}.
    map{|s| s.to_a}

    $ ruby -v z.rb
    ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0]
    [#<struct Struct::Record value="A", timestamp=0>,
    #<struct Struct::Record value="B", timestamp=15>,
    #<struct Struct::Record value="C", timestamp=35>,
    #<struct Struct::Record value="D", timestamp=100>,
    #<struct Struct::Record value="E", timestamp=205>,
    #<struct Struct::Record value="F", timestamp=215>,
    #<struct Struct::Record value="G", timestamp=300>]
    [[#<struct Struct::Record value="D", timestamp=100>],
    [#<struct Struct::Record value="G", timestamp=300>],
    [#<struct Struct::Record value="F", timestamp=215>,
    #<struct Struct::Record value="E", timestamp=205>],
    [#<struct Struct::Record value="A", timestamp=0>,
    #<struct Struct::Record value="C", timestamp=35>,
    #<struct Struct::Record value="B", timestamp=15>]]

    (Depending on the amount of data converting from array to sets to
    arrays may be expensive :)
     
    , Mar 20, 2010
    #13
  14. Steve Wilhelm

    Tanaka Akira Guest

    2010/3/19 Steve Wilhelm <>:

    > Example (second column is timestamp in seconds starting from zero).
    >
    > A 0
    > B 15
    > C 35
    > D 100
    > E 205
    > F 215
    > G 300
    >
    > would result in
    >
    > [[A, B, C], [D], [E, F], [G]]


    I think this kind of problems which slices consecutive elements in an array
    is not well supported in Ruby.

    Enumerable#each_slice is not usable because it slices for each fixed number
    of elements.

    Ruby 1.9.2 has Enumerable#slice_before and it is usable but not so elegant
    because it needs to maintain previous element.

    % ruby -e '
    a = [
    ["A", 0],
    ["B", 15],
    ["C", 35],
    ["D", 100],
    ["E", 205],
    ["F", 215],
    ["G", 300]
    ]
    prev = nil
    p a.slice_before {|s,t|
    tmp, prev = prev, t
    tmp && (t-tmp) > 30
    }.map {|es|
    es.map {|s,t| s }
    }
    '
    [["A", "B", "C"], ["D"], ["E", "F"], ["G"]]

    We may need Enumerable#slice_between.

    % ruby -e '
    module Enumerable
    def slice_between(&b)
    Enumerator.new {|y|
    first = true
    buf = []
    prev = nil
    self.each {|elt|
    if first
    first = false
    buf << elt
    prev = elt
    else
    if b.call(prev, elt)
    y << buf
    buf = [elt]
    else
    buf << elt
    end
    prev = elt
    end
    }
    if !buf.empty?
    y << buf
    end
    }
    end
    end
    a = [
    ["A", 0],
    ["B", 15],
    ["C", 35],
    ["D", 100],
    ["E", 205],
    ["F", 215],
    ["G", 300]
    ]
    p a.slice_between {|(s1,t1),(s2,t2)|
    (t2-t1) < 30
    }.map {|es|
    es.map {|s,t| s }
    }
    '
    [["A"], ["B"], ["C", "D", "E"], ["F", "G"]]
    --
    Tanaka Akira
     
    Tanaka Akira, Mar 21, 2010
    #14
  15. On Sun, Mar 21, 2010 at 12:57 AM, Tanaka Akira wrote:
    >> I think this kind of problems which slices consecutive elements in an array
    >> is not well supported in Ruby.


    On Sun, Mar 21, 2010 at 6:42 AM, Urabe Shyouhei wrote:
    > +1. There seems to be some real applications where that kind of tools are nifty.


    On Sun, Mar 21, 2010 at 12:57 AM, Tanaka Akira also wrote:
    >> Enumerable#each_slice is not usable because it slices for each fixed number
    >> of elements.
    >> Ruby 1.9.2 has Enumerable#slice_before and it is usable but not so elegant
    >> because it needs to maintain previous element.

    ...
    >> We may need Enumerable#slice_between.


    Is a really elegant method possible for this sort of thing?
    Some time ago I was trying to find duplicate files,
    and set up an array of files with elements (paths, sizes, checksums),
    and then wrote an Enumerable#each_group method.
    The only way I could think of differentiating a yield for "comparison"
    from a yield of the wanted array of "similar" objects
    was to have the first item of each yield to be a boolean
    with true meaning compare, and false meaning it's the wanted array.

    I'd be interested in seeing a better Enumerable#each_group method,
    if one is possible.

    In the meantime, below is what I wrote for Enumerable#each_group,
    with its application to Steve Wilhelm's problem.


    module Enumerable
    # Deeming successive enumerable objects to be "equivalent"
    # using a block compare, yields an array of the equivalent items.
    def each_group_use_block_compare()
    arr = nil ; obj_g = nil
    self.each do | obj |
    if arr then
    # first item in yield is "true" indicating a yield for comparison
    if ( yield true, obj_g, obj ) == 0 then
    arr << obj
    obj_g = obj # group by adjacent objects, not by first in group
    next
    end
    # first item in yield is "false" indicating a yield of the group
    yield false, arr
    end
    obj_g = obj ; arr = [ obj ]
    end
    if arr then
    # first item in yield is "false" indicating a yield of the group
    yield false, arr
    end
    return self
    end
    end

    # for the problem of Steve Wilhelm
    def group_for_sw( arr )
    arrg = []
    arr.each_group_use_block_compare do | q_compare, aa, bb |
    if q_compare then
    if bb[1] - aa[1] < 30 then 0 else -1 end
    else
    aa.map! { |cc| cc[0] } # to preserve A, B, C objects, omit this
    arrg << aa
    end
    end
    arrg
    end

    arr = [ [ :A, 0 ], [ :B, 15 ], [ :C, 35 ],
    [ :D, 100 ],
    [ :E, 205 ], [ :F, 215 ],
    [ :G, 300 ]
    ]
    arrg = group_for_sw( arr )
    p arrg #=> [[:A, :B, :C], [:D], [:E, :F], [:G]]
     
    Colin Bartlett, Mar 21, 2010
    #15
  16. Steve Wilhelm

    Tanaka Akira Guest

    2010/3/21 Colin Bartlett <>:

    > Is a really elegant method possible for this sort of thing?
    > Some time ago I was trying to find duplicate files,
    > and set up an array of files with elements (paths, sizes, checksums),
    > and then wrote an Enumerable#each_group method.
    > The only way I could think of differentiating a yield for "comparison"
    > from a yield of the wanted array of "similar" objects
    > was to have the first item of each yield to be a boolean
    > with true meaning compare, and false meaning it's the wanted array.


    We need two blocks.
    One for slice condition and one for sliced array.
    But Ruby's method call can take only one block at most.

    Enumerable#slice_before solves this problem by returning enumerator.

    enum.slice_before {|elt| condition }.each {|ary| ... }

    slice_between presented in [ruby-talk:359384] is similar.

    enum.slice_between {|elt1, elt2| condition }.each {|ary| ... }

    Another possible idea is using Proc argument.
    enum.foo(lambda {|elt| condition }) {|ary| ... }
    --
    Tanaka Akira
     
    Tanaka Akira, Mar 21, 2010
    #16
  17. On Sun, Mar 21, 2010 at 7:58 AM, Tanaka Akira <> wrote:
    > We need two blocks.
    > One for slice condition and one for sliced array.
    > But Ruby's method call can take only one block at most.
    >
    > Enumerable#slice_before solves this problem by returning enumerator.
    >
    > enum.slice_before {|elt| condition }.each {|ary| ... }
    >
    > slice_between presented in [ruby-talk:359384] is similar.
    >
    > enum.slice_between {|elt1, elt2| condition }.each {|ary| ... }
    >
    > Another possible idea is using Proc argument.
    > enum.foo(lambda {|elt| condition }) {|ary| ... }
    > --
    > Tanaka Akira


    I must admit that when I made my post I was having difficulty
    following your code for
    Enumerable#slice_between
    and the result you show for it of
    [["A"], ["B"], ["C", "D", "E"], ["F", "G"]]
    isn't (I think) what the original poster wanted.
    But I've just changed your condition of
    (t2-t1) < 30
    to
    ! ( (t2-t1) < 30 )
    and that gives the wanted result of:
    [["A", "B", "C"], ["D"], ["E", "F"], ["G"]]
    and it was that that made my realise the meaning of "slice_between".

    One issue here is is it going to be usually better to "group by
    similarity" or "slice at difference",
    which might affect what one calls such a method, and what condition is tested?
    For what I was doing it made sense to me to think in terms of grouping.
    As posted, Steve Wilhelm's problem was in terms of grouping, but could
    be thought of as slicing at "difference"?

    I hadn't thought of using a Proc argument for the comparison.
    I've just amended my Enumerable#each_group to do that,
    and that looks (to my tired eyes at the moment!) cleaner than what I had before.
    So thanks for that Proc idea. (And it runs in 1.8 as well as in 1.9,
    which isn't the case if you use Enumerable::Enumerator?)

    def group_for_sw2( arr )
    lambda_compare = lambda { |aa, bb|
    if bb[1] - aa[1] < 30 then 0 else -1 end
    }
    arrg = []
    arr.each_group( lambda_compare ) do | aa |
    aa.map! { |cc| cc[0] } # to preserve A, B, C objects, omit this
    arrg << aa
    end
    arrg
    end

    Colin Bartlett
     
    Colin Bartlett, Mar 21, 2010
    #17
  18. Steve Wilhelm

    Tanaka Akira Guest

    2010/3/21 Colin Bartlett <>:

    > I must admit that when I made my post I was having difficulty
    > following your code for
    > Enumerable#slice_between
    > and the result you show for it of
    > [["A"], ["B"], ["C", "D", "E"], ["F", "G"]]
    > isn't (I think) what the original poster wanted.
    > But I've just changed your condition of
    > (t2-t1) < 30
    > to
    > ! ( (t2-t1) < 30 )
    > and that gives the wanted result of:
    > [["A", "B", "C"], ["D"], ["E", "F"], ["G"]]
    > and it was that that made my realise the meaning of "slice_between".


    Oops. You are right.
    --
    Tanaka Akira
     
    Tanaka Akira, Mar 21, 2010
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. kcwolle
    Replies:
    0
    Views:
    389
    kcwolle
    Nov 27, 2003
  2. Dimitre Novatchev

    .Re: Grouping neighboring elements with xsl

    Dimitre Novatchev, Nov 28, 2003, in forum: XML
    Replies:
    0
    Views:
    500
    Dimitre Novatchev
    Nov 28, 2003
  3. Sam Kong

    Grouping elements of an array?

    Sam Kong, Jan 8, 2007, in forum: Ruby
    Replies:
    6
    Views:
    139
    Sam Kong
    Jan 8, 2007
  4. Sam Kong
    Replies:
    5
    Views:
    350
    Peña, Botp
    Oct 22, 2007
  5. Matt Constantine
    Replies:
    2
    Views:
    106
    Robert Klemme
    Sep 22, 2008
Loading...

Share This Page