Group by unique entries of a hash

Discussion in 'Ruby' started by Ne Scripter, Sep 29, 2009.

  1. Ne Scripter

    Ne Scripter Guest

    I have two data sets loaded into a hash to give the following output

    "2efa4ba470", "00000005"
    "2efa4ba470", "00000004"
    "02adecfd5c", "00000002"
    "c0784b5de101", "00000006"
    "68c4bf10539", "00000003"
    "c0784b5de101", "00000001"

    My code to get this is as follows:

    source= "C:\\dummyFile.txt"
    hashMapping = Hash.new
    ocrIDMapping = Hash.new

    IO.foreach(source.to_s) do |data|
    fields = data.split(",")
    hash = fields[0]
    ocrID = fields[1]
    hashMapping[ocrID] = hash
    end

    hashMapping.sort{|a,b| a[1]<=>b[1]}.each { |elem|

    puts "#{elem[1]}, #{elem[0]}"}

    I would like to alter my output to group my the first value to give an
    output like this:

    "2efa4ba470", "00000005", "00000004"
    "02adecfd5c", "00000002"
    "c0784b5de101", "00000006", "00000001"
    "68c4bf10539", "00000003"

    As you can see now only unique values are shown in the first field
    however a list of the corresponding second field is formed, grouping the
    results. Something like this I could do in SQL however I have never come
    across it in Ruby so does anyone have any pointers?

    Many thanks
    --
    Posted via http://www.ruby-forum.com/.
     
    Ne Scripter, Sep 29, 2009
    #1
    1. Advertising

  2. Ne Scripter

    Paul Smith Guest

    On Tue, Sep 29, 2009 at 12:43 PM, Ne Scripter
    <> wrote:
    > I have two data sets loaded into a hash to give the following output
    >
    > "2efa4ba470", =A0"00000005"
    > "2efa4ba470", =A0"00000004"
    > "02adecfd5c", =A0"00000002"
    > "c0784b5de101", =A0"00000006"
    > "68c4bf10539", =A0"00000003"
    > "c0784b5de101", =A0"00000001"
    >
    > My code to get this is as follows:
    >
    > =A0source=3D "C:\\dummyFile.txt"
    > =A0hashMapping =3D Hash.new
    > =A0ocrIDMapping =3D Hash.new
    >
    > =A0IO.foreach(source.to_s) do |data|
    > =A0 =A0fields =3D data.split(",")
    > =A0 =A0hash =3D fields[0]
    > =A0 =A0ocrID =3D fields[1]
    > =A0 =A0hashMapping[ocrID] =3D hash
    > =A0end
    >
    > =A0hashMapping.sort{|a,b| a[1]<=3D>b[1]}.each { |elem|
    >
    > =A0puts "#{elem[1]}, #{elem[0]}"}
    >
    > I would like to alter my output to group my the first value to give an
    > output like this:
    >
    > "2efa4ba470", =A0"00000005", "00000004"
    > "02adecfd5c", =A0"00000002"
    > "c0784b5de101", =A0"00000006", "00000001"
    > "68c4bf10539", =A0"00000003"
    >
    > As you can see now only unique values are shown in the first field
    > however a list of the corresponding second field is formed, grouping the
    > results. Something like this I could do in SQL however I have never come
    > across it in Ruby so does anyone have any pointers?


    You want a hash where the key is the element you want to group on, and
    the 'item' is an array of all items with the shared key. A bit like
    (untested):

    hashMapping =3D {}

    IO.foreach(source.to_s) do |data|
    fields =3D data.split(",")
    hash =3D fields[0]
    ocrID =3D fields[1]

    hashMapping[ocrID] ||=3D [] #If hashMapping has never seen this key
    before, make an empty array

    hashMapping[ocrID] << hash #Add the new element to the array for this ke=
    y

    end

    >
    > Many thanks
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >




    --=20
    Paul Smith
    http://www.nomadicfun.co.uk

     
    Paul Smith, Sep 29, 2009
    #2
    1. Advertising

  3. 2009/9/29 Paul Smith <>:
    > On Tue, Sep 29, 2009 at 12:43 PM, Ne Scripter
    > <> wrote:
    >> I have two data sets loaded into a hash to give the following output
    >>
    >> "2efa4ba470", =A0"00000005"
    >> "2efa4ba470", =A0"00000004"
    >> "02adecfd5c", =A0"00000002"
    >> "c0784b5de101", =A0"00000006"
    >> "68c4bf10539", =A0"00000003"
    >> "c0784b5de101", =A0"00000001"
    >>
    >> My code to get this is as follows:
    >>
    >> =A0source=3D "C:\\dummyFile.txt"
    >> =A0hashMapping =3D Hash.new
    >> =A0ocrIDMapping =3D Hash.new
    >>
    >> =A0IO.foreach(source.to_s) do |data|
    >> =A0 =A0fields =3D data.split(",")
    >> =A0 =A0hash =3D fields[0]
    >> =A0 =A0ocrID =3D fields[1]
    >> =A0 =A0hashMapping[ocrID] =3D hash
    >> =A0end
    >>
    >> =A0hashMapping.sort{|a,b| a[1]<=3D>b[1]}.each { |elem|
    >>
    >> =A0puts "#{elem[1]}, #{elem[0]}"}
    >>
    >> I would like to alter my output to group my the first value to give an
    >> output like this:
    >>
    >> "2efa4ba470", =A0"00000005", "00000004"
    >> "02adecfd5c", =A0"00000002"
    >> "c0784b5de101", =A0"00000006", "00000001"
    >> "68c4bf10539", =A0"00000003"
    >>
    >> As you can see now only unique values are shown in the first field
    >> however a list of the corresponding second field is formed, grouping the
    >> results. Something like this I could do in SQL however I have never come
    >> across it in Ruby so does anyone have any pointers?

    >
    > You want a hash where the key is the element you want to group on, and
    > the 'item' is an array of all items with the shared key. =A0A bit like
    > (untested):
    >
    > hashMapping =3D {}
    >
    > IO.foreach(source.to_s) do |data|
    > =A0 fields =3D data.split(",")
    > =A0 hash =3D fields[0]
    > =A0 ocrID =3D fields[1]
    >
    > =A0 hashMapping[ocrID] ||=3D [] #If hashMapping has never seen this key
    > before, make an empty array
    >
    > =A0 hashMapping[ocrID] << hash #Add the new element to the array for this=

    key
    >
    > =A0end


    It is slightly more efficient to do it in one step:

    (hashMapping[ocrID] ||=3D []) << hash

    Even nicer

    hashMapping =3D Hash.new {|h,k| h[k] =3D []}
    ...
    hashMapping[ocrID] << hash

    Kind regards

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Sep 29, 2009
    #3
  4. Ne Scripter

    Paul Smith Guest

    On Tue, Sep 29, 2009 at 2:38 PM, Robert Klemme
    <> wrote:
    > 2009/9/29 Paul Smith <>:
    >> On Tue, Sep 29, 2009 at 12:43 PM, Ne Scripter
    >> <> wrote:
    >>> I have two data sets loaded into a hash to give the following output
    >>>
    >>> "2efa4ba470", =A0"00000005"
    >>> "2efa4ba470", =A0"00000004"
    >>> "02adecfd5c", =A0"00000002"
    >>> "c0784b5de101", =A0"00000006"
    >>> "68c4bf10539", =A0"00000003"
    >>> "c0784b5de101", =A0"00000001"
    >>>
    >>> My code to get this is as follows:
    >>>
    >>> =A0source=3D "C:\\dummyFile.txt"
    >>> =A0hashMapping =3D Hash.new
    >>> =A0ocrIDMapping =3D Hash.new
    >>>
    >>> =A0IO.foreach(source.to_s) do |data|
    >>> =A0 =A0fields =3D data.split(",")
    >>> =A0 =A0hash =3D fields[0]
    >>> =A0 =A0ocrID =3D fields[1]
    >>> =A0 =A0hashMapping[ocrID] =3D hash
    >>> =A0end
    >>>
    >>> =A0hashMapping.sort{|a,b| a[1]<=3D>b[1]}.each { |elem|
    >>>
    >>> =A0puts "#{elem[1]}, #{elem[0]}"}
    >>>
    >>> I would like to alter my output to group my the first value to give an
    >>> output like this:
    >>>
    >>> "2efa4ba470", =A0"00000005", "00000004"
    >>> "02adecfd5c", =A0"00000002"
    >>> "c0784b5de101", =A0"00000006", "00000001"
    >>> "68c4bf10539", =A0"00000003"
    >>>
    >>> As you can see now only unique values are shown in the first field
    >>> however a list of the corresponding second field is formed, grouping th=

    e
    >>> results. Something like this I could do in SQL however I have never com=

    e
    >>> across it in Ruby so does anyone have any pointers?

    >>
    >> You want a hash where the key is the element you want to group on, and
    >> the 'item' is an array of all items with the shared key. =A0A bit like
    >> (untested):
    >>
    >> hashMapping =3D {}
    >>
    >> IO.foreach(source.to_s) do |data|
    >> =A0 fields =3D data.split(",")
    >> =A0 hash =3D fields[0]
    >> =A0 ocrID =3D fields[1]
    >>
    >> =A0 hashMapping[ocrID] ||=3D [] #If hashMapping has never seen this key
    >> before, make an empty array
    >>
    >> =A0 hashMapping[ocrID] << hash #Add the new element to the array for thi=

    s key
    >>
    >> =A0end

    >
    > It is slightly more efficient to do it in one step:
    >
    > (hashMapping[ocrID] ||=3D []) << hash
    >
    > Even nicer
    >
    > hashMapping =3D Hash.new {|h,k| h[k] =3D []}


    Is this defining a default element for the hash? I had a vague
    recollection you could do this but completely forgot how.

    I'd also rename the 'hash' variable to 'key' or something, I think
    it's less confusing. Then Your hashMapping can either be given the
    name 'hash', because that's what it is, or a name that's actually
    useful for describing what the mystical contents of the hash are.

    > ...
    > hashMapping[ocrID] << hash
    >
    > Kind regards
    >
    > robert
    >
    > --
    > remember.guy do |as, often| as.you_can - without end
    > http://blog.rubybestpractices.com/
    >
    >




    --=20
    Paul Smith
    http://www.nomadicfun.co.uk

     
    Paul Smith, Sep 29, 2009
    #4
  5. On Tue, Sep 29, 2009 at 4:22 PM, Paul Smith <> wrote=
    :
    > On Tue, Sep 29, 2009 at 2:38 PM, Robert Klemme
    > <> wrote:


    >> Even nicer
    >>
    >> hashMapping =3D Hash.new {|h,k| h[k] =3D []}

    >
    > Is this defining a default element for the hash? =A0I had a vague
    > recollection you could do this but completely forgot how.


    Using that constructor you pass a block which will be executed every
    time there's a missing key. The value of the block is used as a
    default value, but as it is in this case, it can have the side effect
    of modifying the hash. If you don't modify the hash inside the block,
    it's not modified as you can see in the first example:

    Two examples:

    irb(main):001:0> h =3D Hash.new {|h,k| 0}
    =3D> {}
    irb(main):002:0> h[:a]
    =3D> 0
    irb(main):003:0> h[:a] +=3D 1
    =3D> 1
    irb(main):004:0> h
    =3D> {:a=3D>1}
    irb(main):005:0> h2 =3D Hash.new {|h,k| h[k] =3D []}
    =3D> {}
    irb(main):007:0> h2[:a]
    =3D> []
    irb(main):008:0> h2
    =3D> {:a=3D>[]}

    Jesus.
     
    Jesús Gabriel y Galán, Sep 29, 2009
    #5
  6. 2009/9/29 Paul Smith <>:
    > On Tue, Sep 29, 2009 at 2:38 PM, Robert Klemme
    > <> wrote:
    >> 2009/9/29 Paul Smith <>:
    >>> On Tue, Sep 29, 2009 at 12:43 PM, Ne Scripter
    >>> <> wrote:


    >>> hashMapping =3D {}
    >>>
    >>> IO.foreach(source.to_s) do |data|
    >>> =A0 fields =3D data.split(",")
    >>> =A0 hash =3D fields[0]
    >>> =A0 ocrID =3D fields[1]
    >>>
    >>> =A0 hashMapping[ocrID] ||=3D [] #If hashMapping has never seen this key
    >>> before, make an empty array
    >>>
    >>> =A0 hashMapping[ocrID] << hash #Add the new element to the array for th=

    is key
    >>>
    >>> =A0end

    >>
    >> It is slightly more efficient to do it in one step:
    >>
    >> (hashMapping[ocrID] ||=3D []) << hash
    >>
    >> Even nicer
    >>
    >> hashMapping =3D Hash.new {|h,k| h[k] =3D []}

    >
    > Is this defining a default element for the hash? =A0I had a vague
    > recollection you could do this but completely forgot how.


    No, this is defining a hook which is executed each time a key is
    requested which is not present. In this case the hook stores a new
    Array in the Hash but you could do other things as well.

    A default value is defined via Hash.new([]) which does not work in
    this case for obvious reasons.

    > I'd also rename the 'hash' variable to 'key' or something, I think
    > it's less confusing. =A0Then Your hashMapping can either be given the
    > name 'hash', because that's what it is, or a name that's actually
    > useful for describing what the mystical contents of the hash are.


    Absolutely. I just did not want to cause extra confusion by starting
    to rename everything. :)

    Cheers

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Sep 29, 2009
    #6
  7. Ne Scripter

    Ne Scripter Guest

    Ah yes, that makes perfect sense.

    Thanks

    Paul Smith wrote:

    >
    > You want a hash where the key is the element you want to group on, and
    > the 'item' is an array of all items with the shared key. A bit like
    > (untested):
    >
    > hashMapping = {}
    >
    > IO.foreach(source.to_s) do |data|
    > fields = data.split(",")
    > hash = fields[0]
    > ocrID = fields[1]
    >
    > hashMapping[ocrID] ||= [] #If hashMapping has never seen this key
    > before, make an empty array
    >
    > hashMapping[ocrID] << hash #Add the new element to the array for this
    > key
    >
    > end
    >
    >>
    >> Many thanks
    >> --
    >> Posted via http://www.ruby-forum.com/.
    >>
    >>


    > --
    > Paul Smith
    > http://www.nomadicfun.co.uk
    >
    >

    --
    Posted via http://www.ruby-forum.com/.
     
    Ne Scripter, Sep 29, 2009
    #7
  8. Ne Scripter

    Ne Scripter Guest

    So if we were to take this further.We now have out hashMapping list

    "2efa4ba470", "00000005""00000004"
    "c0784b5de101", "00000006""00000003"
    "02adecfd5c", "00000002"
    "c0784b5de101", "00000001"

    Now we split the hash to give only the ID values (column 2) again doing
    some like this:

    hashMapping.each do |itemDetail|
    newID = itemDetail[1].to_s.delete("\"").strip
    end

    I have another declared array in my code, with many more ID values like
    those shown above

    moreIDs = ["00000001", "00000003", "00000004", "00000005", 00000007",
    "00000008"]

    What I want to is search for all newID's that match moreIDs and output
    the matching ID and the corresponding code shown in column one. So the
    sample output would be like this:

    "2efa4ba470", "00000005""00000004"
    "c0784b5de101", "00000003"
    "c0784b5de101", "00000001"

    Show this shows the string code for an ID that is present in both of my
    lists. I had thought something like moreID == newID but that creates a
    lot off do loops. Also, thr though of array intersects crossed my mind,
    but we have a hash and array hear so I was unsure of how to make this
    work?

    Any assistance in greatly appreciated.

    Ne Scripter wrote:
    > Ah yes, that makes perfect sense.
    >
    > Thanks
    >
    > Paul Smith wrote:
    >
    >>
    >> You want a hash where the key is the element you want to group on, and
    >> the 'item' is an array of all items with the shared key. A bit like
    >> (untested):
    >>
    >> hashMapping = {}
    >>
    >> IO.foreach(source.to_s) do |data|
    >> fields = data.split(",")
    >> hash = fields[0]
    >> ocrID = fields[1]
    >>
    >> hashMapping[ocrID] ||= [] #If hashMapping has never seen this key
    >> before, make an empty array
    >>
    >> hashMapping[ocrID] << hash #Add the new element to the array for this
    >> key
    >>
    >> end
    >>
    >>>
    >>> Many thanks
    >>> --
    >>> Posted via http://www.ruby-forum.com/.
    >>>
    >>>

    >
    >> --
    >> Paul Smith
    >> http://www.nomadicfun.co.uk
    >>
    >>


    --
    Posted via http://www.ruby-forum.com/.
     
    Ne Scripter, Sep 29, 2009
    #8
  9. * Paul Smith <> (2009-09-29) schrieb:

    >>    fields = data.split(",")
    >>    hash = fields[0]
    >>    ocrID = fields[1]


    BTW, you could write this as:

    hash, ocrID = *data.split(",")

    or

    hash, ocrID, *ignored = *data.split(",")

    if you want to ignore everything behind a second comma, or

    hash, ocrID = *data.split(",", 2)

    if you want to include a second comma and everything behind it into the
    ocrID string.

    mfg, simon .... which color is the green bill? blue!
     
    Simon Krahnke, Sep 30, 2009
    #9
  10. Ne Scripter

    Ne Scripter Guest

    I am having real problems with my array. As expected my array contents
    is like so:

    "00000004 00000005"
    "00000003 00000006"
    "00000001"
    "00000002"

    I now want 6 indivdual elements insteal of 4. I have tried to split the
    double entries up with split, however they remain together. Is there
    something fundamental I am missing? I want an array like so

    array = ["00000004", "00000005", "00000003", "00000006", "00000001",
    "00000002"]

    Sorry if this is simple, I have spent several hours chopping and
    changing.

    Thanks

    S

    David A. Black wrote:
    > On Thu, 1 Oct 2009, Simon Krahnke wrote:
    >
    >> or
    >>
    >> hash, ocrID, *ignored = *data.split(",")
    >>
    >> if you want to ignore everything behind a second comma, or
    >>
    >> hash, ocrID = *data.split(",", 2)
    >>
    >> if you want to include a second comma and everything behind it into the
    >> ocrID string.

    >
    > I don't think you need the * for any of them. This:
    >
    > hash, ocrID = data.split(',')
    >
    > should be fine for getting the first two values.
    >
    >
    > David


    --
    Posted via http://www.ruby-forum.com/.
     
    Ne Scripter, Oct 1, 2009
    #10
  11. Ne Scripter

    Josh Cheek Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Thu, Oct 1, 2009 at 8:15 AM, Ne Scripter <
    > wrote:


    > I am having real problems with my array. As expected my array contents
    > is like so:
    >
    > "00000004 00000005"
    > "00000003 00000006"
    > "00000001"
    > "00000002"
    >
    > I now want 6 indivdual elements insteal of 4. I have tried to split the
    > double entries up with split, however they remain together. Is there
    > something fundamental I am missing? I want an array like so
    >
    > array = ["00000004", "00000005", "00000003", "00000006", "00000001",
    > "00000002"]
    >
    > Sorry if this is simple, I have spent several hours chopping and
    > changing.
    >
    > Thanks
    >
    > S
    >
    > David A. Black wrote:
    > > On Thu, 1 Oct 2009, Simon Krahnke wrote:
    > >
    > >> or
    > >>
    > >> hash, ocrID, *ignored = *data.split(",")
    > >>
    > >> if you want to ignore everything behind a second comma, or
    > >>
    > >> hash, ocrID = *data.split(",", 2)
    > >>
    > >> if you want to include a second comma and everything behind it into the
    > >> ocrID string.

    > >
    > > I don't think you need the * for any of them. This:
    > >
    > > hash, ocrID = data.split(',')
    > >
    > > should be fine for getting the first two values.
    > >
    > >
    > > David

    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >


    x = [
    "00000004 00000005" ,
    "00000003 00000006" ,
    "00000001" ,
    "00000002" ,
    ]

    x.map!{|str| str.split(/\s+/) }.flatten!

    x # => ["00000004", "00000005", "00000003", "00000006", "00000001",
    "00000002"]
     
    Josh Cheek, Oct 1, 2009
    #11
  12. Hi --

    On Thu, 1 Oct 2009, Josh Cheek wrote:

    > On Thu, Oct 1, 2009 at 8:15 AM, Ne Scripter <
    >> wrote:

    >
    >> I am having real problems with my array. As expected my array contents
    >> is like so:
    >>
    >> "00000004 00000005"
    >> "00000003 00000006"
    >> "00000001"
    >> "00000002"
    >>
    >> I now want 6 indivdual elements insteal of 4. I have tried to split the
    >> double entries up with split, however they remain together. Is there
    >> something fundamental I am missing? I want an array like so
    >>
    >> array = ["00000004", "00000005", "00000003", "00000006", "00000001",
    >> "00000002"]
    >>
    >> Sorry if this is simple, I have spent several hours chopping and
    >> changing.
    >>

    >
    > x = [
    > "00000004 00000005" ,
    > "00000003 00000006" ,
    > "00000001" ,
    > "00000002" ,
    > ]
    >
    > x.map!{|str| str.split(/\s+/) }.flatten!
    >
    > x # => ["00000004", "00000005", "00000003", "00000006", "00000001",
    > "00000002"]


    You can even dispense with the argument to split if it's just
    whitespace.

    Another option, though perhaps a slightly memory-wasting one:

    x.join(' ').split


    David

    --
    David A. Black, Director
    Ruby Power and Light, LLC (http://www.rubypal.com)
    Ruby/Rails training, consulting, mentoring, code review
    Book: The Well-Grounded Rubyist (http://www.manning.com/black2)
     
    David A. Black, Oct 1, 2009
    #12
  13. Ne Scripter

    Ne Scripter Guest

    Ah map. Perfect.

    Josh Cheek wrote:
    > On Thu, Oct 1, 2009 at 8:15 AM, Ne Scripter
    > <
    >> wrote:

    >
    >> something fundamental I am missing? I want an array like so
    >>
    >> >>
    >> > David

    >>
    >> --
    >> Posted via http://www.ruby-forum.com/.
    >>
    >>


    > x.map!{|str| str.split(/\s+/) }.flatten!
    >
    > x # => ["00000004", "00000005", "00000003", "00000006", "00000001",
    > "00000002"]

    --
    Posted via http://www.ruby-forum.com/.
     
    Ne Scripter, Oct 1, 2009
    #13
  14. * David A. Black <> (2009-09-30) schrieb:

    > On Thu, 1 Oct 2009, Simon Krahnke wrote:
    >
    >> * Paul Smith <> (2009-09-29) schrieb:
    >>
    >>>>    fields = data.split(",")
    >>>>    hash = fields[0]
    >>>>    ocrID = fields[1]

    >>
    >> BTW, you could write this as:
    >>
    >> hash, ocrID = *data.split(",")
    >>
    >> or
    >>
    >> hash, ocrID, *ignored = *data.split(",")
    >>
    >> if you want to ignore everything behind a second comma, or
    >>
    >> hash, ocrID = *data.split(",", 2)
    >>
    >> if you want to include a second comma and everything behind it into the
    >> ocrID string.

    >
    > I don't think you need the * for any of them.


    I always include it anyway, to be explicit.

    > This:
    >
    > hash, ocrID = data.split(',')
    >
    > should be fine for getting the first two values.


    Right, I just tried it. I thought you might be getting an array into
    ocrID, but you don't.

    mfg, simon .... l
     
    Simon Krahnke, Oct 1, 2009
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ToshiBoy
    Replies:
    6
    Views:
    876
    ToshiBoy
    Aug 12, 2008
  2. Don Bruder
    Replies:
    3
    Views:
    1,013
    spikeysnack
    Aug 3, 2010
  3. rp
    Replies:
    1
    Views:
    562
    red floyd
    Nov 10, 2011
  4. Srijayanth Sridhar
    Replies:
    19
    Views:
    655
    David A. Black
    Jul 2, 2008
  5. Token Type
    Replies:
    9
    Views:
    384
    Chris Angelico
    Sep 9, 2012
Loading...

Share This Page