How to do this complicated logic in ruby

Discussion in 'Ruby' started by Valentino Lun, Feb 16, 2009.

  1. Dear all

    I have an array with size around 1000, I want to perform some data
    checking and correction in this array.

    For instance, the first record of this array is a hash, as follow
    my_array[0] = {"server"=>"AHN", "hosp"=>"AHN", "loc"=>"PC1",
    "pspec"=>"ANA", "number"=>"1", "pcat"=>"1"}

    server hosp loc pspec pcat
    AHN AHN PC1 ANA 1
    PWH AHN PC1 ANA 1
    NDH AHN PC1 ANA 2 <= This pcat value need update in
    array1
    TMH AHN PC1 ANA 2 <= This pcat value need update in
    array1
    .......
    .....
    ...
    (around 1000 records)

    When keys hosp, loc, pspec has the same values, their pcat must be
    identical. So, there is problem in the last two records, the key pcat
    should be 1, because the pcat is correct if array["server"] equal to
    array["hosp"].

    I cannot figure out the logic to doing this in ruby (even in other
    language). Can someone give me some hints on this? Thanks

    Many thanks
    Valentino
    --
    Posted via http://www.ruby-forum.com/.
     
    Valentino Lun, Feb 16, 2009
    #1
    1. Advertising

  2. Loop the array, changing the values of the hash as you go based on
    some conditional. It's not complex at all. What are you finding
    difficult?

    Blog: http://random8.zenunit.com/
    Learn rails: http://sensei.zenunit.com/

    On 16/02/2009, at 9:00 PM, Valentino Lun <> wrote:

    > Dear all
    >
    > I have an array with size around 1000, I want to perform some data
    > checking and correction in this array.
    >
    > For instance, the first record of this array is a hash, as follow
    > my_array[0] = {"server"=>"AHN", "hosp"=>"AHN", "loc"=>"PC1",
    > "pspec"=>"ANA", "number"=>"1", "pcat"=>"1"}
    >
    > server hosp loc pspec pcat
    > AHN AHN PC1 ANA 1
    > PWH AHN PC1 ANA 1
    > NDH AHN PC1 ANA 2 <= This pcat value need update in
    > array1
    > TMH AHN PC1 ANA 2 <= This pcat value need update in
    > array1
    > .......
    > .....
    > ...
    > (around 1000 records)
    >
    > When keys hosp, loc, pspec has the same values, their pcat must be
    > identical. So, there is problem in the last two records, the key pcat
    > should be 1, because the pcat is correct if array["server"] equal to
    > array["hosp"].
    >
    > I cannot figure out the logic to doing this in ruby (even in other
    > language). Can someone give me some hints on this? Thanks
    >
    > Many thanks
    > Valentino
    > --
    > Posted via http://www.ruby-forum.com/.
    >
     
    Julian Leviston, Feb 16, 2009
    #2
    1. Advertising

  3. On Mon, Feb 16, 2009 at 3:30 PM, Valentino Lun <> wrote:
    > Dear all
    >
    > I have an array with size around 1000, I want to perform some data
    > checking and correction in this array.
    >
    > For instance, the first record of this array is a hash, as follow
    > my_array[0] = {"server"=>"AHN", "hosp"=>"AHN", "loc"=>"PC1",
    > "pspec"=>"ANA", "number"=>"1", "pcat"=>"1"}
    >
    > server hosp loc pspec pcat
    > AHN AHN PC1 ANA 1
    > PWH AHN PC1 ANA 1
    > NDH AHN PC1 ANA 2 <= This pcat value need update in
    > array1
    > TMH AHN PC1 ANA 2 <= This pcat value need update in
    > array1
    > .......
    > .....
    > ...
    > (around 1000 records)
    >
    > When keys hosp, loc, pspec has the same values, their pcat must be
    > identical. So, there is problem in the last two records, the key pcat
    > should be 1, because the pcat is correct if array["server"] equal to
    > array["hosp"].


    Simple way:

    1. Have a 'signature' for each row, composed of the hosp, loc and
    pspec. Could be as simple as

    def signature(ary, row)
    %w(hosp loc pspec).map {|k| ary[row][k]}.join(",")
    end

    2. Collect all the rows with the same signature

    verify = Hash.new {|h,k| h[k] = []}
    ary.each_with_index {|row, i|
    h[signature(ary, row)] << [i, row['pcat']]
    }

    3. See if there are any problems

    verify.each_pair {|k, v|
    if v.length > 1
    fix_array_for(v)
    end
    }

    4. Write fix_array_for(v)

    Note that v is an array of pairs of [index, pcat]. So for your
    example, it would be
    [[0,1], [1,1], [2,2], [3,2]]

    you basically need to iterate over that array, see which pcat is
    right, then iterate over it once more and set all the pcats to the
    right value.

    There are probably more efficient ways to do all this, but this has
    the advantage of being straightforward.

    martin
     
    Martin DeMello, Feb 16, 2009
    #3
  4. Using Symbols here make a big sense. Try to structure your array like:

    my_array[0] =3D {:server =3D> "AHN", :hosp =3D>"AHN", :loc =3D>"PC1",
    :pspec=3D>"ANA", :number=3D>"1", :pcat=3D>"1"}

    And for all the values that are frequently repeated use Symbols. Basically
    when you use Symbols you create one object and all the times that you use
    one object with the same name you create a referece to this object and NOT
    another object. Making that you will free memory.

    Regards,
    Luiz Vitor.

    On Mon, Feb 16, 2009 at 7:57 AM, Martin DeMello <>wr=
    ote:

    > On Mon, Feb 16, 2009 at 3:30 PM, Valentino Lun <> wrote:
    > > Dear all
    > >
    > > I have an array with size around 1000, I want to perform some data
    > > checking and correction in this array.
    > >
    > > For instance, the first record of this array is a hash, as follow
    > > my_array[0] =3D {"server"=3D>"AHN", "hosp"=3D>"AHN", "loc"=3D>"PC1",
    > > "pspec"=3D>"ANA", "number"=3D>"1", "pcat"=3D>"1"}
    > >
    > > server hosp loc pspec pcat
    > > AHN AHN PC1 ANA 1
    > > PWH AHN PC1 ANA 1
    > > NDH AHN PC1 ANA 2 <=3D This pcat value need update in
    > > array1
    > > TMH AHN PC1 ANA 2 <=3D This pcat value need update in
    > > array1
    > > .......
    > > .....
    > > ...
    > > (around 1000 records)
    > >
    > > When keys hosp, loc, pspec has the same values, their pcat must be
    > > identical. So, there is problem in the last two records, the key pcat
    > > should be 1, because the pcat is correct if array["server"] equal to
    > > array["hosp"].

    >
    > Simple way:
    >
    > 1. Have a 'signature' for each row, composed of the hosp, loc and
    > pspec. Could be as simple as
    >
    > def signature(ary, row)
    > %w(hosp loc pspec).map {|k| ary[row][k]}.join(",")
    > end
    >
    > 2. Collect all the rows with the same signature
    >
    > verify =3D Hash.new {|h,k| h[k] =3D []}
    > ary.each_with_index {|row, i|
    > h[signature(ary, row)] << [i, row['pcat']]
    > }
    >
    > 3. See if there are any problems
    >
    > verify.each_pair {|k, v|
    > if v.length > 1
    > fix_array_for(v)
    > end
    > }
    >
    > 4. Write fix_array_for(v)
    >
    > Note that v is an array of pairs of [index, pcat]. So for your
    > example, it would be
    > [[0,1], [1,1], [2,2], [3,2]]
    >
    > you basically need to iterate over that array, see which pcat is
    > right, then iterate over it once more and set all the pcats to the
    > right value.
    >
    > There are probably more efficient ways to do all this, but this has
    > the advantage of being straightforward.
    >
    > martin
    >
    >



    --=20
    Regards,

    Luiz Vitor Martinez Cardoso
    cel.: (11) 8187-8662
    blog: rubz.org
    engineer student at maua.br

    "Posso nunca chegar a ser o melhor engenheiro do mundo, mas tenha certeza d=
    e
    que eu vou lutar com todas as minhas for=C3=A7as para ser o melhor engenhei=
    ro que
    eu puder ser"
     
    Luiz Vitor Martinez Cardoso, Feb 16, 2009
    #4
  5. On Mon, Feb 16, 2009 at 4:43 PM, Luiz Vitor Martinez Cardoso
    <> wrote:
    > Using Symbols here make a big sense. Try to structure your array like:
    >
    > my_array[0] = {:server => "AHN", :hosp =>"AHN", :loc =>"PC1",
    > :pspec=>"ANA", :number=>"1", :pcat=>"1"}
    >
    > And for all the values that are frequently repeated use Symbols. Basically
    > when you use Symbols you create one object and all the times that you use
    > one object with the same name you create a referece to this object and NOT
    > another object. Making that you will free memory.


    Even better: http://www.codeforpeople.com/lib/ruby/arrayfields/

    martin
     
    Martin DeMello, Feb 16, 2009
    #5
  6. 2009/2/16 Valentino Lun <>:
    > Dear all
    >
    > I have an array with size around 1000, I want to perform some data
    > checking and correction in this array.
    >
    > For instance, the first record of this array is a hash, as follow
    > my_array[0] = {"server"=>"AHN", "hosp"=>"AHN", "loc"=>"PC1",
    > "pspec"=>"ANA", "number"=>"1", "pcat"=>"1"}
    >
    > server hosp loc pspec pcat
    > AHN AHN PC1 ANA 1
    > PWH AHN PC1 ANA 1
    > NDH AHN PC1 ANA 2 <= This pcat value need update in
    > array1
    > TMH AHN PC1 ANA 2 <= This pcat value need update in
    > array1
    > .......
    > .....
    > ...
    > (around 1000 records)
    >
    > When keys hosp, loc, pspec has the same values, their pcat must be
    > identical. So, there is problem in the last two records, the key pcat
    > should be 1, because the pcat is correct if array["server"] equal to
    > array["hosp"].
    >
    > I cannot figure out the logic to doing this in ruby (even in other
    > language). Can someone give me some hints on this? Thanks


    IMHO this is plainly the wrong data structure for the task. Since you
    identify entries by their hosp, loc, pspec you should *index* the
    whole thing by these columns. Also, since your Hashes seem to be
    uniform I would rather define a particular type for this, e.g.

    Entry = Struct.new :server, :hosp, :loc, :pspec, :pcat

    EntryKey = Struct.new :server, :hosp, :loc do
    def self.create(entry)
    new(*members.map {|m| entry[m]})
    end
    end

    index = Hash.new {|h,k| h[k] = []}
    # loop reading input
    entry = ...
    index[EntryKey.create(entry)] << entry

    # now you can process them or do it while reading

    See also Martin's reply which goes into the same direction just with a
    different approach.

    Cheers

    robert


    --
    remember.guy do |as, often| as.you_can - without end
     
    Robert Klemme, Feb 16, 2009
    #6
  7. On Mon, Feb 16, 2009 at 5:37 PM, Robert Klemme
    <> wrote:
    > EntryKey = Struct.new :server, :hosp, :loc do
    > def self.create(entry)
    > new(*members.map {|m| entry[m]})
    > end
    > end
    >
    > index = Hash.new {|h,k| h[k] = []}
    > # loop reading input
    > entry = ...
    > index[EntryKey.create(entry)] << entry
    >
    > # now you can process them or do it while reading
    >
    > See also Martin's reply which goes into the same direction just with a
    > different approach.


    The different approach is mostly due to the fact that I'm
    uncomfortable using objects with mutable fieds as hash keys. I prefer
    to explicitly map them to a string, and then use that string as a hash
    key.

    martin
     
    Martin DeMello, Feb 16, 2009
    #7
  8. 2009/2/16 Martin DeMello <>:
    > On Mon, Feb 16, 2009 at 5:37 PM, Robert Klemme
    > <> wrote:
    >> EntryKey = Struct.new :server, :hosp, :loc do
    >> def self.create(entry)
    >> new(*members.map {|m| entry[m]})
    >> end
    >> end
    >>
    >> index = Hash.new {|h,k| h[k] = []}
    >> # loop reading input
    >> entry = ...
    >> index[EntryKey.create(entry)] << entry
    >>
    >> # now you can process them or do it while reading
    >>
    >> See also Martin's reply which goes into the same direction just with a
    >> different approach.

    >
    > The different approach is mostly due to the fact that I'm
    > uncomfortable using objects with mutable fieds as hash keys. I prefer
    > to explicitly map them to a string, and then use that string as a hash
    > key.


    Hehe, that would be something *I* would be uncomfortable with. :) It
    is interesting that you advertise this approach as a more robust one.
    Because IMHO this is more on the hackish side of things because
    instead of using a structured type you lump everything into a single
    unstructured object. This can break awfully (i.e. in your example, if
    fields contain "," in different places).

    The nice thing about Struct is that it defines #==, #eql? and #hash
    properly making generated classes suitable as Hash keys. If you are
    afraid of mutations you can always freeze keys.

    Kind regards

    robert


    --
    remember.guy do |as, often| as.you_can - without end
     
    Robert Klemme, Feb 16, 2009
    #8
  9. Valentino Lun

    Pit Capitain Guest

    2009/2/16 Valentino Lun <>:
    > I cannot figure out the logic to doing this in ruby (even in other
    > language). Can someone give me some hints on this? Thanks


    While I agree on what the others have said, that you should create a
    better data structure, here's a way to do what you wanted with your
    array of hashes. But look at the other posts. It's easy to build good
    data structures in Ruby.

    # create a key for the given record to be used in the pcat hash
    def pcat_key(record)
    [record["hosp"], record["loc"], record["psec"]]
    end

    # build hash with valid pcat values
    pcat = {}
    my_array.each do |record|
    next unless record["server"] == record["hosp"]
    pcat[pcat_key(record)] = record["pcat"]
    end

    # look for invalid records
    my_array.each do |record|
    next if record["pcat"] == pcat[pcat_key(record)]
    # do something with the invalid record
    p record
    end

    Regards,
    Pit
     
    Pit Capitain, Feb 16, 2009
    #9
  10. Dear all

    Thank you for your help. Finally, I used about 5 hours (>_<) to figure
    out my solution and it works..But it takes long time to execute.

    Below is my code to share with you all, and I am seeking your expert
    advices if any optimization can be done. Thank you.


    # data collection about 5000 records for each variable (lis, gcrs)
    lis = ActiveRecord::Base.connection.execute("select * from lis_requests
    order by hosp, spec, loc, pspec")
    gcrs = ActiveRecord::Base.connection.execute("select * from
    gcrs_requests order by hosp, spec, loc, pspec")

    def find_correct_pcat(arr)

    server_ref = {"AHN" => "AHN", "TPH" => "AHN",
    "NDH" => "NDH", "BBH" => "NDH", "CHS" => "NDH",
    "PWH" => "PWH", "SH" => "PWH"}

    arr.each do |x|
    return x["pcat"] if x["server"] == server_ref[x["hosp"]]
    end

    #if not, then find the pcat with the largest "number"
    a.sort_by {|y| y["number"].to_i}.last["pcat"]

    end

    # The result will put in this hash
    result = {}

    #looping in all index key and get the result.
    lis.collect {|x| [x["hosp"],x["spec"],x["loc"],x["pspec"]]}.uniq.each do
    |index_key|

    lis_record = lis.select {|x| x["hosp"] == index_key[0] and x["spec"]
    == index_key[1] and x["loc"] == index_key[2] and x["pspec"] ==
    index_key[3]}
    gcrs_record = gcrs.select {|x| x["hosp"] == index_key[0] and x["spec"]
    == index_key[1] and x["loc"] == index_key[2] and x["pspec"] ==
    index_key[3]}
    lis_req_count = lis_record.inject(0) {|sum,n| sum + n["number"].to_i }
    gcrs_req_count = gcrs_record.inject(0) {|sum,n| sum + n["number"].to_i
    }

    if lis_record.collect {|x| x["pcat"]}.uniq.size == 1
    pcat = lis_record.first["pcat"]
    else
    pcat = find_correct_pcat(lis_record)
    end

    result[index_key] = [pcat, gcrs_req_count, lis_req_count]

    end

    Thanks again
    Valentino
    --
    Posted via http://www.ruby-forum.com/.
     
    Valentino Lun, Feb 17, 2009
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Marius Vollmer
    Replies:
    6
    Views:
    829
    Jim Lewis
    Dec 10, 2003
  2. Henrry Pires

    Stange and complicated

    Henrry Pires, Feb 7, 2006, in forum: ASP .Net
    Replies:
    3
    Views:
    385
    Karl Seguin [MVP]
    Feb 7, 2006
  3. JFizzR
    Replies:
    1
    Views:
    320
    Andrew Thompson
    Nov 30, 2003
  4. Mike
    Replies:
    3
    Views:
    765
  5. spike
    Replies:
    8
    Views:
    1,471
    Steve Holden
    Feb 9, 2010
Loading...

Share This Page