Gathering ngrams with the highest probability

Discussion in 'Ruby' started by Minkoo Seo, Apr 2, 2006.

  1. Minkoo Seo

    Minkoo Seo Guest

    Hi group.

    I'm writing some scientific applications with Ruby, and found a
    frequent problem that I want to solve with Ruby.

    I got tons of instances of NGram whose definition is as follows:

    NGram = Struct.new :seq, :prob

    I have a list of instances of NGram like:

    ....
    #<struct NGram seq=["AO", "S"], prob=-139918.174804688>
    #<struct NGram seq=["AY", "T"], prob=-46389.6875>
    #<struct NGram seq=["HH", "IH"], prob=18983.1796875>
    #<struct NGram seq=["OW", "Z", "AH"], prob=-326323.640625>
    #<struct NGram seq=["OW", "Z", "AH"], prob=-35945.25>
    #<struct NGram seq=["T", "AH", "L"], prob=20778.7421875>
    #<struct NGram seq=["HH", "IH", "S"], prob=37747.3046875>
    #<struct NGram seq=["IH", "S", "T"], prob=-17305.6640625>
    #<struct NGram seq=["IH", "S", "T"], prob=-17477.390625>
    #<struct NGram seq=["IH", "S", "T"], prob=34243.34375>
    #<struct NGram seq=["IH", "S", "T"], prob=-2125.265625>
    #<struct NGram seq=["IH", "S", "T"], prob=-9046.7890625>
    #<struct NGram seq=["IH", "S", "T"], prob=-18200.265625>
    #<struct NGram seq=["K", "L", "AH"], prob=-110206.140625>
    #<struct NGram seq=["K", "L", "AH"], prob=-92664.984375>
    ....

    What I want to derive from this data is the list of NGram instances
    each of which is unique with regard to seq. At the same time, the prob
    of each ngram in the list must be that of the highest prob.

    For example, from the ngram list I've shown above, I want to derive a
    list like the folloing:

    ....
    #<struct NGram seq=["AO", "S"], prob=-139918.174804688>
    #<struct NGram seq=["AY", "T"], prob=-46389.6875>
    #<struct NGram seq=["HH", "IH"], prob=18983.1796875>
    #<struct NGram seq=["OW", "Z", "AH"], prob=-35945.25>
    #<struct NGram seq=["T", "AH", "L"], prob=20778.7421875>
    #<struct NGram seq=["HH", "IH", "S"], prob=37747.3046875>
    #<struct NGram seq=["K", "L", "AH"], prob=-92664.984375>
    ....

    What I've written so far is

    # Sort by prob in descending order
    ngrams.sort_by { |ngram|

    # Compare seq

    # Then, compare prob
    }

    result = []

    # Collect unique ngrams with the highest prob.
    ngrams.inject(nil) { |prev, cur|
    if prev.nil?
    result << cur
    prev = cur
    elsif prev.seq != cur.seq
    result << cur
    prev = cur
    end
    }

    return result

    And it does not seem to be good even to me. Not to mention unwritten
    sort_by block, I used result = [] statement which might be get rid of.

    Any idea for better code?

    Sincerely,
    Minkoo Seo
     
    Minkoo Seo, Apr 2, 2006
    #1
    1. Advertising

  2. Minkoo Seo

    Robert Feldt Guest

    On 4/2/06, Minkoo Seo <> wrote:
    > Hi group.
    >
    > I'm writing some scientific applications with Ruby, and found a
    > frequent problem that I want to solve with Ruby.
    >
    > I got tons of instances of NGram whose definition is as follows:
    >
    > NGram =3D Struct.new :seq, :prob
    >
    > I have a list of instances of NGram like:
    >
    > ....
    > #<struct NGram seq=3D["AO", "S"], prob=3D-139918.174804688>
    > #<struct NGram seq=3D["AY", "T"], prob=3D-46389.6875>
    > #<struct NGram seq=3D["HH", "IH"], prob=3D18983.1796875>
    > #<struct NGram seq=3D["OW", "Z", "AH"], prob=3D-326323.640625>
    > #<struct NGram seq=3D["OW", "Z", "AH"], prob=3D-35945.25>
    > #<struct NGram seq=3D["T", "AH", "L"], prob=3D20778.7421875>
    > #<struct NGram seq=3D["HH", "IH", "S"], prob=3D37747.3046875>
    > #<struct NGram seq=3D["IH", "S", "T"], prob=3D-17305.6640625>
    > #<struct NGram seq=3D["IH", "S", "T"], prob=3D-17477.390625>
    > #<struct NGram seq=3D["IH", "S", "T"], prob=3D34243.34375>
    > #<struct NGram seq=3D["IH", "S", "T"], prob=3D-2125.265625>
    > #<struct NGram seq=3D["IH", "S", "T"], prob=3D-9046.7890625>
    > #<struct NGram seq=3D["IH", "S", "T"], prob=3D-18200.265625>
    > #<struct NGram seq=3D["K", "L", "AH"], prob=3D-110206.140625>
    > #<struct NGram seq=3D["K", "L", "AH"], prob=3D-92664.984375>
    > ....
    >
    > What I want to derive from this data is the list of NGram instances
    > each of which is unique with regard to seq. At the same time, the prob
    > of each ngram in the list must be that of the highest prob.
    >
    > For example, from the ngram list I've shown above, I want to derive a
    > list like the folloing:
    >
    > ....
    > #<struct NGram seq=3D["AO", "S"], prob=3D-139918.174804688>
    > #<struct NGram seq=3D["AY", "T"], prob=3D-46389.6875>
    > #<struct NGram seq=3D["HH", "IH"], prob=3D18983.1796875>
    > #<struct NGram seq=3D["OW", "Z", "AH"], prob=3D-35945.25>
    > #<struct NGram seq=3D["T", "AH", "L"], prob=3D20778.7421875>
    > #<struct NGram seq=3D["HH", "IH", "S"], prob=3D37747.3046875>
    > #<struct NGram seq=3D["K", "L", "AH"], prob=3D-92664.984375>
    > ....
    >
    > What I've written so far is
    >
    > # Sort by prob in descending order
    > ngrams.sort_by { |ngram|
    >
    > # Compare seq
    >
    > # Then, compare prob
    > }
    >
    > result =3D []
    >
    > # Collect unique ngrams with the highest prob.
    > ngrams.inject(nil) { |prev, cur|
    > if prev.nil?
    > result << cur
    > prev =3D cur
    > elsif prev.seq !=3D cur.seq
    > result << cur
    > prev =3D cur
    > end
    > }
    >
    > return result
    >

    ngrams.inject({}) do |highest, ngram|
    seq =3D ngram.seq
    best_now =3D highest[seq]
    highest[seq] =3D ngram unless (best_now && best_now.prob > ngram.prob)
    highest
    end.values

    /RF
     
    Robert Feldt, Apr 2, 2006
    #2
    1. Advertising

  3. ngrams.inject({}) do |table, ngram|
    if old = table[ngram.seq]
    table[ngram.seq] = ngram if ngram.prob > old.prob
    else
    table[ngram.seq] = ngram
    end
    table
    end
    --
    Sylvain Joyeux
     
    Sylvain Joyeux, Apr 2, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sparky Arbuckle

    Return Highest Record (SQL - Access)

    Sparky Arbuckle, Aug 17, 2005, in forum: ASP .Net
    Replies:
    4
    Views:
    3,125
    Paul Clement
    Aug 18, 2005
  2. Stone
    Replies:
    0
    Views:
    362
    Stone
    Nov 24, 2004
  3. NoKetch

    Rounding to next highest number?

    NoKetch, Dec 15, 2003, in forum: C Programming
    Replies:
    7
    Views:
    584
    Mark McIntyre
    Dec 15, 2003
  4. Jaspreet

    Second Highest number in an array

    Jaspreet, Sep 23, 2005, in forum: C Programming
    Replies:
    21
    Views:
    1,378
    Keith Thompson
    Feb 1, 2006
  5. Muhammad Adeel
    Replies:
    2
    Views:
    325
    Muhammad Adeel
    Aug 6, 2010
Loading...

Share This Page