n-gram based & edit distance based comparisons

Discussion in 'Java' started by Ezee, Jul 26, 2005.

  1. Ezee

    Ezee Guest

    Hi

    Can anybody please suggest me any useful idea about how to perform
    n-gram based comparisons and edit-distance based comparisons between
    words.(Words are strings and they are in Vectors).

    Thanx in anticipation :)
    Ezee, Jul 26, 2005
    #1
    1. Advertising

  2. Ezee

    Harald Guest

    "Ezee" <> writes:

    > Can anybody please suggest me any useful idea about how to perform
    > n-gram based comparisons and edit-distance based comparisons between
    > words.(Words are strings and they are in Vectors).


    n-gram: use java.util.Set to build the intersection of n-grams.

    Your question is not exactly specific. Are you looking for code
    examples, pointers to algorithm descriptions, class libraries, ...?

    Harald.

    --
    ---------------------+---------------------------------------------
    Harald Kirsch (@home)|
    Java Text Crunching: http://www.ebi.ac.uk/Rebholz-srv/whatizit/software
    Harald, Jul 26, 2005
    #2
    1. Advertising

  3. Ezee

    Ezee Guest

    Harald wrote:

    > Your question is not exactly specific. Are you looking for code
    > examples, pointers to algorithm descriptions, class libraries, ...?


    I am looking for code examples. Actually I have to perform comparison
    b/w two words e.g peace & piece on base of n-grams. i-e if we consider
    all 3-grams of these two words (peace = pea pec pee eac eae ace) and
    (piece = pie pic pie iec ice). So that, even if these two words are not
    exactly similar, but if compared on basis of n-grams, then they are
    similar to some extent, and this degree of similarity is to be
    calcultaed. (may be in %age, like piece is 60% similar to peace). I am
    not sure if I am right in calling this n-gram based comparison...

    Ezee
    --
    You can't run away forever,
    But there's nothing wrong with getting a good head start.
    Ezee, Jul 27, 2005
    #3
  4. Hi,

    Ezee wrote:
    > I am looking for code examples. Actually I have to perform comparison
    > b/w two words e.g peace & piece on base of n-grams. i-e if we consider
    > all 3-grams of these two words (peace = pea pec pee eac eae ace) and
    > (piece = pie pic pie iec ice). ... I am
    > not sure if I am right in calling this n-gram based comparison...


    AFAIK, this is only called NGram, if you take neighboured letters (for N=3):

    piece=pie iec ece
    peace=pea esc ace

    (In this special case, piece and peace are not similar at all.)

    As for your question: Sorry, I don't have source for that, and I think,
    there is no official standard package for that, but it should be easy to
    implement or to google for "java" and "ngram".

    Ciao,
    Ingo
    Ingo R. Homann, Jul 27, 2005
    #4
  5. Ezee

    HK Guest

    Ezee wrote:
    > Harald wrote:
    >
    > > Your question is not exactly specific. Are you looking for code
    > > examples, pointers to algorithm descriptions, class libraries, ...?

    >
    > I am looking for code examples. Actually I have to perform comparison
    > b/w two words e.g peace & piece on base of n-grams. i-e if we consider
    > all 3-grams of these two words (peace = pea pec pee eac eae ace) and
    > (piece = pie pic pie iec ice). So that, even if these two words are not
    > exactly similar, but if compared on basis of n-grams, then they are
    > similar to some extent, and this degree of similarity is to be
    > calcultaed. (may be in %age, like piece is 60% similar to peace). I am
    > not sure if I am right in calling this n-gram based comparison...


    Well, then at least for the n-gram approach my previous
    comment is just right. Foreach word, create its n-grams
    and put them in a Set. Then use set intersection to find
    the common ones. Count them to get a ranking. If you
    want, you give weights to different n-grams depending
    on how often you find them in the unique words of
    a corpus. More frequent means they should be less
    decisive.

    See, for example http://www.cs.ualberta.ca/~lindek/papers/sim.pdf

    Harald.
    HK, Jul 27, 2005
    #5
  6. Ezee

    Ezee Guest

    Thanks for the help. I am gonna try it & if I need some help, perhaps I
    will bother you again :).
    Ezee
    Ezee, Jul 27, 2005
    #6
  7. Ezee

    Guest

    Ezee,

    Search on sourceforge for this. If not found, add to appropriate
    project or create a new project.

    Tim
    , Jul 27, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. San Diego Guy
    Replies:
    0
    Views:
    534
    San Diego Guy
    Aug 7, 2003
  2. Rogan Dawes

    edit distance algorithms

    Rogan Dawes, Jun 7, 2005, in forum: Java
    Replies:
    7
    Views:
    771
    Robert Maas, see http://tinyurl.com/uh3t
    Jun 11, 2005
  3. rembremading
    Replies:
    5
    Views:
    305
    Tim Prince
    Feb 19, 2009
  4. varun
    Replies:
    0
    Views:
    292
    varun
    Jul 28, 2009
  5. Perry Smith

    Looking for gram.y

    Perry Smith, Jul 31, 2007, in forum: Ruby
    Replies:
    1
    Views:
    82
    Jano Svitok
    Aug 1, 2007
Loading...

Share This Page