I need a more sophisticated string comparison method......

K

Keith

Hello,

I need a java method that will take two strings and compare them for
similarities. For instance if I have a string "avaritari" and
"avartari", I would like my java method to return a score of some sort
indicating how similar they are.... I provide a simple example, but I
would hope the method could take on more complex string
relationships...

Can anyone point me in the right direction?

Keith
 
R

Roland

Hello,

I need a java method that will take two strings and compare them for
similarities. For instance if I have a string "avaritari" and
"avartari", I would like my java method to return a score of some sort
indicating how similar they are.... I provide a simple example, but I
would hope the method could take on more complex string
relationships...

Can anyone point me in the right direction?

Keith
The soundex or metaphone algorithms may provide what you need. The
Apache Commons Codec project offers an implementation of both.
<http://jakarta.apache.org/commons/codec/>
<http://jakarta.apache.org/commons/c...e/commons/codec/language/package-summary.html>
<http://en.wikipedia.org/wiki/Soundex>
<http://en.wikipedia.org/wiki/Metaphone>
--
Regards,

Roland de Ruiter
` ___ ___
`/__/ w_/ /__/
/ \ /_/ / \
 
J

Joan

Keith said:
Hello,

I need a java method that will take two strings and compare
them for
similarities. For instance if I have a string "avaritari" and
"avartari", I would like my java method to return a score of
some sort
indicating how similar they are.... I provide a simple example,
but I
would hope the method could take on more complex string
relationships...

Can anyone point me in the right direction?

Keith

http://www.psychcrawler.com/plweb/info/help/fuzadv.html
 
N

Nicholas Clarke

I need a java method that will take two strings and compare them for
similarities. For instance if I have a string "avaritari" and
"avartari", I would like my java method to return a score of some sort
indicating how similar they are.... I provide a simple example, but I
would hope the method could take on more complex string
relationships...

Can anyone point me in the right direction?

You could look for a description of the LCS algorithm.

-Nicholas
 
R

Roedy Green

I need a java method that will take two strings and compare them for
similarities. For instance if I have a string "avaritari" and
"avartari", I would like my java method to return a score of some sort
indicating how similar they are.... I provide a simple example, but I
would hope the method could take on more complex string
relationships...

One is Knuth's soundex. see http://mindprod.com/jgloss/soundex.

Another would be to measure distance.

e.g. permutation of adjacent letters counts 2 points
wrong letter counts 2 points
dropped letter counts 2 points
excess letter counts 2 points
1 point if same when chop standard suffixes from both.


You figure out he least points way to transform A into B to give the
distance.


--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 
E

Eric Sosman

Roedy said:

"Knuth's" Soundex? The method was invented by Margaret
Odell and Robert Russell, and was granted a patent twenty
years before Knuth was born (if we're thinking of the same
Knuth).

Also, the O.P. will probably not find Soundex very
helpful. The two example strings he gives (which he clearly
considers different, albeit similar) would both generate
the Soundex code A163. Some other strings that produce A163
are avert, affordable, apart, abortion ... Something tells
me he's looking for a similarity measure with a little
finer resolution.

There have been a couple of posts recently about a
similarity measure called the Levenshtein distance, which
is more along the lines of the count-the-modifications plan
Roedy goes on to suggest. It might be worth a look.
 
G

googmeister

Keith said:
Hello,

I need a java method that will take two strings and compare them for
similarities. For instance if I have a string "avaritari" and
"avartari", I would like my java method to return a score of some sort
indicating how similar they are.... I provide a simple example, but I
would hope the method could take on more complex string
relationships...

Can anyone point me in the right direction?

Edit distance might be want you want. Here distance is measured
in terms of gaps you need to insert and the number of mismatched
characters. You can adjust the penalties for aligning certain
pairs of characters, e.g., b-v has a low penalty, while e-x has
a high one.
 
L

Lee Weiner

Hello,

I need a java method that will take two strings and compare them for
similarities. For instance if I have a string "avaritari" and
"avartari", I would like my java method to return a score of some sort
indicating how similar they are.... I provide a simple example, but I
would hope the method could take on more complex string
relationships...

Can anyone point me in the right direction?

Keith

What your looking for is the Levenstein Distance. Other posters have made
reference to the "edit distance". What Levenstein did was create a
mathematical model that can be programmed to determine the edit distance. In
general, the numeric output represents the number of changes (additions,
deletions and substitutions) that have to be made to change one string into
another string. About two years ago, I assigned this as a project to my
second semester programming students as an exercise in using a two-dimension
array. Google for "Levenstein Distance". You'll find what you need.

Lee Weiner
lee AT leeweiner DOT org
 
E

Ezee

Apart from Edit distance, you can use another method "n-gram based
comparisons", I recently implemented a string comparison technique for
my semester project.
Its simple, you can generate bi or tri-grams from both strings. like in
your case,
avartari = ava var art rta ....
avarti = ava var art rti ...

Put these tri-grams into Sets and take the Intersection of the two. For
Sets, you can use Java Set Interface. Hope it helps.

Ezee
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top