Re: String similarity

Discussion in 'Python' started by Tim Churches, Oct 10, 2003.

  1. Tim Churches

    Tim Churches Guest

    Luca Montecchiani <> wrote:
    >
    > Introduction
    > ------------
    > The need to find files that "resembled" in the name has pushed me to
    > write
    > an utility that unlike the other it was not based on the content of
    > the files
    > but on its name. Initially I start adding this functionality to
    > one "C" program for Unix called "fdupes" witch give me good
    > performance
    > and good precision.
    > The algorithm that I have chosen for the comparison between string
    > was
    > "Levenshtein Distance".


    This starts to look like a probabilistic record linkage (matching) problem. See the
    Febrl project at http://datamining.anu.edu.au/projects/linkage.html - amongst
    other things it contains a library of string comparators written in Python.

    Tim C
     
    Tim Churches, Oct 10, 2003
    #1
    1. Advertising

  2. Tim Churches wrote:

    > This starts to look like a probabilistic record linkage (matching) problem. See the
    > Febrl project at http://datamining.anu.edu.au/projects/linkage.html - amongst
    > other things it contains a library of string comparators written in Python.


    Thanks for the link, the stringcmp.py contains some cool code that I'll try later.
    Unfortunally I can't gain speed but another bunch of algo to improve results quality ;)

    ciao,
    luca
     
    Luca Montecchiani, Oct 10, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fabian Leitritz

    Document-Document similarity

    Fabian Leitritz, Jan 14, 2005, in forum: Java
    Replies:
    0
    Views:
    422
    Fabian Leitritz
    Jan 14, 2005
  2. =?iso-8859-1?B?bW9vcJk=?=

    What are the similarity and difference b/w EBJ and COM+?

    =?iso-8859-1?B?bW9vcJk=?=, May 30, 2006, in forum: Java
    Replies:
    1
    Views:
    428
    dimitar
    May 30, 2006
  3. Luca Montecchiani

    String similarity

    Luca Montecchiani, Oct 10, 2003, in forum: Python
    Replies:
    0
    Views:
    554
    Luca Montecchiani
    Oct 10, 2003
  4. Achim Domma

    string similarity in python

    Achim Domma, Nov 24, 2003, in forum: Python
    Replies:
    5
    Views:
    831
    Luca Montecchiani
    Nov 24, 2003
  5. Chris Chris
    Replies:
    3
    Views:
    116
    Dave Bass
    Jul 9, 2008
Loading...

Share This Page