Matching filenames with typos

Discussion in 'Perl' started by Peter v.d. Berger, Dec 4, 2006.

  1. Hello,

    I'm working on a script that can place results of soccergames from different
    seasons in a row, to see the history of the game.
    I've gattered a lot of scores from different websites on a FreeBSD
    webserver. The scores are all placed in a directory with the season as name,
    and the names of the team as the filename.
    So for example results of the game 'AC Milan - Ajax' are in different files
    for different seasons:

    ../0405/AC Milan - Ajax.txt
    ../0304/AC Milan - Ajax.txt
    ../0203/AC Milan - Ajax.txt
    (team names seperated with '-')

    My script creates an HTML-page with an overview of the results of al
    seasons.
    The problem is that I gathered the names of the teams for the results from
    different websites, and some websites will use 'AC Milan', others just
    'Milan'
    Some websites use the name 'Ajax', others 'Ajax FC', others 'Ajax
    Amsterdam'.
    Since I gathered results of hundreds of teams, in tenthousands of results,
    renaming all the files is not an option.
    Is there a way to improve the matching of these files, with the knowledge
    that:

    - two or three character strings can be left out (like FC, Utd.)
    - make a match when, for example, two out of three names in the filename
    match
    (like: the game 'name1 name2 - name3' matches both 'name1 - name 3', and
    'name2 - name3')

    I hope i could make my question clear, and someone can help me.

    Thanks!
    Peter v.d. Berger, Dec 4, 2006
    #1
    1. Advertising

  2. Peter v.d. Berger

    Jim Gibson Guest

    In article <4574a474$0$333$4all.nl>, Peter v.d. Berger
    <> wrote:

    > Hello,
    >
    > I'm working on a script that can place results of soccergames from different
    > seasons in a row, to see the history of the game.
    > I've gattered a lot of scores from different websites on a FreeBSD
    > webserver. The scores are all placed in a directory with the season as name,
    > and the names of the team as the filename.
    > So for example results of the game 'AC Milan - Ajax' are in different files
    > for different seasons:
    >
    > ./0405/AC Milan - Ajax.txt
    > ./0304/AC Milan - Ajax.txt
    > ./0203/AC Milan - Ajax.txt
    > (team names seperated with '-')
    >
    > My script creates an HTML-page with an overview of the results of al
    > seasons.
    > The problem is that I gathered the names of the teams for the results from
    > different websites, and some websites will use 'AC Milan', others just
    > 'Milan'
    > Some websites use the name 'Ajax', others 'Ajax FC', others 'Ajax
    > Amsterdam'.
    > Since I gathered results of hundreds of teams, in tenthousands of results,
    > renaming all the files is not an option.
    > Is there a way to improve the matching of these files, with the knowledge
    > that:
    >
    > - two or three character strings can be left out (like FC, Utd.)
    > - make a match when, for example, two out of three names in the filename
    > match
    > (like: the game 'name1 name2 - name3' matches both 'name1 - name 3', and
    > 'name2 - name3')
    >
    > I hope i could make my question clear, and someone can help me.


    Create an array of unique team names and use a regular expression to
    test if each name occurs in the file name. Generate a new name that
    contains the two team names and either use that name as a key or rename
    the old file to the new name. Example (untested):

    my $name = 'AC Milan - Ajax FC';
    my @teams = qw( Ajax Milan );

    my $newname = '';
    for my $team ( @teams ) {
    if( $name =~ /$team/i ) {
    $newname .= $team;
    }
    }
    print "New name is '$newname'\n";

    should produce

    New name is 'AjaxMilan'

    FYI: this newsgroup is defunct. Try comp.lang.perl.misc in the future.

    Posted Via Usenet.com Premium Usenet Newsgroup Services
    ----------------------------------------------------------
    ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
    ----------------------------------------------------------
    http://www.usenet.com
    Jim Gibson, Dec 5, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dagoodyear

    Parsing text acounting for typos?

    dagoodyear, Jun 12, 2005, in forum: Java
    Replies:
    1
    Views:
    374
    Harald
    Jun 12, 2005
  2. B.J.
    Replies:
    4
    Views:
    741
    Toby Inkster
    Apr 23, 2005
  3. Siemel Naran

    typos in set functions

    Siemel Naran, Nov 30, 2004, in forum: C++
    Replies:
    5
    Views:
    368
    Siemel Naran
    Dec 2, 2004
  4. Manish Jethani

    How to detect typos in Python programs

    Manish Jethani, Jul 25, 2003, in forum: Python
    Replies:
    15
    Views:
    1,607
    David Bolen
    Jul 29, 2003
  5. Bob Gailer
    Replies:
    2
    Views:
    416
    Bengt Richter
    Jul 26, 2003
Loading...

Share This Page