Re: Arrange files according to a text file

Discussion in 'Python' started by Emile van Sebille, Aug 27, 2011.

  1. On 8/27/2011 10:03 AM said...
    > Hello,
    >
    > What would be the best way to accomplish this task?


    I'd do something like:


    usernames = """Adler, Jack
    Smith, John
    Smith, Sally
    Stone, Mark""".split('\n')

    filenames = """Smith, John - 02-15-75 - business files.doc
    Random Data - Adler Jack - expenses.xls
    More Data Mark Stone files list.doc""".split('\n')

    from difflib import SequenceMatcher as SM


    def ignore(x):
    return x in ' ,.'


    for filename in filenames:
    ratios = [SM(ignore,filename,username).ratio() for username in
    usernames]
    best = max(ratios)
    owner = usernames[ratios.index(best)]
    print filename,":",owner


    Emile



    > I have many files in separate directories, each file name
    > contain a persons name but never in the same spot.
    > I need to find that name which is listed in a large
    > text file in the following format. Last name, comma
    > and First name. The last name could be duplicate.
    >
    > Adler, Jack
    > Smith, John
    > Smith, Sally
    > Stone, Mark
    > etc.
    >
    >
    > The file names don't necessary follow any standard
    > format.
    >
    > Smith, John - 02-15-75 - business files.doc
    > Random Data - Adler Jack - expenses.xls
    > More Data Mark Stone files list.doc
    > etc
    >
    > I need some way to pull the name from the file name, find it in the
    > text list and then create a directory based on the name on the list
    > "Smith, John" and move all files named with the clients name into that
    > directory.
    Emile van Sebille, Aug 27, 2011
    #1
    1. Advertising

  2. On 8/27/2011 1:15 PM said...
    >
    > Hello Emile ,
    >
    > Thank you for the code below as I have not encountered SequenceMatcher
    > before and would have to take a look at it closer.
    >
    > My question would it work for a text file list of names about 25k
    > lines and a directory with say 100 files inside?


    Sure.

    Emile


    >
    > Thank you once again.
    >
    >
    > On Sat, 27 Aug 2011 11:06:22 -0700, Emile van Sebille<>
    > wrote:
    >
    >> On 8/27/2011 10:03 AM said...
    >>> Hello,
    >>>
    >>> What would be the best way to accomplish this task?

    >>
    >> I'd do something like:
    >>
    >>
    >> usernames = """Adler, Jack
    >> Smith, John
    >> Smith, Sally
    >> Stone, Mark""".split('\n')
    >>
    >> filenames = """Smith, John - 02-15-75 - business files.doc
    >> Random Data - Adler Jack - expenses.xls
    >> More Data Mark Stone files list.doc""".split('\n')
    >>
    >>from difflib import SequenceMatcher as SM
    >>
    >>
    >> def ignore(x):
    >> return x in ' ,.'
    >>
    >>
    >> for filename in filenames:
    >> ratios = [SM(ignore,filename,username).ratio() for username in
    >> usernames]
    >> best = max(ratios)
    >> owner = usernames[ratios.index(best)]
    >> print filename,":",owner
    >>
    >>
    >> Emile
    >>
    >>
    >>
    >>> I have many files in separate directories, each file name
    >>> contain a persons name but never in the same spot.
    >>> I need to find that name which is listed in a large
    >>> text file in the following format. Last name, comma
    >>> and First name. The last name could be duplicate.
    >>>
    >>> Adler, Jack
    >>> Smith, John
    >>> Smith, Sally
    >>> Stone, Mark
    >>> etc.
    >>>
    >>>
    >>> The file names don't necessary follow any standard
    >>> format.
    >>>
    >>> Smith, John - 02-15-75 - business files.doc
    >>> Random Data - Adler Jack - expenses.xls
    >>> More Data Mark Stone files list.doc
    >>> etc
    >>>
    >>> I need some way to pull the name from the file name, find it in the
    >>> text list and then create a directory based on the name on the list
    >>> "Smith, John" and move all files named with the clients name into that
    >>> directory.

    >>
    Emile van Sebille, Aug 27, 2011
    #2
    1. Advertising

  3. Emile van Sebille

    MRAB Guest

    On 28/08/2011 00:18, wrote:
    > Thank you so much. The code worked perfectly.
    >
    > This is what I tried using Emile code. The only time when it picked
    > wrong name from the list was when the file was named like this.
    >
    > Data Mark Stone.doc
    >
    > How can I fix this? Hope I am not asking too much?
    >

    Have you tried the alternative word orders, "Mark Stone" as well as
    "Stone, Mark", picking whichever name has the best ratio for either?
    >
    > import os
    > from difflib import SequenceMatcher as SM
    >
    > path = r'D:\Files '
    > txt_names = []
    >
    >
    > with open(r'D:/python/log1.txt') as f:
    > for txt_name in f.readlines():
    > txt_names.append(txt_name.strip())
    >
    > def ignore(x):
    > return x in ' ,.'
    >
    > for filename in os.listdir(path):
    > ratios = [SM(ignore,filename,txt_name).ratio() for txt_name in
    > txt_names]
    > best = max(ratios)
    > owner = txt_names[ratios.index(best)]
    > print filename,":",owner
    >
    >
    >
    >
    >
    > On Sat, 27 Aug 2011 14:08:17 -0700, Emile van Sebille<>
    > wrote:
    >
    >> On 8/27/2011 1:15 PM said...
    >>>
    >>> Hello Emile ,
    >>>
    >>> Thank you for the code below as I have not encountered SequenceMatcher
    >>> before and would have to take a look at it closer.
    >>>
    >>> My question would it work for a text file list of names about 25k
    >>> lines and a directory with say 100 files inside?

    >>
    >> Sure.
    >>
    >> Emile
    >>
    >>
    >>>
    >>> Thank you once again.
    >>>
    >>>
    >>> On Sat, 27 Aug 2011 11:06:22 -0700, Emile van Sebille<>
    >>> wrote:
    >>>
    >>>> On 8/27/2011 10:03 AM said...
    >>>>> Hello,
    >>>>>
    >>>>> What would be the best way to accomplish this task?
    >>>>
    >>>> I'd do something like:
    >>>>
    >>>>
    >>>> usernames = """Adler, Jack
    >>>> Smith, John
    >>>> Smith, Sally
    >>>> Stone, Mark""".split('\n')
    >>>>
    >>>> filenames = """Smith, John - 02-15-75 - business files.doc
    >>>> Random Data - Adler Jack - expenses.xls
    >>>> More Data Mark Stone files list.doc""".split('\n')
    >>>>
    >>> >from difflib import SequenceMatcher as SM
    >>>>
    >>>>
    >>>> def ignore(x):
    >>>> return x in ' ,.'
    >>>>
    >>>>
    >>>> for filename in filenames:
    >>>> ratios = [SM(ignore,filename,username).ratio() for username in
    >>>> usernames]
    >>>> best = max(ratios)
    >>>> owner = usernames[ratios.index(best)]
    >>>> print filename,":",owner
    >>>>
    >>>>
    >>>> Emile
    >>>>
    >>>>
    >>>>
    >>>>> I have many files in separate directories, each file name
    >>>>> contain a persons name but never in the same spot.
    >>>>> I need to find that name which is listed in a large
    >>>>> text file in the following format. Last name, comma
    >>>>> and First name. The last name could be duplicate.
    >>>>>
    >>>>> Adler, Jack
    >>>>> Smith, John
    >>>>> Smith, Sally
    >>>>> Stone, Mark
    >>>>> etc.
    >>>>>
    >>>>>
    >>>>> The file names don't necessary follow any standard
    >>>>> format.
    >>>>>
    >>>>> Smith, John - 02-15-75 - business files.doc
    >>>>> Random Data - Adler Jack - expenses.xls
    >>>>> More Data Mark Stone files list.doc
    >>>>> etc
    >>>>>
    >>>>> I need some way to pull the name from the file name, find it in the
    >>>>> text list and then create a directory based on the name on the list
    >>>>> "Smith, John" and move all files named with the clients name into that
    >>>>> directory.
    >>>>

    >>
    MRAB, Aug 28, 2011
    #3
  4. On 8/27/2011 4:18 PM said...
    > Thank you so much. The code worked perfectly.
    >
    > This is what I tried using Emile code. The only time when it picked
    > wrong name from the list was when the file was named like this.
    >
    > Data Mark Stone.doc
    >
    > How can I fix this? Hope I am not asking too much?


    What name did it pick? I imagine if you're picking a name from a list
    of 25000 names that some subset of combinations may yield like ratios.

    But, if you double up on the file name side you may get closer:

    for filename in filenames:
    ratios = [SM(ignore,filename+filename,username).ratio() for
    username in usernames]
    best = max(ratios)
    owner = usernames[ratios.index(best)]
    print filename,":",owner

    .... on the other hand, if you've only got a 100 files to sort out, you
    should already be done.

    :)

    Emile
    Emile van Sebille, Aug 28, 2011
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jack
    Replies:
    2
    Views:
    531
  2. Luca Fini

    help on how to arrange python code

    Luca Fini, Oct 18, 2003, in forum: Python
    Replies:
    0
    Views:
    410
    Luca Fini
    Oct 18, 2003
  3. PraVeeN

    Gridview columns re-arrange

    PraVeeN, Dec 8, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    378
    PraVeeN
    Dec 8, 2006
  4. Kent
    Replies:
    6
    Views:
    341
    Terry Reedy
    Mar 28, 2009
  5. James Harris

    How to arrange many files of C source code

    James Harris, Mar 2, 2013, in forum: C Programming
    Replies:
    22
    Views:
    567
    Jorgen Grahn
    Mar 14, 2013
Loading...

Share This Page