Matching Directory Names and Grouping Them

Discussion in 'Python' started by J, Jan 11, 2007.

  1. J

    J Guest

    Hello Group-

    I have limited programming experience, but I'm looking for a generic
    way to search through a root directory for subdirectories with similar
    names, organize and group them by matching their subdirectory path, and
    then output their full paths into a text file. For example, the
    contents of the output text file may look like this:

    <root>\Input1\2001\01\
    <root>\Input2\2001\01\
    <root>\Input3\2001\01\

    <root>\Input1\2002\03\
    <root>\Input2\2002\03\
    <root>\Input3\2002\03\

    <root>\Input2\2005\05\
    <root>\Input3\2005\05\

    <root>\Input1\2005\12\
    <root>\Input3\2005\12\

    I tried working with python regular expressions, but so far haven't
    found code that can do the trick. Any help would be greatly
    appreciated. Thanks!


    J.
     
    J, Jan 11, 2007
    #1
    1. Advertising

  2. J

    Steve Holden Guest

    J wrote:
    > Hello Group-
    >
    > I have limited programming experience, but I'm looking for a generic
    > way to search through a root directory for subdirectories with similar
    > names, organize and group them by matching their subdirectory path, and
    > then output their full paths into a text file. For example, the
    > contents of the output text file may look like this:
    >
    > <root>\Input1\2001\01\
    > <root>\Input2\2001\01\
    > <root>\Input3\2001\01\
    >
    > <root>\Input1\2002\03\
    > <root>\Input2\2002\03\
    > <root>\Input3\2002\03\
    >
    > <root>\Input2\2005\05\
    > <root>\Input3\2005\05\
    >
    > <root>\Input1\2005\12\
    > <root>\Input3\2005\12\
    >
    > I tried working with python regular expressions, but so far haven't
    > found code that can do the trick. Any help would be greatly
    > appreciated. Thanks!
    >

    Define "similar".

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden
    Blog of Note: http://holdenweb.blogspot.com
     
    Steve Holden, Jan 11, 2007
    #2
    1. Advertising

  3. J

    J Guest

    Steve-

    Thanks for the reply. I think what I'm trying to say by similar is
    pattern matching. Essentially, walking through a directory tree
    starting at a specified root folder, and returning a list of all
    folders that matches a pattern, in this case, a folder name containing
    a four digit number representing year and a subdirectory name
    containing a two digit number representing a month. The matches are
    grouped together and written into a text file. I hope this helps.

    Kind Regards,
    J

    Steve Holden wrote:
    > J wrote:
    > > Hello Group-
    > >
    > > I have limited programming experience, but I'm looking for a generic
    > > way to search through a root directory for subdirectories with similar
    > > names, organize and group them by matching their subdirectory path, and
    > > then output their full paths into a text file. For example, the
    > > contents of the output text file may look like this:
    > >
    > > <root>\Input1\2001\01\
    > > <root>\Input2\2001\01\
    > > <root>\Input3\2001\01\
    > >
    > > <root>\Input1\2002\03\
    > > <root>\Input2\2002\03\
    > > <root>\Input3\2002\03\
    > >
    > > <root>\Input2\2005\05\
    > > <root>\Input3\2005\05\
    > >
    > > <root>\Input1\2005\12\
    > > <root>\Input3\2005\12\
    > >
    > > I tried working with python regular expressions, but so far haven't
    > > found code that can do the trick. Any help would be greatly
    > > appreciated. Thanks!
    > >

    > Define "similar".
    >
    > regards
    > Steve
    > --
    > Steve Holden +44 150 684 7255 +1 800 494 3119
    > Holden Web LLC/Ltd http://www.holdenweb.com
    > Skype: holdenweb http://del.icio.us/steve.holden
    > Blog of Note: http://holdenweb.blogspot.com
     
    J, Jan 11, 2007
    #3
  4. >From your example, if you want to group every path that has the same
    last 9 characters, a simple solution could be something like:

    groups = {}
    for path in paths:
    group = groups.setdefault(path[-9:],[])
    group.append(path)

    I didn't actually test it, there ight be syntax errors.

    J wrote:
    > Steve-
    >
    > Thanks for the reply. I think what I'm trying to say by similar is
    > pattern matching. Essentially, walking through a directory tree
    > starting at a specified root folder, and returning a list of all
    > folders that matches a pattern, in this case, a folder name containing
    > a four digit number representing year and a subdirectory name
    > containing a two digit number representing a month. The matches are
    > grouped together and written into a text file. I hope this helps.
    >
    > Kind Regards,
    > J
    >
    > Steve Holden wrote:
    > > J wrote:
    > > > Hello Group-
    > > >
    > > > I have limited programming experience, but I'm looking for a generic
    > > > way to search through a root directory for subdirectories with similar
    > > > names, organize and group them by matching their subdirectory path, and
    > > > then output their full paths into a text file. For example, the
    > > > contents of the output text file may look like this:
    > > >
    > > > <root>\Input1\2001\01\
    > > > <root>\Input2\2001\01\
    > > > <root>\Input3\2001\01\
    > > >
    > > > <root>\Input1\2002\03\
    > > > <root>\Input2\2002\03\
    > > > <root>\Input3\2002\03\
    > > >
    > > > <root>\Input2\2005\05\
    > > > <root>\Input3\2005\05\
    > > >
    > > > <root>\Input1\2005\12\
    > > > <root>\Input3\2005\12\
    > > >
    > > > I tried working with python regular expressions, but so far haven't
    > > > found code that can do the trick. Any help would be greatly
    > > > appreciated. Thanks!
    > > >

    > > Define "similar".
    > >
    > > regards
    > > Steve
    > > --
    > > Steve Holden +44 150 684 7255 +1 800 494 3119
    > > Holden Web LLC/Ltd http://www.holdenweb.com
    > > Skype: holdenweb http://del.icio.us/steve.holden
    > > Blog of Note: http://holdenweb.blogspot.com
     
    Virgil Dupras, Jan 12, 2007
    #4
  5. J

    Neil Cerutti Guest

    On 2007-01-11, J <> wrote:
    > Steve-
    >
    > Thanks for the reply. I think what I'm trying to say by similar
    > is pattern matching. Essentially, walking through a directory
    > tree starting at a specified root folder, and returning a list
    > of all folders that matches a pattern, in this case, a folder
    > name containing a four digit number representing year and a
    > subdirectory name containing a two digit number representing a
    > month. The matches are grouped together and written into a text
    > file. I hope this helps.


    Here's a solution using itertools.groupby, just because this is
    the first programming problem I've seen that seemed to call for
    it. Hooray!

    from itertools import groupby

    def print_by_date(dirs):
    r""" Group a directory list according to date codes.

    >>> data = [

    ... "<root>/Input2/2002/03/",
    ... "<root>/Input1/2001/01/",
    ... "<root>/Input3/2005/05/",
    ... "<root>/Input3/2001/01/",
    ... "<root>/Input1/2002/03/",
    ... "<root>/Input3/2005/12/",
    ... "<root>/Input2/2001/01/",
    ... "<root>/Input3/2002/03/",
    ... "<root>/Input2/2005/05/",
    ... "<root>/Input1/2005/12/"]
    >>> print_by_date(data)

    <root>/Input1/2001/01/
    <root>/Input2/2001/01/
    <root>/Input3/2001/01/
    <BLANKLINE>
    <root>/Input1/2002/03/
    <root>/Input2/2002/03/
    <root>/Input3/2002/03/
    <BLANKLINE>
    <root>/Input2/2005/05/
    <root>/Input3/2005/05/
    <BLANKLINE>
    <root>/Input1/2005/12/
    <root>/Input3/2005/12/
    <BLANKLINE>

    """
    def date_key(path):
    return path[-7:]
    groups = [list(g) for _,g in groupby(sorted(dirs, key=date_key), date_key)]
    for g in groups:
    print '\n'.join(path for path in sorted(g))
    print

    if __name__ == "__main__":
    import doctest
    doctest.testmod()

    I really wanted nested join calls for the output, to suppress
    that trailing blank line, but I kept getting confused and
    couldn't sort it out.

    It would better to use the os.path module, but I couldn't find
    the function in there lets me pull out path tails.

    I didn't filter out stuff that didn't match the date path
    convention you used.

    --
    Neil Cerutti
     
    Neil Cerutti, Jan 12, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Carl
    Replies:
    0
    Views:
    526
  2. Anonieko

    HttpHandlers - Learn Them. Use Them.

    Anonieko, Jun 15, 2006, in forum: ASP .Net
    Replies:
    5
    Views:
    525
    tdavisjr
    Jun 16, 2006
  3. fBechmann
    Replies:
    0
    Views:
    403
    fBechmann
    Jun 10, 2004
  4. Max Williams

    Grouping results by similar names

    Max Williams, Apr 22, 2008, in forum: Ruby
    Replies:
    1
    Views:
    126
    Joachim (M√ľnchen)
    Apr 23, 2008
  5. Wang, Jay
    Replies:
    9
    Views:
    233
Loading...

Share This Page