Parsing problems: A journey from a text file to a directory tree

Discussion in 'Python' started by Martin M., Sep 16, 2007.

  1. Martin M.

    Martin M. Guest

    Hi everybody,

    Some of my colleagues want me to write a script for easy folder and
    subfolder creation on the Mac.

    The script is supposed to scan a text file containing directory trees
    in the following format:

    [New client]
    |-Invoices
    |-Offers
    |--Denied
    |--Accepted
    |-Delivery notes

    As you can see, the folder hierarchy is expressed by the amounts of
    minuses, each section header framed by brackets (like in Windows
    config files).

    After the scan process, the script is supposed to show a dialog, where
    the user can choose from the different sections (e.g. 'Alphabet',
    'Months', 'New client' etc.). Then the script will create the
    corresponding folder hierarchy in the currently selected folder (done
    via AppleScript).

    But currently I simply don't know how to parse these folder lists and
    how to save them in an array accordingly.

    First I thought of an array like this:

    dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
    'Accpeted': {}}, 'Delivery notes': {}}}

    But this doesn't do the trick, as I also have to save the hierarchy
    level of the current folder as well...

    Argh, I really don't get my head around this problem and I need your
    help. I have the feeling, that the answer is not that complicated, but
    I just don't get it right now...

    Your desperate,

    Martin
    Martin M., Sep 16, 2007
    #1
    1. Advertising

  2. Martin M.

    Neil Cerutti Guest

    On 2007-09-16, Martin M. <> wrote:
    > Hi everybody,
    >
    > Some of my colleagues want me to write a script for easy folder and
    > subfolder creation on the Mac.
    >
    > The script is supposed to scan a text file containing directory trees
    > in the following format:
    >
    > [New client]
    >|-Invoices
    >|-Offers
    >|--Denied
    >|--Accepted
    >|-Delivery notes


    Would it make sense to store it like this?

    [('New client',
    [('Invoices', []),
    ('Offers', [('Denied', []), ('Accepted', [])]),
    ('Delivery notes', [])]]

    > First I thought of an array like this:
    >
    > dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
    > 'Accpeted': {}}, 'Delivery notes': {}}}


    A dictionary approach is fine if it's OK for the directories to
    be unordered, which doesn't appear to be the case.

    > But this doesn't do the trick, as I also have to save the
    > hierarchy level of the current folder as well...


    The above does store the hierarchy, as the number of nesting
    levels.

    ditreedb['New Client']['Offers']['Denied']

    --
    Neil Cerutti
    Neil Cerutti, Sep 16, 2007
    #2
    1. Advertising

  3. Martin M.

    Larry Bates Guest

    Since you are going to need to do a dialog, I would use wxWindows tree
    control. It already knows how to do what you describe. Then you can
    just walk all the branches and create the folders.

    -Larry

    Martin M. wrote:
    > Hi everybody,
    >
    > Some of my colleagues want me to write a script for easy folder and
    > subfolder creation on the Mac.
    >
    > The script is supposed to scan a text file containing directory trees
    > in the following format:
    >
    > [New client]
    > |-Invoices
    > |-Offers
    > |--Denied
    > |--Accepted
    > |-Delivery notes
    >
    > As you can see, the folder hierarchy is expressed by the amounts of
    > minuses, each section header framed by brackets (like in Windows
    > config files).
    >
    > After the scan process, the script is supposed to show a dialog, where
    > the user can choose from the different sections (e.g. 'Alphabet',
    > 'Months', 'New client' etc.). Then the script will create the
    > corresponding folder hierarchy in the currently selected folder (done
    > via AppleScript).
    >
    > But currently I simply don't know how to parse these folder lists and
    > how to save them in an array accordingly.
    >
    > First I thought of an array like this:
    >
    > dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
    > 'Accpeted': {}}, 'Delivery notes': {}}}
    >
    > But this doesn't do the trick, as I also have to save the hierarchy
    > level of the current folder as well...
    >
    > Argh, I really don't get my head around this problem and I need your
    > help. I have the feeling, that the answer is not that complicated, but
    > I just don't get it right now...
    >
    > Your desperate,
    >
    > Martin
    >
    Larry Bates, Sep 17, 2007
    #3
  4. In article <>,
    "Martin M." <> wrote:

    > Hi everybody,
    >
    > Some of my colleagues want me to write a script for easy folder and
    > subfolder creation on the Mac.
    >
    > The script is supposed to scan a text file containing directory trees
    > in the following format:
    >
    > [New client]
    > |-Invoices
    > |-Offers
    > |--Denied
    > |--Accepted
    > |-Delivery notes
    >
    > As you can see, the folder hierarchy is expressed by the amounts of
    > minuses, each section header framed by brackets (like in Windows
    > config files).
    >
    > After the scan process, the script is supposed to show a dialog, where
    > the user can choose from the different sections (e.g. 'Alphabet',
    > 'Months', 'New client' etc.). Then the script will create the
    > corresponding folder hierarchy in the currently selected folder (done
    > via AppleScript).
    >
    > But currently I simply don't know how to parse these folder lists and
    > how to save them in an array accordingly.
    >
    > First I thought of an array like this:
    >
    > dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
    > 'Accpeted': {}}, 'Delivery notes': {}}}
    >
    > But this doesn't do the trick, as I also have to save the hierarchy
    > level of the current folder as well...
    >
    > Argh, I really don't get my head around this problem and I need your
    > help. I have the feeling, that the answer is not that complicated, but
    > I just don't get it right now...


    Hello, Martin,

    A good way to approach this problem is to recognize that each section of
    your proposed configuration represents a kind of depth-first traversal
    of the tree structure you propose to create. Thus, you can reconstruct
    the tree by keeping track at all times of the path from the "root" of
    the tree to the "current location" in the tree.

    Below is one possible implementation of this idea in Python. In short,
    the function keeps track of a stack of dictionaries, each of which
    represents the contents of some directory in your hierarchy. As you
    encounter "|--" lines, entries are pushed to or popped from the stack
    according to whether the nesting level has increased or decreased.

    This code is not heavily tested, but hopefully it should be clear:

    ..import re
    ..
    ..def parse_folders(input):
    .. """Read input from a file-like object that describes directory
    .. structures to be created. The input format is:
    ..
    .. [Top-level name]
    .. |-Subdirectory1
    .. |--SubSubDirectory1
    .. |--SubSubDirectory2
    .. |---SubSubSubDirectory1
    .. |-Subdirectory2
    .. |-Subdirectory3
    ..
    .. The input may consist of any number of such groups. The result is
    .. a dictionary structure in which each key names a directory, and
    .. the corresponding value is a dictionary structure showing the
    .. contents of that directory, possibly empty.
    .. """
    ..
    .. # This expression matches "header" lines, defining a new section.
    .. new_re = re.compile(r'\[([\w ]+)\]\s*$')
    ..
    .. # This expression matches "nesting" lines, defining subdirectories.
    .. more_re = re.compile(r'(\|-+)([\w ]+)$')
    ..
    .. out = {} # Root: Maps section names to subtrees.
    .. state = [out] # Stack of dictionaries, current path.
    ..
    .. for line in input:
    .. m = new_re.match(line)
    .. if m: # New section begins here...
    .. key = m.group(1).strip()
    .. out[key] = {}
    .. state = [out, out[key]]
    .. continue
    ..
    .. m = more_re.match(line)
    .. if m: # Add a directory to an existing section
    .. assert state
    ..
    .. new_level = len(m.group(1))
    .. key = m.group(2).strip()
    ..
    .. while new_level < len(state):
    .. state.pop()
    ..
    .. state[-1][key] = {}
    .. state.append(state[-1][key])
    ..
    .. return out

    To call this, pass a file-like object to parse_folders(), e.g.:

    test1 = '''
    [New client].
    |-Invoices
    |-Offers
    |--Denied
    |--Accepted
    |---Reasons
    |---Rhymes
    |-Delivery notes
    '''

    from StringIO import StringIO
    result = parse_folders(StringIO(test1))

    As the documentation suggests, the result is a nested dictionary
    structure, representing the folder structure you encoded. I hope this
    helps.

    Cheers,
    -M

    --
    Michael J. Fromberger | Lecturer, Dept. of Computer Science
    http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
    Michael J. Fromberger, Sep 18, 2007
    #4
  5. Martin M.

    John Machin Guest

    On Sep 19, 4:51 am, "Michael J. Fromberger"
    <> wrote:
    > .
    > . # This expression matches "header" lines, defining a new section.
    > . new_re = re.compile(r'\[([\w ]+)\]\s*$')


    Directory names can contain more different characters than those which
    match [\w ] ... and which ones depends on the OS; might as well just
    allow anything, and leave it to the OS to complain. Also consider
    using line.rstrip() (usually a handy precaution on ANY input text
    file) instead of having \s*$ at the end of your regex.

    > .
    > . while new_level < len(state):
    > . state.pop()


    Hmmm ... consider rewriting that as the slightly less obfuscatory

    while len(state) > new_level:
    state.pop()

    If you really want to make the reader slow down and think, try this:

    del state[new_level:]

    A warning message if there are too many "-" characters might be a good
    idea:

    [foo]
    |-bar
    |-zot
    |---plugh

    > .
    > . state[-1][key] = {}
    > . state.append(state[-1][key])
    > .


    And if the input line matches neither regex?

    > . return out
    >
    > To call this, pass a file-like object to parse_folders(), e.g.:
    >
    > test1 = '''
    > [New client].


    Won't work with the dot on the end.

    > Michael J. Fromberger | Lecturer, Dept. of Computer Science
    John Machin, Sep 18, 2007
    #5
  6. Hi, John,

    Your comments below are all reasonable. However, I would like to point
    out that the purpose of my example was to provide a demonstration of an
    algorithm, not an industrial-grade solution to every aspect of the
    original poster's problem. I am confident the original poster can deal
    with these aspects of his problem space on his own.

    In article <>,
    John Machin <> wrote:

    > [...]
    > > . while new_level < len(state):
    > > . state.pop()

    >
    > Hmmm ... consider rewriting that as the slightly less obfuscatory
    >
    > while len(state) > new_level:
    > state.pop()


    This seems to me to be an aesthetic consideration only; I'm not sure I
    understand your rationale for reversing the sense of the comparison.
    Since it does not change the functionality, it's hardly worthy of
    complaint, but I don't see any improvement, either.

    > A warning message if there are too many "-" characters might be a good
    > idea:
    >
    > [foo]
    > |-bar
    > |-zot
    > |---plugh


    Perhaps so. Again, the original poster will have to decide what should
    be the correct response to input of this sort; at present, the
    implementation is tolerant of such variations, without loss of
    generality.

    > And if the input line matches neither regex?


    I believe it should be clear that such lines are ignored. Again, this
    is an opportunity for the original poster to determine an alternative
    response -- perhaps an exception could be raised, if that is his desire.
    The problem specification did not constrain this case.

    > > To call this, pass a file-like object to parse_folders(), e.g.:
    > >
    > > test1 = '''
    > > [New client].

    >
    > Won't work with the dot on the end.


    My mistake. The period was a copy-and-paste artifact, which I missed.

    Cheers,
    -M

    --
    Michael J. Fromberger | Lecturer, Dept. of Computer Science
    http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
    Michael J. Fromberger, Sep 19, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stub

    B tree, B+ tree and B* tree

    Stub, Nov 12, 2003, in forum: C Programming
    Replies:
    3
    Views:
    10,091
  2. Xah Lee
    Replies:
    13
    Views:
    472
    Brandon K
    Nov 3, 2005
  3. Rafael Rosa

    Beggining the compiler journey

    Rafael Rosa, Dec 13, 2008, in forum: Ruby
    Replies:
    7
    Views:
    194
    Ryan Davis
    Dec 15, 2008
  4. Gregory Brown
    Replies:
    0
    Views:
    73
    Gregory Brown
    Jul 8, 2009
  5. ifiaz
    Replies:
    3
    Views:
    248
    ifiaz
    Jan 31, 2009
Loading...

Share This Page