Help parsing a text file

Discussion in 'Python' started by William Gill, Aug 29, 2011.

  1. William Gill

    William Gill Guest

    I haven't done much with Python for a couple years, bouncing around
    between other languages and scripts as needs suggest, so I have some
    minor difficulty keeping Python functionality Python functionality in my
    head, but I can overcome that as the cobwebs clear. Though I do seem to
    keep tripping over the same Py2 -> Py3 syntax changes (old habits die hard).

    I have a text file with XML like records that I need to parse. By XML
    like I mean records have proper opening and closing tags. but fields
    don't have closing tags (they rely on line ends). Not all fields appear
    in all records, but they do adhere to a defined sequence.

    My initial passes into Python have been very unfocused (a scatter gun of
    too many possible directions, yielding very messy results), so I'm
    asking for some suggestions, or algorithms (possibly even examples)that
    may help me focus.

    I'm not asking anyone to write my code, just to nudge me toward a more
    disciplined approach to a common task, and I promise to put in the
    effort to understand the underlying fundamentals.
    William Gill, Aug 29, 2011
    #1
    1. Advertising

  2. On Aug 29, 2011, at 2:21 PM, William Gill wrote:

    > I haven't done much with Python for a couple years, bouncing around between other languages and scripts as needs suggest, so I have some minor difficulty keeping Python functionality Python functionality in my head, but I can overcome that as the cobwebs clear. Though I do seem to keep tripping over the same Py2 -> Py3 syntax changes (old habits die hard).
    >
    > I have a text file with XML like records that I need to parse. By XML like I mean records have proper opening and closing tags. but fields don't have closing tags (they rely on line ends). Not all fields appear in all records, but they do adhere to a defined sequence.
    >
    > My initial passes into Python have been very unfocused (a scatter gun of too many possible directions, yielding very messy results), so I'm asking for some suggestions, or algorithms (possibly even examples)that may help me focus.
    >
    > I'm not asking anyone to write my code, just to nudge me toward a more disciplined approach to a common task, and I promise to put in the effort to understand the underlying fundamentals.


    If the syntax really is close to XML, would it be all that difficult to convert it to proper XML? Then you have nice libraries like ElementTree to use for parsing.


    Cheers
    Philip
    Philip Semanchuk, Aug 29, 2011
    #2
    1. Advertising

  3. William Gill

    William Gill Guest

    On 8/29/2011 2:31 PM, Philip Semanchuk wrote:
    >
    > If the syntax really is close to XML, would it be all that difficult to convert it to proper XML? Then you have nice libraries like ElementTree to use for parsing.
    >


    Possibly, but I would still need the same search algorithms to find the
    opening tag for the field, then find and replace the next line end with
    a matching closing tag. So it seems to me that the starting point is
    the same, and then it's my choice to either process the substrings
    myself or employ something like ElementTree.
    William Gill, Aug 29, 2011
    #3
  4. On 29/08/11 20:21, William Gill wrote:
    > I haven't done much with Python for a couple years, bouncing around
    > between other languages and scripts as needs suggest, so I have some
    > minor difficulty keeping Python functionality Python functionality in my
    > head, but I can overcome that as the cobwebs clear. Though I do seem to
    > keep tripping over the same Py2 -> Py3 syntax changes (old habits die
    > hard).
    >
    > I have a text file with XML like records that I need to parse. By XML
    > like I mean records have proper opening and closing tags. but fields
    > don't have closing tags (they rely on line ends). Not all fields appear
    > in all records, but they do adhere to a defined sequence.
    >
    > My initial passes into Python have been very unfocused (a scatter gun of
    > too many possible directions, yielding very messy results), so I'm
    > asking for some suggestions, or algorithms (possibly even examples)that
    > may help me focus.
    >
    > I'm not asking anyone to write my code, just to nudge me toward a more
    > disciplined approach to a common task, and I promise to put in the
    > effort to understand the underlying fundamentals.


    A name that is often thrown around on this list for this kind of
    question is pyparsing. Now, I don't know anything about it myself, but
    it may be worth looking into.

    Otherwise, if you say it's similar to XML, you might want to take a cue
    from XML processing when it comes to dealing with the file. You could
    emulate the stream-based approach taken by SAX or eXpat - have methods
    that handle the different events that can occur - for XML this is "start
    tag", "end tag", "text node", "processing instruction", etc., in your
    case, it might be "start/end record", "field data", etc. That way, you
    could separate the code that keeps track of the current record, and how
    the data fits together to make an object structure, and the parsing
    code, that knows how to convert a line of data into something meaningful.

    Thomas
    Thomas Jollans, Aug 29, 2011
    #4
  5. William Gill

    Waldek M. Guest

    On Mon, 29 Aug 2011 23:05:23 +0200, Thomas Jollans wrote:
    > A name that is often thrown around on this list for this kind of
    > question is pyparsing. Now, I don't know anything about it myself, but
    > it may be worth looking into.


    Definitely. I did use it and even though it's not perfect - it's very
    useful indeed. Due to it's nature it is not a demon of speed when parsing
    complex and big structures, so you might want to keep it in mind.
    But I whole-heartedly recommend it.

    Br.
    Waldek
    Waldek M., Aug 30, 2011
    #5
  6. William Gill

    JT Guest

    On Monday, August 29, 2011 1:21:48 PM UTC-5, William Gill wrote:
    >
    > I have a text file with XML like records that I need to parse. By XML
    > like I mean records have proper opening and closing tags. but fields
    > don't have closing tags (they rely on line ends). Not all fields appear
    > in all records, but they do adhere to a defined sequence.


    lxml can parse XML and broken HTML (see http://lxml.de/parsing.html).

    - James

    --
    Bulbflow: A Python framework for graph databases (http://bulbflow.com)
    JT, Sep 1, 2011
    #6
  7. William Gill

    William Gill Guest

    On 9/1/2011 1:58 PM, JT wrote:
    > On Monday, August 29, 2011 1:21:48 PM UTC-5, William Gill wrote:
    >>
    >> I have a text file with XML like records that I need to parse. By XML
    >> like I mean records have proper opening and closing tags. but fields
    >> don't have closing tags (they rely on line ends). Not all fields appear
    >> in all records, but they do adhere to a defined sequence.

    >
    > lxml can parse XML and broken HTML (see http://lxml.de/parsing.html).
    >
    > - James
    >

    Thanks to everyone.

    Though I didn't get what I expected, it made me think more about the
    reason I need to parse these files to begin with. So I'm going to do
    some more homework on the overall business application and work backward
    from there. Once I know how the data fits in the scheme of things, I
    will create an appropriate abstraction layer, either from scratch, or
    using one of the existing parsers mentioned, but I won't really know
    that until I have finished modeling.
    William Gill, Sep 1, 2011
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. .Net Sports
    Replies:
    11
    Views:
    1,415
    .Net Sports
    Jan 17, 2006
  2. Kai Schlamp
    Replies:
    1
    Views:
    409
    Arne Vajhøj
    Mar 27, 2008
  3. Joey Martin

    Help with parsing text file

    Joey Martin, Nov 7, 2003, in forum: ASP General
    Replies:
    2
    Views:
    92
    dlbjr
    Nov 7, 2003
  4. Domenico Discepola

    Assistance parsing text file using Text::CSV_XS

    Domenico Discepola, Sep 1, 2004, in forum: Perl Misc
    Replies:
    6
    Views:
    442
    Domenico Discepola
    Sep 2, 2004
  5. greggiefen

    Help Parsing A Text File

    greggiefen, Jan 3, 2007, in forum: Perl Misc
    Replies:
    1
    Views:
    79
    -berlin.de
    Jan 3, 2007
Loading...

Share This Page