elementtree: line numbers and iterparse

Discussion in 'Python' started by Stuart McGraw, Sep 13, 2006.

  1. I have a broad (~200K nodes) but shallow xml file
    I want to parse with Elementtree. There are too many
    nodes to read into memory simultaneously so I use
    iterparse() to process each node sequentially.

    Now I find i need to get and save the input file line
    number of each node. Googling turned up a way
    to do it by subclassing FancyTreeBuilder,
    (http://groups.google.com/group/comp.lang.python/msg/45f5313409553b4b?hl=en&)
    but that tries to read everything at once.

    Is there a way to do something similiar with iterparse()?
     
    Stuart McGraw, Sep 13, 2006
    #1
    1. Advertising

  2. Stuart McGraw wrote:

    > I have a broad (~200K nodes) but shallow xml file
    > I want to parse with Elementtree. There are too many
    > nodes to read into memory simultaneously so I use
    > iterparse() to process each node sequentially.
    >
    > Now I find i need to get and save the input file line
    > number of each node. Googling turned up a way
    > to do it by subclassing FancyTreeBuilder,
    > (http://groups.google.com/group/comp.lang.python/msg/45f5313409553b4b?hl=en&)
    > but that tries to read everything at once.
    >
    > Is there a way to do something similiar with iterparse()?


    something like this could work:

    import elementtree.ElementTree as ET
    import StringIO

    data = """\
    <doc>
    <tag>
    <subtag>text</subtag>
    <subtag>text</subtag>
    </tag>
    </doc>
    """

    class FileWrapper:
    def __init__(self, source):
    self.source = source
    self.lineno = 0
    def read(self, bytes):
    s = self.source.readline()
    self.lineno += 1
    return s

    # f = FileWrapper(open("source.xml")
    f = FileWrapper(StringIO.StringIO(data))

    for event, elem in ET.iterparse(f, events=["start", "end"]):
    if event == "start":
    print f.lineno, event, elem

    </F>
     
    Fredrik Lundh, Sep 13, 2006
    #2
    1. Advertising

  3. "Fredrik Lundh" <> wrote in message news:...
    > Stuart McGraw wrote:
    > > Now I find i need to get and save the input file line
    > > number of each node. Googling turned up a way
    > > to do it by subclassing FancyTreeBuilder,
    > > (http://groups.google.com/group/comp.lang.python/msg/45f5313409553b4b?hl=en&)
    > > but that tries to read everything at once.
    > >
    > > Is there a way to do something similiar with iterparse()?

    >
    > something like this could work:
    > ...snip...


    Indeed it does. Many thanks!
     
    Stuart McGraw, Sep 13, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    910
  2. George Sakkis

    Re: iterparse and unicode

    George Sakkis, Aug 21, 2008, in forum: Python
    Replies:
    6
    Views:
    273
    George Sakkis
    Aug 27, 2008
  3. Kee Nethery
    Replies:
    12
    Views:
    2,202
    Stefan Behnel
    Jun 27, 2009
  4. Robert Kern
    Replies:
    0
    Views:
    641
    Robert Kern
    May 28, 2010
  5. bfrederi

    Issue with xml iterparse

    bfrederi, Jun 3, 2010, in forum: Python
    Replies:
    4
    Views:
    775
    Stefan Behnel
    Jun 13, 2010
Loading...

Share This Page