help with recursive whitespace filter in

Discussion in 'Python' started by Rustom Mody, May 10, 2009.

  1. Rustom Mody

    Rustom Mody Guest

    I am trying to write a recursive filter to remove whitespace-only
    nodes for minidom.
    The code is below.

    Strangely it deletes some whitespace nodes and leaves some.
    If I keep calling it -- like so: fws(fws(fws(doc))) then at some
    stage all the ws nodes disappear

    Does anybody have a clue?


    from xml.dom.minidom import parse

    #The input to fws is the output of parse("something.xml")


    def fws(ele):
    """ filter white space (recursive)"""

    for c in ele.childNodes:
    if isWsNode(c):
    ele.removeChild(c)
    #c.unlink() Makes no diff whether this is there or not
    elif c.nodeType == ele.ELEMENT_NODE:
    fws(c)


    def isWsNode(ele):
    return (ele.nodeType == ele.TEXT_NODE and not ele.data.strip())
    Rustom Mody, May 10, 2009
    #1
    1. Advertising

  2. Rustom Mody

    Steve Howell Guest

    On May 10, 9:10 am, Rustom Mody <> wrote:
    > I am trying to write a recursive filter to remove whitespace-only
    > nodes for minidom.
    > The code is below.
    >
    > Strangely it deletes some whitespace nodes and leaves some.
    > If I keep calling it -- like so: fws(fws(fws(doc)))  then at some
    > stage all the ws nodes disappear
    >
    > Does anybody have a clue?
    >
    > from xml.dom.minidom import parse
    >
    > #The input to fws is the output of parse("something.xml")
    >
    > def fws(ele):
    >     """ filter white space (recursive)"""
    >
    >    for c in ele.childNodes:
    >         if isWsNode(c):
    >             ele.removeChild(c)
    >             #c.unlink() Makes no diff whether this is there or not
    >         elif c.nodeType == ele.ELEMENT_NODE:
    >             fws(c)
    >
    > def isWsNode(ele):
    >     return (ele.nodeType == ele.TEXT_NODE and not ele.data.strip())


    I would avoid doing things like delete/remove in a loop. Instead
    build a list of things to delete.
    Steve Howell, May 10, 2009
    #2
    1. Advertising

  3. Rustom Mody

    rustom Guest

    On May 10, 9:49 pm, Steve Howell <> wrote:
    > On May 10, 9:10 am, Rustom Mody <> wrote:
    >
    >
    >
    > > I am trying to write a recursive filter to remove whitespace-only
    > > nodes for minidom.
    > > The code is below.

    >
    > > Strangely it deletes some whitespace nodes and leaves some.
    > > If I keep calling it -- like so: fws(fws(fws(doc)))  then at some
    > > stage all the ws nodes disappear

    >
    > > Does anybody have a clue?

    >
    > > from xml.dom.minidom import parse

    >
    > > #The input to fws is the output of parse("something.xml")

    >
    > > def fws(ele):
    > >     """ filter white space (recursive)"""

    >
    > >    for c in ele.childNodes:
    > >         if isWsNode(c):
    > >             ele.removeChild(c)
    > >             #c.unlink() Makes no diff whether this is there or not
    > >         elif c.nodeType == ele.ELEMENT_NODE:
    > >             fws(c)

    >
    > > def isWsNode(ele):
    > >     return (ele.nodeType == ele.TEXT_NODE and not ele.data.strip())

    >
    > I would avoid doing things like delete/remove in a loop.  Instead
    > build a list of things to delete.


    Yeah I know. I would write the whole damn thing functionally if I knew
    how. But cant figure out the API.
    I actually started out to write a (haskell-style) copy out the whole
    tree minus the unwanted nodes but could not figure it out
    rustom, May 10, 2009
    #3
  4. Rustom Mody

    MRAB Guest

    rustom wrote:
    > On May 10, 9:49 pm, Steve Howell <> wrote:
    >> On May 10, 9:10 am, Rustom Mody <> wrote:
    >>
    >>
    >>
    >>> I am trying to write a recursive filter to remove whitespace-only
    >>> nodes for minidom.
    >>> The code is below.
    >>> Strangely it deletes some whitespace nodes and leaves some.
    >>> If I keep calling it -- like so: fws(fws(fws(doc))) then at some
    >>> stage all the ws nodes disappear
    >>> Does anybody have a clue?
    >>> from xml.dom.minidom import parse
    >>> #The input to fws is the output of parse("something.xml")
    >>> def fws(ele):
    >>> """ filter white space (recursive)"""
    >>> for c in ele.childNodes:
    >>> if isWsNode(c):
    >>> ele.removeChild(c)
    >>> #c.unlink() Makes no diff whether this is there or not
    >>> elif c.nodeType == ele.ELEMENT_NODE:
    >>> fws(c)
    >>> def isWsNode(ele):
    >>> return (ele.nodeType == ele.TEXT_NODE and not ele.data.strip())

    >> I would avoid doing things like delete/remove in a loop. Instead
    >> build a list of things to delete.

    >
    > Yeah I know. I would write the whole damn thing functionally if I knew
    > how. But cant figure out the API.
    > I actually started out to write a (haskell-style) copy out the whole
    > tree minus the unwanted nodes but could not figure it out
    >

    def fws(ele):
    """ filter white space (recursive)"""
    empty_nodes = []
    for c in ele.childNodes:
    if isWsNode(c):
    empty_nodes.append(c)
    elif c.nodeType == ele.ELEMENT_NODE:
    fws(c)
    for c in empty_nodes:
    ele.removeChild(c)
    MRAB, May 10, 2009
    #4
  5. Rustom Mody

    Steve Howell Guest

    On May 10, 10:23 am, rustom <> wrote:
    > On May 10, 9:49 pm, Steve Howell <> wrote:
    >
    >
    >
    > > On May 10, 9:10 am, Rustom Mody <> wrote:

    >
    > > > I am trying to write a recursive filter to remove whitespace-only
    > > > nodes for minidom.
    > > > The code is below.

    >
    > > > Strangely it deletes some whitespace nodes and leaves some.
    > > > If I keep calling it -- like so: fws(fws(fws(doc)))  then at some
    > > > stage all the ws nodes disappear

    >
    > > > Does anybody have a clue?

    >
    > > > from xml.dom.minidom import parse

    >
    > > > #The input to fws is the output of parse("something.xml")

    >
    > > > def fws(ele):
    > > >     """ filter white space (recursive)"""

    >
    > > >    for c in ele.childNodes:
    > > >         if isWsNode(c):
    > > >             ele.removeChild(c)
    > > >             #c.unlink() Makes no diff whether this is there or not
    > > >         elif c.nodeType == ele.ELEMENT_NODE:
    > > >             fws(c)

    >
    > > > def isWsNode(ele):
    > > >     return (ele.nodeType == ele.TEXT_NODE and not ele.data.strip())

    >
    > > I would avoid doing things like delete/remove in a loop.  Instead
    > > build a list of things to delete.

    >
    > Yeah I know. I would write the whole damn thing functionally if I knew
    > how.  But cant figure out the API.
    > I actually started out to write a (haskell-style) copy out the whole
    > tree minus the unwanted nodes but could not figure it out


    You can use list comprehensions for a more functional style.

    Instead of deleting the unwanted nodes in place, try to create new
    lists of just the wanted results.
    Steve Howell, May 11, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ryan Stewart
    Replies:
    11
    Views:
    5,749
    Roedy Green
    Jun 10, 2004
  2. Oli Filth
    Replies:
    9
    Views:
    3,319
    Uncle Pirate
    Jan 17, 2005
  3. n00m
    Replies:
    12
    Views:
    1,104
  4. Replies:
    10
    Views:
    721
    Eric Brunel
    Dec 16, 2008
  5. MRAB
    Replies:
    3
    Views:
    373
Loading...

Share This Page