replacing xml elements with other elements using lxml

Discussion in 'Python' started by Ultrus, Aug 29, 2007.

  1. Ultrus

    Ultrus Guest

    Hello,
    I'm attempting to generate a random story using xml as the document,
    and lxml as the parser. I want the document to be simplified before
    processing it further, and am very close to accomplishing my goal.
    Below is what I have so far. Any ideas on how to move forward?

    The goal:
    read and edit xml file, replacing random elements with randomly picked
    content from within

    Completed:
    [x] read xml
    [x] access first random tag
    [x] pick random content within random item
    [o] need to replace <random> tag with picked contents

    xml sample:
    <contents>Here is some content.</contents>
    <random>
    <item><contents>Here is some random content.</contents></item>
    <item><contents>Here is some more random content.</contents></item>
    </random>
    <contents>Here is some content.</contents>

    Python code:
    from lxml import etree
    from StringIO import StringIO
    import random

    theXml = "<contents>Here is some content.</
    contents><random><item><contents>Here is some random content.</
    contents></item><item><contents>Here is some more random content.</
    contents></item></random><contents>Here is some content.</contents>"

    f = StringIO(theXml)
    tree = etree.parse(f)
    r = tree.xpath('//random')

    if len(r) > 0:
    randInt = random.randInt(0,(len(r[0]) - 1))
    randContents = r[0][randInt][0]
    #replace parent random tag with picked content here

    now that I have the contents tag randomly chosen, how do I delete the
    parent <random> tag, and replace it to look like this:

    final xml sample (goal):
    <contents>Here is some content.</contents>
    <contents>Here is some random content.</contents>
    <contents>Here is some content.</contents>

    Any idea on how to do this? So close! Thanks for the help in
    advance. :)
    Ultrus, Aug 29, 2007
    #1
    1. Advertising

  2. Ultrus wrote:
    > I'm attempting to generate a random story using xml as the document,
    > and lxml as the parser. I want the document to be simplified before
    > processing it further, and am very close to accomplishing my goal.
    > Below is what I have so far. Any ideas on how to move forward?
    >
    > The goal:
    > read and edit xml file, replacing random elements with randomly picked
    > content from within
    >
    > Completed:
    > [x] read xml
    > [x] access first random tag
    > [x] pick random content within random item
    > [o] need to replace <random> tag with picked contents
    >
    > xml sample:
    > <contents>Here is some content.</contents>
    > <random>
    > <item><contents>Here is some random content.</contents></item>
    > <item><contents>Here is some more random content.</contents></item>
    > </random>
    > <contents>Here is some content.</contents>


    Hmm, this is not well-formed XML, so I assume you stripped the example. The
    root element is missing.


    > Python code:
    > from lxml import etree
    > from StringIO import StringIO
    > import random
    >
    > theXml = "<contents>Here is some content.</
    > contents><random><item><contents>Here is some random content.</
    > contents></item><item><contents>Here is some more random content.</
    > contents></item></random><contents>Here is some content.</contents>"
    >
    > f = StringIO(theXml)
    > tree = etree.parse(f)


    ^^^^^
    This would raise an exception if the above really *was* your input.


    > r = tree.xpath('//random')
    >
    > if len(r) > 0:
    > randInt = random.randInt(0,(len(r[0]) - 1))
    > randContents = r[0][randInt][0]
    > #replace parent random tag with picked content here
    >
    > now that I have the contents tag randomly chosen, how do I delete the
    > parent <random> tag, and replace it to look like this:
    >
    > final xml sample (goal):
    > <contents>Here is some content.</contents>
    > <contents>Here is some random content.</contents>
    > <contents>Here is some content.</contents>


    what about:

    r.getparent().replace(r, random.choice(r))

    ?

    Stefan
    Stefan Behnel, Aug 29, 2007
    #2
    1. Advertising

  3. Ultrus

    Ultrus Guest

    Stefan,
    I'm honored by your response.

    You are correct about the bad xml. I attempted to shorten the xml for
    this example as there are other tags unrelated to this issue in the
    mix. Based on your feedback, I was able to make following fully
    functional code using some different techniques:

    from lxml import etree
    from StringIO import StringIO
    import random

    sourceXml = "\
    <theroot>\
    <contents>Stefan's fortune cookie:</contents>\
    <random>\
    <item>\
    <random>\
    <item>\
    <contents>You will always know love.</contents>\
    </item>\
    <item>\
    <contents>You will spend it all in one place.</contents>\
    </item>\
    </random>\
    </item>\
    <item>\
    <contents>Your life comes with a lifetime warrenty.</contents>\
    </item>\
    </random>\
    <contents>The end.</contents>\
    </theroot>"

    parser = etree.XMLParser(ns_clean=True, recover=True,
    remove_blank_text=True, remove_comments=True)
    tree = etree.parse(StringIO(sourceXml), parser)
    xml = tree.getroot()

    def reduceRandoms(xml):
    for elem in xml:
    if elem.tag == "random":
    elem.getparent().replace(elem, random.choice(elem)[0])
    reduceRandoms(xml)

    reduceRandoms(xml)
    for elem in xml:
    print elem.tag, ":", elem.text




    One challenge that I face now is that I can only replace a parent
    element with a single element. This isn't a problem if an <item>
    element only has 1 <contents> element, or just 1 <random> element
    (this works above). However, if <item> elements have more than one
    child element such as a <contents> element, followed by a <random>
    element (like children of <theroot>), only the first element is used.

    Any thoughts on how to replace+append after the replaced element, or
    clear+append multiple elements to the cleared position?

    Thanks again :)
    Ultrus, Aug 30, 2007
    #3
  4. Ultrus

    Ultrus Guest

    Ah! I figured it out. I forgot that the tree is treated like a list.
    The solution was to replace the <random> element with the first <item>
    child, then use Python's insert(i,x) function to insert elements after
    the first one.

    lxml rocks!
    Ultrus, Aug 30, 2007
    #4
  5. Ultrus wrote:
    > Ah! I figured it out. I forgot that the tree is treated like a list.
    > The solution was to replace the <random> element with the first <item>
    > child, then use Python's insert(i,x) function to insert elements after
    > the first one.


    You could also use slicing, something like:

    parent[2:3] = child[1:5]

    should work.


    > lxml rocks!


    I know, but it feels good to read it once in a while. :)

    Stefan
    Stefan Behnel, Aug 30, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Laurent Pointal

    lxml and identification of elements in source

    Laurent Pointal, Sep 19, 2007, in forum: Python
    Replies:
    1
    Views:
    359
    Stefan Behnel
    Sep 20, 2007
  2. J. Pablo Fernández

    Getting elements and text with lxml

    J. Pablo Fernández, May 16, 2008, in forum: Python
    Replies:
    5
    Views:
    918
    J. Pablo Fernández
    May 17, 2008
  3. Gibson
    Replies:
    2
    Views:
    1,399
    Gibson
    Nov 19, 2008
  4. byron
    Replies:
    5
    Views:
    1,074
    Stefan Behnel
    May 30, 2009
  5. Rob Meade

    Replacing - and not Replacing...

    Rob Meade, Apr 5, 2005, in forum: ASP General
    Replies:
    5
    Views:
    273
    Chris Hohmann
    Apr 11, 2005
Loading...

Share This Page