How to find and replace sequences of elements

Discussion in 'XML' started by Bernd.Moos@gmail.com, Sep 6, 2005.

  1. Guest

    Given the following XML document:

    <text>
    <p>
    <w>Ronaldo</w>
    <w>scoredw>
    <w>the</w>
    <w>1</w>
    <c>:</c>
    <w>1</w>
    <w>opener</w>
    </p>
    ...
    <text>

    I need to

    1) find patterns like <w>...</w><c>:</c><w>...</w>, i.e. any
    'w'-element, followed by a 'c'-element with text ':', followed by any
    'w'-element
    2) replace this pattern with <w>...:...</w>, i.e. a single element in
    which the text of the found elements is aligned

    Right now, I am doing this with the following stylesheet:

    -----------------------------------------------------------
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:eek:utput method="xml" indent="no"/>

    <!-- go through the whole document, copy everything -->
    <xsl:template match="/ | @* | node()">
    <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
    </xsl:template>

    <!-- for c-elements whose text is ':'... -->
    <xsl:template match="//c[text()=':']">
    <xsl:if test="name(preceding-sibling::*[1])='w' and
    name(following-sibling::*[1])='w'">
    <!-- ... make a new element and put the text of the matching
    pattern inside -->
    <w type='score'>
    <xsl:value-of select="preceding-sibling::*[1]/text()"/>
    <xsl:text>:</xsl:text>
    <xsl:value-of select="following-sibling::*[1]/text()"/>
    </w>
    </xsl:if>
    </xsl:template>

    <!-- make sure not to copy w-elements that have been taken care of by
    the former template -->
    <xsl:template match="//w[name(following-sibling::*[1])='c' and
    following-sibling::*[1]/text()=':' and
    name(following-sibling::*[2])='w']">
    </xsl:template>
    <xsl:template match="//w[name(preceding-sibling::*[1])='c' and
    preceding-sibling::*[1]/text()=':' and
    name(preceding-sibling::*[2])='w']">
    </xsl:template>
    </xsl:stylesheet>
    -----------------------------------------------------------

    This works OK but it looks so awkward! What's more: if the patterns I
    want to replace get longer, it becomes increasingly difficult to take
    care of all the things that must not be copied a second time (i.e. the
    last to templates in the above). Can anybody point me to a more elegant
    way of doing this?

    Thanks very much,

    Thomas
     
    , Sep 6, 2005
    #1
    1. Advertising

  2. William Park Guest

    <> wrote:
    > Given the following XML document:
    >
    > <text>
    > <p>
    > <w>Ronaldo</w>
    > <w>scoredw>
    > <w>the</w>
    > <w>1</w>
    > <c>:</c>
    > <w>1</w>
    > <w>opener</w>
    > </p>
    > ...
    > <text>
    >
    > I need to
    >
    > 1) find patterns like <w>...</w><c>:</c><w>...</w>, i.e. any
    > 'w'-element, followed by a 'c'-element with text ':', followed by any
    > 'w'-element
    > 2) replace this pattern with <w>...:...</w>, i.e. a single element in
    > which the text of the found elements is aligned
    >
    > Right now, I am doing this with the following stylesheet:
    >
    > -----------------------------------------------------------
    > <xsl:stylesheet version="1.0"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    > <xsl:eek:utput method="xml" indent="no"/>
    >
    > <!-- go through the whole document, copy everything -->
    > <xsl:template match="/ | @* | node()">
    > <xsl:copy>
    > <xsl:apply-templates select="@* | node()"/>
    > </xsl:copy>
    > </xsl:template>
    >
    > <!-- for c-elements whose text is ':'... -->
    > <xsl:template match="//c[text()=':']">
    > <xsl:if test="name(preceding-sibling::*[1])='w' and
    > name(following-sibling::*[1])='w'">
    > <!-- ... make a new element and put the text of the matching
    > pattern inside -->
    > <w type='score'>
    > <xsl:value-of select="preceding-sibling::*[1]/text()"/>
    > <xsl:text>:</xsl:text>
    > <xsl:value-of select="following-sibling::*[1]/text()"/>
    > </w>
    > </xsl:if>
    > </xsl:template>
    >
    > <!-- make sure not to copy w-elements that have been taken care of by
    > the former template -->
    > <xsl:template match="//w[name(following-sibling::*[1])='c' and
    > following-sibling::*[1]/text()=':' and
    > name(following-sibling::*[2])='w']">
    > </xsl:template>
    > <xsl:template match="//w[name(preceding-sibling::*[1])='c' and
    > preceding-sibling::*[1]/text()=':' and
    > name(preceding-sibling::*[2])='w']">
    > </xsl:template>
    > </xsl:stylesheet>
    > -----------------------------------------------------------
    >
    > This works OK but it looks so awkward! What's more: if the patterns I
    > want to replace get longer, it becomes increasingly difficult to take
    > care of all the things that must not be copied a second time (i.e. the
    > last to templates in the above). Can anybody point me to a more elegant
    > way of doing this?
    >
    > Thanks very much,
    >
    > Thomas


    Key insight is "split" and then "join". In extended Bash shell,
    a=`< file.xml`
    set -- "${a|-</w>[[:space:]]*<c>:</c>[[:space:]]*<w>}"
    echo "${*|,:}"
    where
    ${var|-regex} splits the string on 'regex'
    ${list|,sep} joins the elements using 'sep' string.

    You can do the same thing in Python also.

    --
    William Park <>, Toronto, Canada
    ThinFlash: Linux thin-client on USB key (flash) drive
    http://home.eol.ca/~parkw/thinflash.html
    BashDiff: Super Bash shell
    http://freshmeat.net/projects/bashdiff/
     
    William Park, Sep 6, 2005
    #2
    1. Advertising

  3. <> wrote in message
    news:...
    > Given the following XML document:
    >
    > <text>
    > <p>
    > <w>Ronaldo</w>
    > <w>scoredw>
    > <w>the</w>
    > <w>1</w>
    > <c>:</c>
    > <w>1</w>
    > <w>opener</w>
    > </p>
    > ...
    > <text>
    >
    > I need to
    >
    > 1) find patterns like <w>...</w><c>:</c><w>...</w>, i.e. any
    > 'w'-element, followed by a 'c'-element with text ':', followed by any
    > 'w'-element
    > 2) replace this pattern with <w>...:...</w>, i.e. a single element in
    > which the text of the found elements is aligned
    >
    > Right now, I am doing this with the following stylesheet:
    >
    > -----------------------------------------------------------
    > <xsl:stylesheet version="1.0"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    > <xsl:eek:utput method="xml" indent="no"/>
    >
    > <!-- go through the whole document, copy everything -->
    > <xsl:template match="/ | @* | node()">
    > <xsl:copy>
    > <xsl:apply-templates select="@* | node()"/>
    > </xsl:copy>
    > </xsl:template>
    >
    > <!-- for c-elements whose text is ':'... -->
    > <xsl:template match="//c[text()=':']">
    > <xsl:if test="name(preceding-sibling::*[1])='w' and
    > name(following-sibling::*[1])='w'">
    > <!-- ... make a new element and put the text of the matching
    > pattern inside -->
    > <w type='score'>
    > <xsl:value-of select="preceding-sibling::*[1]/text()"/>
    > <xsl:text>:</xsl:text>
    > <xsl:value-of select="following-sibling::*[1]/text()"/>
    > </w>
    > </xsl:if>
    > </xsl:template>
    >
    > <!-- make sure not to copy w-elements that have been taken care of by
    > the former template -->
    > <xsl:template match="//w[name(following-sibling::*[1])='c' and
    > following-sibling::*[1]/text()=':' and
    > name(following-sibling::*[2])='w']">
    > </xsl:template>
    > <xsl:template match="//w[name(preceding-sibling::*[1])='c' and
    > preceding-sibling::*[1]/text()=':' and
    > name(preceding-sibling::*[2])='w']">
    > </xsl:template>
    > </xsl:stylesheet>
    > -----------------------------------------------------------
    >
    > This works OK but it looks so awkward! What's more: if the patterns I
    > want to replace get longer, it becomes increasingly difficult to take
    > care of all the things that must not be copied a second time (i.e. the
    > last to templates in the above). Can anybody point me to a more elegant
    > way of doing this?



    Use the "Tree Visitor" pattern like this:

    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:eek:utput omit-xml-declaration="yes"/>

    <xsl:template match="@* | node()">
    <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates select="node()[1]"/>
    </xsl:copy>
    <xsl:apply-templates select="following-sibling::node()[1]"/>
    </xsl:template>

    <xsl:template match="w[following-sibling::*[1][self::c]
    and
    following-sibling::c[1]=':'
    and
    following-sibling::*[2][self::w]
    ]">
    <w>
    <xsl:value-of select=
    "concat(.,following-sibling::*[1],following-sibling::*[2])"/>
    </w>
    <xsl:apply-templates select=
    "following-sibling::*[2]/following-sibling::node()[1]"/>
    </xsl:template>

    </xsl:stylesheet>

    When this transformation is applied on your source xml (corrected to be at
    least well-formed):

    <text>
    <p>
    <w>Ronaldo</w>
    <w>scoredw</w>
    <w>the</w>
    <w>1</w>
    <c>:</c>
    <w>1</w>
    <w>opener</w>
    </p>
    ...
    </text>

    the wanted result is produced:

    <text>
    <p>
    <w>Ronaldo</w>
    <w>scoredw</w>
    <w>the</w>
    <w>1:1</w>
    <w>opener</w>
    </p>
    ...
    </text>

    Here I use a variation of the identity rule that only applies templates to
    one node at a time. This results in achieving maximum flexibility in
    processing the nodes of the xml document really sequentially.


    Hope this helped.

    Cheers,
    Dimitre Novatchev
     
    Dimitre Novatchev, Sep 6, 2005
    #3
  4. Guest

    Thanks, Dimitre, this is very, very helpful!
    Just one more question: if the sequence pattern I want to replace
    consists of, (e.g.) five elements (and not three like in the example),
    I would
    1) change the match-value of the last template accordingly and
    2) change the arguments of the concat function accordingly and
    3) what else?

    (I do not quite understand what these statements in your code do:

    <xsl:apply-templates select="following-sibling::node()[1]"/> and
    <xsl:apply-templates
    select="following-sibling::*[2]/following-sibling::node()[1]"/>)

    Thanks again...
     
    , Sep 7, 2005
    #4
  5. <> wrote in message
    news:...
    > Thanks, Dimitre, this is very, very helpful!
    > Just one more question: if the sequence pattern I want to replace
    > consists of, (e.g.) five elements (and not three like in the example),
    > I would
    > 1) change the match-value of the last template accordingly and
    > 2) change the arguments of the concat function accordingly and
    > 3) what else?


    Change:


    <xsl:apply-templates
    select="following-sibling::*[2]/following-sibling::node()[1]"/>

    to

    <xsl:apply-templates
    select="following-sibling::*[4]/following-sibling::node()[1]"/>




    >
    > (I do not quite understand what these statements in your code do:
    >
    > <xsl:apply-templates select="following-sibling::node()[1]"/>



    Apply the templates but not to all children of the current node -- just to
    its immediate following sibling

    > and
    > <xsl:apply-templates
    > select="following-sibling::*[2]/following-sibling::node()[1]"/>)



    Continue to apply templates sequentially, but skipping the three elements
    that we processed in the current template, therefore restarting at the
    immediate following sibling of the last element we processed in the current
    template.


    Cheers,
    Dimitre Novatchev.
     
    Dimitre Novatchev, Sep 7, 2005
    #5
  6. Guest

    Hi again,

    I've been working succesfully with this "Tree Visiting pattern" for a
    week now. Yesterday, however, I added Stylesheet assignments, i.e.
    things like

    <?xml-stylesheet href="file:../../../2HTML.xsl" type="text/xsl"?>

    to my XML documents.

    When I now apply the pattern, it does what it is supposed to do, but,
    on top of this, duplicates the entire document (leading to
    non-well-formed XML because there are two root elements). What's even
    stranger: when I add a comment before the root element on top of the
    XSL assignment, the output triples the input. I know how to avoid this
    (take out the XSL assignments etc.), but I'd like to understand it.
    What on earth is going on here?

    Kind regards,

    Thomas
     
    , Sep 19, 2005
    #6
  7. "" <> wrote in message
    news:...
    > Hi again,
    >
    > I've been working succesfully with this "Tree Visiting pattern" for a
    > week now. Yesterday, however, I added Stylesheet assignments, i.e.
    > things like
    >
    > <?xml-stylesheet href="file:../../../2HTML.xsl" type="text/xsl"?>
    >
    > to my XML documents.
    >
    > When I now apply the pattern, it does what it is supposed to do, but,
    > on top of this, duplicates the entire document (leading to
    > non-well-formed XML because there are two root elements). What's even
    > stranger: when I add a comment before the root element on top of the
    > XSL assignment, the output triples the input. I know how to avoid this
    > (take out the XSL assignments etc.), but I'd like to understand it.
    > What on earth is going on here?



    As you haven't provided any code, the reason is probably the bad weather...
    :eek:)

    Cheers,
    Dimitre Novatchev.
     
    Dimitre Novatchev, Sep 20, 2005
    #7
  8. Guest

    > As you haven't provided any code, the reason is probably the bad weather...
    > :eek:)
    >
    > Cheers,
    > Dimitre Novatchev.


    The weather actually couldn't have been any better ;-) The strange
    things happen with *every* variant of the TreeVisitor pattern, for
    instance:

    1) XSL-File

    <?xml version="1.0" encoding="UTF-8" ?>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:eek:utput omit-xml-declaration="no"/>
    <xsl:template match="@* | node()">
    <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates select="node()[1]"/>
    </xsl:copy>
    <xsl:apply-templates select="following-sibling::node()[1]"/>
    </xsl:template>
    </xsl:stylesheet>

    2) XML input

    <?xml version="1.0" encoding="UTF-8"?>
    <!-- any old comment -->
    <x>
    <y>
    <w>le</w>
    </y>
    </x>

    3) XML output

    <?xml version="1.0" encoding="UTF-8"?><!-- any old comment --><x>
    <y>
    <w>le</w>
    </y>
    </x><x>
    <y>
    <w>le</w>
    </y>
    </x>

    In this example, I did the transformation with whatever is built into
    the EditiX editor. But the same phenomenon occurs when I use JDOM to do
    my transformations.

    Thanks for your help,

    Thomas
     
    , Sep 20, 2005
    #8
  9. Yes. there was a subtle bug in my code.

    The solution is to replace:

    <xsl:template match="@* | node()">

    with:

    <xsl:template match="/ | @* | node()">


    What's happening?

    Someone (like me) would think that the node test:

    node()

    matches the root node (document node in XPath 2.0 lingo).

    And yes, this *is* true in XPath.

    And no, this *isn't true* for a match pattern.

    Because a match pattern matches something when the match pattern is
    evaluated from its parent, then

    node()

    will not match the document node "/", because the document node has no
    parent by definition.

    Because the document node is not matched by any template in our code, the
    default template rule is used:


    <xsl:template match="*|/">
    <xsl:apply-templates/>
    </xsl:template>

    The xsl:apply-templates instruction causes templates to be applied to *both*
    children of the document node (the comment node and the element node), which
    produces two almost identical sequences (one including the comment, the
    other not) and thus we have every node repeated in the output, with the
    exception of the comment node.


    Cheers,
    Dimitre Novatchev.


    Cheers,
    Dimitre Novatchev.




    "" <> wrote in message
    news:...
    >> As you haven't provided any code, the reason is probably the bad
    >> weather...
    >> :eek:)
    >>
    >> Cheers,
    >> Dimitre Novatchev.

    >
    > The weather actually couldn't have been any better ;-) The strange
    > things happen with *every* variant of the TreeVisitor pattern, for
    > instance:
    >
    > 1) XSL-File
    >
    > <?xml version="1.0" encoding="UTF-8" ?>
    > <xsl:stylesheet version="1.0"
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    > <xsl:eek:utput omit-xml-declaration="no"/>
    > <xsl:template match="@* | node()">
    > <xsl:copy>
    > <xsl:apply-templates select="@*"/>
    > <xsl:apply-templates select="node()[1]"/>
    > </xsl:copy>
    > <xsl:apply-templates select="following-sibling::node()[1]"/>
    > </xsl:template>
    > </xsl:stylesheet>
    >
    > 2) XML input
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <!-- any old comment -->
    > <x>
    > <y>
    > <w>le</w>
    > </y>
    > </x>
    >
    > 3) XML output
    >
    > <?xml version="1.0" encoding="UTF-8"?><!-- any old comment --><x>
    > <y>
    > <w>le</w>
    > </y>
    > </x><x>
    > <y>
    > <w>le</w>
    > </y>
    > </x>
    >
    > In this example, I did the transformation with whatever is built into
    > the EditiX editor. But the same phenomenon occurs when I use JDOM to do
    > my transformations.
    >
    > Thanks for your help,
    >
    > Thomas
    >
     
    Dimitre Novatchev, Sep 21, 2005
    #9
  10. Guest

    Got it, changed it, it did the trick.

    Thank you so much,

    Thomas
     
    , Sep 21, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rodrigo Daunaravicius

    accessing elements in multi-dimensional sequences

    Rodrigo Daunaravicius, May 28, 2004, in forum: Python
    Replies:
    7
    Views:
    381
    Rodrigo Daunaravicius
    May 28, 2004
  2. Chris Seberino
    Replies:
    3
    Views:
    750
    Stefan Behnel
    Jun 12, 2009
  3. Stefan Behnel
    Replies:
    0
    Views:
    617
    Stefan Behnel
    Aug 31, 2010
  4. Wybo Dekker
    Replies:
    1
    Views:
    367
    Yukihiro Matsumoto
    Nov 15, 2005
  5. vdvorkin
    Replies:
    0
    Views:
    415
    vdvorkin
    Feb 10, 2011
Loading...

Share This Page