printing XML file with XSLT code

Discussion in 'XML' started by Stu, Jun 11, 2008.

  1. Stu

    Stu Guest

    Being a newbie with XSLT transformation code please excuse my neivte.
    In addition, I am not sure what I want to do can be done with xslt so
    I apologize up front for asking anything stupid

    I have a shell script that needs to get values from an XML file. What
    I want to do is transform the XML into something more KSH friendly so
    it can be easy to parsed in my KSH script.

    I would like to go through an entire XML document and for every
    "element" and "element/attribute" print the associated value in NVP
    (name value pair).

    Assume the following XML file:

    <?xml version="1.0" encoding="UTF-8"?>
    <abc>
    <def>
    <mno>2008-06-11-13:15:59</mno>
    <pqr stu="World">Hello</pqr>
    </def>
    <ghi>
    <jkl vwx="12345678" > </jkl>
    </ghi>
    </abc>

    Below is my desired out. As you can see for each element I print
    "element=value" and for each attribute within an element I print
    "element_attribute=value"

    mno=2008-06-11-13:15:59
    pqr=Hello
    pqr_stu=World
    jkl_vwx=12345678

    Can somebody point me in the right direction or provide me with some
    sample XSLT transformation code that can do this.

    Keep in mind, I would like to keep this as generic as possible. That
    is I don't want to reference element or attributes by names. I would
    like something like this

    for each element
    do
    if attribute
    print element_attribute=value
    else
    print element=value
    done

    As oppose to say search element "pqr" and print value

    Thanks to all who answer
    Stu, Jun 11, 2008
    #1
    1. Advertising

  2. Stu wrote:

    > Assume the following XML file:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <abc>
    > <def>
    > <mno>2008-06-11-13:15:59</mno>
    > <pqr stu="World">Hello</pqr>
    > </def>
    > <ghi>
    > <jkl vwx="12345678" > </jkl>
    > </ghi>
    > </abc>
    >
    > Below is my desired out. As you can see for each element I print
    > "element=value" and for each attribute within an element I print
    > "element_attribute=value"
    >
    > mno=2008-06-11-13:15:59
    > pqr=Hello
    > pqr_stu=World
    > jkl_vwx=12345678


    The following script in XMLgawk does it:

    @load xml
    XMLCHARDATA { data = $0 }
    XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR }
    XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

    It produces the output you wanted (except for a change in sequence).

    > Can somebody point me in the right direction or provide me with some
    > sample XSLT transformation code that can do this.
    >
    > Keep in mind, I would like to keep this as generic as possible. That
    > is I don't want to reference element or attributes by names.


    The solution above _is_ generic in the sense you described.
    Follow this link to the XMLgawk doc:

    http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html
    Jürgen Kahrs, Jun 11, 2008
    #2
    1. Advertising

  3. Stu

    Peter Flynn Guest

    Stu wrote:
    > Being a newbie with XSLT transformation code please excuse my neivte.
    > In addition, I am not sure what I want to do can be done with xslt so
    > I apologize up front for asking anything stupid
    >
    > I have a shell script that needs to get values from an XML file. What
    > I want to do is transform the XML into something more KSH friendly so
    > it can be easy to parsed in my KSH script.
    >
    > I would like to go through an entire XML document and for every
    > "element" and "element/attribute" print the associated value in NVP
    > (name value pair).
    >
    > Assume the following XML file:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <abc>
    > <def>
    > <mno>2008-06-11-13:15:59</mno>
    > <pqr stu="World">Hello</pqr>
    > </def>
    > <ghi>
    > <jkl vwx="12345678" > </jkl>
    > </ghi>
    > </abc>
    >
    > Below is my desired out. As you can see for each element I print
    > "element=value" and for each attribute within an element I print
    > "element_attribute=value"
    >
    > mno=2008-06-11-13:15:59
    > pqr=Hello
    > pqr_stu=World
    > jkl_vwx=12345678
    >
    > Can somebody point me in the right direction or provide me with some
    > sample XSLT transformation code that can do this.
    >
    > Keep in mind, I would like to keep this as generic as possible. That
    > is I don't want to reference element or attributes by names. I would
    > like something like this
    >
    > for each element
    > do
    > if attribute
    > print element_attribute=value
    > else
    > print element=value
    > done
    >
    > As oppose to say search element "pqr" and print value
    >
    > Thanks to all who answer


    You could run the onsgmls validating parser to output ESIS (below)
    which can trivially be processed by (eg) awk or similar.

    ?xml version="1.0" encoding="UTF-8"
    (abc
    -\n\012
    (def
    -\n\012
    (mno
    -2008-06-11-13:15:59
    )mno
    -\n\012
    Astu CDATA World
    (pqr
    -Hello
    )pqr
    -\n\012
    )def
    -\n\012
    (ghi
    -\n\012
    Avwx CDATA 12345678
    (jkl
    -
    )jkl
    -\n\012
    )ghi
    -\n\012
    )abc

    ///Peter
    Peter Flynn, Jun 11, 2008
    #3
  4. $ saxon l.xml l.xsl

    mno=2008-06-11-13:15:59
    pqr_stu=World
    pqr=Hello
    jkl_vwx=12345678

    is produced by:

    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:eek:utput method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="text()">
    <xsl:value-of select="concat('
    ',name(..),'=',.)"/>
    </xsl:template>
    <xsl:template match="@*">
    <xsl:value-of select="concat('
    ',name(..),'_',name(.),'=',.)"/>
    </xsl:template>
    <xsl:template match="*">
    <xsl:apply-templates select="@*|node()"/>
    </xsl:template>
    </xsl:stylesheet>

    David

    --
    http://dpcarlisle.blogspot.com
    David Carlisle, Jun 12, 2008
    #4
  5. Jürgen Kahrs wrote:
    > Stu wrote:
    >
    >> Assume the following XML file:
    >>
    >> <?xml version="1.0" encoding="UTF-8"?>
    >> <abc>
    >> <def>
    >> <mno>2008-06-11-13:15:59</mno>
    >> <pqr stu="World">Hello</pqr>
    >> </def>
    >> <ghi>
    >> <jkl vwx="12345678" > </jkl>
    >> </ghi>
    >> </abc>
    >>
    >> Below is my desired out. As you can see for each element I print
    >> "element=value" and for each attribute within an element I print
    >> "element_attribute=value"
    >>
    >> mno=2008-06-11-13:15:59
    >> pqr=Hello
    >> pqr_stu=World
    >> jkl_vwx=12345678

    >
    > The following script in XMLgawk does it:
    >
    > @load xml
    > XMLCHARDATA { data = $0 }
    > XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR }
    > XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }
    >
    > It produces the output you wanted (except for a change in sequence).


    [:alnum:] delivers correct results with the given test data, but I guess you meant:

    XMLENDELEM && data ~ /[[:alnum:]]/ ...

    Hermann
    Hermann Peifer, Jun 12, 2008
    #5
  6. Hermann Peifer schrieb:

    >> The following script in XMLgawk does it:
    >>
    >> @load xml
    >> XMLCHARDATA { data = $0 }
    >> XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
    >> XMLATTR }
    >> XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }
    >>
    >> It produces the output you wanted (except for a change in sequence).

    >
    > [:alnum:] delivers correct results with the given test data, but I guess
    > you meant:
    > XMLENDELEM && data ~ /[[:alnum:]]/ ...


    No, I thought [:alnum:] was sufficient.
    Does it really make a difference in this example ?
    Jürgen Kahrs, Jun 12, 2008
    #6
  7. Jürgen Kahrs wrote:
    > Hermann Peifer schrieb:
    >
    >>> The following script in XMLgawk does it:
    >>>
    >>> @load xml
    >>> XMLCHARDATA { data = $0 }
    >>> XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
    >>> XMLATTR }
    >>> XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }
    >>>
    >>> It produces the output you wanted (except for a change in sequence).

    >>
    >> [:alnum:] delivers correct results with the given test data, but I
    >> guess you meant:
    >> XMLENDELEM && data ~ /[[:alnum:]]/ ...

    >
    > No, I thought [:alnum:] was sufficient.
    > Does it really make a difference in this example ?


    [:alnum:] is treated as list of characters: 'a', 'l', 'm', 'n', 'u', and ':', whereas [[:alnum:]] is treated as a character class. The former will match 'Hello', but not 'HELLO', whereas the latter will match both. However, this doesn't make any difference with the given test data.

    To make your script a bit more generic and robust (in case of empty elements), I would go for:

    $ cat hermann.awk
    @load xml
    XMLCHARDATA { data = $0 }
    XMLSTARTELEM { data = ""; for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR }
    XMLENDELEM && data !~ /^[[:space:]]*$/ { print XMLENDELEM "=" data }

    See below the different results for this sample data:

    $ cat file1
    <?xml version="1.0" encoding="UTF-8"?>
    <abc>
    <def>
    <mno>2008-06-11-13:15:59</mno>
    <pqr stu="World">Hello</pqr>
    <a1>.,-?(){}[]</a1>
    <a2>ABC</a2><a3/>
    </def>
    <ghi>
    <jkl vwx="12345678" > </jkl>
    </ghi>
    </abc>

    $ xgawk -f hermann.awk file1
    mno=2008-06-11-13:15:59
    pqr_stu=World
    pqr=Hello
    a1=.,-?(){}[]
    a2=ABC
    jkl_vwx=12345678

    $ xgawk -f juergen.awk file1
    mno=2008-06-11-13:15:59
    pqr_stu=World
    pqr=Hello
    a2=ABC
    a3=ABC
    jkl_vwx=12345678
    Hermann Peifer, Jun 12, 2008
    #7
  8. Hermann Peifer schrieb:
    > Jürgen Kahrs wrote:
    >> Hermann Peifer schrieb:
    >>
    >>>> The following script in XMLgawk does it:
    >>>>
    >>>> @load xml
    >>>> XMLCHARDATA { data = $0 }
    >>>> XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
    >>>> XMLATTR }
    >>>> XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }
    >>>>
    >>>> It produces the output you wanted (except for a change in sequence).
    >>>
    >>> [:alnum:] delivers correct results with the given test data, but I
    >>> guess you meant:
    >>> XMLENDELEM && data ~ /[[:alnum:]]/ ...

    >>
    >> No, I thought [:alnum:] was sufficient.
    >> Does it really make a difference in this example ?

    >
    > [:alnum:] is treated as list of characters: 'a', 'l', 'm', 'n', 'u', and
    > ':', whereas [[:alnum:]] is treated as a character class. The former
    > will match 'Hello', but not 'HELLO', whereas the latter will match both.


    Thanks for the reminder.

    You know both languages equally well (XSL and XMLgawk).
    Would you prefer the XSL solution that was posted here ?
    Jürgen Kahrs, Jun 12, 2008
    #8
  9. On Jun 12, 10:53 pm, Jürgen Kahrs <>
    wrote:

    > You know both languages equally well (XSL and XMLgawk).
    > Would you prefer the XSL solution that was posted here ?


    My rule of thumb is:

    Big files (say: 100+ MB), with a flat, regular structure -> XMLgawk
    Small files with many optional and/or empty elements -> XSL

    Hermann
    Hermann Peifer, Jun 13, 2008
    #9
  10. Hermann Peifer wrote:
    > Big files (say: 100+ MB), with a flat, regular structure -> XMLgawk
    > Small files with many optional and/or empty elements -> XSL


    Depends in part the XSLT processor, of course. Some handle large
    documents better than others.
    Joseph J. Kesselman, Jun 13, 2008
    #10
  11. On Jun 13, 3:55 pm, "Joseph J. Kesselman" <>
    wrote:
    > Hermann Peifer wrote:
    > > Big files (say: 100+ MB), with a flat, regular structure -> XMLgawk
    > > Small files with many optional and/or empty elements -> XSL

    >
    > Depends in part the XSLT processor, of course. Some handle large
    > documents better than others.


    Of course. Reality is not as black and white as my rule of thumb
    suggests. Would you have any pointer to some helpful XSLT processor
    comparison/benchmarking?

    BTW, another rule of thumb is:

    Transformation: XML to text, with regex string processing -> XMLgawk
    Transformation: XML to XML (in my context usually: XML to KML) -> XSL

    Hermann
    Hermann Peifer, Jun 13, 2008
    #11
  12. Hermann Peifer wrote:
    > Of course. Reality is not as black and white as my rule of thumb
    > suggests. Would you have any pointer to some helpful XSLT processor
    > comparison/benchmarking?


    Most of what I've been doing has been using the W3C/NIST XPath and XSLT
    conformance suites (pointed to from http://www.w3.org/QA/TheMatrix),
    test sets such as the DataPower (now IBM) XSLTMark kernels (described at
    http://www.xml.com/pub/a/2001/03/28/xsltmark/), or customer datasets
    (which for obvious reasons I can't share).

    I do know that the XSLT processor in the DataPower product can recognize
    at least some cases where a document can be processed in a streaming
    manner rather than reading it all into memory at once. That depends on
    the nature of the stylesheet, of course; I'm not sure exactly where the
    current limits are. But when this optimization works, it permits
    handling huge documents and reduces latency, both of which are good
    things. Websearch on "DataPower streaming" finds some discussion of this.

    I don't think Apache Xalan has any true streaming capability yet, though
    we've wanted it for many years. However, Xalan's internal data model
    (DTM) is considerably more space-efficient than a standard Java DOM,
    which improves its ability to handle large documents. (We had a version
    of DTM which reduced overhead to only 16 bytes per XML node -- but
    compressing things that far cost us some performance and imposed some
    limitations we didn't like, so we had to let it grow a bit.)

    I haven't used XMLgawk. But part of the point of XML is precisely that
    adopting a shared (and relatively simple) syntax eases the task of
    writing useful and reusable tools, and there's certainly a large amount
    of "let a thousand flowers bloom" built into that assumption. I prefer
    to stick to the W3C's standardized tools as much as possible, both to
    push those to improve and for best portability of my work, but if
    another tool does something XSLT really can't, or does it far better
    than the copy of XSLT you have available to you, I'm not going to tell
    you not to use it.
    Joseph J. Kesselman, Jun 13, 2008
    #12
  13. Joseph J. Kesselman wrote:
    > Hermann Peifer wrote:
    >> Of course. Reality is not as black and white as my rule of thumb
    >> suggests. Would you have any pointer to some helpful XSLT processor
    >> comparison/benchmarking?

    >
    > Most of what I've been doing has been using the W3C/NIST XPath and XSLT
    > conformance suites (pointed to from http://www.w3.org/QA/TheMatrix),
    > test sets such as the DataPower (now IBM) XSLTMark kernels (described at
    > http://www.xml.com/pub/a/2001/03/28/xsltmark/), or customer datasets
    > (which for obvious reasons I can't share).
    >
    > I do know that the XSLT processor in the DataPower product can recognize
    > at least some cases where a document can be processed in a streaming
    > manner rather than reading it all into memory at once. That depends on
    > the nature of the stylesheet, of course; I'm not sure exactly where the
    > current limits are. But when this optimization works, it permits
    > handling huge documents and reduces latency, both of which are good
    > things. Websearch on "DataPower streaming" finds some discussion of this.
    >
    > I don't think Apache Xalan has any true streaming capability yet, though
    > we've wanted it for many years. However, Xalan's internal data model
    > (DTM) is considerably more space-efficient than a standard Java DOM,
    > which improves its ability to handle large documents. (We had a version
    > of DTM which reduced overhead to only 16 bytes per XML node -- but
    > compressing things that far cost us some performance and imposed some
    > limitations we didn't like, so we had to let it grow a bit.)
    >
    > I haven't used XMLgawk. But part of the point of XML is precisely that
    > adopting a shared (and relatively simple) syntax eases the task of
    > writing useful and reusable tools, and there's certainly a large amount
    > of "let a thousand flowers bloom" built into that assumption. I prefer
    > to stick to the W3C's standardized tools as much as possible, both to
    > push those to improve and for best portability of my work, but if
    > another tool does something XSLT really can't, or does it far better
    > than the copy of XSLT you have available to you, I'm not going to tell
    > you not to use it.


    Thanks for the information.

    I can't remember that I ever came across something that XSLT really can't do, but string processing is obviously not a strength of XSLT 1.0. I read that this improved with version 2.0, but I don't have any own experience. For transforming large XML documents into text format, which in my context often includes some regex based string processing: XMLgawk continues to be my favourite tool.

    Hermann
    Hermann Peifer, Jun 15, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stylus Studio
    Replies:
    0
    Views:
    656
    Stylus Studio
    Aug 3, 2004
  2. Replies:
    4
    Views:
    669
  3. jkflens
    Replies:
    2
    Views:
    1,467
    jkflens
    May 30, 2006
  4. PL
    Replies:
    2
    Views:
    221
    Brian McCauley
    Dec 14, 2004
  5. Erik Wasser
    Replies:
    5
    Views:
    449
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page