printing XML file with XSLT code

Stu · Jun 11, 2008

Being a newbie with XSLT transformation code please excuse my neivte.
In addition, I am not sure what I want to do can be done with xslt so
I apologize up front for asking anything stupid

I have a shell script that needs to get values from an XML file. What
I want to do is transform the XML into something more KSH friendly so
it can be easy to parsed in my KSH script.

I would like to go through an entire XML document and for every
"element" and "element/attribute" print the associated value in NVP
(name value pair).

Assume the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<abc>
<def>
<mno>2008-06-11-13:15:59</mno>
<pqr stu="World">Hello</pqr>
</def>
<ghi>
<jkl vwx="12345678" > </jkl>
</ghi>
</abc>

Below is my desired out. As you can see for each element I print
"element=value" and for each attribute within an element I print
"element_attribute=value"

mno=2008-06-11-13:15:59
pqr=Hello
pqr_stu=World
jkl_vwx=12345678

Can somebody point me in the right direction or provide me with some
sample XSLT transformation code that can do this.

Keep in mind, I would like to keep this as generic as possible. That
is I don't want to reference element or attributes by names. I would
like something like this

for each element
do
if attribute
print element_attribute=value
else
print element=value
done

As oppose to say search element "pqr" and print value

Thanks to all who answer

Jürgen Kahrs · Jun 11, 2008

Stu said:
Assume the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<abc>
<def>
<mno>2008-06-11-13:15:59</mno>
<pqr stu="World">Hello</pqr>
</def>
<ghi>
<jkl vwx="12345678" > </jkl>
</ghi>
</abc>

Below is my desired out. As you can see for each element I print
"element=value" and for each attribute within an element I print
"element_attribute=value"

mno=2008-06-11-13:15:59
pqr=Hello
pqr_stu=World
jkl_vwx=12345678

The following script in XMLgawk does it:

@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR }
XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

It produces the output you wanted (except for a change in sequence).

Can somebody point me in the right direction or provide me with some
sample XSLT transformation code that can do this.

Keep in mind, I would like to keep this as generic as possible. That
is I don't want to reference element or attributes by names.

Click to expand...

The solution above _is_ generic in the sense you described.
Follow this link to the XMLgawk doc:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html

Peter Flynn · Jun 11, 2008

Stu said:
Being a newbie with XSLT transformation code please excuse my neivte.
In addition, I am not sure what I want to do can be done with xslt so
I apologize up front for asking anything stupid

I have a shell script that needs to get values from an XML file. What
I want to do is transform the XML into something more KSH friendly so
it can be easy to parsed in my KSH script.

I would like to go through an entire XML document and for every
"element" and "element/attribute" print the associated value in NVP
(name value pair).

Assume the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<abc>
<def>
<mno>2008-06-11-13:15:59</mno>
<pqr stu="World">Hello</pqr>
</def>
<ghi>
<jkl vwx="12345678" > </jkl>
</ghi>
</abc>

Below is my desired out. As you can see for each element I print
"element=value" and for each attribute within an element I print
"element_attribute=value"

mno=2008-06-11-13:15:59
pqr=Hello
pqr_stu=World
jkl_vwx=12345678

Can somebody point me in the right direction or provide me with some
sample XSLT transformation code that can do this.

Keep in mind, I would like to keep this as generic as possible. That
is I don't want to reference element or attributes by names. I would
like something like this

for each element
do
if attribute
print element_attribute=value
else
print element=value
done

As oppose to say search element "pqr" and print value

Thanks to all who answer

You could run the onsgmls validating parser to output ESIS (below)
which can trivially be processed by (eg) awk or similar.

?xml version="1.0" encoding="UTF-8"
(abc
-\n\012
(def
-\n\012
(mno
-2008-06-11-13:15:59
)mno
-\n\012
Astu CDATA World
(pqr
-Hello
)pqr
-\n\012
)def
-\n\012
(ghi
-\n\012
Avwx CDATA 12345678
(jkl
-
)jkl
-\n\012
)ghi
-\n\012
)abc

///Peter

David Carlisle · Jun 12, 2008

$ saxon l.xml l.xsl

mno=2008-06-11-13:15:59
pqr_stu=World
pqr=Hello
jkl_vwx=12345678

is produced by:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl

utput method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="text()">
<xsl:value-of select="concat('
',name(..),'=',.)"/>
</xsl:template>
<xsl:template match="@*">
<xsl:value-of select="concat('
',name(..),'_',name(.),'=',.)"/>
</xsl:template>
<xsl:template match="*">
<xsl:apply-templates select="@*|node()"/>
</xsl:template>
</xsl:stylesheet>

David

Hermann Peifer · Jun 12, 2008

Jürgen Kahrs said:
Stu said:

Assume the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<abc>
<def>
<mno>2008-06-11-13:15:59</mno>
<pqr stu="World">Hello</pqr>
</def>
<ghi>
<jkl vwx="12345678" > </jkl>
</ghi>
</abc>

Below is my desired out. As you can see for each element I print
"element=value" and for each attribute within an element I print
"element_attribute=value"

mno=2008-06-11-13:15:59
pqr=Hello
pqr_stu=World
jkl_vwx=12345678

Click to expand...

The following script in XMLgawk does it:

@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR }
XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

It produces the output you wanted (except for a change in sequence).

[:alnum:] delivers correct results with the given test data, but I guess you meant:

XMLENDELEM && data ~ /[[:alnum:]]/ ...

Hermann

Jürgen Kahrs · Jun 12, 2008

Hermann said:
The following script in XMLgawk does it:

@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
XMLATTR }
XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

It produces the output you wanted (except for a change in sequence).

Click to expand...

[:alnum:] delivers correct results with the given test data, but I guess
you meant:
XMLENDELEM && data ~ /[[:alnum:]]/ ...

No, I thought [:alnum:] was sufficient.
Does it really make a difference in this example ?

Hermann Peifer · Jun 12, 2008

Jürgen Kahrs said:
Hermann said:

The following script in XMLgawk does it:

@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
XMLATTR }
XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

It produces the output you wanted (except for a change in sequence).

Click to expand...

[:alnum:] delivers correct results with the given test data, but I
guess you meant:
XMLENDELEM && data ~ /[[:alnum:]]/ ...

Click to expand...

No, I thought [:alnum:] was sufficient.
Does it really make a difference in this example ?

[:alnum:] is treated as list of characters: 'a', 'l', 'm', 'n', 'u', and ':', whereas [[:alnum:]] is treated as a character class. The former will match 'Hello', but not 'HELLO', whereas the latter will match both. However, this doesn't make any difference with the given test data.

To make your script a bit more generic and robust (in case of empty elements), I would go for:

$ cat hermann.awk
@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { data = ""; for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR }
XMLENDELEM && data !~ /^[[:space:]]*$/ { print XMLENDELEM "=" data }

See below the different results for this sample data:

$ cat file1
<?xml version="1.0" encoding="UTF-8"?>
<abc>
<def>
<mno>2008-06-11-13:15:59</mno>
<pqr stu="World">Hello</pqr>
<a1>.,-?(){}[]</a1>
<a2>ABC</a2><a3/>
</def>
<ghi>
<jkl vwx="12345678" > </jkl>
</ghi>
</abc>

$ xgawk -f hermann.awk file1
mno=2008-06-11-13:15:59
pqr_stu=World
pqr=Hello
a1=.,-?(){}[]
a2=ABC
jkl_vwx=12345678

$ xgawk -f juergen.awk file1
mno=2008-06-11-13:15:59
pqr_stu=World
pqr=Hello
a2=ABC
a3=ABC
jkl_vwx=12345678

Jürgen Kahrs · Jun 12, 2008

Hermann said:
Jürgen Kahrs said:

Hermann said:

The following script in XMLgawk does it:

@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
XMLATTR }
XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

It produces the output you wanted (except for a change in sequence).

[:alnum:] delivers correct results with the given test data, but I
guess you meant:
XMLENDELEM && data ~ /[[:alnum:]]/ ...

Click to expand...

No, I thought [:alnum:] was sufficient.
Does it really make a difference in this example ?

Click to expand...

[:alnum:] is treated as list of characters: 'a', 'l', 'm', 'n', 'u', and
':', whereas [[:alnum:]] is treated as a character class. The former
will match 'Hello', but not 'HELLO', whereas the latter will match both.

Thanks for the reminder.

You know both languages equally well (XSL and XMLgawk).
Would you prefer the XSL solution that was posted here ?

Hermann Peifer · Jun 13, 2008

You know both languages equally well (XSL and XMLgawk).
Would you prefer the XSL solution that was posted here ?

My rule of thumb is:

Big files (say: 100+ MB), with a flat, regular structure -> XMLgawk
Small files with many optional and/or empty elements -> XSL

Hermann

Joseph J. Kesselman · Jun 13, 2008

Hermann said:
Big files (say: 100+ MB), with a flat, regular structure -> XMLgawk
Small files with many optional and/or empty elements -> XSL

Depends in part the XSLT processor, of course. Some handle large
documents better than others.

Hermann Peifer · Jun 13, 2008

Depends in part the XSLT processor, of course. Some handle large
documents better than others.

Of course. Reality is not as black and white as my rule of thumb
suggests. Would you have any pointer to some helpful XSLT processor
comparison/benchmarking?

BTW, another rule of thumb is:

Transformation: XML to text, with regex string processing -> XMLgawk
Transformation: XML to XML (in my context usually: XML to KML) -> XSL

Hermann

Joseph J. Kesselman · Jun 13, 2008

Hermann said:
Of course. Reality is not as black and white as my rule of thumb
suggests. Would you have any pointer to some helpful XSLT processor
comparison/benchmarking?

Most of what I've been doing has been using the W3C/NIST XPath and XSLT
conformance suites (pointed to from http://www.w3.org/QA/TheMatrix),
test sets such as the DataPower (now IBM) XSLTMark kernels (described at
http://www.xml.com/pub/a/2001/03/28/xsltmark/), or customer datasets
(which for obvious reasons I can't share).

I do know that the XSLT processor in the DataPower product can recognize
at least some cases where a document can be processed in a streaming
manner rather than reading it all into memory at once. That depends on
the nature of the stylesheet, of course; I'm not sure exactly where the
current limits are. But when this optimization works, it permits
handling huge documents and reduces latency, both of which are good
things. Websearch on "DataPower streaming" finds some discussion of this.

I don't think Apache Xalan has any true streaming capability yet, though
we've wanted it for many years. However, Xalan's internal data model
(DTM) is considerably more space-efficient than a standard Java DOM,
which improves its ability to handle large documents. (We had a version
of DTM which reduced overhead to only 16 bytes per XML node -- but
compressing things that far cost us some performance and imposed some
limitations we didn't like, so we had to let it grow a bit.)

I haven't used XMLgawk. But part of the point of XML is precisely that
adopting a shared (and relatively simple) syntax eases the task of
writing useful and reusable tools, and there's certainly a large amount
of "let a thousand flowers bloom" built into that assumption. I prefer
to stick to the W3C's standardized tools as much as possible, both to
push those to improve and for best portability of my work, but if
another tool does something XSLT really can't, or does it far better
than the copy of XSLT you have available to you, I'm not going to tell
you not to use it.

Hermann Peifer · Jun 15, 2008

Joseph said:
Most of what I've been doing has been using the W3C/NIST XPath and XSLT
conformance suites (pointed to from http://www.w3.org/QA/TheMatrix),
test sets such as the DataPower (now IBM) XSLTMark kernels (described at
http://www.xml.com/pub/a/2001/03/28/xsltmark/), or customer datasets
(which for obvious reasons I can't share).

I do know that the XSLT processor in the DataPower product can recognize
at least some cases where a document can be processed in a streaming
manner rather than reading it all into memory at once. That depends on
the nature of the stylesheet, of course; I'm not sure exactly where the
current limits are. But when this optimization works, it permits
handling huge documents and reduces latency, both of which are good
things. Websearch on "DataPower streaming" finds some discussion of this.

I don't think Apache Xalan has any true streaming capability yet, though
we've wanted it for many years. However, Xalan's internal data model
(DTM) is considerably more space-efficient than a standard Java DOM,
which improves its ability to handle large documents. (We had a version
of DTM which reduced overhead to only 16 bytes per XML node -- but
compressing things that far cost us some performance and imposed some
limitations we didn't like, so we had to let it grow a bit.)

I haven't used XMLgawk. But part of the point of XML is precisely that
adopting a shared (and relatively simple) syntax eases the task of
writing useful and reusable tools, and there's certainly a large amount
of "let a thousand flowers bloom" built into that assumption. I prefer
to stick to the W3C's standardized tools as much as possible, both to
push those to improve and for best portability of my work, but if
another tool does something XSLT really can't, or does it far better
than the copy of XSLT you have available to you, I'm not going to tell
you not to use it.

Thanks for the information.

I can't remember that I ever came across something that XSLT really can't do, but string processing is obviously not a strength of XSLT 1.0. I read that this improved with version 2.0, but I don't have any own experience. For transforming large XML documents into text format, which in my context often includes some regex based string processing: XMLgawk continues to be my favourite tool.

Hermann

Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
XSLT with output files	3	Jan 30, 2008
problem with xslt transformation	2	Mar 27, 2007
including xml file in xslt?	7	May 8, 2006
XSLT Noob with a problem	5	Jul 3, 2010
XSLT and XML namespace issue	4	Jun 22, 2007
Transform XML to XML using XSLT	1	Jun 5, 2006

printing XML file with XSLT code

Stu

Jürgen Kahrs

Peter Flynn

David Carlisle

Hermann Peifer

Jürgen Kahrs

Hermann Peifer

Jürgen Kahrs

Hermann Peifer

Joseph J. Kesselman

Hermann Peifer

Joseph J. Kesselman

Hermann Peifer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads