Convert xml to CSV using xsltproc

L

loc

I'm trying to convert an xml file into CSV using xsltproc.

#file.xml
<?xml-version ="1.0"standalone="no"?>
<NAXML-POSJournal version="3.3">
<TransmissionHeader>
<StoreLocationID>207</StoreLocationID>
</TransmissionHeader>
<JournalReport>
<JournalHeader>
<ReportSequenceNumber>74</ReportSequenceNumber>
<PrimaryReportPeriod>2</PrimaryReportPeriod>
<SecondaryReportPeriod>1</SecondaryReportPeriod>
<BeginDate>2010-02-11</BeginDate>
<BeginTime>03:58:42</BeginTime>
<EndDate>2100-01-01</EndDate>
<EndTime>00:00:00</EndTime>
</JournalHeader>
<SaleEvent>
<BusinessDate>2010-02-11</BusinessDate>
<TransactionDetailGroup>
<TransactionLine status="normal">
<ItemLine>
<ItemCode>
<POSCodeFormat format="upcA"></POSCodeFormat>
<POSCode>028400079037</POSCode>
<POSCodeModifier name="pc">1</POSCodeModifier>
</ItemCode>
</ItemLine>
</TransactionLine>
<TransactionLine status="normal">
<ItemLine>
<ItemCode>
<POSCodeFormat format="upcA"></POSCodeFormat>
<POSCode>049000051148</POSCode>
<POSCodeModifier name="pc">1</POSCodeModifier>
</ItemCode>
</ItemLine>
</TransactionLine>
</TransactionDetailGroup>
</SaleEvent>
</JournalReport>
</NAXML-POSJournal>


Here is the stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:eek:utput method="text"/>
<xsl:template match="NAXML-POSJournal/JournalReport/SaleEvent/
TransactionDetailGroup">
<xsl:for-each select="*">
<xsl:value-of select="."/>
<xsl:text>,</xsl:text>
<xsl:if test="not(position() = last())">
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>


The output I'm looking for is the values for the following:

<StoreLocationID>,<BusinessDate>,<POSCodeFormat>,<POSCode>

I'd like to get a new line with that info for each <TransactionLine>
How can I make this work?
 
M

Martin Honnen

loc said:
I'm trying to convert an xml file into CSV using xsltproc.

#file.xml
<?xml-version ="1.0"standalone="no"?>
<NAXML-POSJournal version="3.3">
<TransmissionHeader>
<StoreLocationID>207</StoreLocationID>
</TransmissionHeader>
<JournalReport>
<JournalHeader>
<ReportSequenceNumber>74</ReportSequenceNumber>
<PrimaryReportPeriod>2</PrimaryReportPeriod>
<SecondaryReportPeriod>1</SecondaryReportPeriod>
<BeginDate>2010-02-11</BeginDate>
<BeginTime>03:58:42</BeginTime>
<EndDate>2100-01-01</EndDate>
<EndTime>00:00:00</EndTime>
</JournalHeader>
<SaleEvent>
<BusinessDate>2010-02-11</BusinessDate>
<TransactionDetailGroup>
<TransactionLine status="normal">
<ItemLine>
<ItemCode>
<POSCodeFormat format="upcA"></POSCodeFormat>
<POSCode>028400079037</POSCode>
<POSCodeModifier name="pc">1</POSCodeModifier>
</ItemCode>
</ItemLine>
</TransactionLine>
<TransactionLine status="normal">
<ItemLine>
<ItemCode>
<POSCodeFormat format="upcA"></POSCodeFormat>
<POSCode>049000051148</POSCode>
<POSCodeModifier name="pc">1</POSCodeModifier>
</ItemCode>
</ItemLine>
</TransactionLine>
</TransactionDetailGroup>
</SaleEvent>
</JournalReport>
</NAXML-POSJournal>

The output I'm looking for is the values for the following:

<StoreLocationID>,<BusinessDate>,<POSCodeFormat>,<POSCode>

I'd like to get a new line with that info for each <TransactionLine>
How can I make this work?

Then process each 'TransactionLine' element and output what you want to
output:

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:strip-space elements="*"/>
<xsl:eek:utput method="text"/>

<xsl:template match="/">
<xsl:apply-templates
select="NAXML-POSJournal/JournalReport/SaleEvent/TransactionDetailGroup/TransactionLine"/>
</xsl:template>

<xsl:template match="TransactionLine">
<xsl:value-of
select="/NAXML-POSJournal/TransmissionHeader/StoreLocationID"/>
<xsl:text>,</xsl:text>
<xsl:value-of
select="/NAXML-POSJournal/JournalReport/SaleEvent/BusinessDate"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="ItemLine/ItemCode/POSCodeFormat/@format"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="ItemLine/ItemCode/POSCode"/>
<xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>
 
L

loc

Then process each 'TransactionLine' element and output what you want to
output:

<xsl:stylesheet
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   version="1.0">

   <xsl:strip-space elements="*"/>
   <xsl:eek:utput method="text"/>

   <xsl:template match="/">
     <xsl:apply-templates
select="NAXML-POSJournal/JournalReport/SaleEvent/TransactionDetailGroup/TransactionLine"/>
   </xsl:template>

   <xsl:template match="TransactionLine">
     <xsl:value-of
select="/NAXML-POSJournal/TransmissionHeader/StoreLocationID"/>
     <xsl:text>,</xsl:text>
     <xsl:value-of
select="/NAXML-POSJournal/JournalReport/SaleEvent/BusinessDate"/>
     <xsl:text>,</xsl:text>
     <xsl:value-of select="ItemLine/ItemCode/POSCodeFormat/@format"/>
     <xsl:text>,</xsl:text>
     <xsl:value-of select="ItemLine/ItemCode/POSCode"/>
     <xsl:text>
</xsl:text>
   </xsl:template>

</xsl:stylesheet>

Thanks, it works great, just what I wanted. One question, I'm just
trying to get a better understanding of how this works, why isn't a
for-each needed even though there are multiple matches for
<TransactionLine> and the data under it?
 
M

Martin Honnen

Thanks, it works great, just what I wanted. One question, I'm just
trying to get a better understanding of how this works, why isn't a
for-each needed even though there are multiple matches for
<TransactionLine> and the data under it?

The apply-templates
select="NAXML-POSJournal/JournalReport/SaleEvent/TransactionDetailGroup/TransactionLine"
selects all 'TransactionLine' elements for processing, that is the
reason you do not need a for-each.
 
L

loc

The apply-templates
select="NAXML-POSJournal/JournalReport/SaleEvent/TransactionDetailGroup/TransactionLine"
selects all 'TransactionLine' elements for processing, that is the
reason you do not need a for-each.

It's giving me the data I want, but there is an error, it doesn't like
the first line of my xml file

bash$ xsltproc style.xsl sale.xml
post.xml:1: parser warning : xmlParsePITarget: invalid name prefix
'xml'
<?xml-version ="1.0"standalone="no"?>
^
207,2010-02-11,upcA,028400079037
207,2010-02-11,upcA,049000051148

Should I just cut that line off, it then works without the error. I
don't have control over how the xml file is generated, but I could
modify it with `sed' or just delete that line.
 
M

Martin Honnen

loc said:
It's giving me the data I want, but there is an error, it doesn't like
the first line of my xml file

bash$ xsltproc style.xsl sale.xml
post.xml:1: parser warning : xmlParsePITarget: invalid name prefix
'xml'
<?xml-version ="1.0"standalone="no"?>

I am afraid that's not XML, a legal XML declaration is
<?xml version="1.0" standalone="no"?>
so you will need to fix that line if you want to parse it as XML (which
you need to process it with an XSLT processor).
 
J

Joe Kesselman

loc said:
Thanks, it works great, just what I wanted. One question, I'm just
trying to get a better understanding of how this works, why isn't a
for-each needed even though there are multiple matches for
<TransactionLine> and the data under it?

Apply-templates operates on all the nodes which match its select=
pattern. Effectively, that's an implied for-each.

(Actually, it may be better to reverse that and think of for-each as
applying a private inline template, but it isn't obvious why that's true
until you've worked with XSLT for a while.)

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
J

Joe Kesselman

loc said:
<?xml-version ="1.0"standalone="no"?>
^

The space it's pointing to, after xml-version, is legal per the XML
Recommendation (see http://www.w3.org/TR/REC-xml/#NT-XMLDecl,
particularly production 25). Assuming it really is a space character
rather than an &nbsp;.

But the fact that you're missing a space before "standalone" is
definitely an error. See http://www.w3.org/TR/REC-xml/#NT-SDDecl
(production 32); note that it requires a leading whitespace character.

So: If the latter is really in your file, whatever's producing that
document is not generating well-formed XML. Fix it (preferable, since it
will continue to upset everything that has to interface with it), or
preprocess to fix this problem.

If that doesn't cure the problem, and you're sure the space after
xml-version really is an XML whitespace character, that would appear to
be a bug in your XML parser.


--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
H

Hermann Peifer

I am afraid that's not XML, a legal XML declaration is
<?xml version="1.0" standalone="no"?>
so you will need to fix that line if you want to parse it as XML (which
you need to process it with an XSLT processor).


One "quick fix" for this line would be to simply drop it:

bash$ awk 'NR>1' sale.xml | xsltproc style.xsl -

Hermann
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top