xsltproc question - I am clueless and a newbie, so don't be too roughon me!

Discussion in 'XML' started by Glen Millard, Mar 11, 2012.

  1. Glen Millard

    Glen Millard Guest

    Okay, I have an XML file that I get from a provider:

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/css" href="basic.xsl"?>
    <content>
    <status>ok</status>
    <faxes>
    <faxid>21404974</faxid>
    <date>2012-01-05 07:34:10</date>
    <source>5194852368</source>
    <destination>8885216725</destination>
    <status>read</status>
    <faxid>23223059</faxid>
    <date>2012-03-01 07:27:52</date>
    <source>5194211862</source>
    <destination>8885216725</destination>
    <status>new</status>
    <faxid>23223164</faxid>
    <date>2012-03-01 07:29:45</date>
    <source>5194211862</source>
    <destination>8885210692</destination>
    <status>new</status>
    <faxid>23224287</faxid>
    <date>2012-03-01 07:51:07</date>
    <source>8885216725</source>
    <destination>8885210692</destination>
    <status>new</status>
    </faxes></content>

    I want to be able to import/parse this into a MySQL database. However, I need to reformat it so that it is 'database-centric'. I was going to use a simple script to use find/replace with a sed and regular expressions - which works.

    However, I am going to need to do this multiple times per day and figured that an xslt processor would be more efficient.

    So, can someone get me started? I think that the root element being <database></database> would be a start.

    I am just kind of clueless on this - I am not looking for someone to do it for me, just a little hand-holding on how to create an xsl stylesheet to rename the elements.

    This is the type of format that I want to achieve:


    <database>
    <status>ok</status>
    <faxes>
    <row>
    <faxid>21404974</faxid>
    <date>2012-01-05 07:34:10</date>
    <source>5194852368</source>
    <destination>8885216725</destination>
    <status>read</status>
    </row>
    <row>
    <faxid>23223059</faxid>
    <date>2012-03-01 07:27:52</date>
    <source>5194211862</source>
    <destination>8885216725</destination>
    <status>new</status>
    </row>
    <row>
    <faxid>23223164</faxid>
    <date>2012-03-01 07:29:45</date>
    <source>5194211862</source>
    <destination>8885210692</destination>
    <status>new</status>
    </row>
    <row>
    <faxid>23224287</faxid>
    <date>2012-03-01 07:51:07</date>
    <source>8885216725</source>
    <destination>8885210692</destination>
    <status>new</status>
    </row>
    </faxes>
    </database>

    This way, I can use a parser XML:parser in Perl to bring it into a MySQL database.

    So, I guess substituting/replacing tags is what I need to do.

    I guess I just don't understand the syntax of xsl stylesheets.

    Thanks - Glen
    Glen Millard, Mar 11, 2012
    #1
    1. Advertising

  2. Glen Millard

    Simon Wright Guest

    Re: xsltproc question - I am clueless and a newbie, so don't be too rough on me!

    Glen Millard <> writes:

    > Okay, I have an XML file that I get from a provider:


    > I want to be able to import/parse this into a MySQL database. However,
    > I need to reformat it so that it is 'database-centric'. I was going to
    > use a simple script to use find/replace with a sed and regular
    > expressions - which works.


    > So, can someone get me started? I think that the root element being
    > <database></database> would be a start.


    This should be a start - it doesn't do the <status> element (so your
    first 'ok' comes out in the wrong place), and I've only done a couple of
    the elements. Also, it relies on your provider (who should really know
    better and put in the <row> (or <fax>) elements in the first place!) not
    to omit any elements; if one of the <source> elements was missing, all
    the <sources>s would be one row out from then on.

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:eek:utput method="xml" encoding="iso-8859-1" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/content/faxes">
    <xsl:element name="database">
    <xsl:element name="faxes">
    <xsl:for-each select="faxid">
    <xsl:variable name="i" select="position()"/>
    <xsl:element name="row">
    <xsl:element name="faxid">
    <xsl:value-of select="."/>
    </xsl:element>
    <xsl:element name="date">
    <xsl:value-of select="../date[$i]"/>
    </xsl:element>
    </xsl:element>
    </xsl:for-each>
    </xsl:element>
    </xsl:element>
    </xsl:template>

    </xsl:stylesheet>
    Simon Wright, Mar 11, 2012
    #2
    1. Advertising

  3. Re: xsltproc question - I am clueless and a newbie, so don't be toorough on me!

    Haven't checked the logic, but I'd note that by using literal result
    elements you can simplify that slightly. Also, I'd leave out the
    encoding and let it stay in UTF8 unless you have some specific reason
    for doing otherwise. Finally, you might want to explicitly process only
    the <faxes> elements, and make sure anything else doesn't contribute, by
    adding a root template which selects only those for processing.

    But, yeah, whoever created that XML in the first place should be ashamed
    of themselves for not structuring it better.


    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:eek:utput method="xml" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
    <xsl:apply-templates select="/content/faxes"/>
    </xsl:template>

    <xsl:template match="/content/faxes">
    <database>
    <faxes>
    <xsl:for-each select="faxid">
    <xsl:variable name="i" select="position()"/>
    <row>
    <faxid>
    <xsl:value-of select="."/>
    </faxid>
    <date>
    <xsl:value-of select="../date[$i]"/>
    </date>
    </row>
    </xsl:for-each>
    </faxes>
    </database>
    </xsl:template>

    </xsl:stylesheet>


    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Kesselman, Mar 12, 2012
    #3
  4. Glen Millard

    Simon Wright Guest

    Re: xsltproc question - I am clueless and a newbie, so don't be too rough on me!

    Joe Kesselman <> writes:

    > Haven't checked the logic, but I'd note that by using literal result
    > elements you can simplify that slightly. Also, I'd leave out the
    > encoding and let it stay in UTF8 unless you have some specific reason
    > for doing otherwise. Finally, you might want to explicitly process
    > only the <faxes> elements, and make sure anything else doesn't
    > contribute, by adding a root template which selects only those for
    > processing.


    All good points; the encoding change was a copy-and-paste thing.

    OP wanted the content/status to be copied too, so I tried

    ....
    <xsl:template match="/">
    <database>
    <xsl:apply-templates select="/content/status"/>
    <xsl:apply-templates select="/content/faxes"/>
    </database>
    </xsl:template>

    <xsl:template match="/content/status">
    <status><xsl:value-of select="."/></status>
    </xsl:template>

    <xsl:template match="/content/faxes">
    <faxes>
    ....
    Simon Wright, Mar 12, 2012
    #4
  5. Re: xsltproc question - I am clueless and a newbie, so don't be too rough on me!

    Glen Millard <> writes:

    > <?xml version="1.0" encoding="UTF-8"?>
    > <?xml-stylesheet type="text/css" href="basic.xsl"?>
    > <content>
    > <status>ok</status>
    > <faxes>
    > <faxid>21404974</faxid>
    > <date>2012-01-05 07:34:10</date>
    > <source>5194852368</source>
    > <destination>8885216725</destination>
    > <status>read</status>
    > <faxid>23223059</faxid>
    > <date>2012-03-01 07:27:52</date>
    > <source>5194211862</source>
    > <destination>8885216725</destination>
    > <status>new</status>

    [...]
    > </faxes></content>

    [to]

    > <database>
    > <status>ok</status>
    > <faxes>
    > <row>
    > <faxid>21404974</faxid>
    > <date>2012-01-05 07:34:10</date>
    > <source>5194852368</source>
    > <destination>8885216725</destination>
    > <status>read</status>
    > </row>
    > <row>
    > <faxid>23223059</faxid>
    > <date>2012-03-01 07:27:52</date>
    > <source>5194211862</source>
    > <destination>8885216725</destination>
    > <status>new</status>
    > </row>

    [...]
    > </faxes>
    > </database>


    XPath also has the nice concept of "axis", which lets you traverse the
    tree along various, well, axes. In your case, it means that whenever
    you've found a <faxid>, you can find the following <date> etc. by
    moving along the "following-sibling" axis. In your case, you can write:

    <xsl:template match="faxid">
    <xsl-variable name="date" select="./following-sibling::date[1]"/>
    <xsl-variable name="source" select="./following-sibling::source[1]"/>

    <row>
    <faxid><xsl-value-of select="."/><faxid>
    <date><xsl:value-of select="$date"/></date>
    <source><xsl:value-of select="$source"/></source>
    ...
    </row>
    </xsl:template>

    The stylesheet then only needs to apply this templates to all <faxid>
    elements.

    As others have said, the format of your input document should be
    changed. All solutions in this thread would stop working if some <faxid>
    "qualifiers" (like <date>) become optional.

    -- Alain.
    Alain Ketterlin, Mar 12, 2012
    #5
  6. Glen Millard

    Glen Millard Guest

    Re: xsltproc question - I am clueless and a newbie, so don't be toorough on me!

    Sheldon;

    That is a HUGE help!

    Now that I see how the syntax works, I think I can take it from there. I don't know if the status tag is relevant, but I will check with my client.

    I think I get the idea - thank you much again.

    Glen

    On Sunday, March 11, 2012 5:25:29 PM UTC-4, Simon Wright wrote:
    > Glen Millard <> writes:
    >
    > > Okay, I have an XML file that I get from a provider:

    >
    > > I want to be able to import/parse this into a MySQL database. However,
    > > I need to reformat it so that it is 'database-centric'. I was going to
    > > use a simple script to use find/replace with a sed and regular
    > > expressions - which works.

    >
    > > So, can someone get me started? I think that the root element being
    > > <database></database> would be a start.

    >
    > This should be a start - it doesn't do the <status> element (so your
    > first 'ok' comes out in the wrong place), and I've only done a couple of
    > the elements. Also, it relies on your provider (who should really know
    > better and put in the <row> (or <fax>) elements in the first place!) not
    > to omit any elements; if one of the <source> elements was missing, all
    > the <sources>s would be one row out from then on.
    >
    > <?xml version="1.0" encoding="utf-8"?>
    > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    > version="1.0">
    >
    > <xsl:eek:utput method="xml" encoding="iso-8859-1" indent="yes"/>
    > <xsl:strip-space elements="*"/>
    >
    > <xsl:template match="/content/faxes">
    > <xsl:element name="database">
    > <xsl:element name="faxes">
    > <xsl:for-each select="faxid">
    > <xsl:variable name="i" select="position()"/>
    > <xsl:element name="row">
    > <xsl:element name="faxid">
    > <xsl:value-of select="."/>
    > </xsl:element>
    > <xsl:element name="date">
    > <xsl:value-of select="../date[$i]"/>
    > </xsl:element>
    > </xsl:element>
    > </xsl:for-each>
    > </xsl:element>
    > </xsl:element>
    > </xsl:template>
    >
    > </xsl:stylesheet>
    Glen Millard, Mar 12, 2012
    #6
  7. Glen Millard

    Glen Millard Guest

    Re: xsltproc question - I am clueless and a newbie, so don't be toorough on me!

    Okay, now for my next dumb question - ha.

    When I get the XML document delivered, this is EXACTLY what it looks like.

    <?xml version="1.0" encoding="UTF-8"?>
    <content>
    <status>ok</status><faxes><faxid>21404974</faxid><date>2012-01-05 07:34:10</date><source>5194852368</source><destination>8885216725</destination><status>read</status><faxid>23223059</faxid><date>2012-03-01 07:27:52</date><source>5194211862</source><destination>8885216725</destination><status>new</status><faxid>23223164</faxid><date>2012-03-01 07:29:45</date><source>5194211862</source><destination>8885210692</destination><status>new</status><faxid>23224287</faxid><date>2012-03-01 07:51:07</date><source>8885216725</source><destination>8885210692</destination><status>new</status></faxes></content>

    Now, again, I am not asking for anyone to do my work for me, but what wouldbe the best way (besides using an editor or fancy shell script).

    Looking at the provider's API calls, there does not seem to be any way to format it.

    So, I pose this question also - what would be the best way to format this XML document so that it is something workable?

    Thanks again, everyone.

    Glen

    On Sunday, March 11, 2012 4:49:31 PM UTC-4, Glen Millard wrote:
    > Okay, I have an XML file that I get from a provider:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <?xml-stylesheet type="text/css" href="basic.xsl"?>
    > <content>
    > <status>ok</status>
    > <faxes>
    > <faxid>21404974</faxid>
    > <date>2012-01-05 07:34:10</date>
    > <source>5194852368</source>
    > <destination>8885216725</destination>
    > <status>read</status>
    > <faxid>23223059</faxid>
    > <date>2012-03-01 07:27:52</date>
    > <source>5194211862</source>
    > <destination>8885216725</destination>
    > <status>new</status>
    > <faxid>23223164</faxid>
    > <date>2012-03-01 07:29:45</date>
    > <source>5194211862</source>
    > <destination>8885210692</destination>
    > <status>new</status>
    > <faxid>23224287</faxid>
    > <date>2012-03-01 07:51:07</date>
    > <source>8885216725</source>
    > <destination>8885210692</destination>
    > <status>new</status>
    > </faxes></content>
    >
    > I want to be able to import/parse this into a MySQL database. However, I need to reformat it so that it is 'database-centric'. I was going to use a simple script to use find/replace with a sed and regular expressions - which works.
    >
    > However, I am going to need to do this multiple times per day and figuredthat an xslt processor would be more efficient.
    >
    > So, can someone get me started? I think that the root element being <database></database> would be a start.
    >
    > I am just kind of clueless on this - I am not looking for someone to do it for me, just a little hand-holding on how to create an xsl stylesheet to rename the elements.
    >
    > This is the type of format that I want to achieve:
    >
    >
    > <database>
    > <status>ok</status>
    > <faxes>
    > <row>
    > <faxid>21404974</faxid>
    > <date>2012-01-05 07:34:10</date>
    > <source>5194852368</source>
    > <destination>8885216725</destination>
    > <status>read</status>
    > </row>
    > <row>
    > <faxid>23223059</faxid>
    > <date>2012-03-01 07:27:52</date>
    > <source>5194211862</source>
    > <destination>8885216725</destination>
    > <status>new</status>
    > </row>
    > <row>
    > <faxid>23223164</faxid>
    > <date>2012-03-01 07:29:45</date>
    > <source>5194211862</source>
    > <destination>8885210692</destination>
    > <status>new</status>
    > </row>
    > <row>
    > <faxid>23224287</faxid>
    > <date>2012-03-01 07:51:07</date>
    > <source>8885216725</source>
    > <destination>8885210692</destination>
    > <status>new</status>
    > </row>
    > </faxes>
    > </database>
    >
    > This way, I can use a parser XML:parser in Perl to bring it into a MySQL database.
    >
    > So, I guess substituting/replacing tags is what I need to do.
    >
    > I guess I just don't understand the syntax of xsl stylesheets.
    >
    > Thanks - Glen
    Glen Millard, Mar 12, 2012
    #7
  8. Glen Millard

    Simon Wright Guest

    Re: xsltproc question - I am clueless and a newbie, so don't be too rough on me!

    Glen Millard <> writes:

    > Okay, now for my next dumb question - ha.
    >
    > When I get the XML document delivered, this is EXACTLY what it looks
    > like.
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <content>
    > <status>ok</status><faxes><faxid>21404974</faxid><date>2012-01-05 07:34:10</date><source>5194852368</source><destination>8885216725</destination><status>read</status><faxid>23223059</faxid><date>2012-03-01 07:27:52</date><source>5194211862</source><destination>8885216725</destination><status>new</status><faxid>23223164</faxid><date>2012-03-01 07:29:45</date><source>5194211862</source><destination>8885210692</destination><status>new</status><faxid>23224287</faxid><date>2012-03-01 07:51:07</date><source>8885216725</source><destination>8885210692</destination><status>new</status></faxes></content>
    >
    > Now, again, I am not asking for anyone to do my work for me, but what
    > would be the best way (besides using an editor or fancy shell script).
    >
    > Looking at the provider's API calls, there does not seem to be any way
    > to format it.
    >
    > So, I pose this question also - what would be the best way to format
    > this XML document so that it is something workable?


    The XSLT processor shouldn't care about line breaks or other white space
    in the input; the collapsed input above produces the same output as the
    version in your original post.
    Simon Wright, Mar 12, 2012
    #8
  9. Re: xsltproc question - I am clueless and a newbie, so don't be toorough on me!

    Glen Millard wrote:

    > So, I pose this question also - what would be the best way to format this XML document so that it is something workable?


    Well if you load such a document into a browser like IE or Firefox or
    Opera or Chrome then they show it in a formatted way (they show the
    document tree where you can collapse or expand levels).

    And XML editors or plugins for editors usually have an option to indent
    a document.

    If you want to do it yourself with XSLT then running the document
    through a stylesheet doing

    <xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:eek:utput method="xml" indent="yes"/>

    <xsl:template match="/">
    <xsl:copy-of select="."/>
    </xsl:template>

    </xsl:stylesheet>

    is also a way.


    --

    Martin Honnen --- MVP Data Platform Development
    http://msmvps.com/blogs/martin_honnen/
    Martin Honnen, Mar 12, 2012
    #9
  10. Glen Millard

    Glen Millard Guest

    Re: xsltproc question - I am clueless and a newbie, so don't be toorough on me!

    Hey all - I discovered an easier way around this. A utility called tidy - it is even in the Linux distro.

    It takes my XML file and formats it quite nicely upon download.

    This is one of the things I was looking for.

    Thanks again

    Glen

    On Monday, March 12, 2012 8:45:52 AM UTC-4, Martin Honnen wrote:
    > Glen Millard wrote:
    >
    > > So, I pose this question also - what would be the best way to format this XML document so that it is something workable?

    >
    > Well if you load such a document into a browser like IE or Firefox or
    > Opera or Chrome then they show it in a formatted way (they show the
    > document tree where you can collapse or expand levels).
    >
    > And XML editors or plugins for editors usually have an option to indent
    > a document.
    >
    > If you want to do it yourself with XSLT then running the document
    > through a stylesheet doing
    >
    > <xsl:stylesheet
    > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    > version="1.0">
    >
    > <xsl:eek:utput method="xml" indent="yes"/>
    >
    > <xsl:template match="/">
    > <xsl:copy-of select="."/>
    > </xsl:template>
    >
    > </xsl:stylesheet>
    >
    > is also a way.
    >
    >
    > --
    >
    > Martin Honnen --- MVP Data Platform Development
    > http://msmvps.com/blogs/martin_honnen/
    Glen Millard, Mar 12, 2012
    #10
  11. Glen Millard

    Peter Flynn Guest

    Re: xsltproc question - I am clueless and a newbie, so don't be toorough on me!

    On 12/03/12 03:45, Joe Kesselman wrote:
    > Haven't checked the logic, but I'd note that by using literal result
    > elements you can simplify that slightly. Also, I'd leave out the
    > encoding and let it stay in UTF8 unless you have some specific reason
    > for doing otherwise. Finally, you might want to explicitly process only
    > the <faxes> elements, and make sure anything else doesn't contribute, by
    > adding a root template which selects only those for processing.
    >
    > But, yeah, whoever created that XML in the first place should be ashamed
    > of themselves for not structuring it better.


    Amen. Unfortunately many people don't think before creating XML.

    [OP]
    >> I am clueless and a newbie, so don't be too rough on me!


    You are not the clueless one: that is your client, unfortunately.

    When I get a large quantity (or many repeat instances) of files like
    this, where the markup itself is regular enough to be SGML, but the
    design defective, I resort to invoking omissibility. This SGML document:

    <!doctype content [
    <!element content - - (status,faxes)>
    <!element status - - (#pcdata)>
    <!element faxes - - (fax)+>
    <!element fax o o (faxid,date,source,destination,status)>
    <!element faxid - - (#pcdata)>
    <!element date - - (#pcdata)>
    <!element source - - (#pcdata)>
    <!element destination - - (#pcdata)>
    ]>
    <content>
    <status>ok</status>
    <faxes>
    <faxid>21404974</faxid>
    <date>2012-01-05 07:34:10</date>
    <source>5194852368</source>
    <destination>8885216725</destination>
    <status>read</status>
    <faxid>23223059</faxid>
    <date>2012-03-01 07:27:52</date>
    <source>5194211862</source>
    <destination>8885216725</destination>
    <status>new</status>
    <faxid>23223164</faxid>
    <date>2012-03-01 07:29:45</date>
    <source>5194211862</source>
    <destination>8885210692</destination>
    <status>new</status>
    <faxid>23224287</faxid>
    <date>2012-03-01 07:51:07</date>
    <source>8885216725</source>
    <destination>8885210692</destination>
    <status>new</status>
    </faxes></content>

    can be processed with sgmlnorm to normalize it so that the missing <fax>
    and </fax> elements are inserted. Usually this is only worth doing for a
    workflow, so that it will create fully-normalized SGML which an XML
    processor will accept.

    >> When I get the XML document delivered, this is EXACTLY what it looks like.
    >>
    >> <?xml version="1.0" encoding="UTF-8"?>
    >> <content>
    >> <status>ok</status><faxes><faxid>21404974</faxid><date>2012-01-05
    >> 07:34:10</date><source>5194852368</source><destination>8885216725</destination><status>read</status><faxid>23223059</faxid><date>2012-03-01
    >> 07:27:52</date><source>5194211862</source><destination>8885216725</destination><status>new</status><faxid>23223164</faxid><date>2012-03-01
    >> 07:29:45</date><source>5194211862</source><destination>8885210692</destination><status>new</status><faxid>23224287</faxid><date>2012-03-01
    >> 07:51:07</date><source>8885216725</source><destination>8885210692</destination><status>new</status></faxes></content>


    That's fine. XML doesn't need to be pretty-printed unless you want to
    show it to a human. As Martin indicated, there are ways to pretty-print
    it if you need (and from a later post you discovered Tidy). But it's
    usually more effective to concentrate on making the markup processable
    rather than on making it look attractive.

    ///Peter
    Peter Flynn, Mar 12, 2012
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mickey Segal
    Replies:
    0
    Views:
    370
    Mickey Segal
    Aug 21, 2004
  2. dave
    Replies:
    1
    Views:
    397
    Andy Dingley
    Oct 17, 2005
  3. Brett

    xsltproc and Entities

    Brett, Feb 27, 2004, in forum: XML
    Replies:
    1
    Views:
    897
    Alain Ketterlin
    Mar 1, 2004
  4. Darel Finkbeiner

    xsltproc and DocBook

    Darel Finkbeiner, Mar 21, 2007, in forum: XML
    Replies:
    5
    Views:
    806
    Joseph Kesselman
    Mar 22, 2007
  5. Glen Millard
    Replies:
    4
    Views:
    1,383
    Glen Millard
    Apr 15, 2012
Loading...

Share This Page