XSLT to transform a "flat" XML file into a structured text file

Discussion in 'XML' started by R. P., Jun 21, 2006.

  1. R. P.

    R. P. Guest

    Subject: XSLT to transform a flat XML file into a structured text file

    I have an XML file that lists the PDF file segment names and titles of a
    larger document and looks something like this:

    <DOCUMENT>
    ......
    ...... some lead elements
    ......
    <SEGMENT_LIST>
    <SEGMENT FILE="fwd.pdf">Foreword</SEGMENT>
    <SEGMENT FILE="chap1.pdf">Chapter 1</SEGMENT>
    <SEGMENT FILE="chap2.pdf">Chapter 2</SEGMENT>
    <SEGMENT FILE="chap3.pdf">Chapter 3</SEGMENT>
    <SEGMENT FILE="v1fwd.pdf" VOLUME="Volume 1">Foreword</SEGMENT>
    <SEGMENT FILE="v1defs.pdf" VOLUME="Volume 1">Definitions</SEGMENT>
    <SEGMENT FILE="v1meth.pdf" VOLUME="Volume 1">Methodology</SEGMENT>
    <SEGMENT FILE="v1sachap1.pdf" VOLUME="Volume 1" GROUP='Section
    A">Chapter 1A</SEGMENT>
    <SEGMENT FILE="v1sachap2.pdf" VOLUME="Volume 1" GROUP='Section
    A">Chapter 2A</SEGMENT>
    <SEGMENT FILE="v1sachap3.pdf" VOLUME="Volume 1" GROUP='Section
    A">Chapter 3A</SEGMENT>
    <SEGMENT FILE="v1sbchap1.pdf" VOLUME="Volume 1" GROUP='Section
    B">Chapter 1B</SEGMENT>
    <SEGMENT FILE="v1sbchap2.pdf" VOLUME="Volume 1" GROUP='Section
    B">Chapter 2B</SEGMENT>
    <SEGMENT FILE="v1sbchap3.pdf" VOLUME="Volume 1" GROUP='Section
    B">Chapter 3B</SEGMENT>
    <SEGMENT FILE="appa.pdf" GROUP="Appendices">Appendix A</SEGMENT>
    <SEGMENT FILE="appb.pdf" GROUP="Appendices">Appendix B</SEGMENT>
    <SEGMENT FILE="appc.pdf" GROU2P="Appendices">Appendix C</SEGMENT>
    </SEGMENT_LIST>
    </DOCUMENT>

    I need to transform the SEGMENT_LIST elements into a structured text
    file for use by another application to construct the Table Of Content
    (TOC). The file would be vertical bar (|) separated list of PDF file
    segment names and their titles with a single-digit TOC indentation level
    indicator in the first position as so:

    1|fwd.pdf|Foreword
    1|chap1.pdf|Chapter 1
    1|chap2.pdf|Chapter 2
    1|chap3.pdf|Chapter 3
    1||Volume 1
    2|v1fwd.pdf|Foreword
    2|v1defs.pdf|Definitions
    2|v1meth.pdf|Methodology
    2||Section A
    3|v1sachap1.pdf|Chapter 1A
    3|v1sachap2.pdf|Chapter 2A
    3|v1sachap3.pdf|Chapter 3A
    2||Section B
    3|v1sbchap1.pdf|Chapter 1B
    3|v1sbchap2.pdf|Chapter 2B
    3|v1sbchap3.pdf|Chapter 3B
    1||Appendices
    2|appa.pdf|Appendix A
    2|appb.pdf|Appendix B
    2|appc.pdf|Appendix C

    I think you can imagine from the transformed file how the TOC would look
    like:

    Foreword
    Chapter 1
    Chapter 2
    Chapter 3
    Volume 1
    Foreword
    Definitions
    Methodology
    Section A
    Chapter 1A
    Chapter 2A
    Chapter 3A
    Section B
    Chapter 1B
    Chapter 2B
    Chapter 3B
    Appendices
    Appendix A
    Appendix B
    Appendix C

    My problem is that while I find it easy to write an XSLT stylesheet to
    create the first 4 lines of the output file where the source XML does
    not have either of the optional VOLUME and GROUP attributes:

    <xsl:template match="/">
    <xsl:apply-templates select="/DOCUMENT/SEGMENT_LIST/*" />
    </xsl:template>

    <xsl:template match="SEGMENT">
    <xsl:text>1|</xsl:text>
    <xsl:value-of select="@FILE"/>
    <xsl:text>|</xsl:text>
    <xsl:value-of select="."/>
    <xsl:text>
    </xsl:text>
    </xsl:template>

    I have no idea however, how to transform the rest of XML because I don't
    know how to process those attributes to make them Volume, Section and
    Appendices headers in the output file for all the segments with the
    same attribute value and with the proper indent level numbers.

    Any suggestion would be greatly appreciated.

    Rudy
    R. P., Jun 21, 2006
    #1
    1. Advertising

  2. R. P.

    Joris Gillis Guest

    On Wed, 21 Jun 2006 04:07:44 +0200, R. P. <> wrote:

    > Subject: XSLT to transform a flat XML file into a structured text file
    >
    > I have an XML file that lists the PDF file segment names and titles ofa
    > larger document and looks something like this:
    >
    > <DOCUMENT>
    > .....
    > ..... some lead elements
    > .....
    > <SEGMENT_LIST>
    > <SEGMENT FILE="fwd.pdf">Foreword</SEGMENT>
    > <SEGMENT FILE="chap1.pdf">Chapter 1</SEGMENT>
    > <SEGMENT FILE="chap2.pdf">Chapter 2</SEGMENT>
    > <SEGMENT FILE="chap3.pdf">Chapter 3</SEGMENT>
    > <SEGMENT FILE="v1fwd.pdf" VOLUME="Volume 1">Foreword</SEGMENT>
    > <SEGMENT FILE="v1defs.pdf" VOLUME="Volume 1">Definitions</SEGMENT>
    > <SEGMENT FILE="v1meth.pdf" VOLUME="Volume 1">Methodology</SEGMENT>
    > <SEGMENT FILE="v1sachap1.pdf" VOLUME="Volume 1" GROUP='Section
    > A">Chapter 1A</SEGMENT>
    > <SEGMENT FILE="v1sachap2.pdf" VOLUME="Volume 1" GROUP='Section
    > A">Chapter 2A</SEGMENT>
    > <SEGMENT FILE="v1sachap3.pdf" VOLUME="Volume 1" GROUP='Section
    > A">Chapter 3A</SEGMENT>
    > <SEGMENT FILE="v1sbchap1.pdf" VOLUME="Volume 1" GROUP='Section
    > B">Chapter 1B</SEGMENT>
    > <SEGMENT FILE="v1sbchap2.pdf" VOLUME="Volume 1" GROUP='Section
    > B">Chapter 2B</SEGMENT>
    > <SEGMENT FILE="v1sbchap3.pdf" VOLUME="Volume 1" GROUP='Section
    > B">Chapter 3B</SEGMENT>
    > <SEGMENT FILE="appa.pdf" GROUP="Appendices">Appendix A</SEGMENT>
    > <SEGMENT FILE="appb.pdf" GROUP="Appendices">Appendix B</SEGMENT>
    > <SEGMENT FILE="appc.pdf" GROU2P="Appendices">Appendix C</SEGMENT>
    > </SEGMENT_LIST>
    > </DOCUMENT>
    >
    > I need to transform the SEGMENT_LIST elements into a structured text
    > file for use by another application to construct the Table Of Content
    > (TOC). The file would be vertical bar (|) separated list of PDF file
    > segment names and their titles with a single-digit TOC indentation level
    > indicator in the first position as so:


    You probably should look for a solution involving 'multi-level grouping',
    possibly with muenchian technique...

    In the mean time, you could try out this quick and dirty solution:
    (I wouldn't use it in a production environment)

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:eek:utput method="text"/>

    <xsl:template match="SEGMENT">
    <xsl:variable name="this" select="@*[not(name()='FILE')]"/>
    <xsl:variable name="that"
    select="preceding-sibling::SEGMENT[1]/@*[not(name()='FILE')]"/>

    <xsl:if test="$this[not(.=$that)] or count($this)!=count($that)">
    <xsl:value-of select="count($this)"/>||<xsl:value-of
    select="$this[not(.=$that)]"/>
    <xsl:text>
    </xsl:text>
    </xsl:if>

    <xsl:value-of select="count($this) + 1"/>|<xsl:value-of select="@FILE"/>
    <xsl:text>|</xsl:text>
    <xsl:value-of select="."/>
    <xsl:text>
    </xsl:text>
    </xsl:template>

    </xsl:stylesheet>


    regards,
    --
    Joris Gillis (http://users.telenet.be/root-jg/me.html)
    Gaudiam omnibus traderat W3C, nec vana fides
    Joris Gillis, Jun 21, 2006
    #2
    1. Advertising

  3. R. P.

    R. P. Guest

    "Joris Gillis" <> wrote:
    >
    > You probably should look for a solution involving 'multi-level
    > grouping', possibly with muenchian technique...
    >
    > In the mean time, you could try out this quick and dirty solution:
    > (I wouldn't use it in a production environment)


    Thanks Joris, I wouldn't do it either. If for nothing else, it did not
    provide the sought results on my first attempt. :-( However, you gave me
    some tips on the direction I should be looking at for solution,
    especially the term that describes my problem: "multi-level grouping."
    I didn't know there was a name for it.

    Regards,
    Rudy
    R. P., Jun 22, 2006
    #3
  4. In case you aren't aware of it: Check the XSLT FAQ website's grouping
    and indexing pages; some of the techniques there are quite useful but
    not at all obvious.

    http://www.dpawson.co.uk/xsl/sect2/sect21.html

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Jun 22, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    504
  2. Replies:
    1
    Views:
    676
    Joris Gillis
    Feb 2, 2005
  3. jkflens
    Replies:
    2
    Views:
    1,472
    jkflens
    May 30, 2006
  4. adi
    Replies:
    1
    Views:
    31,611
    Joe Kesselman
    Jun 6, 2006
  5. Hoang
    Replies:
    5
    Views:
    663
    Dean Goodmanson
    Nov 17, 2003
Loading...

Share This Page