Split element with another one

Discussion in 'XML' started by Eivind, Apr 29, 2005.

  1. Eivind

    Eivind Guest

    Hi,

    I'm creating XML-files from printed documents. According to the DTD I
    have to use, there has to be pagebreaks in the XML-file. These
    pagebrakes must be located whenever a new page in the printed version
    occurs. This is fairly simple to accomplish.
    The problem is however, the DTD states that the pagebreak cannot occur
    inside paragraph-element, but must be in between them.
    Is it possible, using XSLT, to end the paragraph-element before the
    pagebreak, and start a new one after it?

    To illustrastrate:

    Illegal text block:
    <para>Blah blah
    <pagebreak/>
    more blah blah</para>

    Must become:
    <para>Blah blah</para>
    <pagebreak/>
    <para>more blah blah</para>

    I'm grateful for any help!

    regards,
    Eivind Andersen
    Eivind, Apr 29, 2005
    #1
    1. Advertising

  2. "Eivind" <> writes:

    > Hi,
    >
    > I'm creating XML-files from printed documents. According to the DTD I
    > have to use, there has to be pagebreaks in the XML-file. These
    > pagebrakes must be located whenever a new page in the printed version
    > occurs. This is fairly simple to accomplish.
    > The problem is however, the DTD states that the pagebreak cannot occur
    > inside paragraph-element, but must be in between them.
    > Is it possible, using XSLT, to end the paragraph-element before the
    > pagebreak, and start a new one after it?
    >
    > To illustrastrate:
    >
    > Illegal text block:
    > <para>Blah blah
    > <pagebreak/>
    > more blah blah</para>
    >
    > Must become:
    > <para>Blah blah</para>
    > <pagebreak/>
    > <para>more blah blah</para>
    >
    > I'm grateful for any help!
    >
    > regards,
    > Eivind Andersen


    XSLT can do essentially arbitrary tree transformations so the answer is
    yes, but in this case the transformation may be more or less hard
    depending where pagebreak can be. Do you know that it's at the top level
    of para (this makes it fairly easy or can it be nested aywhere

    <para>Blah blah <italic> xxx <bold> zzz</bold>
    <pagebreak/> yyy</italic>
    more blah blah</para>

    In the latter case things are "interesting" as you have to close an
    arbitrary number of elements, and things get more interesting if
    the pagebreak appears in table markup and you have to correcly close all
    teh elemenst and open up everything needed for a new table...

    Assuming the simple case this is a grouping problem you just want to
    group all children of para depending on their position related to
    pagebreak, searching for xslt grouping on google will show lots of
    possibilities

    eg

    <xsl:template match="para">
    <xsl:copy-of select="."/>
    </xsl:template>

    <xsl:template match="para[pagebreak]">
    <para>
    <xsl:copy-of select="@*|pagebreak[1]/preceding-sibling::node()"/>
    </para>
    <xsl:for-each select="pagebreak">
    <xsl:copy-of select="."/>
    <para>
    <xsl:copy-of select="../@*"><!-- re-copy attributes, you might not want that-->
    <xsl:apply-templates select="following-sibling::node()[1] mode="p"/>
    </para>
    </xsl:for-each>
    </xsl:template>

    <xsl:template match="node()" mode="p">
    <xsl:copy-of select="."/>
    <xsl:apply-templates select="following-sibling::node()[1] mode="p"/>
    </xsl:template>

    <xsl:template match="pagebreak" mode="p"/>

    David
    David Carlisle, Apr 29, 2005
    #2
    1. Advertising

  3. Eivind

    Eivind Guest

    Wow! Thank you!

    Fortunately the pagebreaks wont occur inside a table, but it's possible
    to have one inside an <italic> or <bold> element.

    I havent been able to test this code yet, but I get on it first thing
    monday morning, and I'll report back a littel bit later :).

    Again, thank you for ble incredlble quick and helpful reply!

    Eivind
    Eivind, Apr 29, 2005
    #3

  4. > Fortunately the pagebreaks wont occur inside a table, but it's possible
    > o have one inside an <italic> or <bold> element.


    It's really a lot harder if that can happen.
    The general case where you have to close an arbitrary number of elements
    would need a completely different approach essentially walking over
    the whole tree one node at a a time building up a data structure of
    currently open elements as you go along. Ie implementing a parser in
    xslt. This is certainly possible but probably not a lot of fun (it would
    be a bit more fun in xslt2 than xslt1) But if you can tie down a secific
    list of bad things that can happen, in practice most cases can be done
    fairly easily in xslt, usually, on a good day...

    David
    David Carlisle, Apr 29, 2005
    #4
  5. Eivind

    Eivind Guest

    Hi,

    I've tried using the xsl templates you provided, and they seem to work
    quite good. However, the templates inserts some new attributes to the
    para and pagebreak elements:

    <pagebreak xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:mml="http://www.w3.org/1998/Math/MathML">116</pagebreak>

    How can you remove these? (I must admit I don't entirely undestand
    what's going on in the templates you gave me, so I don't see where the
    new attributes are inserted, and how to remove them)

    Eivind
    Eivind, May 4, 2005
    #5
  6. I've tried using the xsl templates you provided, and they seem to work
    quite good. However, the templates inserts some new attributes to the
    para and pagebreak elements:


    <pagebreak xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:mml="http://www.w3.org/1998/Math/MathML">116</pagebreak>


    These namespace declarations do not come from the templates I provided in
    this thread, they must be declared either elsewhere in your stylesheet
    or in your source file. How to get rid of them depends on where they
    came from.

    they may have come from me originally, I quite often use mml as the
    mathml namespace prefix, but mathml hasn't been mentioned so far in this
    thread has it?

    David
    David Carlisle, May 4, 2005
    #6
  7. Eivind

    Eivind Guest

    It seems they come from the root-element of the source file.
    (I tried to delete them from the source file, and then run the xslt
    again. Result: no namespace declarations throughout the resulting
    xml-file)

    Thank you for all your help!

    Eivind
    Eivind, May 4, 2005
    #7
  8. In general of course removing namespace declarations from the input will
    break the the input. If your document has any mathml in it then you
    can't remove the mathml declaration.

    To avoid copying, just don't use copy-of,

    so I think i originally said something like:


    <xsl:for-each select="pagebreak">
    <xsl:copy-of select="."/>

    doing


    <xsl:for-each select="pagebreak">
    <pagebreak/>

    would generate a new pagebreak element rather than copying one from the
    source so wouldn't copy any namespace nodes from the source.
    (but would use any in scope namespaces from the stylesheet)





    <xsl:for-each select="pagebreak">
    <xsl:element name="pagebreak"/>


    is similar but wouldn't use any namespaces from the stylesheet either
    (other than the default namepsace, if that has been declared)

    David
    David Carlisle, May 4, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Y.S.
    Replies:
    3
    Views:
    985
    strajan
    Sep 17, 2003
  2. loveNUNO
    Replies:
    2
    Views:
    892
    loveNUNO
    Nov 20, 2003
  3. Robert Cohen
    Replies:
    3
    Views:
    249
    Andrew Durstewitz
    Jul 15, 2003
  4. trans.  (T. Onoma)

    split on '' (and another for split -1)

    trans. (T. Onoma), Dec 27, 2004, in forum: Ruby
    Replies:
    10
    Views:
    201
    Florian Gross
    Dec 28, 2004
  5. OccasionalFlyer
    Replies:
    6
    Views:
    238
    Garrett Smith
    Jul 29, 2009
Loading...

Share This Page