Split element with another one

E

Eivind

Hi,

I'm creating XML-files from printed documents. According to the DTD I
have to use, there has to be pagebreaks in the XML-file. These
pagebrakes must be located whenever a new page in the printed version
occurs. This is fairly simple to accomplish.
The problem is however, the DTD states that the pagebreak cannot occur
inside paragraph-element, but must be in between them.
Is it possible, using XSLT, to end the paragraph-element before the
pagebreak, and start a new one after it?

To illustrastrate:

Illegal text block:
<para>Blah blah
<pagebreak/>
more blah blah</para>

Must become:
<para>Blah blah</para>
<pagebreak/>
<para>more blah blah</para>

I'm grateful for any help!

regards,
Eivind Andersen
 
D

David Carlisle

Eivind said:
Hi,

I'm creating XML-files from printed documents. According to the DTD I
have to use, there has to be pagebreaks in the XML-file. These
pagebrakes must be located whenever a new page in the printed version
occurs. This is fairly simple to accomplish.
The problem is however, the DTD states that the pagebreak cannot occur
inside paragraph-element, but must be in between them.
Is it possible, using XSLT, to end the paragraph-element before the
pagebreak, and start a new one after it?

To illustrastrate:

Illegal text block:
<para>Blah blah
<pagebreak/>
more blah blah</para>

Must become:
<para>Blah blah</para>
<pagebreak/>
<para>more blah blah</para>

I'm grateful for any help!

regards,
Eivind Andersen

XSLT can do essentially arbitrary tree transformations so the answer is
yes, but in this case the transformation may be more or less hard
depending where pagebreak can be. Do you know that it's at the top level
of para (this makes it fairly easy or can it be nested aywhere

<para>Blah blah <italic> xxx <bold> zzz</bold>
<pagebreak/> yyy</italic>
more blah blah</para>

In the latter case things are "interesting" as you have to close an
arbitrary number of elements, and things get more interesting if
the pagebreak appears in table markup and you have to correcly close all
teh elemenst and open up everything needed for a new table...

Assuming the simple case this is a grouping problem you just want to
group all children of para depending on their position related to
pagebreak, searching for xslt grouping on google will show lots of
possibilities

eg

<xsl:template match="para">
<xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="para[pagebreak]">
<para>
<xsl:copy-of select="@*|pagebreak[1]/preceding-sibling::node()"/>
</para>
<xsl:for-each select="pagebreak">
<xsl:copy-of select="."/>
<para>
<xsl:copy-of select="../@*"><!-- re-copy attributes, you might not want that-->
<xsl:apply-templates select="following-sibling::node()[1] mode="p"/>
</para>
</xsl:for-each>
</xsl:template>

<xsl:template match="node()" mode="p">
<xsl:copy-of select="."/>
<xsl:apply-templates select="following-sibling::node()[1] mode="p"/>
</xsl:template>

<xsl:template match="pagebreak" mode="p"/>

David
 
E

Eivind

Wow! Thank you!

Fortunately the pagebreaks wont occur inside a table, but it's possible
to have one inside an <italic> or <bold> element.

I havent been able to test this code yet, but I get on it first thing
monday morning, and I'll report back a littel bit later :).

Again, thank you for ble incredlble quick and helpful reply!

Eivind
 
D

David Carlisle

Fortunately the pagebreaks wont occur inside a table, but it's possible
o have one inside an <italic> or <bold> element.

It's really a lot harder if that can happen.
The general case where you have to close an arbitrary number of elements
would need a completely different approach essentially walking over
the whole tree one node at a a time building up a data structure of
currently open elements as you go along. Ie implementing a parser in
xslt. This is certainly possible but probably not a lot of fun (it would
be a bit more fun in xslt2 than xslt1) But if you can tie down a secific
list of bad things that can happen, in practice most cases can be done
fairly easily in xslt, usually, on a good day...

David
 
E

Eivind

Hi,

I've tried using the xsl templates you provided, and they seem to work
quite good. However, the templates inserts some new attributes to the
para and pagebreak elements:

<pagebreak xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:mml="http://www.w3.org/1998/Math/MathML">116</pagebreak>

How can you remove these? (I must admit I don't entirely undestand
what's going on in the templates you gave me, so I don't see where the
new attributes are inserted, and how to remove them)

Eivind
 
D

David Carlisle

I've tried using the xsl templates you provided, and they seem to work
quite good. However, the templates inserts some new attributes to the
para and pagebreak elements:


<pagebreak xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:mml="http://www.w3.org/1998/Math/MathML">116</pagebreak>


These namespace declarations do not come from the templates I provided in
this thread, they must be declared either elsewhere in your stylesheet
or in your source file. How to get rid of them depends on where they
came from.

they may have come from me originally, I quite often use mml as the
mathml namespace prefix, but mathml hasn't been mentioned so far in this
thread has it?

David
 
E

Eivind

It seems they come from the root-element of the source file.
(I tried to delete them from the source file, and then run the xslt
again. Result: no namespace declarations throughout the resulting
xml-file)

Thank you for all your help!

Eivind
 
D

David Carlisle

In general of course removing namespace declarations from the input will
break the the input. If your document has any mathml in it then you
can't remove the mathml declaration.

To avoid copying, just don't use copy-of,

so I think i originally said something like:


<xsl:for-each select="pagebreak">
<xsl:copy-of select="."/>

doing


<xsl:for-each select="pagebreak">
<pagebreak/>

would generate a new pagebreak element rather than copying one from the
source so wouldn't copy any namespace nodes from the source.
(but would use any in scope namespaces from the stylesheet)





<xsl:for-each select="pagebreak">
<xsl:element name="pagebreak"/>


is similar but wouldn't use any namespaces from the stylesheet either
(other than the default namepsace, if that has been declared)

David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top