Detecting CDATA sections with XSLT

D

Dave Matthews

Hi folks,

I'm writing a web-page editing tool for my company which will allow
staff (with no "technical" expertise) to maintain their own Intranet sites.
The content for each webpage is stored in the form of XHTML in an XML
document (which, in turn, is stored in an XML database). So far so good.
However the editing tool must allow users to paste in the contents of MS
Word documents. I soon discovered that Word does not generate
properly-formed HTML, the main problem being that tags that should be nested
are often "overlapped" (as my example below shows). My solution is to store
this "bad" data as CDATA sections, thereby preventing the finished XML
document from being invalidated. My finished XML document looks something
like this:


<page id="0001">
<content>
<p>
<i>
<font face="Arial">Properly-formed HTML</font>
</i>
</p>
<![CDATA[<p><i><font face="Arial">The 'i' and 'font' end-tags are
wrong and there is no end-tag for 'p'</i></font>]]>
<p>
<i>
<font face="Arial">This is OK.</font>
</i>
</p>
</content>
</page>


On retrieving a document for formatting and display within the client
browser, my XSL template for the <content> nodes needs to be able to detect
whether each of its children can be regarded as proper XML (and, therefore,
to transform the it into HTML) or a CDATA section whose contents will simply
be passed straight to the browser. So my template needs to look something
like this:


<xsl:template match="content">
<xsl:for-each select="*">
<xsl:choose>
<xsl:when test="nodetype(.)=cdata()">
<xsl:value-of select=".">
</xsl:when>
<xsl:eek:therwise>
<xsl:apply-templates/>
</xsl:eek:therwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>


Of course it's the fourth line of code - <xsl:when
test="nodetype(.)=cdata()"> - that is giving me problems. Unfortunately I am
stuck with a fairly basic XSLT engine that has none of the fancy additional
functions MSXML, SAXON or Xalan offer. Try as I might, I can't find a way of
getting XSLT to tell when it's dealing with a CDATA section.

(I could simply hold everything as CDATA but in the future I am going to
have to interface with other systems that will demand as much content a
possible be presented as proper XML/XHTML.)

Any ideas would be very much appreciated!

--

Many thanks in advance!


Dave Matthews

'New Avengers' and 'Professionals' sites at:
http://www.mark-1.co.uk
 
R

Rolf Magnus

Dave said:
On retrieving a document for formatting and display within the
client
browser, my XSL template for the <content> nodes needs to be able to
detect whether each of its children can be regarded as proper XML
(and, therefore, to transform the it into HTML) or a CDATA section
whose contents will simply be passed straight to the browser.

xslt doesn't see CDATA secions. They're converted to text nodes before
xslt even sees them. But couldn't you just wrap an element around it?

<page id="0001">
    <content>
        <p>
            <i>
                <font face="Arial">Properly-formed HTML</font>
            </i>
        </p>
<non-wellformed>
        <![CDATA[<p><i><font face="Arial">The 'i' and 'font' end-tags are
wrong and there is no end-tag for 'p'</i></font>]]>
</non-wellformed>
        <p>
            <i>
                <font face="Arial">This is OK.</font>
            </i>
        </p>
    </content>
</page>
 
D

Dave Matthews

Thanks for your help, guys. You've confirmed what I was rapidly coming to
suspect!

Rolf - your idea of using a "wrapping" element seems an ideal way around
the problem - many thanks!

--
Cheers,

Dave Matthews

'New Avengers' and 'Professionals' sites at:
http://www.mark-1.co.uk
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top