Inlining or de-xmlifying xml using XSLT

S

Simon Brooke

I'm trying to write an XSL template to generate Google's data input
format for blogger, which is documented here: http://goo.gl/tydYA

As you can see, it has the particularly delightful property that the
content of the 'content' element actually /is/ markup, but has been
inlined or entified so that it appears as raw text. Yes, I know this is
bizarre and ugly, but I don't control it - Google do. And as there's no
documentation on how to import normally-well-formed XML, this is what I
need to generate.

I've been trying to write a template which does this awful mangling,
and what I've come up with seems to work:

<xsl:template name="mangle-xml">
<xsl:param name="content"/>
<xsl:choose>
<xsl:when test="$content/*">
&lt;<xsl:value-of select="local-name()"/>
<xsl:for-each select="$content/@*">
<xsl:value-of select="concat(' ', local-name(),
'=&quot;', ., '&quot;')"/>
</xsl:for-each>&gt;
<xsl:for-each select="$content/node()">
<xsl:call-template name="mangle-xml">
<xsl:with-param name="content" select="."/>
</xsl:call-template>
</xsl:for-each>
&lt;/<xsl:value-of select="local-name()"/>&gt;
</xsl:when>
<xsl:eek:therwise>?text?
<xsl:value-of select="$content"/>
</xsl:eek:therwise>
</xsl:choose>
</xsl:template>

This mangles the XML correctly(!), but it's pretty ugly, dark and
mysterious. Is there a cleaner way of doing this?

--
http://www.journeyman.cc/~simon/ :: PGP public key on home page

;; USER ERROR: replace user and press any key to continue


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAk00WPoACgkQPj28Ek2lI8X4rwCgsA+l1yGGMV2pXho124NuDzBP
0yMAn23M0QVt2FDVn2EMGcSY179rwyqW
=0j07
-----END PGP SIGNATURE-----
 
M

Mayeul

I'm trying to write an XSL template to generate Google's data input
format for blogger, which is documented here: http://goo.gl/tydYA

As you can see, it has the particularly delightful property that the
content of the 'content' element actually /is/ markup, but has been
inlined or entified so that it appears as raw text. Yes, I know this is
bizarre and ugly, but I don't control it - Google do. And as there's no
documentation on how to import normally-well-formed XML, this is what I
need to generate.

I've been trying to write a template which does this awful mangling,
and what I've come up with seems to work:

<xsl:template name="mangle-xml">
<xsl:param name="content"/>
<xsl:choose>
<xsl:when test="$content/*">
&lt;<xsl:value-of select="local-name()"/>
<xsl:for-each select="$content/@*">
<xsl:value-of select="concat(' ', local-name(),
'=&quot;', ., '&quot;')"/>
</xsl:for-each>&gt;
<xsl:for-each select="$content/node()">
<xsl:call-template name="mangle-xml">
<xsl:with-param name="content" select="."/>
</xsl:call-template>
</xsl:for-each>
&lt;/<xsl:value-of select="local-name()"/>&gt;
</xsl:when>
<xsl:eek:therwise>?text?
<xsl:value-of select="$content"/>
</xsl:eek:therwise>
</xsl:choose>
</xsl:template>

This mangles the XML correctly(!), but it's pretty ugly, dark and
mysterious. Is there a cleaner way of doing this?

Ugh. Personally I'd renounce making it in one integrated XSL
transformation. I'd make one for the <content>, another one for the
whole document, and feed the result from the first as a string parameter
to the second.

In idea, this is probably what was meant.
 
M

Martin Honnen

Simon said:
I'm trying to write an XSL template to generate Google's data input
format for blogger, which is documented here: http://goo.gl/tydYA

As you can see, it has the particularly delightful property that the
content of the 'content' element actually /is/ markup, but has been
inlined or entified so that it appears as raw text. Yes, I know this is
bizarre and ugly, but I don't control it - Google do. And as there's no
documentation on how to import normally-well-formed XML, this is what I
need to generate.

I've been trying to write a template which does this awful mangling,
and what I've come up with seems to work:

<xsl:template name="mangle-xml">
<xsl:param name="content"/>
<xsl:choose>
<xsl:when test="$content/*">
&lt;<xsl:value-of select="local-name()"/>
<xsl:for-each select="$content/@*">
<xsl:value-of select="concat(' ', local-name(),
'=&quot;', ., '&quot;')"/>
</xsl:for-each>&gt;
<xsl:for-each select="$content/node()">
<xsl:call-template name="mangle-xml">
<xsl:with-param name="content" select="."/>
</xsl:call-template>
</xsl:for-each>
&lt;/<xsl:value-of select="local-name()"/>&gt;
</xsl:when>
<xsl:eek:therwise>?text?
<xsl:value-of select="$content"/>
</xsl:eek:therwise>
</xsl:choose>
</xsl:template>

This mangles the XML correctly(!), but it's pretty ugly, dark and
mysterious. Is there a cleaner way of doing this?

Some XSLT processors supply extension functions to serialize nodes as
XML, for instance Saxon has
http://www.saxonica.com/documentation/extensions/functions/serialize.xml. I
would use that if available. If it needs to be done in XSLT itself then
I would probably do it with templates in a particular mode, not with a
single named template. And sophisticated approaches like
http://lenzconsulting.com/xml-to-string/xml-to-string.xsl to deal with
more complex problems like namespaces exist. But I am not sure you need
that, the markup is probably escaped as it is supposed to be some
text/html tag soup and not clean XML.
 
S

Simon Brooke

But I am not sure you need
that, the markup is probably escaped as it is supposed to be some
text/html tag soup and not clean XML.

Yes, probably the only reason they adopted this horrible kluge in the
first place was to deal with tag soup.

[fx: a look of ineffable disgust and derision plays across his features]

--
http://www.journeyman.cc/~simon/ :: PGP public key on home page

;; USER ERROR: replace user and press any key to continue


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAk00lbQACgkQPj28Ek2lI8V2pwCePqu7t/RpOf2R928SCWvOUXdf
yDYAmQExmFeJIRxccn9B3YbRty0rqrNs
=OboJ
-----END PGP SIGNATURE-----
 
J

Joe Kesselman

Yes, probably the only reason they adopted this horrible kluge in the
first place was to deal with tag soup.

.... It would accomplish that, I suppose. I've also seen this sort of
thing done simply because people didn't understand that a single tree --
especially if namespaced -- is actually _easier_ to handle than
reparsing the content.

Second the suggestion that this be generated via a Mode. Much of it
could be handled by rewriting the Identity Transform to output text
rather than XML. The really hideous thing is handling the case of nexted
<[![CDATA]]> sections; detecting and handling those requires string
manipulation on the contained text, which is not XSLT's strongest point.


--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
P

Peter Flynn

But I am not sure you need
that, the markup is probably escaped as it is supposed to be some
text/html tag soup and not clean XML.

Yes, probably the only reason they adopted this horrible kluge in the
first place was to deal with tag soup.

[fx: a look of ineffable disgust and derision plays across his features]

Cheer up, the recent announcements about HTML5 will only make things
worse :)

///Peter
 
J

Joe Kesselman

Cheer up, the recent announcements about HTML5 will only make things

Yeah, it seems the term "HTML5" has been hijacked from its original
intent, which was to bring HTML back to being well-formed, make it
XML-based rather than SGML-based, and make it extendable via namespaces.
Sigh. That's the web for ya -- never mind what would be useful and
robust, just let me make it pretty.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top