Inlining or de-xmlifying xml using XSLT

Discussion in 'XML' started by Simon Brooke, Jan 17, 2011.

  1. Simon Brooke

    Simon Brooke Guest

    I'm trying to write an XSL template to generate Google's data input
    format for blogger, which is documented here: http://goo.gl/tydYA

    As you can see, it has the particularly delightful property that the
    content of the 'content' element actually /is/ markup, but has been
    inlined or entified so that it appears as raw text. Yes, I know this is
    bizarre and ugly, but I don't control it - Google do. And as there's no
    documentation on how to import normally-well-formed XML, this is what I
    need to generate.

    I've been trying to write a template which does this awful mangling,
    and what I've come up with seems to work:

    <xsl:template name="mangle-xml">
    <xsl:param name="content"/>
    <xsl:choose>
    <xsl:when test="$content/*">
    &lt;<xsl:value-of select="local-name()"/>
    <xsl:for-each select="$content/@*">
    <xsl:value-of select="concat(' ', local-name(),
    '=&quot;', ., '&quot;')"/>
    </xsl:for-each>&gt;
    <xsl:for-each select="$content/node()">
    <xsl:call-template name="mangle-xml">
    <xsl:with-param name="content" select="."/>
    </xsl:call-template>
    </xsl:for-each>
    &lt;/<xsl:value-of select="local-name()"/>&gt;
    </xsl:when>
    <xsl:eek:therwise>?text?
    <xsl:value-of select="$content"/>
    </xsl:eek:therwise>
    </xsl:choose>
    </xsl:template>

    This mangles the XML correctly(!), but it's pretty ugly, dark and
    mysterious. Is there a cleaner way of doing this?

    --
    http://www.journeyman.cc/~simon/ :: PGP public key on home page

    ;; USER ERROR: replace user and press any key to continue


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEARECAAYFAk00WPoACgkQPj28Ek2lI8X4rwCgsA+l1yGGMV2pXho124NuDzBP
    0yMAn23M0QVt2FDVn2EMGcSY179rwyqW
    =0j07
    -----END PGP SIGNATURE-----
    Simon Brooke, Jan 17, 2011
    #1
    1. Advertising

  2. Simon Brooke

    Mayeul Guest

    On 17/01/2011 15:58, Simon Brooke wrote:
    > I'm trying to write an XSL template to generate Google's data input
    > format for blogger, which is documented here: http://goo.gl/tydYA
    >
    > As you can see, it has the particularly delightful property that the
    > content of the 'content' element actually /is/ markup, but has been
    > inlined or entified so that it appears as raw text. Yes, I know this is
    > bizarre and ugly, but I don't control it - Google do. And as there's no
    > documentation on how to import normally-well-formed XML, this is what I
    > need to generate.
    >
    > I've been trying to write a template which does this awful mangling,
    > and what I've come up with seems to work:
    >
    > <xsl:template name="mangle-xml">
    > <xsl:param name="content"/>
    > <xsl:choose>
    > <xsl:when test="$content/*">
    > &lt;<xsl:value-of select="local-name()"/>
    > <xsl:for-each select="$content/@*">
    > <xsl:value-of select="concat(' ', local-name(),
    > '=&quot;', ., '&quot;')"/>
    > </xsl:for-each>&gt;
    > <xsl:for-each select="$content/node()">
    > <xsl:call-template name="mangle-xml">
    > <xsl:with-param name="content" select="."/>
    > </xsl:call-template>
    > </xsl:for-each>
    > &lt;/<xsl:value-of select="local-name()"/>&gt;
    > </xsl:when>
    > <xsl:eek:therwise>?text?
    > <xsl:value-of select="$content"/>
    > </xsl:eek:therwise>
    > </xsl:choose>
    > </xsl:template>
    >
    > This mangles the XML correctly(!), but it's pretty ugly, dark and
    > mysterious. Is there a cleaner way of doing this?
    >


    Ugh. Personally I'd renounce making it in one integrated XSL
    transformation. I'd make one for the <content>, another one for the
    whole document, and feed the result from the first as a string parameter
    to the second.

    In idea, this is probably what was meant.

    --
    Mayeul
    Mayeul, Jan 17, 2011
    #2
    1. Advertising

  3. Simon Brooke wrote:
    > I'm trying to write an XSL template to generate Google's data input
    > format for blogger, which is documented here: http://goo.gl/tydYA
    >
    > As you can see, it has the particularly delightful property that the
    > content of the 'content' element actually /is/ markup, but has been
    > inlined or entified so that it appears as raw text. Yes, I know this is
    > bizarre and ugly, but I don't control it - Google do. And as there's no
    > documentation on how to import normally-well-formed XML, this is what I
    > need to generate.
    >
    > I've been trying to write a template which does this awful mangling,
    > and what I've come up with seems to work:
    >
    > <xsl:template name="mangle-xml">
    > <xsl:param name="content"/>
    > <xsl:choose>
    > <xsl:when test="$content/*">
    > &lt;<xsl:value-of select="local-name()"/>
    > <xsl:for-each select="$content/@*">
    > <xsl:value-of select="concat(' ', local-name(),
    > '=&quot;', ., '&quot;')"/>
    > </xsl:for-each>&gt;
    > <xsl:for-each select="$content/node()">
    > <xsl:call-template name="mangle-xml">
    > <xsl:with-param name="content" select="."/>
    > </xsl:call-template>
    > </xsl:for-each>
    > &lt;/<xsl:value-of select="local-name()"/>&gt;
    > </xsl:when>
    > <xsl:eek:therwise>?text?
    > <xsl:value-of select="$content"/>
    > </xsl:eek:therwise>
    > </xsl:choose>
    > </xsl:template>
    >
    > This mangles the XML correctly(!), but it's pretty ugly, dark and
    > mysterious. Is there a cleaner way of doing this?


    Some XSLT processors supply extension functions to serialize nodes as
    XML, for instance Saxon has
    http://www.saxonica.com/documentation/extensions/functions/serialize.xml. I
    would use that if available. If it needs to be done in XSLT itself then
    I would probably do it with templates in a particular mode, not with a
    single named template. And sophisticated approaches like
    http://lenzconsulting.com/xml-to-string/xml-to-string.xsl to deal with
    more complex problems like namespaces exist. But I am not sure you need
    that, the markup is probably escaped as it is supposed to be some
    text/html tag soup and not clean XML.

    --

    Martin Honnen
    http://msmvps.com/blogs/martin_honnen/
    Martin Honnen, Jan 17, 2011
    #3
  4. Simon Brooke

    Simon Brooke Guest

    On Mon, 17 Jan 2011 18:00:14 +0100
    Martin Honnen <> wrote:

    > But I am not sure you need
    > that, the markup is probably escaped as it is supposed to be some
    > text/html tag soup and not clean XML.


    Yes, probably the only reason they adopted this horrible kluge in the
    first place was to deal with tag soup.

    [fx: a look of ineffable disgust and derision plays across his features]

    --
    http://www.journeyman.cc/~simon/ :: PGP public key on home page

    ;; USER ERROR: replace user and press any key to continue


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (GNU/Linux)

    iEYEARECAAYFAk00lbQACgkQPj28Ek2lI8V2pwCePqu7t/RpOf2R928SCWvOUXdf
    yDYAmQExmFeJIRxccn9B3YbRty0rqrNs
    =OboJ
    -----END PGP SIGNATURE-----
    Simon Brooke, Jan 17, 2011
    #4
  5. On 1/17/2011 2:17 PM, Simon Brooke wrote:
    > Yes, probably the only reason they adopted this horrible kluge in the
    > first place was to deal with tag soup.


    .... It would accomplish that, I suppose. I've also seen this sort of
    thing done simply because people didn't understand that a single tree --
    especially if namespaced -- is actually _easier_ to handle than
    reparsing the content.

    Second the suggestion that this be generated via a Mode. Much of it
    could be handled by rewriting the Identity Transform to output text
    rather than XML. The really hideous thing is handling the case of nexted
    <[![CDATA]]> sections; detecting and handling those requires string
    manipulation on the contained text, which is not XSLT's strongest point.


    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Kesselman, Jan 19, 2011
    #5
  6. Simon Brooke

    Peter Flynn Guest

    On 17/01/11 19:17, Simon Brooke wrote:
    > On Mon, 17 Jan 2011 18:00:14 +0100
    > Martin Honnen<> wrote:
    >
    >> But I am not sure you need
    >> that, the markup is probably escaped as it is supposed to be some
    >> text/html tag soup and not clean XML.

    >
    > Yes, probably the only reason they adopted this horrible kluge in the
    > first place was to deal with tag soup.
    >
    > [fx: a look of ineffable disgust and derision plays across his features]


    Cheer up, the recent announcements about HTML5 will only make things
    worse :)

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
    Peter Flynn, Jan 20, 2011
    #6
  7. > Cheer up, the recent announcements about HTML5 will only make things
    > worse :)


    Yeah, it seems the term "HTML5" has been hijacked from its original
    intent, which was to bring HTML back to being well-formed, make it
    XML-based rather than SGML-based, and make it extendable via namespaces.
    Sigh. That's the web for ya -- never mind what would be useful and
    robust, just let me make it pretty.

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Kesselman, Jan 21, 2011
    #7
  8. Simon Brooke

    Peter Flynn Guest

    On 21/01/11 01:56, Joe Kesselman wrote:
    >> Cheer up, the recent announcements about HTML5 will only make things
    >> worse :)

    >
    > Yeah, it seems the term "HTML5" has been hijacked from its original
    > intent, which was to bring HTML back to being well-formed, make it
    > XML-based rather than SGML-based, and make it extendable via namespaces.
    > Sigh. That's the web for ya -- never mind what would be useful and
    > robust, just let me make it pretty.


    http://blog.whatwg.org/html-is-the-new-html5#comments

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
    Peter Flynn, Jan 21, 2011
    #8
  9. On 1/24/2011 6:53 PM, William F Hammond wrote:
    > For example,<p> ...<a href="..."> ...</p><p> ...</p><p>...</a>...</p>


    Ill-formed in either XML or SGML, right?


    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Kesselman, Jan 25, 2011
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matthew Lenz
    Replies:
    0
    Views:
    511
    Matthew Lenz
    Feb 22, 2005
  2. Stylus Studio
    Replies:
    0
    Views:
    637
    Stylus Studio
    Aug 3, 2004
  3. Replies:
    4
    Views:
    656
  4. jkflens
    Replies:
    2
    Views:
    1,448
    jkflens
    May 30, 2006
  5. Lipper, Matthew

    Inlining Images Using Rublog

    Lipper, Matthew, Jul 27, 2004, in forum: Ruby
    Replies:
    1
    Views:
    121
    Dave Thomas
    Jul 27, 2004
Loading...

Share This Page