XSL for recursive transformation

Discussion in 'XML' started by Indy, Feb 15, 2006.

  1. Indy

    Indy Guest

    Hi,
    I have a XHTML input file with custom tag which specifies html
    fragments to include
    For example:
    <html>
    ....
    <include frag1="frag1.html" frag2="frag2.html">
    More html here
    </include>
    ....html...
    <include frag1="frag3.html" ....>...

    </html>
    The include tag can be nested. The contents of an include tag would be
    combined with the fragments [frag1.html and frag2.html] to produce the
    output xml which would replace the currently processed include tag.
    After that the whole output has to be checked for valid XML. And the
    process is continued until there are no more include tags.

    I was wondering about the best way to go about doing this. Is XSL
    suitable? If so how?

    Thanks
    Indy
     
    Indy, Feb 15, 2006
    #1
    1. Advertising

  2. Indy wrote:
    > I was wondering about the best way to go about doing this. Is XSL
    > suitable? If so how?


    Given that XHTML is an XML language, the *right* way to do this would be
    to use XInclude tags. Assuming your XHTML processor supports XInclude,
    of course.

    If it doesn't -- yes, you can implement XInclude, or similar
    functionality, in XSLT if you want to. One such implementation can be
    seen at http://www.dpawson.co.uk/xsl/sect2/include.html

    (It's always worth checking Dave Pawson's XSLT FAQ website. He's done a
    very good job of collecting many of the best answers from the XSLT
    user's mailing list. Which, by the way, is also worth subscribing to if
    you're looking for a deeper understanding of stylesheets.)


    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
     
    Joe Kesselman, Feb 15, 2006
    #2
    1. Advertising

  3. Indy

    Nick Kew Guest

    Joe Kesselman wrote:
    > Indy wrote:
    >
    >> I was wondering about the best way to go about doing this. Is XSL
    >> suitable? If so how?

    >
    >
    > Given that XHTML is an XML language, the *right* way to do this would be
    > to use XInclude tags. Assuming your XHTML processor supports XInclude,
    > of course.


    FWIW, mod_transform for Apache is an XSLT filter that supports XInclude
    (based on libxml2/libxslt). So it's a solved problem on the Web.

    However, XSLT is not a good solution to this, except for small
    documents. Inclusion can be streamed, so it'll be hugely faster
    and more scalable using a SAX-based parser. mod_publisher would
    be a better choice.

    --
    Nick Kew
     
    Nick Kew, Feb 15, 2006
    #3
  4. Indy

    Peter Flynn Guest

    Indy wrote:
    > Hi,
    > I have a XHTML input file with custom tag which specifies html
    > fragments to include
    > For example:
    > <html>
    > ...
    > <include frag1="frag1.html" frag2="frag2.html">
    > More html here
    > </include>
    > ...html...
    > <include frag1="frag3.html" ....>...
    >
    > </html>
    > The include tag can be nested. The contents of an include tag would be
    > combined with the fragments [frag1.html and frag2.html] to produce the
    > output xml which would replace the currently processed include tag.
    > After that the whole output has to be checked for valid XML. And the
    > process is continued until there are no more include tags.
    >
    > I was wondering about the best way to go about doing this.


    Why not just use entity declarations?

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
     
    Peter Flynn, Feb 15, 2006
    #4
  5. Peter Flynn wrote:
    > Why not just use entity declarations?


    Parsed entities are pretty much dying as XML Schema replaces DTDs.
    Schemas don't have any equivalent. XInclude/XLink were supposed to take
    over that role.


    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
     
    Joe Kesselman, Feb 16, 2006
    #5
  6. Indy

    Indy Guest

    Hi,
    Thanks for your comments, I tried using XInclude tags but came across
    some problems.
    The fragments that I'm trying to include are not valid XML themselves,
    they could be for example be:
    ---sof---
    <table><tr><td>This is a header</td></tr>
    ---eof---

    and only when the fragments are assembled it forms a valid XML.

    Do you think XInclude can still be used to achieve this?

    Thanks again,
    Indeera
     
    Indy, Feb 16, 2006
    #6
  7. In article <>,
    Indy <> wrote:
    >The fragments that I'm trying to include are not valid XML themselves,

    ....
    >and only when the fragments are assembled it forms a valid XML.


    >Do you think XInclude can still be used to achieve this?


    No. XInclude operates at the level of the XML Infoset, not on
    characters. You will need to use a non-XML tool to put them together.

    -- Richard
     
    Richard Tobin, Feb 16, 2006
    #7
  8. Indy wrote:
    > The fragments that I'm trying to include are not valid XML themselves,


    In which case XML-aware tools aren't going to handle them. Write
    something text-based.



    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
     
    Joe Kesselman, Feb 16, 2006
    #8
  9. .... or redesign the whole problem so you're working with XML structure
    rather than text fragments.

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
     
    Joe Kesselman, Feb 16, 2006
    #9
  10. Indy

    Andy Dingley Guest

    Indy wrote:

    > I have a XHTML input file with custom tag which specifies html
    > fragments to include


    Other posters have suggested ways to include XML fragments in XML.

    However I'd advise against this, because you're trying to embed HTML as
    the fragment and HTML is _not_ XML. HTML needs to be processed with
    text or SGML aware tools, not XML. What happens if you encounter a HTML
    fragment that's not well-formed? What happens if you _want_ to use a
    fragment that's not well forned?

    RSS has addressed this same problem before now. Worth reading the
    background.
     
    Andy Dingley, Feb 16, 2006
    #10
  11. Indy

    Peter Flynn Guest

    Joe Kesselman wrote:
    > Peter Flynn wrote:
    >> Why not just use entity declarations?

    >
    > Parsed entities are pretty much dying as XML Schema replaces DTDs.


    I think you'll find them alive and kicking in many places. Reports
    of the death of DTDs are greatly exaggerated.

    > Schemas don't have any equivalent.


    QED

    > XInclude/XLink were supposed to take over that role.


    Oooh look, flying pigs :)

    ///Peter
     
    Peter Flynn, Feb 16, 2006
    #11
  12. >> Parsed entities are pretty much dying as XML Schema replaces DTDs.
    >
    > I think you'll find them alive and kicking in many places. Reports
    > of the death of DTDs are greatly exaggerated.


    Uhm. I agree that schemas are taking longer to find their way in than
    might have been expected, partly becuase they're a syntax only a
    database expert or computer science geek could love. (Though frankly the
    DTD syntax is also pretty hideous.)

    However, entities are definitely on the way out. The problem is that
    they really aren't all that useful unless there's a fragment that will
    appear in a huge number of instances of this kind of document, and even
    then they're only a significant advantage when producing the document by
    hand; it is a significant pain for software to recognize that the
    opportunity exists to take advantage of a parsed entity, and there
    usually isn't much to be gained by doing so.

    Entities had value when most docs were produced by humans pounding on
    raw XML text; they really aren't useful for docs produced by smarter
    editors. Most of the things you might still want to use them for can be
    handled better by an appropriate tool -- an editor that lets you see and
    enter the actual characters rather than their named equivalents, for
    example, or a syntax that's actually defined in the document rather than
    in a non-tag-language secondary file. Among other things, that permits
    different documents to reference different resource rather than having
    only a single set, hard-wired into the DTD, that they can name.

    >> XInclude/XLink were supposed to take over that role.

    > Oooh look, flying pigs :)


    I did put it in the imperfect tense... Part of the problem is that we're
    finding that the need for a portable syntax for documents referencing
    other documents isn't as universal as we expected. Or at least isn't so
    right now.

    If we'd designed XML completely before releasing it to the public, we
    would have started with the infoset (including namespaces and schemas
    and includes and links), then designed the syntax and APIs from that,
    Instead the W3C started with the syntax and a known-inadequate schema
    language (DTDs), and has build everything out from there. The upside is
    that folks had a chance to start using XML much earlier, and we've
    gotten some benefit from seeing which directions everyone has gone with
    it. The downside is that there have been some warts and hiccups and
    direction changes along the way, and tools have not always been quick to
    catch up -- and even when they have, folks who have working solutions
    using the old stopgaps are often reluctant to make the effort to move
    over. Which leaves all of us with the job of supporting multiple ways of
    doing things and trying to gently push folks toward the ones that will
    make their life -- and ours -- easier in the long run.

    Oh well. The cutting edge usually has a few nicks in it.



    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
     
    Joe Kesselman, Feb 17, 2006
    #12
  13. Indy

    Peter Flynn Guest

    Joe Kesselman wrote:
    >>> Parsed entities are pretty much dying as XML Schema replaces DTDs.

    >>
    >> I think you'll find them alive and kicking in many places. Reports
    >> of the death of DTDs are greatly exaggerated.

    >
    > Uhm. I agree that schemas are taking longer to find their way in than
    > might have been expected, partly because they're a syntax only a
    > database expert or computer science geek could love. (Though frankly the
    > DTD syntax is also pretty hideous.)


    Only a syntax geek would love it, but it has the advantage of being very
    terse, and once learned, quite expressive. RelaxNG seems to be the way
    forward, but I still feel we did the community a disservice by not
    properly investigating the possibility of adding datatyping to DTDs
    before running amok with W3C Schemas. Ah well. Another time.

    > However, entities are definitely on the way out. The problem is that
    > they really aren't all that useful unless there's a fragment that will
    > appear in a huge number of instances of this kind of document, and even
    > then they're only a significant advantage when producing the document by
    > hand;


    Actually there is rather a lot of stuff out there that does this.

    > it is a significant pain for software to recognize that the
    > opportunity exists to take advantage of a parsed entity, and there
    > usually isn't much to be gained by doing so.


    For parsed entities, yes. Legal boilerplate, tech doc, and chapter
    files for long documents are the only real candidates.

    Parameter entities are a different matter.

    > Entities had value when most docs were produced by humans pounding on
    > raw XML text; they really aren't useful for docs produced by smarter
    > editors. Most of the things you might still want to use them for can be
    > handled better by an appropriate tool -- an editor that lets you see and
    > enter the actual characters rather than their named equivalents, for


    This refers to character entities. Sadly, editors are still in their
    infancy when it comes to the interface (hence my thesis topic), and
    there are still a gazillion so-called plaintext editors (non-XML) out
    there that XML beginners use, which seriously screws up their chances
    when they start editing UTF-8. For this reason, several companies and
    projects I have been dealing with have made it policy for the moment
    to create ISO-8859-1 files only, and ALL other characters go in as
    character entity references or numeric references (fortunately for them
    they deal only with western languages in Latin scripts).

    > example, or a syntax that's actually defined in the document rather than
    > in a non-tag-language secondary file. Among other things, that permits
    > different documents to reference different resource rather than having
    > only a single set, hard-wired into the DTD, that they can name.
    >
    >>> XInclude/XLink were supposed to take over that role.

    >> Oooh look, flying pigs :)

    >
    > I did put it in the imperfect tense...


    Sorry, I was being deliberately provocative.

    > Part of the problem is that we're
    > finding that the need for a portable syntax for documents referencing
    > other documents isn't as universal as we expected. Or at least isn't so
    > right now.


    Ahead of the curve as usual :) Although the demand for a syntax to
    refer from one document to another is slowly approaching FAQ-level.
    It's just embarrassing that we had multi-way bidirectional 3rd-party
    linking in the Panorama plugin a decade ago, and still nothing to
    replace it.

    > If we'd designed XML completely before releasing it to the public,


    We'd still be discussing it.

    > would have started with the infoset (including namespaces and schemas
    > and includes and links), then designed the syntax and APIs from that,
    > Instead the W3C started with the syntax and a known-inadequate schema
    > language (DTDs), and has build everything out from there. The upside is
    > that folks had a chance to start using XML much earlier, and we've
    > gotten some benefit from seeing which directions everyone has gone with


    I like the description, although I disagree about the infoset. Coming
    from the tech doc background, I would have preferred to see some of the
    useful SGML features retained and more attention paid to the usability
    of markup. Pretending that a document is a tree when it's not (it's a
    document!) was a mistake we are still paying for. Starting with the
    syntax was OK, IMHO, and pretty much 99% of what we did was right. But
    schemas were a later development, a bolt-on which only came when the
    XML-Data folks saw the market for the syntax (and that's something else
    we'll end up paying for -- I see way too many slabs of data being done
    into XML when CSV would be much more sensible).

    > it. The downside is that there have been some warts and hiccups and
    > direction changes along the way, and tools have not always been quick to
    > catch up -- and even when they have, folks who have working solutions
    > using the old stopgaps are often reluctant to make the effort to move
    > over.


    This is going to be the interesting bit. New tools -- *really good* new
    tools -- are few and far between. And there are too many good old tools
    which have become unavailable just at the point when they were most
    needed, because of corporate buyouts resulting in technically-unaware
    people dropping the ball.

    > Which leaves all of us with the job of supporting multiple ways of
    > doing things and trying to gently push folks toward the ones that will
    > make their life -- and ours -- easier in the long run.


    It does work eventually. I've only had one breakage so far, and that was
    due to sabotage.

    > Oh well. The cutting edge usually has a few nicks in it.


    Mind that axe, Eugene.

    ///Peter
     
    Peter Flynn, Feb 17, 2006
    #13
  14. Indy

    Harrie Guest

    Harrie, Feb 17, 2006
    #14
  15. >> If we'd designed XML completely before releasing it to the public,
    > We'd still be discussing it.


    Which is why they went the other way around. Unfortunately that left us
    with some warts where the afterthoughts were tacked on (including some
    that could have been avoided, but... oh well; too much water over the
    dam at this point).

    > I like the description, although I disagree about the infoset. Coming
    > from the tech doc background, I would have preferred to see some of the
    > useful SGML features retained


    Trimming away everything that wasn't absolutely required is what made
    implementing XML easy. If you've ever written an SGML processor, you
    know getting it right is messy at best. XML was deliberately restricted
    to the point where the parser is implementable by an average student in
    a week or less.

    > This is going to be the interesting bit. New tools -- *really good* new
    > tools -- are few and far between.


    They're starting to appear, though. If you see a market not being
    adequately served, think of it as a marketing opportunity. That's what
    got us started on Xerces and Xalan...<grin/>

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
     
    Joe Kesselman, Feb 22, 2006
    #15
  16. Indy

    Peter Flynn Guest

    Joe Kesselman wrote:
    > Trimming away everything that wasn't absolutely required is what made
    > implementing XML easy. If you've ever written an SGML processor, you
    > know getting it right is messy at best. XML was deliberately restricted
    > to the point where the parser is implementable by an average student in
    > a week or less.


    I think Tim Bray's comment was "implementable in 'just a few' 30-hour
    Perl hacking sessions" :)

    > They're starting to appear, though. If you see a market not being
    > adequately served, think of it as a marketing opportunity.


    Oh I am, believe me :)

    ///Peter
     
    Peter Flynn, Feb 23, 2006
    #16
  17. Peter Flynn wrote:
    > I think Tim Bray's comment was "implementable in 'just a few' 30-hour
    > Perl hacking sessions" :)


    The concept of the DPH -- Desperate Perl Hacker -- has been invoked a
    number of times as an argument for why everything should be kept as
    simple as possible. (But not simpler.)



    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
     
    Joseph Kesselman, Feb 24, 2006
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. George Durzi

    Timeout on Xsl Transformation

    George Durzi, Dec 29, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    540
    George Durzi
    Dec 29, 2003
  2. Hugo Ferreira
    Replies:
    0
    Views:
    392
    Hugo Ferreira
    Jul 14, 2004
  3. George Durzi

    Nested DataSet / Xsl Transformation

    George Durzi, Mar 24, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    1,131
    George Durzi
    Mar 24, 2005
  4. Replies:
    1
    Views:
    3,686
    A. Bolmarcich
    May 27, 2005
  5. Replies:
    0
    Views:
    559
Loading...

Share This Page