XSLT Extract Text from Nodes

Discussion in 'XML' started by gregmcmullinjr@gmail.com, Oct 10, 2006.

  1. Guest

    Hello,

    I am new to the concept of XSL and am looking for some assistance.

    Take the following XML document:

    <binder>
    <author>Greg</author>
    <notes>
    <time>11:45</time>
    <content>
    This would be some content... every once in a while you may run
    into
    <heading>A Heading!</heading>
    Which could be followed by more content... and possible
    <heading>More Headings.</heading>
    and even more content!
    </content>
    </notes>
    </binder>

    What I would like to do is to be able to extract the value of the
    <content> node, and have special formatting for the headings.

    When I do something like:

    <xsl:value-of select="content" />

    I receive the data within <content> - including the values of the
    nested <heading> nodes, but what I really want to be able to do is do
    is to have XSLT read the text of the <content> node until a <heading>
    node is reached, at which point the value of the heading node is
    formatted correctly and displayed, and then continued by the text of
    the <content> node after the <heading> until another <heading> is
    reached... etc etc...

    Could someone give me some pointers as to how this can be accomplished?
     
    , Oct 10, 2006
    #1
    1. Advertising

  2. wrote:


    > <content>
    > This would be some content... every once in a while you may run
    > into
    > <heading>A Heading!</heading>
    > Which could be followed by more content... and possible
    > <heading>More Headings.</heading>
    > and even more content!
    > </content>



    > What I would like to do is to be able to extract the value of the
    > <content> node, and have special formatting for the headings.


    Use templates and xsl:apply-templates e.g.

    <xsl:template match="content">
    <div>
    <xsl:apply-templates/>
    </div>
    </xsl:template>

    <xsl:template match="heading">
    <h1>
    <xsl:apply-templates/>
    </h1>
    </xsl:template>

    There is a built-in template for text nodes
    <http://www.w3.org/TR/xslt#built-in-rule>
    so you don't have to do anything for them, they end up in the result
    tree anyway with the above approach.


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Oct 10, 2006
    #2
    1. Advertising

  3. Guest

    Thanks for your quick reply Martin,

    This has brought me closer to what I would like to accomplish, however
    I now have the following issue.

    I was using the xsl:value-of element with disable-output-escaping="yes"
    to produce HTML formatted text in the browser screen. You see within
    the <content> node there may be HTML that should be displayed as such.
    Your method produces all of the text in the correct order and formatted
    according to tag name, but produces HTML tags which should be hidden.

    ie.

    <content>
    There may be some <i>italicized</i> text...
    <heading>Maybe even <u>formatting in a heading</u></heading>
    ...
    </content>

    Is there some way to overcome this?

    Martin Honnen wrote:
    > wrote:
    >
    >
    > > <content>
    > > This would be some content... every once in a while you may run
    > > into
    > > <heading>A Heading!</heading>
    > > Which could be followed by more content... and possible
    > > <heading>More Headings.</heading>
    > > and even more content!
    > > </content>

    >
    >
    > > What I would like to do is to be able to extract the value of the
    > > <content> node, and have special formatting for the headings.

    >
    > Use templates and xsl:apply-templates e.g.
    >
    > <xsl:template match="content">
    > <div>
    > <xsl:apply-templates/>
    > </div>
    > </xsl:template>
    >
    > <xsl:template match="heading">
    > <h1>
    > <xsl:apply-templates/>
    > </h1>
    > </xsl:template>
    >
    > There is a built-in template for text nodes
    > <http://www.w3.org/TR/xslt#built-in-rule>
    > so you don't have to do anything for them, they end up in the result
    > tree anyway with the above approach.
    >
    >
    > --
    >
    > Martin Honnen
    > http://JavaScript.FAQTs.com/
     
    , Oct 10, 2006
    #3
  4. Guest

    I should say that the HTML tags within my XML document are stored as
    entities (at least the < character is) i.e.

    <content>
    This is some &lt;i>italicized&lt;/i> text...
    ...
    </content>

    Thanks.


    wrote:
    > Thanks for your quick reply Martin,
    >
    > This has brought me closer to what I would like to accomplish, however
    > I now have the following issue.
    >
    > I was using the xsl:value-of element with disable-output-escaping="yes"
    > to produce HTML formatted text in the browser screen. You see within
    > the <content> node there may be HTML that should be displayed as such.
    > Your method produces all of the text in the correct order and formatted
    > according to tag name, but produces HTML tags which should be hidden.
    >
    > ie.
    >
    > <content>
    > There may be some <i>italicized</i> text...
    > <heading>Maybe even <u>formatting in a heading</u></heading>
    > ...
    > </content>
    >
    > Is there some way to overcome this?
    >
    > Martin Honnen wrote:
    > > wrote:
    > >
    > >
    > > > <content>
    > > > This would be some content... every once in a while you may run
    > > > into
    > > > <heading>A Heading!</heading>
    > > > Which could be followed by more content... and possible
    > > > <heading>More Headings.</heading>
    > > > and even more content!
    > > > </content>

    > >
    > >
    > > > What I would like to do is to be able to extract the value of the
    > > > <content> node, and have special formatting for the headings.

    > >
    > > Use templates and xsl:apply-templates e.g.
    > >
    > > <xsl:template match="content">
    > > <div>
    > > <xsl:apply-templates/>
    > > </div>
    > > </xsl:template>
    > >
    > > <xsl:template match="heading">
    > > <h1>
    > > <xsl:apply-templates/>
    > > </h1>
    > > </xsl:template>
    > >
    > > There is a built-in template for text nodes
    > > <http://www.w3.org/TR/xslt#built-in-rule>
    > > so you don't have to do anything for them, they end up in the result
    > > tree anyway with the above approach.
    > >
    > >
    > > --
    > >
    > > Martin Honnen
    > > http://JavaScript.FAQTs.com/
     
    , Oct 10, 2006
    #4
  5. Guest

    I have found a solution. The following is the build in template for
    text nodes:

    <xsl:template match="text()|@*">
    <xsl:value-of select="."/>
    </xsl:template>

    It can be overridden simply by creating a new custom template, which I
    did as the following:

    <xsl:template match="text()|@*">
    <xsl:value-of select="." disable-output-escaping="yes"/>
    </xsl:template>

    The result is that the HTML in the text nodes outputs as desired.

    wrote:
    > I should say that the HTML tags within my XML document are stored as
    > entities (at least the < character is) i.e.
    >
    > <content>
    > This is some &lt;i>italicized&lt;/i> text...
    > ...
    > </content>
    >
    > Thanks.
    >
    >
    > wrote:
    > > Thanks for your quick reply Martin,
    > >
    > > This has brought me closer to what I would like to accomplish, however
    > > I now have the following issue.
    > >
    > > I was using the xsl:value-of element with disable-output-escaping="yes"
    > > to produce HTML formatted text in the browser screen. You see within
    > > the <content> node there may be HTML that should be displayed as such.
    > > Your method produces all of the text in the correct order and formatted
    > > according to tag name, but produces HTML tags which should be hidden.
    > >
    > > ie.
    > >
    > > <content>
    > > There may be some <i>italicized</i> text...
    > > <heading>Maybe even <u>formatting in a heading</u></heading>
    > > ...
    > > </content>
    > >
    > > Is there some way to overcome this?
    > >
    > > Martin Honnen wrote:
    > > > wrote:
    > > >
    > > >
    > > > > <content>
    > > > > This would be some content... every once in a while you may run
    > > > > into
    > > > > <heading>A Heading!</heading>
    > > > > Which could be followed by more content... and possible
    > > > > <heading>More Headings.</heading>
    > > > > and even more content!
    > > > > </content>
    > > >
    > > >
    > > > > What I would like to do is to be able to extract the value of the
    > > > > <content> node, and have special formatting for the headings.
    > > >
    > > > Use templates and xsl:apply-templates e.g.
    > > >
    > > > <xsl:template match="content">
    > > > <div>
    > > > <xsl:apply-templates/>
    > > > </div>
    > > > </xsl:template>
    > > >
    > > > <xsl:template match="heading">
    > > > <h1>
    > > > <xsl:apply-templates/>
    > > > </h1>
    > > > </xsl:template>
    > > >
    > > > There is a built-in template for text nodes
    > > > <http://www.w3.org/TR/xslt#built-in-rule>
    > > > so you don't have to do anything for them, they end up in the result
    > > > tree anyway with the above approach.
    > > >
    > > >
    > > > --
    > > >
    > > > Martin Honnen
    > > > http://JavaScript.FAQTs.com/
     
    , Oct 10, 2006
    #5
  6. roy axenov Guest

    Please don't top-post.

    wrote:
    > Martin Honnen wrote:
    > > wrote:
    > > > <content>
    > > > This would be some content... every once in a
    > > > while you may run into
    > > > <heading>A Heading!</heading>
    > > > Which could be followed by more content... and
    > > > possible
    > > > <heading>More Headings.</heading>
    > > > and even more content!
    > > > </content>

    > >
    > > Use templates and xsl:apply-templates e.g.
    > >
    > > <xsl:template match="content">
    > > <div>
    > > <xsl:apply-templates/>
    > > </div>
    > > </xsl:template>
    > >
    > > <xsl:template match="heading">
    > > <h1>
    > > <xsl:apply-templates/>
    > > </h1>
    > > </xsl:template>

    >
    > This has brought me closer to what I would like to
    > accomplish, however I now have the following issue.
    >
    > I was using the xsl:value-of element with
    > disable-output-escaping="yes" to produce HTML formatted
    > text in the browser screen. You see within the <content>
    > node there may be HTML that should be displayed as such.
    > Your method produces all of the text in the correct order
    > and formatted according to tag name, but produces HTML
    > tags which should be hidden.
    >
    > ie.
    >
    > <content>
    > There may be some <i>italicized</i> text...
    > <heading>Maybe even <u>formatting in a
    > heading</u></heading>
    > ...
    > </content>
    >
    > Is there some way to overcome this?
    >
    > I should say that the HTML tags within my XML document
    > are stored as entities (at least the < character is) i.e.
    >
    > <content>
    > This is some &lt;i>italicized&lt;/i> text...
    > ...
    > </content>


    Don't do that, it seems to lead to innumerable problems.
    Store you mark-up as XML instead:

    <content>
    This is some <i>italicized</i> text...
    ...
    </content>

    ....and use the identity transformation to convert it into
    HTML:

    <xsl:template match="@*|node()">
    <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
    </xsl:template>

    This also has the virtue of fitting neatly with the
    solution for your original problem that Martin Honnen has
    proposed.

    You might also need to write exclusion templates for some
    nodes, but that's hardly a problem.

    --
    roy axenov
     
    roy axenov, Oct 10, 2006
    #6
  7. Guest

    Not sure what a top-post is...

    While I see what your saying Roy, the problem is that the contained
    HTML is not necessarily well formed because of the way that its formed
    at this time. Perhaps when I have figured out how to force it to be
    well formed I can use this solution.

    Thanks.

    roy axenov wrote:
    > Please don't top-post.
    >
    > wrote:
    > > Martin Honnen wrote:
    > > > wrote:
    > > > > <content>
    > > > > This would be some content... every once in a
    > > > > while you may run into
    > > > > <heading>A Heading!</heading>
    > > > > Which could be followed by more content... and
    > > > > possible
    > > > > <heading>More Headings.</heading>
    > > > > and even more content!
    > > > > </content>
    > > >
    > > > Use templates and xsl:apply-templates e.g.
    > > >
    > > > <xsl:template match="content">
    > > > <div>
    > > > <xsl:apply-templates/>
    > > > </div>
    > > > </xsl:template>
    > > >
    > > > <xsl:template match="heading">
    > > > <h1>
    > > > <xsl:apply-templates/>
    > > > </h1>
    > > > </xsl:template>

    > >
    > > This has brought me closer to what I would like to
    > > accomplish, however I now have the following issue.
    > >
    > > I was using the xsl:value-of element with
    > > disable-output-escaping="yes" to produce HTML formatted
    > > text in the browser screen. You see within the <content>
    > > node there may be HTML that should be displayed as such.
    > > Your method produces all of the text in the correct order
    > > and formatted according to tag name, but produces HTML
    > > tags which should be hidden.
    > >
    > > ie.
    > >
    > > <content>
    > > There may be some <i>italicized</i> text...
    > > <heading>Maybe even <u>formatting in a
    > > heading</u></heading>
    > > ...
    > > </content>
    > >
    > > Is there some way to overcome this?
    > >
    > > I should say that the HTML tags within my XML document
    > > are stored as entities (at least the < character is) i.e.
    > >
    > > <content>
    > > This is some &lt;i>italicized&lt;/i> text...
    > > ...
    > > </content>

    >
    > Don't do that, it seems to lead to innumerable problems.
    > Store you mark-up as XML instead:
    >
    > <content>
    > This is some <i>italicized</i> text...
    > ...
    > </content>
    >
    > ...and use the identity transformation to convert it into
    > HTML:
    >
    > <xsl:template match="@*|node()">
    > <xsl:copy>
    > <xsl:apply-templates select="@*|node()"/>
    > </xsl:copy>
    > </xsl:template>
    >
    > This also has the virtue of fitting neatly with the
    > solution for your original problem that Martin Honnen has
    > proposed.
    >
    > You might also need to write exclusion templates for some
    > nodes, but that's hardly a problem.
    >
    > --
    > roy axenov
     
    , Oct 10, 2006
    #7
  8. schrieb:
    > roy axenov wrote:
    >> Please don't top-post.


    > Not sure what a top-post is...


    Then ask a search engine. It will lead you to some documents like
    <http://www.catb.org/~esr/jargon/html/T/top-post.html>.

    --
    Johannes Koch
    Spem in alium nunquam habui praeter in te, Deus Israel.
    (Thomas Tallis, 40-part motet)
     
    Johannes Koch, Oct 10, 2006
    #8
  9. wrote:


    > It can be overridden simply by creating a new custom template, which I
    > did as the following:
    >
    > <xsl:template match="text()|@*">
    > <xsl:value-of select="." disable-output-escaping="yes"/>
    > </xsl:template>
    >
    > The result is that the HTML in the text nodes outputs as desired.


    If that works for you then you can use it. But you should be aware that
    disable-output-escaping support is an optional feature during
    serialization of the result tree meaning it might not be supported at
    all by an XSLT processor or it is not supported when you don't serialize
    the result tree (e.g. when you chain transformation or e.g. in a browser
    like Mozilla where the result tree is being rendered directly without
    any serialization happening).

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Oct 11, 2006
    #9
  10. Guest

    I think this will suffice for my needs as I am doing the
    transformations on the server.

    Thanks again.

    Martin Honnen wrote:
    > wrote:
    >
    >
    > > It can be overridden simply by creating a new custom template, which I
    > > did as the following:
    > >
    > > <xsl:template match="text()|@*">
    > > <xsl:value-of select="." disable-output-escaping="yes"/>
    > > </xsl:template>
    > >
    > > The result is that the HTML in the text nodes outputs as desired.

    >
    > If that works for you then you can use it. But you should be aware that
    > disable-output-escaping support is an optional feature during
    > serialization of the result tree meaning it might not be supported at
    > all by an XSLT processor or it is not supported when you don't serialize
    > the result tree (e.g. when you chain transformation or e.g. in a browser
    > like Mozilla where the result tree is being rendered directly without
    > any serialization happening).
    >
    > --
    >
    > Martin Honnen
    > http://JavaScript.FAQTs.com/
     
    , Oct 11, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. asd
    Replies:
    3
    Views:
    440
    Arnaud Berger
    May 23, 2005
  2. Michael Reiche
    Replies:
    3
    Views:
    10,854
    Michael Reiche
    Feb 5, 2004
  3. Andy Fish
    Replies:
    2
    Views:
    1,182
    Andy Fish
    Jan 10, 2005
  4. Xamle Eng

    Why treat text nodes as nodes?

    Xamle Eng, May 13, 2005, in forum: XML
    Replies:
    8
    Views:
    496
    Fredrik Lundh
    May 28, 2005
  5. Volker Lenhardt
    Replies:
    4
    Views:
    1,013
    Volker Lenhardt
    Feb 23, 2012
Loading...

Share This Page