Looking for suggestions (xslt?) on stripping specified elements/attributesfrom XHTML

Discussion in 'XML' started by Foxpointe, Jul 26, 2006.

  1. Foxpointe

    Foxpointe Guest

    Given some arbitrary XHTML, I'd like to obtain a 'simplified' XHTML
    result which strips out a large subset of standard elements and
    attributes - but not all. The main things I would like to accomplish:

    1) Provide a list of elements/attributes to be stripped (i.e. everything
    else should be passed through) or those that should be passed through
    (i.e. everything else should be stripped) which would be applied
    recursively.
    2) If an element is to be stripped, pass through any enclosed text
    and/or elements (the elements should in turn be processed recursively by
    step 1.)
    3) If after stripping the resulting element is empty, eliminate it
    completely.

    For example, this snippet:

    <h1>
    <a href='chap2.htm'>
    <img src="image.gif" alt="Thumbnail" border=0>
    </a>
    </h1>
    <table width=515 border=0 cellpadding=0 cellspacing=0>
    <tr>
    <td width=172 align=left valign=top>
    <a href="chap1.htm">
    <img src="prev.gif" alt="Previous" border=0>
    </a>
    </td>
    <td>
    <style type="text/css">
    </style>
    </td>
    <td width=171 align=center valign=top>
    <b>
    <font face="ariel,helvetica,helv,sanserif" size="-1">Chapter 2 Getting
    Started</font>
    </b>
    </td>
    <td width=172 align=right valign=top>
    <a href="chap3.htm">
    <img src="next.gif" alt="Next" border=0>
    </a>
    </td>
    </tr>
    </table>

    Would become:

    <a href='chap2.htm'>
    <img src="image.gif">
    </a>
    <table>
    <tr>
    <td>
    <a href="chap1.htm">
    <img src="prev.gif" alt="Previous">
    </a>
    </td>
    <td>
    Chapter 2 Getting Started
    </td>
    <td>
    <a href="chap3.htm">
    <img src="next.gif" alt="Next">
    </a>
    </td>
    </tr>
    </table>

    Is XSLT the best means to accomplish this? Suggestions on how to get
    this done (esp. examples that could be used as a starting point) are
    appreciated.

    Thanks,
    Phil
    Foxpointe, Jul 26, 2006
    #1
    1. Advertising

  2. Re: Looking for suggestions (xslt?) on stripping specified elements/attributes from XHTML

    Search for and read about "XSLT identity rule" or "XSLT identity
    transformation".

    It is the most fundamental design pattern in XSLT to override the identity
    rule in order to globally delete/replace ... etc. certain subset of nodes,
    leaving the general structure and other nodes of the document the same.


    Cheers,
    Dimitre Novatchev

    "Foxpointe" <> wrote in message
    news:...
    > Given some arbitrary XHTML, I'd like to obtain a 'simplified' XHTML result
    > which strips out a large subset of standard elements and attributes - but
    > not all. The main things I would like to accomplish:
    >
    > 1) Provide a list of elements/attributes to be stripped (i.e. everything
    > else should be passed through) or those that should be passed through
    > (i.e. everything else should be stripped) which would be applied
    > recursively.
    > 2) If an element is to be stripped, pass through any enclosed text and/or
    > elements (the elements should in turn be processed recursively by step 1.)
    > 3) If after stripping the resulting element is empty, eliminate it
    > completely.
    >
    > For example, this snippet:
    >
    > <h1>
    > <a href='chap2.htm'>
    > <img src="image.gif" alt="Thumbnail" border=0>
    > </a>
    > </h1>
    > <table width=515 border=0 cellpadding=0 cellspacing=0>
    > <tr>
    > <td width=172 align=left valign=top>
    > <a href="chap1.htm">
    > <img src="prev.gif" alt="Previous" border=0>
    > </a>
    > </td>
    > <td>
    > <style type="text/css">
    > </style>
    > </td>
    > <td width=171 align=center valign=top>
    > <b>
    > <font face="ariel,helvetica,helv,sanserif" size="-1">Chapter 2 Getting
    > Started</font>
    > </b>
    > </td>
    > <td width=172 align=right valign=top>
    > <a href="chap3.htm">
    > <img src="next.gif" alt="Next" border=0>
    > </a>
    > </td>
    > </tr>
    > </table>
    >
    > Would become:
    >
    > <a href='chap2.htm'>
    > <img src="image.gif">
    > </a>
    > <table>
    > <tr>
    > <td>
    > <a href="chap1.htm">
    > <img src="prev.gif" alt="Previous">
    > </a>
    > </td>
    > <td>
    > Chapter 2 Getting Started
    > </td>
    > <td>
    > <a href="chap3.htm">
    > <img src="next.gif" alt="Next">
    > </a>
    > </td>
    > </tr>
    > </table>
    >
    > Is XSLT the best means to accomplish this? Suggestions on how to get this
    > done (esp. examples that could be used as a starting point) are
    > appreciated.
    >
    > Thanks,
    > Phil
    Dimitre Novatchev, Jul 27, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andy Jefferies
    Replies:
    1
    Views:
    1,133
    Andy Jefferies
    Jun 26, 2003
  2. Replies:
    7
    Views:
    851
  3. Gerald Aichholzer
    Replies:
    2
    Views:
    2,471
    Gerald Aichholzer
    Jun 27, 2006
  4. CSUIDL PROGRAMMEr

    stripping parts of elements in a list

    CSUIDL PROGRAMMEr, Oct 28, 2006, in forum: Python
    Replies:
    2
    Views:
    258
    Bruno Desthuilliers
    Oct 30, 2006
  5. Usha2009
    Replies:
    0
    Views:
    1,112
    Usha2009
    Dec 20, 2009
Loading...

Share This Page