Normalizing XHTML with XML

Discussion in 'XML' started by Ryan Stewart, May 11, 2006.

  1. Ryan Stewart

    Ryan Stewart Guest

    I'm getting XHTML input that can be in a number of formats, and I'm
    trying to get it into a consistent format for later use. "Consistent"
    in this case means everything in the root/body is in either a p, table,
    img, ol, or ul tag. I'm processing just the body text. There is no head
    section or anything. So the body is the root of the tree that I'm
    processing. I've got almost everything working except one thing. If I
    get input like the following:
    some text<br/>some more text

    then I need that to become two paragraphs, like:
    <p>some text</p>
    <p>some more text</p>

    That's easy enough. But if I get this input:
    some text <a href="blah">link</a> some more text

    that should all become one paragraph:
    <p>some text <a href="blah">link</a> some more text<p>

    And if a table, list, or image is encountered, that should be the end
    of a paragraph if there is one:
    some text<table> ... </table>some more text

    becomes
    <p>some text</p>
    <table> ... </table>
    <p>some more text</p>

    Again, simply placing the text nodes inside p tags is simple, but a
    problem arises if there is a link or other tag inside some of that
    text. (At this point other tags don't actually matter because I'm
    stripping them out, but links need to be passed through.)

    Basically, my problem boils down to this:
    1) I need to select any text node child of the root and surround it
    with p tags, but
    2) if an a element is a child of the root, it should be joined with any
    adjacent text nodes and the whole thing should be surrounded with p
    tags.

    Can someone give me an example of how to do this with XSL?
     
    Ryan Stewart, May 11, 2006
    #1
    1. Advertising

  2. > 1) I need to select any text node child of the root and surround it
    > with p tags, but


    > 2) if an a element is a child of the root, it should be joined with any
    > adjacent text nodes and the whole thing should be surrounded with p
    > tags.


    .... If I put those two rules together, I get "I want to wrap a <p>
    element around all the root's children". Since that's trivial, I presume
    there's some case where you don't want to do that....?
     
    Joe Kesselman, May 11, 2006
    #2
    1. Advertising

  3. Ryan Stewart

    Ryan Stewart Guest

    Yes, only text nodes and links should be inside p tags. Tables, lists,
    and images will also be present and must not be wrapped, especially
    since tables and lists are block elements and p tags may only contain
    inline elements. Maybe a more complex example:
    some text <a href="blah">a link</a> some more text<br/>
    third text node<table>...</table>final text node

    should become:
    <p>some text <a href="blah">a link</a> some more text</p>
    <p>third text node</p>
    <table>...</table>
    <p>final text node</p>

    Notice that the <br/> causes a new p element, the first two root-level
    text nodes and the a element in between them become one paragraph, the
    third text node becomes a paragraph, the table is not touched, and the
    last text node becomes a paragraph.
     
    Ryan Stewart, May 11, 2006
    #3
  4. Ryan Stewart

    Ryan Stewart Guest

    >From looking around some more, I'm seeing that XSLT should be viewed as
    transforming nodes from a source tree into nodes in a result tree. So a
    different way of looking at my problem might be, "How do I grab
    consecutive text and inline nodes (besides the br and img elements)
    that are children of the root node from the source tree and put them
    inside one node (a p element) in the result tree?"
     
    Ryan Stewart, May 11, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris

    Method for normalizing URL?

    Chris, Jan 10, 2005, in forum: Java
    Replies:
    6
    Views:
    1,140
    Chris Smith
    Jan 12, 2005
  2. Stu

    Normalizing tm structure past 2038

    Stu, Oct 31, 2003, in forum: C Programming
    Replies:
    5
    Views:
    452
    Chris Torek
    Nov 1, 2003
  3. Replies:
    1
    Views:
    380
    Joshua Cranmer
    Apr 9, 2007
  4. Thomas Wittek

    XSLT: Normalizing namespaces

    Thomas Wittek, Aug 30, 2007, in forum: XML
    Replies:
    5
    Views:
    1,210
    Martin Honnen
    Aug 31, 2007
  5. Alexey Verkhovsky

    Normalizing XML tag values

    Alexey Verkhovsky, Aug 1, 2004, in forum: Ruby
    Replies:
    2
    Views:
    177
    Alexey Verkhovsky
    Aug 1, 2004
Loading...

Share This Page