HTML to XML transform

Discussion in 'XML' started by FrankIsHere@gmail.com, Jul 13, 2005.

  1. Guest

    I'm having some problems converting HTML to XML. Below is the source
    document.



    <html xmlns:wf="http://sometest/wf">
    <head>
    <span wf:class="cookie.equals.EntryURL" wf:values="region">
    <span class="text_red">
    <span wf:class="term.insert" wf:term="Master">United
    States</span>
    </span>
    </span>
    </head>
    </html>

    The logic is find any html tags that contain an attribute with the
    namespace "wf". Then get the attribute "class" and use it as a parent
    node with the attribute values as a child nodes. It's probably easier
    I show you what I want the output to look like.

    Below is the format I'm trying to convert the source to:


    <docRoot>
    <Class>
    <classname>cookie.equals.EntryURL</classname>
    <attribute aName="values">Region</attribute>
    <Class>
    <classname>term.insert</classname>
    <attribute aName="term">Master</attribute>
    <Class>
    </Class>
    </docRoot>



    This is the stylesheet I have so far:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:eek:utput method="xml" version="1.0" encoding="UTF-8" indent="yes"/>


    <xsl:template match="/ | node() | @* ">
    <xsl:copy>
    <xsl:apply-templates select="@wf:* | node()"
    xmlns:wf="http://sometest/wf"/>
    </xsl:copy>
    </xsl:template>

    <xsl:template match="@*">
    <xsl:element name="{local-name(.)}" namespace="{namespace-uri(..)}">
    <xsl:value-of select="."/>
    </xsl:element>
    </xsl:template>

    </xsl:stylesheet>


    It converts it to this:


    <?xml version="1.0" encoding="UTF-8"?>
    <html xmlns:wf="http://sometest/wf">
    <head>
    <span>
    <class>cookie.equals.EntryURL</class>
    <values>region</values>
    <span>
    <span>
    <class>term.insert</class>
    <term>Master</term>
    United States</span>
    </span>
    </span>
    </head>
    </html>


    Thanks!
     
    , Jul 13, 2005
    #1
    1. Advertising

  2. Joris Gillis Guest

    Hi,

    Tempore 20:41:01, die Wednesday 13 July 2005 AD, hinc in foro {comp.text.xml} scripsit
    <>:

    > The logic is find any html tags that contain an attribute with the
    > namespace "wf". Then get the attribute "class" and use it as a parent
    > node with the attribute values as a child nodes. It's probably easier
    > I show you what I want the output to look like.


    Here's one method, I'm not sure if it will work universally.

    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:wf="http://sometest/wf" xmlns="http://sometest/wf">
    <xsl:eek:utput method="xml" version="1.0" encoding="UTF-8" indent="yes"/>


    <xsl:template match="/">
    <docRoot>
    <xsl:apply-templates />
    </docRoot>
    </xsl:template>

    <xsl:template match="*[@wf:*]">
    <xsl:element name="{local-name(@wf:*[1])}" namespace="{namespace-uri(@wf:*[1])}">
    <classname><xsl:value-of select="@wf:*[1]"/></classname>
    <xsl:apply-templates select="* | @wf:*[position() &gt; 1]"/>
    </xsl:element>
    </xsl:template>

    <xsl:template match="@wf:*">
    <attribute aName="{local-name()}"><xsl:value-of select="."/></attribute>
    </xsl:template>

    </xsl:stylesheet>


    regards,
    --
    Joris Gillis (http://users.telenet.be/root-jg/me.html)
    Spread the wiki (http://www.wikipedia.org)
     
    Joris Gillis, Jul 13, 2005
    #2
    1. Advertising

  3. Joris,

    <classname><xsl:value-of select="@wf:*[1]"/></classname>
    <xsl:apply-templates select="* | @wf:*[position() &gt; 1]"/>

    attribute order is undefined (but stable within a given transformation)
    so you can't rely on [1] being the class attribute, you need to pull it
    out by name.


    <classname><xsl:value-of select="@wf:class"/></classname>
    <xsl:apply-templates select="* | @wf:*[local-name()!='class']"/>

    David
     
    David Carlisle, Jul 13, 2005
    #3
  4. Guest

    Thank you guys that works great! Just had to change

    <xsl:element name="{local-name(@wf:*[1])}"
    namespace="{namespace-uri(@wf:­*[1])}">

    to

    <xsl:element name="{local-name(@xwf:class)}"
    namespace="{namespace-uri(@xwf:class)}">

    just incase any other attribute comes before the class attribute.

    I have another question though...
    How can I incorporate an ignore list?
    For instance if "term.insert" were to appear in the source then I
    wouldn't want to process that node. Any child nodes of the ignored
    node would be processed though. So far I have created a separate XML
    file that would serve as the ignore list but I don't know how to call
    it and process it in the XSLT code.

    Here is the ignore XML I have:

    <?xml version="1.0" encoding="UTF-8"?>
    <ignoreList>
    <classes>
    <class>term.insert</class>
    <class>term.delete</class>
    </classes>
    </ignoreList>


    So if I have this as the source:

    <html xmlns:wf="http://sometest/wf">
    <head>
    <span wf:class="cookie.equals.EntryURL" wf:values="region">
    <span class="text_red">
    <span wf:class="term.insert" wf:term="Master">United
    States</span>
    <span wf:class="term.delete" wf:term="Master2">Some text
    <span wf:class="term.append" wf:term="Global">Other
    text</span>
    </span>
    </span>
    </span>
    </head>
    </html>

    And we ignore "term.insert" and "term.delete" then we should be left
    with:

    <docRoot>
    <Class>
    <classname>cookie.equals.Entry­URL</classname>
    <attribute aName="values">Region</attribute>
    <Class>
    <classname>term.append</classname>
    <attribute aName="term">Global</attribute>
    </Class>
    </Class>
    </docRoot>

    Thanks again!
    Frank
     
    , Jul 14, 2005
    #4

  5. > just incase any other attribute comes before the class attribute.


    Yes, in fact "before" doesn't really mean much in the case of attributes
    (the order used by the processor often isn't the order used in the file)


    > For instance if "term.insert" were to appear in the source then I


    You could just do

    <xsl:template match="*[not(@wf:class='term.insert')][not(@wf:class='term.delete')][@wf:*]">
    <xsl:element name="{local-name(@wf:class)}" namespace="{namespace-uri(@wf:*[1])}">
    <classname><xsl:value-of select="@wf:class"/></classname>
    <xsl:apply-templates select="* | @wf:*[not(name()='class;]"/>
    </xsl:element>
    </xsl:template>



    or if you have a long list and want that lookup file then


    <xsl:template match="*[not(@class=document('ignore.xml')/ignorelist/classes/class)][@wf:*]">
    <xsl:element name="{local-name(@wf:class)}" namespace="{namespace-uri(@wf:*[1])}">
    <classname><xsl:value-of select="@wf:class"/></classname>
    <xsl:apply-templates select="* | @wf:*[not(name()='class;]"/>
    </xsl:element>
    </xsl:template>
    </xsl:template>

    David
     
    David Carlisle, Jul 14, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Anthony Harkness-Gripe

    Dataset to XSL Transform not displaying HTML--only XML

    Anthony Harkness-Gripe, Aug 14, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    1,005
    itnrant
    Jul 8, 2008
  2. =?Utf-8?B?UnViZW4=?=

    Transform Word XML to HTML

    =?Utf-8?B?UnViZW4=?=, Mar 27, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    4,047
    =?Utf-8?B?UnViZW4=?=
    Mar 28, 2006
  3. Johannes Koch
    Replies:
    0
    Views:
    851
    Johannes Koch
    Jul 2, 2003
  4. Marrow
    Replies:
    0
    Views:
    4,147
    Marrow
    Jul 2, 2003
  5. Marc Mendez

    XML Transform to HTML

    Marc Mendez, Sep 26, 2003, in forum: XML
    Replies:
    1
    Views:
    387
    Martin Honnen
    Sep 26, 2003
Loading...

Share This Page