HTML to XML transform

F

FrankIsHere

I'm having some problems converting HTML to XML. Below is the source
document.



<html xmlns:wf="http://sometest/wf">
<head>
<span wf:class="cookie.equals.EntryURL" wf:values="region">
<span class="text_red">
<span wf:class="term.insert" wf:term="Master">United
States</span>
</span>
</span>
</head>
</html>

The logic is find any html tags that contain an attribute with the
namespace "wf". Then get the attribute "class" and use it as a parent
node with the attribute values as a child nodes. It's probably easier
I show you what I want the output to look like.

Below is the format I'm trying to convert the source to:


<docRoot>
<Class>
<classname>cookie.equals.EntryURL</classname>
<attribute aName="values">Region</attribute>
<Class>
<classname>term.insert</classname>
<attribute aName="term">Master</attribute>
<Class>
</Class>
</docRoot>



This is the stylesheet I have so far:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="xml" version="1.0" encoding="UTF-8" indent="yes"/>


<xsl:template match="/ | node() | @* ">
<xsl:copy>
<xsl:apply-templates select="@wf:* | node()"
xmlns:wf="http://sometest/wf"/>
</xsl:copy>
</xsl:template>

<xsl:template match="@*">
<xsl:element name="{local-name(.)}" namespace="{namespace-uri(..)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>

</xsl:stylesheet>


It converts it to this:


<?xml version="1.0" encoding="UTF-8"?>
<html xmlns:wf="http://sometest/wf">
<head>
<span>
<class>cookie.equals.EntryURL</class>
<values>region</values>
<span>
<span>
<class>term.insert</class>
<term>Master</term>
United States</span>
</span>
</span>
</head>
</html>


Thanks!
 
J

Joris Gillis

Hi,

Tempore 20:41:01, die Wednesday 13 July 2005 AD, hinc in foro {comp.text.xml} scripsit
The logic is find any html tags that contain an attribute with the
namespace "wf". Then get the attribute "class" and use it as a parent
node with the attribute values as a child nodes. It's probably easier
I show you what I want the output to look like.

Here's one method, I'm not sure if it will work universally.

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:wf="http://sometest/wf" xmlns="http://sometest/wf">
<xsl:eek:utput method="xml" version="1.0" encoding="UTF-8" indent="yes"/>


<xsl:template match="/">
<docRoot>
<xsl:apply-templates />
</docRoot>
</xsl:template>

<xsl:template match="*[@wf:*]">
<xsl:element name="{local-name(@wf:*[1])}" namespace="{namespace-uri(@wf:*[1])}">
<classname><xsl:value-of select="@wf:*[1]"/></classname>
<xsl:apply-templates select="* | @wf:*[position() &gt; 1]"/>
</xsl:element>
</xsl:template>

<xsl:template match="@wf:*">
<attribute aName="{local-name()}"><xsl:value-of select="."/></attribute>
</xsl:template>

</xsl:stylesheet>


regards,
 
D

David Carlisle

Joris,

<classname><xsl:value-of select="@wf:*[1]"/></classname>
<xsl:apply-templates select="* | @wf:*[position() &gt; 1]"/>

attribute order is undefined (but stable within a given transformation)
so you can't rely on [1] being the class attribute, you need to pull it
out by name.


<classname><xsl:value-of select="@wf:class"/></classname>
<xsl:apply-templates select="* | @wf:*[local-name()!='class']"/>

David
 
F

FrankIsHere

Thank you guys that works great! Just had to change

<xsl:element name="{local-name(@wf:*[1])}"
namespace="{namespace-uri(@wf:­*[1])}">

to

<xsl:element name="{local-name(@xwf:class)}"
namespace="{namespace-uri(@xwf:class)}">

just incase any other attribute comes before the class attribute.

I have another question though...
How can I incorporate an ignore list?
For instance if "term.insert" were to appear in the source then I
wouldn't want to process that node. Any child nodes of the ignored
node would be processed though. So far I have created a separate XML
file that would serve as the ignore list but I don't know how to call
it and process it in the XSLT code.

Here is the ignore XML I have:

<?xml version="1.0" encoding="UTF-8"?>
<ignoreList>
<classes>
<class>term.insert</class>
<class>term.delete</class>
</classes>
</ignoreList>


So if I have this as the source:

<html xmlns:wf="http://sometest/wf">
<head>
<span wf:class="cookie.equals.EntryURL" wf:values="region">
<span class="text_red">
<span wf:class="term.insert" wf:term="Master">United
States</span>
<span wf:class="term.delete" wf:term="Master2">Some text
<span wf:class="term.append" wf:term="Global">Other
text</span>
</span>
</span>
</span>
</head>
</html>

And we ignore "term.insert" and "term.delete" then we should be left
with:

<docRoot>
<Class>
<classname>cookie.equals.Entry­URL</classname>
<attribute aName="values">Region</attribute>
<Class>
<classname>term.append</classname>
<attribute aName="term">Global</attribute>
</Class>
</Class>
</docRoot>

Thanks again!
Frank
 
D

David Carlisle

just incase any other attribute comes before the class attribute.

Yes, in fact "before" doesn't really mean much in the case of attributes
(the order used by the processor often isn't the order used in the file)

For instance if "term.insert" were to appear in the source then I

You could just do

<xsl:template match="*[not(@wf:class='term.insert')][not(@wf:class='term.delete')][@wf:*]">
<xsl:element name="{local-name(@wf:class)}" namespace="{namespace-uri(@wf:*[1])}">
<classname><xsl:value-of select="@wf:class"/></classname>
<xsl:apply-templates select="* | @wf:*[not(name()='class;]"/>
</xsl:element>
</xsl:template>



or if you have a long list and want that lookup file then


<xsl:template match="*[not(@class=document('ignore.xml')/ignorelist/classes/class)][@wf:*]">
<xsl:element name="{local-name(@wf:class)}" namespace="{namespace-uri(@wf:*[1])}">
<classname><xsl:value-of select="@wf:class"/></classname>
<xsl:apply-templates select="* | @wf:*[not(name()='class;]"/>
</xsl:element>
</xsl:template>
</xsl:template>

David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top