Extracting values from CDATA

D

Dana B

I am trying to get the values in FIELD2 and FIELD3 in the XML file
below using XSLT. I can get the value of CLOB_DATA. It comes back as
an string. How can I extract the values FIELD2 and FIELD3?

Thanks in advance.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<XML>
<FIELD1>field1</FIELD1>
<CLOB_DATA>
<![CDATA[<?xml version="1.0"?>
<FIELD2>field2</FIELD2>
<FIELD3>field3</FIELD3>
]]>
</CLOB_DATA>
</XML>
 
M

Martin Honnen

Dana said:
I am trying to get the values in FIELD2 and FIELD3 in the XML file
below using XSLT. I can get the value of CLOB_DATA. It comes back as
an string. How can I extract the values FIELD2 and FIELD3?

<CLOB_DATA>
<![CDATA[<?xml version="1.0"?>
<FIELD2>field2</FIELD2>
<FIELD3>field3</FIELD3>
]]>
</CLOB_DATA>

For that stuff in the CDATA section your XSLT stylesheet sees only a
string and no fields or elements. So you need to have a way to parse
that string as XML which needs an extension function, XSLT 1.0 can't do
that directly.
 
D

Dana B

Can you show me an example of how I can parse the XML string with an
extension function?

Thanks
Dana
 
M

Martin Honnen

Dana said:
Can you show me an example of how I can parse the XML string with an
extension function?

That depends highly on which XSLT processor you use and whether it
supports that kind of extension function.
I could show you can example for MSXML with the extension function being
written in JScript but that does not help you at all if you are using a
diffetrent XSLT processor.
 
A

Andy Dingley

Dana said:
I am trying to get the values in FIELD2 and FIELD3 in the XML file
below using XSLT.

That's not XML - That's XML, wrapped up as a string, wrapped up in more
XML. Three layers of data! It's not only hard to parse, it actually
_means_ something quite different. Of course it might be intended to
mean the same thing after all, so don't take that difference too
seriously.

Now XSLT likes to see XML and it will (grudgingly) let you work with
strings inside XML. However it has no way to work with XML inside this
(3rd layer down).

If you want to work with this then you need to tree walk the outer XML
to get the strings, then pass this content to an XML parser! It's not
hard to do in a script language with a DOM (Java / JavaScript / Perl /
VB), although I'm not keen on the efficiency aspects.

As to doing it _within_ XSLT then I've never tried it, but it ought to
work fairly much the same, so long as you have an extension language
like JavaScript with access to a suitable DOM.

The real solution though is to not wrap it up in the first place. Is
this "inner" XML really well-formed XML ? Is it a well-formed fragment,
or is it also a well-formed XML document? (your's isn't - two root
elements) If it isn't, then any attempt to parse it is doomed. If it
is (even if it's a fragment) then why was it put in a CDATA section to
begin with ?

If this "inner" content _is_ XML, then there's no reason at all to wrap
it in a CDATA section. This is what namespacing is for, after all.
 
J

Joe Kesselman

Why are you using the CDATA section in the first place? If the data's
structure is important, make that part of the structure of your
document, not a string value.
 
D

Dana B

I didn't create the XML but I have to parse it. I'm not sure why they
used CDATA. There are probably characters that they didn't want to
encode. I am trying to parse the fields and convert it to CSV using
XSLT if possible and practical. If not, I will use Java.
 
J

Joe Kesselman

If you're really stuck with it... I agree with others who said the
cleanest solution is to extract the contents of the CDATA section and
run them through a second parsing pass.

Your customer handed you lemons. Make lemonade.
 
N

Naim KANJ

I have the same problem, i have an xml document that i can't change its
CDATA section which contain data in xml format that i need to browse in
html format using xslt1.0 .

the result in the parser is text written in xml format.

I need your help,

Thanks in advance,

Naim KANJ
 
P

Peter Flynn

Naim said:
I have the same problem, i have an xml document that i can't change its
CDATA section which contain data in xml format that i need to browse in
html format using xslt1.0 .

the result in the parser is text written in xml format.

Tell the originators to remove the CDATA markup before they send you
the document.

Or filter it out in a script to another file before you process it.

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top