Extracting values from CDATA

Dana B · Jul 27, 2006

I am trying to get the values in FIELD2 and FIELD3 in the XML file
below using XSLT. I can get the value of CLOB_DATA. It comes back as
an string. How can I extract the values FIELD2 and FIELD3?

Thanks in advance.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<XML>
<FIELD1>field1</FIELD1>
<CLOB_DATA>
<![CDATA[<?xml version="1.0"?>
<FIELD2>field2</FIELD2>
<FIELD3>field3</FIELD3>
]]>
</CLOB_DATA>
</XML>

Martin Honnen · Jul 27, 2006

Dana said:
I am trying to get the values in FIELD2 and FIELD3 in the XML file
below using XSLT. I can get the value of CLOB_DATA. It comes back as
an string. How can I extract the values FIELD2 and FIELD3?

<CLOB_DATA>
<![CDATA[<?xml version="1.0"?>
<FIELD2>field2</FIELD2>
<FIELD3>field3</FIELD3>
]]>
</CLOB_DATA>

For that stuff in the CDATA section your XSLT stylesheet sees only a
string and no fields or elements. So you need to have a way to parse
that string as XML which needs an extension function, XSLT 1.0 can't do
that directly.

Dana B · Jul 27, 2006

Can you show me an example of how I can parse the XML string with an
extension function?

Thanks
Dana

Martin Honnen · Jul 27, 2006

Dana said:
Can you show me an example of how I can parse the XML string with an
extension function?

That depends highly on which XSLT processor you use and whether it
supports that kind of extension function.
I could show you can example for MSXML with the extension function being
written in JScript but that does not help you at all if you are using a
diffetrent XSLT processor.

Andy Dingley · Jul 27, 2006

Dana said:
I am trying to get the values in FIELD2 and FIELD3 in the XML file
below using XSLT.

That's not XML - That's XML, wrapped up as a string, wrapped up in more
XML. Three layers of data! It's not only hard to parse, it actually
_means_ something quite different. Of course it might be intended to
mean the same thing after all, so don't take that difference too
seriously.

Now XSLT likes to see XML and it will (grudgingly) let you work with
strings inside XML. However it has no way to work with XML inside this
(3rd layer down).

If you want to work with this then you need to tree walk the outer XML
to get the strings, then pass this content to an XML parser! It's not
hard to do in a script language with a DOM (Java / JavaScript / Perl /
VB), although I'm not keen on the efficiency aspects.

As to doing it _within_ XSLT then I've never tried it, but it ought to
work fairly much the same, so long as you have an extension language
like JavaScript with access to a suitable DOM.

The real solution though is to not wrap it up in the first place. Is
this "inner" XML really well-formed XML ? Is it a well-formed fragment,
or is it also a well-formed XML document? (your's isn't - two root
elements) If it isn't, then any attempt to parse it is doomed. If it
is (even if it's a fragment) then why was it put in a CDATA section to
begin with ?

If this "inner" content _is_ XML, then there's no reason at all to wrap
it in a CDATA section. This is what namespacing is for, after all.

Joe Kesselman · Jul 28, 2006

Why are you using the CDATA section in the first place? If the data's
structure is important, make that part of the structure of your
document, not a string value.

Dana B · Jul 28, 2006

I didn't create the XML but I have to parse it. I'm not sure why they
used CDATA. There are probably characters that they didn't want to
encode. I am trying to parse the fields and convert it to CSV using
XSLT if possible and practical. If not, I will use Java.

Joe Kesselman · Jul 28, 2006

If you're really stuck with it... I agree with others who said the
cleanest solution is to extract the contents of the CDATA section and
run them through a second parsing pass.

Your customer handed you lemons. Make lemonade.

Naim KANJ · Sep 4, 2006

I have the same problem, i have an xml document that i can't change its
CDATA section which contain data in xml format that i need to browse in
html format using xslt1.0 .

the result in the parser is text written in xml format.

I need your help,

Thanks in advance,

Naim KANJ

Peter Flynn · Sep 4, 2006

Naim said:
I have the same problem, i have an xml document that i can't change its
CDATA section which contain data in xml format that i need to browse in
html format using xslt1.0 .

the result in the parser is text written in xml format.

Tell the originators to remove the CDATA markup before they send you
the document.

Or filter it out in a script to another file before you process it.

///Peter

Required and optional elements	1	Aug 14, 2009
Using ArrayList in Client side	2	May 22, 2006
PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 11, 2022
[XSL] how could I know node attributes???	1	Oct 17, 2006
CDATA in XMLout	1	Dec 28, 2003
XSL newbie question	2	Mar 29, 2005
xslt sort options	1	Jan 13, 2005
!CDATA	4	Dec 11, 2007

Extracting values from CDATA

Dana B

Martin Honnen

Dana B

Martin Honnen

Andy Dingley

Joe Kesselman

Dana B

Joe Kesselman

Naim KANJ

Peter Flynn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads