embedding xml in xml as non-xml :)

Mark Van Orman · Sep 14, 2004

Hi all,

I have an application that logs in xml.

Assume <xmlLog></xmlLog>. In this element the app logs anything it gets
from foreign hosts. Now if the host sends xml data, the structure of the
document changes. ie. <xmlLog><somTag></somTag></xmlLog>. This will
cause problems with my log reader, because it assumes that <xmlLog/>
contains non-xml data.

My question is, is there a way to treat the data in the <xmlLog/>
element as non xml data. Something I can do that would treat anything
this element contains as a literal?

Any help or suggestions would be greatly appreciated.

Regards,

Mark

William Park · Sep 14, 2004

Mark Van Orman said:
Hi all,

I have an application that logs in xml.

Assume <xmlLog></xmlLog>. In this element the app logs
anything it gets from foreign hosts. Now if the host sends xml
data, the structure of the document changes. ie.
<xmlLog><somTag></somTag></xmlLog>. This will cause problems
with my log reader, because it assumes that <xmlLog/> contains
non-xml data.

My question is, is there a way to treat the data in the
<xmlLog/> element as non xml data. Something I can do that
would treat anything this element contains as a literal?

Any help or suggestions would be greatly appreciated.

Modify your "log reader". If remote can send any ASCII, then why does
log reader assume a particular format? '<somTag></somTag>' is ASCII
string to me.

Andy Dingley · Sep 14, 2004

In this element the app logs anything it gets from foreign hosts.

Your problem is to map "input" to well-formed character data according
to the rules of
http://www.w3.org/TR/2004/REC-xml11-20040204/#syntax

This is a task as old as computer programming with input files. There
are several rechniques to solve it, broadly by "escaping" or by
"wrapping"

Your example of

<xmlLog><somTag></somTag></xmlLog>

is quite easy, and could indeed be stored and read back, then treated
as ASCII.

However a foreign host that sends "<notATag<><>>" will break things,
because
<xmlLog><notATag<><>></xmlLog>
isn't well-formed XML and so parsers will choke on it.

The main problem is to handle the mapping of arbitrary characters into
"character data" (this is a term carefully defined in the XML spec).

The "escaping" way to do this is quite simple, and can be done with a
handful of character substitutions (from the XML spec):

:>The ampersand character (&) and the left angle bracket (<) MUST NOT
:> appear in their literal form, [...] they MUST be escaped using
:> either numeric character references or the strings "&" and "<"
:> respectively. The right angle bracket (>) MAY be represented using
:> the string ">", and MUST, for compatibility, be escaped using
:> either ">" or a character reference when it appears in the string
:> "]]>" in content,

So your example of
<xmlLog><somTag></somTag></xmlLog>
becomes
<xmlLog><somTag></somTag></xmlLog>

You could also use a "CDATA section", which would be the "wrapping"
approach. This takes the dubious input content and places it between
two markers that say "Between these points is CDATA, not XML markup"

The markers are <![CDATA[ and ]]>

Your example of
<xmlLog><somTag></somTag></xmlLog>
becomes
<xmlLog><![CDATA[<somTag></somTag>]]></xmlLog>

be warned that you'll still need escaping in case the input contains a
copy of the end marker! (read the XML spec, or ask again)

Second problem is to define "input". This is important because in
today's world we're really having to face up to internationalization,
character sets and encodings. It's likely that you can redefine input
from "anything" to "anything that is in UTF-8", which will make your
life easier, but be aware you _have_ made a deliberate choice here.

It's OK to write code that breaks in Japanese - just be aware that
you've done so, and know what would need changing if you needed to
remedy this.

You'll find that RSS has this same problem when embedding HTML content
within it. Some RSS versions handle this better than others, and
there's an excellent overview here
http://diveintomark.org/archives/2004/02/04/incompatible-rss

Kenneth Stephen · Sep 14, 2004

Andy Dingley wrote:

It's OK to write code that breaks in Japanese - just be aware that
you've done so, and know what would need changing if you needed to
remedy this.

Andy,

Why would code break only in Japanese and why is that ok?

Regards,
Kenneth

Andy Dingley · Sep 14, 2004

Why would code break only in Japanese and why is that ok?

That's just as an example. Most European-written XML code fails in
CJKV countries (China, Japan, Korea, Vietnam). Most American-written
XML fails in France Just look how many RSS feeds choke when they meet
é, or more usually é with the entity having been defined.

XML _itself_ (and the major tools) are very good at supporting a wide
range of character sets and encodings, but there are rules you have to
follow. For most _applications_, coders don't bother to do this. If
you _know_ your app will never receive something outside ASCII, then
that's all you need - but you should still be aware of what you've
built.

Patrick TJ McPhee · Sep 15, 2004

[...]

% The markers are <![CDATA[ and ]]>
%
% Your example of
% <xmlLog><somTag></somTag></xmlLog>
% becomes
% <xmlLog><![CDATA[<somTag></somTag>]]></xmlLog>
%
% be warned that you'll still need escaping in case the input contains a
% copy of the end marker! (read the XML spec, or ask again)

You don't need escaping so much as you need to end and restart the
CDATA section

<xmlLog><![CDATA[<somTag><![CDATA[with a CDATA section]]>]]><![CDATA[</somTag>]]></xmlLog>

The first ]]> ends the first CDATA section. The second is data.

XML support featured in the DataSet class for reading and writingdata as XML	0	Feb 16, 2014
Liquid Technologies Unvei Liquid XML Studio 2013	0	Mar 20, 2013
embedding xml in xml as string	1	Nov 2, 2005
How to generate (invalid) Xml from Schema?	0	Jun 28, 2010
ElementTree XML parsing problem	8	Apr 27, 2011
E-learning website with XML/XSL	1	Apr 1, 2009
XML Schema questions (hopefully not too silly)	3	Sep 19, 2007
Reusing XML Processing Code in non-XML Applications	0	Jun 6, 2005

embedding xml in xml as non-xml :)

Mark Van Orman

William Park

Andy Dingley

Kenneth Stephen

Andy Dingley

Patrick TJ McPhee

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads