Xerces and .NET System.Xml

  • Thread starter Ernesto Bascón Pantoja
  • Start date
E

Ernesto Bascón Pantoja

Hi everybody:

I do not know if this is the correct list to a very specific
implementation problem, but if you can help me, it would be great! :)

I have one application that builds a Xml that contains some strange
characters:

std::string str = "Code = ";
str += '♦'; //strange character ASCII 4

and I serialize the Xml using Xerces and Xerces writes something like
(no matter the encoding I am using; I tried iso-8859-1; utf-8; utf-16,
etc.)

<XmlTest>
Code = ♦
</XmlTest>


But when I want to load this Xml using the Microsoft .NET
System.Xml.XmlDocument, I get an:

"Invalid character found" exception and the XML cannot be loaded.

What is wrong here? If I try to serialize the same String using the MS
implementation, I get a:

<XmlTest>
Code = &x4;
</XmlTest>
 
M

Martin Honnen

Ernesto said:
I have one application that builds a Xml that contains some strange
characters:

std::string str = "Code = ";
str += '♦'; //strange character ASCII 4

ASCII 4 is not allowed in XML 1.0 and only allowed as numeric character
reference in XML 1.1 I think.
Why do you want to put such characters into your XML documents?

What is wrong here? If I try to serialize the same String using the MS
implementation, I get a:

<XmlTest>
Code = &x4;
</XmlTest>

..NET is not necessarily complying with the XML 1.0 specification, it
allows you to serialize such characters as numeric character references.
You can turn that off by using an XmlWriter with XmlWriterSettings where
CheckCharacters is set to true.
 
E

Ernesto Bascón Pantoja

ASCII 4 is not allowed in XML 1.0 and only allowed as numeric character
reference in XML 1.1 I think.
Why do you want to put such characters into your XML documents?

I am getting clear text from a database and I serialize it into a XML
to allow a .NET client to receive such information;
the problem occurs when the "clear text" comes with those characters
or with international characters. Xerces performs the serialization
but does not transform the '♦' or the 'ß' in 'Straße' and serializes
them as they come.

I do not know if written directy those characters with utf-8 encoding
is valid.
 
J

Joseph J. Kesselman

ASCII 4 is not a legal XML 1.0 character, no matter what your encoding
is or how you try to escape it. I'd suggest introducing something like
<mychar codepoint="4"/> and having your application code convert this
appropriately. Or do a base-64 encoding on your block of binary code and
have the application convert that appropriately.

XML 1.1 relaxes restrictions on characters somewhat. I'm not sure
offhand whether it would let you get away with this one or not. But
support for 1.1 is, alas, still extremely rare; you may have to beat up
your XML library suppliers to get it, and having gotten it you may have
trouble interchanging those files with other applications or users that
haven't yet upgraded.
 
E

Ernesto Bascón Pantoja

ASCII 4 is not a legal XML 1.0 character, no matter what your encoding
is or how you try to escape it. I'd suggest introducing something like
<mychar codepoint="4"/> and having your application code convert this
appropriately. Or do a base-64 encoding on your block of binary code and
have the application convert that appropriately.

So, how can I say to Xerces: "given this string, transcode the special
characters to their Unicode escape sequence (i.e. &4;)
 
J

Joseph J. Kesselman

Ernesto said:
So, how can I say to Xerces: "given this string, transcode the special
characters to their Unicode escape sequence (i.e. &4;)

Xerces deals with XML. The 0x04 character is not XML. (Or at least not
XML 1.0), so it isn't Xerces' responsibility to deal with it.

If you must represent this character in data that's expressed as XML,
it's your application's responsibility to use some alternate escaping
solution (such as the element I suggested, or base-64 encoding, or
whatever).

If you really want this character to appear as itself in the file...
that isn't an XML file and you can't expect XML tools to either accept
it or generate it.



Take a long step back from this detail and look at the the actual
problem you're trying to solve. You haven't told us that, so we can't
say more than that the specific solution you've proposed here doesn't work.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top