Xerces and .NET System.Xml

Discussion in 'XML' started by Ernesto Bascón Pantoja, Apr 22, 2008.

  1. Hi everybody:

    I do not know if this is the correct list to a very specific
    implementation problem, but if you can help me, it would be great! :)

    I have one application that builds a Xml that contains some strange
    characters:

    std::string str = "Code = ";
    str += '♦'; //strange character ASCII 4

    and I serialize the Xml using Xerces and Xerces writes something like
    (no matter the encoding I am using; I tried iso-8859-1; utf-8; utf-16,
    etc.)

    <XmlTest>
    Code = ♦
    </XmlTest>


    But when I want to load this Xml using the Microsoft .NET
    System.Xml.XmlDocument, I get an:

    "Invalid character found" exception and the XML cannot be loaded.

    What is wrong here? If I try to serialize the same String using the MS
    implementation, I get a:

    <XmlTest>
    Code = &x4;
    </XmlTest>
     
    Ernesto Bascón Pantoja, Apr 22, 2008
    #1
    1. Advertising

  2. Ernesto Bascón Pantoja wrote:

    > I have one application that builds a Xml that contains some strange
    > characters:
    >
    > std::string str = "Code = ";
    > str += '♦'; //strange character ASCII 4


    ASCII 4 is not allowed in XML 1.0 and only allowed as numeric character
    reference in XML 1.1 I think.
    Why do you want to put such characters into your XML documents?


    > What is wrong here? If I try to serialize the same String using the MS
    > implementation, I get a:
    >
    > <XmlTest>
    > Code = &x4;
    > </XmlTest>


    ..NET is not necessarily complying with the XML 1.0 specification, it
    allows you to serialize such characters as numeric character references.
    You can turn that off by using an XmlWriter with XmlWriterSettings where
    CheckCharacters is set to true.


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Apr 22, 2008
    #2
    1. Advertising

  3. On Apr 22, 1:29 pm, Martin Honnen <> wrote:
    > Ernesto Bascón Pantoja wrote:
    > > I have one application that builds a Xml that contains some strange
    > > characters:

    >
    > > std::string str = "Code = ";
    > > str += '♦'; //strange character ASCII 4

    >
    > ASCII 4 is not allowed in XML 1.0 and only allowed as numeric character
    > reference in XML 1.1 I think.
    > Why do you want to put such characters into your XML documents?


    I am getting clear text from a database and I serialize it into a XML
    to allow a .NET client to receive such information;
    the problem occurs when the "clear text" comes with those characters
    or with international characters. Xerces performs the serialization
    but does not transform the '♦' or the 'ß' in 'Straße' and serializes
    them as they come.

    I do not know if written directy those characters with utf-8 encoding
    is valid.
     
    Ernesto Bascón Pantoja, Apr 22, 2008
    #3
  4. ASCII 4 is not a legal XML 1.0 character, no matter what your encoding
    is or how you try to escape it. I'd suggest introducing something like
    <mychar codepoint="4"/> and having your application code convert this
    appropriately. Or do a base-64 encoding on your block of binary code and
    have the application convert that appropriately.

    XML 1.1 relaxes restrictions on characters somewhat. I'm not sure
    offhand whether it would let you get away with this one or not. But
    support for 1.1 is, alas, still extremely rare; you may have to beat up
    your XML library suppliers to get it, and having gotten it you may have
    trouble interchanging those files with other applications or users that
    haven't yet upgraded.
     
    Joseph J. Kesselman, Apr 22, 2008
    #4
  5. On Apr 22, 1:54 pm, "Joseph J. Kesselman" <>
    wrote:
    > ASCII 4 is not a legal XML 1.0 character, no matter what your encoding
    > is or how you try to escape it. I'd suggest introducing something like
    > <mychar codepoint="4"/> and having your application code convert this
    > appropriately. Or do a base-64 encoding on your block of binary code and
    > have the application convert that appropriately.


    So, how can I say to Xerces: "given this string, transcode the special
    characters to their Unicode escape sequence (i.e. &4;)


    > XML 1.1 relaxes restrictions on characters somewhat. I'm not sure
    > offhand whether it would let you get away with this one or not. But
    > support for 1.1 is, alas, still extremely rare; you may have to beat up
    > your XML library suppliers to get it, and having gotten it you may have
    > trouble interchanging those files with other applications or users that
    > haven't yet upgraded.
     
    Ernesto Bascón Pantoja, Apr 22, 2008
    #5
  6. Ernesto Bascón Pantoja wrote:
    > So, how can I say to Xerces: "given this string, transcode the special
    > characters to their Unicode escape sequence (i.e. &4;)


    Xerces deals with XML. The 0x04 character is not XML. (Or at least not
    XML 1.0), so it isn't Xerces' responsibility to deal with it.

    If you must represent this character in data that's expressed as XML,
    it's your application's responsibility to use some alternate escaping
    solution (such as the element I suggested, or base-64 encoding, or
    whatever).

    If you really want this character to appear as itself in the file...
    that isn't an XML file and you can't expect XML tools to either accept
    it or generate it.



    Take a long step back from this detail and look at the the actual
    problem you're trying to solve. You haven't told us that, so we can't
    say more than that the specific solution you've proposed here doesn't work.
     
    Joseph J. Kesselman, Apr 22, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. indo3
    Replies:
    0
    Views:
    458
    indo3
    Jul 3, 2004
  2. Soeren
    Replies:
    2
    Views:
    366
    Waxolunist
    Sep 22, 2004
  3. Sylwester Ba³a

    Xerces parser XML and Visual C++ 6.0

    Sylwester Ba³a, Jul 22, 2003, in forum: XML
    Replies:
    2
    Views:
    984
    Sylwester Ba³a
    Jul 23, 2003
  4. cvissy
    Replies:
    0
    Views:
    632
    cvissy
    Nov 16, 2004
  5. Arvin Portlock
    Replies:
    0
    Views:
    160
    Arvin Portlock
    Jun 20, 2005
Loading...

Share This Page