invalid XML character

Discussion in 'XML' started by Marco Montel, Dec 7, 2004.

  1. Marco Montel

    Marco Montel Guest

    I have two applications that should comunicate through an xml file. This
    xml will contain a CDATA section with a digital signature.

    The problem is that the digital signature is composed of special
    character that are nor correctly recognized by the xml parser.

    When you try to open the follow file with an xml editor, like jedit, you
    will see that the CDATA block is marked with follow error:


    "An invalid XML character (Unicode: 0x6) was found in the CDATA section."

    This is the xml file:

    <?xml version="1.0" encoding="utf-8"?>
    <Document>
    <DataBlock>
    <![CDATA[0‚V_ *†H†÷
     ‚VP0‚VL1 0+0‚FO *†H†÷
     ‚F@‚F<Content-Type: multipart/mixed;
    boundary="----=_NextPart_717_3066_3832508151693287"
    MIME-Version: 1.0
    X-Mailer: EldoS MIMEBlackbox Library, version: 2004.04.16
    Date: Sat, 4 Dec 2004 19:33:50 +0100
    Message-ID: <200412041933500630@69768441>

    This is a multi-part message in MIME format.

    ------=_NextPart_717_3066_3832508151693287
    Content-Type: text/html;
    charset="utf-8"
    Content-Transfer-Encoding: base64
    ]]>
    </DataBlock>
    </Document>


    How can I handle this special content in an xml file ? Should I use an
    other encoding ?


    bye marco
    Marco Montel, Dec 7, 2004
    #1
    1. Advertising

  2. Marco Montel wrote:


    > The problem is that the digital signature is composed of special
    > character that are nor correctly recognized by the xml parser.
    >
    > When you try to open the follow file with an xml editor, like jedit, you
    > will see that the CDATA block is marked with follow error:
    >
    >
    > "An invalid XML character (Unicode: 0x6) was found in the CDATA section."


    In XML 1.0 0x6 is indeed not an allowed character so you need to encode
    it, using Base64 encoding for instance.

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Dec 7, 2004
    #2
    1. Advertising

  3. Marco Montel

    Marco Montel Guest

    Martin Honnen wrote:
    >
    >
    > Marco Montel wrote:
    >
    >
    >> The problem is that the digital signature is composed of special
    >> character that are nor correctly recognized by the xml parser.
    >>
    >> When you try to open the follow file with an xml editor, like jedit,
    >> you will see that the CDATA block is marked with follow error:
    >>
    >>
    >> "An invalid XML character (Unicode: 0x6) was found in the CDATA section."

    >
    >
    > In XML 1.0 0x6 is indeed not an allowed character so you need to encode
    > it, using Base64 encoding for instance.
    >


    This is a great idea ! I didn't know that the xml 1.0 specification does
    not allow some control caracter. I see that xml 1.1 allow that, but my
    parser support only the XML 1.0 specification.

    thanks a lot

    bye marco
    Marco Montel, Dec 7, 2004
    #3
  4. Marco Montel <> writes:

    > Martin Honnen wrote:
    > >
    > >
    > > Marco Montel wrote:
    > >
    > >
    > >> The problem is that the digital signature is composed of special
    > >> character that are nor correctly recognized by the xml parser.
    > >>
    > >> When you try to open the follow file with an xml editor, like jedit,
    > >> you will see that the CDATA block is marked with follow error:
    > >>
    > >>
    > >> "An invalid XML character (Unicode: 0x6) was found in the CDATA section."

    > >
    > >
    > > In XML 1.0 0x6 is indeed not an allowed character so you need to encode
    > > it, using Base64 encoding for instance.
    > >

    >
    > This is a great idea ! I didn't know that the xml 1.0 specification does
    > not allow some control caracter. I see that xml 1.1 allow that, but my
    > parser support only the XML 1.0 specification.
    >
    > thanks a lot
    >
    > bye marco


    Even XML 1.1 wouldn't allow the character in a CDATA section.

    in XML 1.1 you can encode the character as a character reference, but
    you can't have character references in a CDATA section (as & has no
    special meaning in a CDATA section)

    David
    David Carlisle, Dec 7, 2004
    #4
  5. Marco Montel

    Guest

    I'm a total beginner, and I hope you won't mind a stupid question. I
    don't understand your point about the ampersand not having a special
    meaning in the CDATA section. Can you elaborate on that? Would it have
    a special meaning elsewhere? Why would it need a special meaning in the
    CDATA section? I thought CDATA sections were there when you wanted the
    parser to ignore everything in the secton and just pass the text
    through.
    , Dec 8, 2004
    #5
  6. On 7 Dec 2004 20:59:14 -0800, wrote:

    >...I don't understand your point about the ampersand not
    >having a special meaning in the CDATA section ... Would it have
    >a special meaning elsewhere?


    Yes; in a PCDATA section (e.g. inside "normal" marked up content) the
    ampersand does have a special meaning, it signals to the parser that
    "here is a possible start of an entity, or a numeric character,
    reference".

    >Why would it need a special meaning in the CDATA section?


    It does not "need" to have a special meaning and, per definition of
    CDATA, it _can_not_ have a special meaning in CDATA.

    >I thought CDATA sections were there when you wanted the parser to
    >ignore everything in the section and just pass the text through.


    Yep, that's about it.

    --
    Rex
    Roland Eriksson, Dec 8, 2004
    #6
  7. Roland's pretty much answered this already but just to be explict.

    the point I was making was that you can't have character 1 in XML 1 as
    it's a banned control character, but in XML 1.1 (if your application
    supports it, which most don't yet) you can not have the literal control
    character but you can have " & # 1 ; " (without the spaces) to refer to
    that character which will then appear as the character with unicode
    number 1 if you look at the document using and API. DOM, SAX, XPath etc.

    However the error message in the original posting refered to a CDATA
    section but you can not refer to character 1 in a CDATA section even in
    XML 1.1 as in:

    "< [ C D A T A & # 1 ; ] ] > "

    The & does not start a character reference as it is no longer special,
    it just starts itself so the above (again without the spaces) is the
    equvalent to

    "& a m p ; # 1 ; "


    David


    (Spaces added as some (bad) newsreaders/mailreaders are confused by
    character references and try to interpret them even in a plain text
    message)
    David Carlisle, Dec 8, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. cgbusch
    Replies:
    6
    Views:
    7,469
    Mike Brown
    Sep 2, 2003
  2. Mark

    Invalid XML character

    Mark, Aug 18, 2004, in forum: XML
    Replies:
    5
    Views:
    5,580
    Richard Tobin
    Aug 18, 2004
  3. Kaidi
    Replies:
    3
    Views:
    5,842
    Kaidi
    Sep 27, 2004
  4. Replies:
    2
    Views:
    4,059
    Joe Kesselman
    Jun 13, 2006
  5. kevin
    Replies:
    0
    Views:
    956
    kevin
    Jan 16, 2008
Loading...

Share This Page