losing carriage returns in CDATA section - how do I prevent this?

Discussion in 'XML' started by CarlosRivera, Jan 8, 2005.

  1. CarlosRivera

    CarlosRivera Guest

    I am using apache xerces J 2.5.0. I have \r\n feed combinations in the
    CDATA sections that get converted to \n (or rather \r gets lost. I am
    using sax parsing. I can see in the buffer that is passed that when I
    have \n, one character back it has the \r, but the start offset is on
    the \n. The source is an XML string, so it did not get lost while
    reading the file. In any case, it seems that it should not be removing
    the \r in the cdata section during my sax events. I am running this on
    windows; so it seems like the bahavior is converting \r\n to \n might be
    related. If this is related, this means that the code would not be
    portable between unix and windows. It should give it to me as is.
    Isn't this one of the purposes of the CDATA? I know that one can put
    character entities in the XML and it works, but this is real ugly. We
    just want to get some text from source location and put it into the XML
    without having to replace \r with
    .
     
    CarlosRivera, Jan 8, 2005
    #1
    1. Advertising

  2. In article <xaZDd.499$>,
    CarlosRivera <> wrote:

    >I am using apache xerces J 2.5.0. I have \r\n feed combinations in the
    >CDATA sections that get converted to \n (or rather \r gets lost.


    XML parsers convert CR-LF and CR to LF, so that you don't have to worry
    about what platform you're using.

    If you really want to preserve CRs, you have to use a character
    reference, but think carefully before doing this: XML is a text
    format, and dependence on platform-specific line-end sequences
    is not usually a good idea.

    -- Richard
     
    Richard Tobin, Jan 8, 2005
    #2
    1. Advertising

  3. Richard Tobin wrote:

    > In article <xaZDd.499$>,
    > CarlosRivera <> wrote:
    >
    >
    >>I am using apache xerces J 2.5.0. I have \r\n feed combinations in the
    >>CDATA sections that get converted to \n (or rather \r gets lost.

    >
    >
    > XML parsers convert CR-LF and CR to LF, so that you don't have to worry
    > about what platform you're using.


    To be more specific, here is an excerpt from the XML 1.0 spec:

    ====

    2.11 End-of-Line Handling

    XML parsed entities are often stored in computer files which, for
    editing convenience, are organized into lines. These lines are typically
    separated by some combination of the characters CARRIAGE RETURN (#xD)
    and LINE FEED (#xA).

    To simplify the tasks of applications, the XML processor MUST behave as
    if it normalized all line breaks in external parsed entities (including
    the document entity) on input, before parsing, by translating both the
    two-character sequence #xD #xA and any #xD that is not followed by #xA
    to a single #xA character.

    ====

    XML 1.1 generalizes that requirement a bit.


    John Bollinger
     
    John C. Bollinger, Jan 10, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Davison
    Replies:
    1
    Views:
    640
    Hal Rosser
    Jul 7, 2004
  2. CarlosRivera
    Replies:
    5
    Views:
    888
    CarlosRivera
    Jan 16, 2005
  3. Replies:
    2
    Views:
    1,801
    Richard Tobin
    Nov 27, 2003
  4. Replies:
    3
    Views:
    802
    Joe Kesselman
    Mar 6, 2006
  5. Steve Anderson
    Replies:
    3
    Views:
    283
    Steve Anderson
    Jun 21, 2004
Loading...

Share This Page