Literal 
 (not newline)

Discussion in 'XML' started by will@thestranathans.com, Sep 5, 2006.

  1. Guest

    I have an XML input that includes things like:

    <foo>line of text
    another line of text
    yet another</foo>

    And I want the
    entities PRESERVED (not translated) on the result,
    so:

    <bar>line of text
    another line of text
    yet another</bar>

    I've tried <xsl:copy-of select="foo/text()" />, I've tried
    <xsl:value-of select="foo" disable-output-escaping="yes" />, I've tried
    <xsl:text disable-output-escaping="yes"><xsl:copy-of
    select="foo/text()" /></xsl:text>, and it seems nothing works.

    Strangely, will (with certain incantations of the above) be
    preserved properly, but it seems that perhaps the PARSER is translating
    the entities, not copying them. i.e., no matter what I try, the

    from the input become newlines in the output.

    I'm using Xerces J (a couple of different versions with the same).

    Thanks smart people.
     
    , Sep 5, 2006
    #1
    1. Advertising

  2. Per the XML spec, newlines are normalized as they are read in, and you
    can't distinguish one representation from another. You may be able to
    tell your serializer that you want all newlines output as
    ... but
    it won't be able to tell those from other line breaks in your source file.

    I'd recommend using semantic markup, such as an <lf/> element, to
    represent this case, and postprocessing it to yield the desired
    character. Or fixing whatever downstream tool is forcing you to worry
    about the exact representation of line-break.

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
     
    Joseph Kesselman, Sep 5, 2006
    #2
    1. Advertising

  3. chiaman Guest

    Joseph Kesselman wrote:
    > Per the XML spec, newlines are normalized as they are read in, and you
    > can't distinguish one representation from another.

    I was kinda' afraid of that.

    > I'd recommend using semantic markup, such as an <lf/> element, to
    > represent this case, and postprocessing it to yield the desired
    > character. Or fixing whatever downstream tool is forcing you to worry
    > about the exact representation of line-break.

    Another option I'd love to be able to deal with - however, I'm neither
    in control of the input format (a vendor tool that saves reports in XML
    format) nor the desired output format (M$ Excel).

    Thanks for the help.
     
    chiaman, Sep 5, 2006
    #3
  4. In article <>,
    chiaman <> wrote:
    >Another option I'd love to be able to deal with - however, I'm neither
    >in control of the input format (a vendor tool that saves reports in XML
    >format) nor the desired output format (M$ Excel).


    You haven't told us *why* you need the newlines as character
    references (incidentally, they're not entities). When you say the
    output format is Excel, what do you mean? An XML document that Excel
    can process? If so, it shouldn't care about whether you use literal
    newlines or a reference.

    -- Richard
     
    Richard Tobin, Sep 5, 2006
    #4
  5. chiaman Guest

    Because the data that includes the embedded references is formatted.
    So I want the references included in the excel so that the newlines
    appear in the cell data when displayed in excel. (I know, pick a
    better tool than excel).

    For example, given the following:

    <poem>
    <lines>A unix salesperson, Lenore
    Loved her job, but loved the
    beach more.
    She devised such a way
    to combine work and
    play:
    She sells C-shells by the seashore</lines>
    <author>Unknown</author>
    </poem>

    Translated into Excel:
    <Cell><Data ss:Type="String">A unix salesperson, Lenore
    Loved her job, but loved the beach more.
    She devised such a way
    to combine work and play:
    She sells C-shells by the seashore</Data><Cell>

    when actually opened in Excel renders as

    A unix salesperson, Lenore Loved her job, but loved the beach more. She
    devised such a way to combine work and play: She sells C-shells by the
    seashore

    but if the newlines in the Excel XML include actual references:

    <Cell><Data ss:Type="String">A unix salesperson, Lenore

    Loved her job, but loved the beach more.

    She devised such a way

    to combine work and play:

    She sells C-shells by the seashore</Data><Cell>

    Will render properly in the Excel as

    A unix salesperson, Lenore
    Loved her job, but loved the beach more.
    She devised such a way
    to combine work and play:
    She sells C-shells by the seashore

    So the references are in the source because they're actually important.
    I want them retained when I translate it to excel because they remain
    important.

    Richard Tobin wrote:
    > In article <>,
    > chiaman <> wrote:
    > >Another option I'd love to be able to deal with - however, I'm neither
    > >in control of the input format (a vendor tool that saves reports in XML
    > >format) nor the desired output format (M$ Excel).

    >
    > You haven't told us *why* you need the newlines as character
    > references (incidentally, they're not entities). When you say the
    > output format is Excel, what do you mean? An XML document that Excel
    > can process? If so, it shouldn't care about whether you use literal
    > newlines or a reference.
    >
    > -- Richard
     
    chiaman, Sep 5, 2006
    #5
  6. chiaman wrote:
    > but if the newlines in the Excel XML include actual references:
    >
    > <Cell><Data ss:Type="String">A unix salesperson, Lenore

    > Loved her job, but loved the beach more.

    > She devised such a way

    > to combine work and play:

    > She sells C-shells by the seashore</Data><Cell>
    >
    > Will render properly in the Excel as
    >
    > A unix salesperson, Lenore
    > Loved her job, but loved the beach more.
    > She devised such a way
    > to combine work and play:
    > She sells C-shells by the seashore


    What does Excel render if is is

    <Cell><Data ss:Type="String">A unix salesperson, Lenore
    Loved her
    job, but loved the beach more.
    She devised such a way
    to combine
    work and play:
    She sells C-shells by the seashore</Data><Cell>

    instead?
    --
    Johannes Koch
    Spem in alium nunquam habui praeter in te, Deus Israel.
    (Thomas Tallis, 40-part motet)
     
    Johannes Koch, Sep 5, 2006
    #6
  7. chiaman Guest

    As I said earlier - if the actual references are included, when viewing
    the file in Excel, the line breaks show in the correct places - this
    is, of course, assuming the last <Cell> is actually </Cell> ;) When
    you view this in Excel, you would see:

    A unix salesperson, Lenore
    Loved her job, but loved the beach more.
    She devised such a way
    to combine work and play:
    She sells C-shells by the seashore

    For actual line breaks to appear in Excel, they have to be included in
    the XML as references, otherwise, they're just parsed as whitespace and
    render as a single space within Excel.

    Johannes Koch wrote:
    > What does Excel render if is is
    >
    > <Cell><Data ss:Type="String">A unix salesperson, Lenore
    Loved her
    > job, but loved the beach more.
    She devised such a way
    to combine
    > work and play:
    She sells C-shells by the seashore</Data><Cell>
    >
    > instead?
     
    chiaman, Sep 5, 2006
    #7
  8. chiaman wrote:
    > As I said earlier


    No. You provided two examples:

    1. Newline characters, no character references
    2. Newline characters followed by character references

    for wich you added the renderings in Excel.

    I asked for a third:
    No newline characters, but character references

    Maybe, in the end it's an issue of various line break character(s) on
    different systems (u000A/u000D vs. u000A vs. u000D).
    --
    Johannes Koch
    Spem in alium nunquam habui praeter in te, Deus Israel.
    (Thomas Tallis, 40-part motet)
     
    Johannes Koch, Sep 5, 2006
    #8
  9. In article <>,
    chiaman <> wrote:

    >For actual line breaks to appear in Excel, they have to be included in
    >the XML as references, otherwise, they're just parsed as whitespace and
    >render as a single space within Excel.


    I'm afraid that all I can suggest is that you complain to Microsoft,
    because XML applications should not treat
    in text any
    differently from a newline character (a conforming XML parser will
    return the character in both cases).

    -- Richard
     
    Richard Tobin, Sep 5, 2006
    #9
  10. Peter Flynn Guest

    chiaman wrote:
    [...]
    > So the references are in the source because they're actually important.
    > I want them retained when I translate it to excel because they remain
    > important.


    OK. Yes, picking a better system than Excel would be nice, but...

    If you're not in control of the input format, then just run the
    file through a filter and turn the numeric references into some
    dummy empty element which you can transform back to
    after.
    <lb/> as ?Joseph suggested would be conventional, eg

    $ sed -e "s+
    +<lb/>+g" original.file >new.file

    ///Peter
     
    Peter Flynn, Sep 6, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?RGF2ZQ==?=

    "Literal content is not allowed within a user control"?

    =?Utf-8?B?RGF2ZQ==?=, May 17, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    3,138
    =?Utf-8?B?RGF2ZQ==?=
    May 17, 2004
  2. Jon Paal
    Replies:
    0
    Views:
    375
    Jon Paal
    Dec 3, 2005
  3. Alan Silver
    Replies:
    1
    Views:
    7,355
    Alan Silver
    Jan 26, 2006
  4. Old Wolf
    Replies:
    0
    Views:
    569
    Old Wolf
    Mar 14, 2005
  5. Anonieko Ramos

    What's wrong with rpc-literal? Why use doc-literal?

    Anonieko Ramos, Sep 27, 2004, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    409
    Anonieko Ramos
    Sep 27, 2004
Loading...

Share This Page