Handling " entity in attribute value

Discussion in 'XML' started by Mateusz Loskot, Oct 21, 2005.

  1. Hi,

    I'd like to ask how XML parsers should handle attributes which consists
    of " entity as value. I know XML allows to use both: single and
    double quotes as attribute value terminator. That's clear.
    But how should parser react for such situation:

    I have CORDSYS element with string attribute which consists of value
    with many " entities:

    <COORDSYS
    string="GEOGCS[&quot;GCS_WGS_1984&quot;,DATUM[&quot;WGS84&quot;,SPHEROID[&quot;WGS84&quot;,6378137,298.257223563]],PRIMEM[&quot;Greenwich&quot;,0],UNIT[&quot;Degree&quot;,0.0174532925199433]]"/>

    So, when I read it to DOM and after someoperations I try to save it to
    file parsers replaces double-quote value terminators to single-quote as
    follows:

    <COORDSYS
    string='GEOGCS[&quot;GCS_WGS_1984&quot;,DATUM[&quot;WGS84&quot;,SPHEROID[&quot;WGS84&quot;,6378137,298.257223563]],PRIMEM[&quot;Greenwich&quot;,0],UNIT[&quot;Degree&quot;,0.0174532925199433]]'/>

    Please, explain me how parser is expected to handle this element in
    save operation.

    Best regards

    --
    Mateusz Loskot
    http://mateusz.loskot.net
    Mateusz Loskot, Oct 21, 2005
    #1
    1. Advertising

  2. "Mateusz Loskot" <> wrote:

    > I'd like to ask how XML parsers should handle attributes which consists
    > of &quot; entity as value.


    As data that contains the ASCII quotation mark.

    > I have CORDSYS element with string attribute which consists of value
    > with many &quot; entities:


    OK.

    > So, when I read it to DOM and after someoperations I try to save it to
    > file parsers replaces double-quote value terminators to single-quote as
    > follows:


    That's external to XML parsing. You are not processing XML any more but
    data constructed by parsing an XML document and representing it as a tree.
    What happens then depends on the tools you use. Most probably the internal
    representation does not contain the enclosing quotation marks or the entity
    references but the parsed attribute values a strings. When you later output
    the data in some format, perhaps linearizing it as XML, the results depend
    on how you do that.

    If all occurrences of ASCII quote and ASCII apostrophe in the attribute
    values are "escaped" using entity or character references, it does not
    matter whether you use quotes or apostrophes as delimiters when converting
    the data back to XML format. (Naturally you need to use matching
    delimiters, i.e. the same character as opening and as closing delimiter.)

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Jukka K. Korpela, Oct 21, 2005
    #2
    1. Advertising

  3. Jukka K. Korpela wrote:
    > "Mateusz Loskot" <> wrote:
    >
    > > So, when I read it to DOM and after someoperations I try to save it to
    > > file parsers replaces double-quote value terminators to single-quote as
    > > follows:

    >
    > That's external to XML parsing. You are not processing XML any more but
    > data constructed by parsing an XML document and representing it as a tree.


    Yes, I know

    > What happens then depends on the tools you use.


    Yes, I use TinyXML DOM parser.

    > Most probably the internal
    > representation does not contain the enclosing quotation marks or the entity
    > references but the parsed attribute values a strings. When you later output
    > the data in some format, perhaps linearizing it as XML, the results depend
    > on how you do that.


    I did some investigation and now I know internals of TinyXML. During
    Save operation TinyXML checks if attribute value contains double-quote
    character (")
    then it encloses attribute value in single-quotes ('). Certainly, it's
    correct from XML spec point of view.
    This checking is simply made using (let's say function) find('\"') in
    attribute value.

    TinyXML can be compiled in, let's say, C-style, then it uses its own
    string class or with STL support, then it uses std::string.
    When TinyXML is compiled in C-style then all &quot; entities are
    "vislble" to parser as double-quotes so if you printf value of my
    'string' attribute in way how it is hold by TinyXML then you will get
    double-quotes instead of &quot; entities. But when TinyXML is compiled
    with STL support then everything works fine. TinyXML holds 'string'
    attribute with &quot; entities and does not convert it to double-quotes
    internally.

    Here is longer story with some source code:
    http://sourceforge.net/forum/forum.php?thread_id=1370207&forum_id=172103

    I'm not sure if this approach is correct. I'm also not sure if this is
    a TinyXML bug. That's why I've asked this question.
    I'm going to do some further discussion with TinyXML developmend Team.

    Thanks a lot

    --
    Mateusz Loskot
    http://mateusz.loskot.net
    Mateusz Loskot, Oct 22, 2005
    #3
  4. "Mateusz Loskot" <> wrote:

    > During
    > Save operation TinyXML checks if attribute value contains double-quote
    > character (")
    > then it encloses attribute value in single-quotes ('). Certainly, it's
    > correct from XML spec point of view.


    It is, but if the attribute value contains _both_ an ASCII quotation
    mark " _and_ an ASCII apostrophe ' (which is admittedly rare), then
    either of them _must_ be "escaped".

    > I'm not sure if this approach is correct.


    I still don't know what the problem or question is about. You are saying
    that the output format is correct. The internal format is not really an XML
    issue and mostly a practical question: you need to know the internal format
    in order to play with it.

    What we _can_ say is that in processing XML data, &quot; and " (assuming a
    context where " may appear) must be treated as identical. The distinction
    should normally be lost in parsing, but if it is preserved in the internal
    format, it should not affect processing of the data as XML. (The
    distinction could be retained e.g. in order to be able to print out the
    original XML source verbatim for some purpose.)

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Jukka K. Korpela, Oct 23, 2005
    #4
  5. In article <Xns96F8BD50B302Cjkorpelacstutfi@193.229.0.31>,
    Jukka K. Korpela <> wrote:

    >It is, but if the attribute value contains _both_ an ASCII quotation
    >mark " _and_ an ASCII apostrophe ' (which is admittedly rare)


    Not that rare: in an XSLT stylesheet an XPath may well contain a
    string containing a quote. If you want an XPath string containing
    both you're stuck!

    -- Richard
    Richard Tobin, Oct 23, 2005
    #5
  6. Jukka K. Korpela wrote:
    > "Mateusz Loskot" <> wrote:
    >
    > > I'm not sure if this approach is correct.

    >
    > I still don't know what the problem or question is about. You are saying
    > that the output format is correct. The internal format is not really an XML
    > issue and mostly a practical question: you need to know the internal format
    > in order to play with it.
    >
    > What we _can_ say is that in processing XML data, &quot; and " (assuming a
    > context where " may appear) must be treated as identical.


    Yes, I understand it. The problem seems to be more technical and
    implementation related:

    http://sourceforge.net/forum/forum.php?thread_id=1370207&forum_id=172103

    You can see that TinyXML parser works differently depending on C/C++
    internal usage.

    We are sure that when using every XML parser if I search XML element
    for " then both &quot; and " (double-quotes) are expected to be
    matched.

    Cheers

    --
    Mateusz Loskot
    http://mateusz.loskot.net
    Mateusz Loskot, Oct 23, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Phil Winstanley [Microsoft MVP ASP.NET]

    Re: validateRequest=&quot;false&quot; not working in web.config or page directive

    Phil Winstanley [Microsoft MVP ASP.NET], May 16, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    640
    Phil Winstanley [Microsoft MVP ASP.NET]
    May 16, 2004
  2. usr2003
    Replies:
    4
    Views:
    582
    usr2003
    Sep 19, 2003
  3. Frank Schmitt
    Replies:
    0
    Views:
    487
    Frank Schmitt
    Oct 14, 2003
  4. Ahti Legonkov
    Replies:
    0
    Views:
    502
    Ahti Legonkov
    Dec 12, 2003
  5. markla
    Replies:
    1
    Views:
    540
    Steven Cheng
    Oct 6, 2008
Loading...

Share This Page