Maintaining a Great-than Character in an Attribute Value

Discussion in 'XML' started by gooooglegroups@yahoo.co.uk, Aug 15, 2006.

  1. Guest

    I want to transform the following xml file


    ------------------------------------------------------------------------
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <a>
    <b attrib="if 3 > 2">
    </b>

    <b attrib="3 > 1">
    </b>
    </a>

    ------------------------------------------------------------------------

    into this xml file


    ------------------------------------------------------------------------
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <a>
    <b attrib="3 > 2">
    </b>

    <b attrib="3 > 1">
    </b>
    </a>

    ------------------------------------------------------------------------

    i.e. I want to remove the "if" character from the start of the value of
    the attribute "attrib"

    I am using the style sheet...


    ------------------------------------------------------------------------
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:template match="node( ) | @*">
    <xsl:copy>
    <xsl:apply-templates select="@* | node( )"/>
    </xsl:copy>
    </xsl:template>

    <xsl:template match="@attrib[starts-with(., 'if ')]">
    <xsl:attribute name="attrib">
    <xsl:value-of select="substring-after(., 'if ')"/>
    </xsl:attribute>
    </xsl:template>
    </xsl:stylesheet>

    ------------------------------------------------------------------------

    with "Xalan Version Xalan Java 2.4.1", but I get the following output


    ------------------------------------------------------------------------
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <a>
    <b attrib="3 &gt; 2">
    </b>

    <b attrib="3 &gt; 1">
    </b>
    </a>

    ------------------------------------------------------------------------

    where the greater-than character is changed to &gt; .

    I need to have the single > character in the output also.

    Is there any way in XSLT of keeping the greater-than and less-than
    characters > and < in attribute values when you transform instead of
    having &gt; and &lt; ?

    If there is no way to achieve this in XSLT, what would be the
    recommended method for achieving this?

    Any help\pointers greatly appreciated,

    Regards,

    Metric
     
    , Aug 15, 2006
    #1
    1. Advertising

  2. wrote:


    > with "Xalan Version Xalan Java 2.4.1", but I get the following output


    > <b attrib="3 &gt; 2">


    > where the greater-than character is changed to &gt; .
    >
    > I need to have the single > character in the output also.


    Why, any XML parser/tool should properly unescape the &gt; entity
    reference as the '>' character?

    If you want '>' then I guess you need to write your own serializer to
    serialize the result tree of the XSLT transformation.


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Aug 15, 2006
    #2
    1. Advertising

  3. Andy Dingley Guest

    wrote:

    > where the greater-than character is changed to &gt; .
    > I need to have the single > character in the output also.


    Then stop needing that.
    http://www.w3.org/TR/2004/REC-xml-20040204/#syntax
    The character ">" MAY be encoded as &gt; where you're encountering it.
    So it's an error if your XML-consuming application doesn't recognise
    that. You should concentrate on fixing that, not working around its
    errors. Otherwise these errors build up and you build a non-robust
    system.
     
    Andy Dingley, Aug 15, 2006
    #3
  4. > Is there any way in XSLT of keeping the greater-than and less-than
    > characters > and < in attribute values when you transform instead of
    > having &gt; and &lt; ?


    Not in XSLT by itself, no. Write your own serializer, or (simpler) write
    a text-processing-based postprocessor.

    Better answer: Don't fix what ain't broke. If your problem is that some
    downstream tool cares about this difference, fix that tool.

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
     
    Joe Kesselman, Aug 16, 2006
    #4
  5. Guest

    Thanks for the replies.

    I need to keep the great-than character, <, in the attribute value of
    the original XML file. So it looks like I will have to use another
    approach other than, or in addition to, XSLT.

    Thanks.

    wrote:
    > I want to transform the following xml file
    >
    >
    > ------------------------------------------------------------------------
    > <?xml version="1.0" encoding="ISO-8859-1"?>
    > <a>
    > <b attrib="if 3 > 2">
    > </b>
    >
    > <b attrib="3 > 1">
    > </b>
    > </a>
    >
    > ------------------------------------------------------------------------
    >
    > into this xml file
    >
    >
    > ------------------------------------------------------------------------
    > <?xml version="1.0" encoding="ISO-8859-1"?>
    > <a>
    > <b attrib="3 > 2">
    > </b>
    >
    > <b attrib="3 > 1">
    > </b>
    > </a>
    >
    > ------------------------------------------------------------------------
    >
    > i.e. I want to remove the "if" character from the start of the value of
    > the attribute "attrib"
    >
    > I am using the style sheet...
    >
    >
    > ------------------------------------------------------------------------
    > <?xml version="1.0" encoding="ISO-8859-1"?>
    > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    > version="1.0">
    > <xsl:template match="node( ) | @*">
    > <xsl:copy>
    > <xsl:apply-templates select="@* | node( )"/>
    > </xsl:copy>
    > </xsl:template>
    >
    > <xsl:template match="@attrib[starts-with(., 'if ')]">
    > <xsl:attribute name="attrib">
    > <xsl:value-of select="substring-after(., 'if ')"/>
    > </xsl:attribute>
    > </xsl:template>
    > </xsl:stylesheet>
    >
    > ------------------------------------------------------------------------
    >
    > with "Xalan Version Xalan Java 2.4.1", but I get the following output
    >
    >
    > ------------------------------------------------------------------------
    > <?xml version="1.0" encoding="ISO-8859-1"?>
    > <a>
    > <b attrib="3 &gt; 2">
    > </b>
    >
    > <b attrib="3 &gt; 1">
    > </b>
    > </a>
    >
    > ------------------------------------------------------------------------
    >
    > where the greater-than character is changed to &gt; .
    >
    > I need to have the single > character in the output also.
    >
    > Is there any way in XSLT of keeping the greater-than and less-than
    > characters > and < in attribute values when you transform instead of
    > having &gt; and &lt; ?
    >
    > If there is no way to achieve this in XSLT, what would be the
    > recommended method for achieving this?
    >
    > Any help\pointers greatly appreciated,
    >
    > Regards,
    >
    > Metric
     
    , Aug 16, 2006
    #5
  6. wrote:
    > I need to keep the great-than character, <, in the attribute value of
    > the original XML file.


    I still say that the right fix is to undo that requirement. Your milage
    may vary.

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
     
    Joseph Kesselman, Aug 16, 2006
    #6
  7. In article <>,
    <> wrote:

    >I need to keep the great-than character, <, in the attribute value of
    >the original XML file.


    (You mean > presumably.)

    You need to explain *why* you have to keep it. People aren't going to
    spend much time helping you to do something they think is pointless.

    -- Richard
     
    Richard Tobin, Aug 16, 2006
    #7
  8. Andy Dingley Guest

    wrote:

    > I need to keep the great-than character, <, in the attribute value of
    > the original XML file.


    Less than or greater than ? The permitted use of each character is
    different in XML. A greater than ">" _MAY_ be replaced by the entity
    reference, is acceptable as a character, and must always be parseable
    by "downstream" tools whether it appears as a character or an entity
    reference.

    A less than character "<" is right out. That's just not well-formed if
    used there. Verboten.


    You can still keep XSLT if you produce the output as a DOM and
    serialize it to a file yourself. So long as you can guarantee you'll
    avoid encoding issues, namespaces and <![CDATA[ sections then it's not
    hard to DIY it.
     
    Andy Dingley, Aug 16, 2006
    #8
  9. Guest

    Andy Dingley wrote:

    > You can still keep XSLT if you produce the output as a DOM and
    > serialize it to a file yourself. So long as you can guarantee you'll
    > avoid encoding issues, namespaces and <![CDATA[ sections then it's not
    > hard to DIY it.


    I have both less-than and greater-than characters in the attribute
    value of the input file. I need these less-than and greater-than
    characters in the output file also.

    Will this work even with a less-than character?
     
    , Aug 16, 2006
    #9
  10. Andy Dingley Guest

    wrote:

    > Will this work even with a less-than character?


    How should I know? That's simply not XML. You do that, you're out on
    your own.
     
    Andy Dingley, Aug 16, 2006
    #10
  11. Guest

    Andy Dingley wrote:
    > wrote:
    >
    > > Will this work even with a less-than character?

    >
    > How should I know? That's simply not XML. You do that, you're out on
    > your own.


    Ok thanks, I thought you were implying in your previous post that it is
    possible to use XSLT and keep the greater-than and less-than characters
    in the input and output files.

    So if I want to keep the greater-than and less-than characters in the
    input and output files then this cannot be achieved solely through the
    use of XSLT (as the input file is not valid xml).
     
    , Aug 16, 2006
    #11
  12. wrote:
    > So if I want to keep the greater-than and less-than characters in the
    > input and output files then this cannot be achieved solely through the
    > use of XSLT (as the input file is not valid xml).


    It's not even well-formed, and so not XML at all. So you can't use XML
    tools.
    --
    Johannes Koch
    Spem in alium nunquam habui praeter in te, Deus Israel.
    (Thomas Tallis, 40-part motet)
     
    Johannes Koch, Aug 16, 2006
    #12
  13. writes:

    > Andy Dingley wrote:
    >
    >> You can still keep XSLT if you produce the output as a DOM and
    >> serialize it to a file yourself. So long as you can guarantee you'll
    >> avoid encoding issues, namespaces and <![CDATA[ sections then it's not
    >> hard to DIY it.

    >
    > I have both less-than and greater-than characters in the attribute
    > value of the input file. I need these less-than and greater-than
    > characters in the output file also.


    There may be a form of category error here. When you speak of having
    characters "in the attribute value of the input file", you seem to be
    saying you have XML input. When you speak of a literal '<' as being
    one of those characters, you are clearly saying you have non-XML
    input. I conclude that I haven't got the faintest idea what you are
    talking about. And frankly, I'm not certain about you, either.

    At the infoset level, both input and output can contain < and > in any
    attribute value. At the XML serialization level, < must and > may be
    escaped. If your plan is to use XSLT to produce XML output, then your
    downstream apps presumably can consume XML, and will have no trouble
    with the representation of < as &lt; in the output. (In which case
    your only problem is that you think you have a problem.) If your plan
    is to use XSLT to produce non-XML output, then it's not clear to me
    that you have a problem, since when writing out < and > in text mode,
    XSLT 1.0 processors won't escape them.

    > Will this work even with a less-than character?


    What is the antecedent of 'this'?


    No, wait. Don't answer. Before you post another message to this
    thread, I recommend that you read Eric Raymond's essay "How to ask
    questions the smart way". You can find it on the Web at
    http://www.catb.org/~esr/faqs/smart-questions.html

    best,

    C. M. Sperberg-McQueen
     
    C. M. Sperberg-McQueen, Aug 17, 2006
    #13
  14. Guest

    > There may be a form of category error here.

    Yes, I agree, my original post stated that the file I wanted to
    transform was an XML file, whereas it is not valid XML as it contains
    the greater-than character, >, in an attribute value.

    > When you speak of having
    > characters "in the attribute value of the input file", you seem to be
    > saying you have XML input.


    I recommend that you read my previous post, where I state "the input
    file is not valid xml". It is quite clear that I've already said, and
    recognised the fact, that the input is "non-XML". So no, I am not
    saying I have XML input.

    This fact, that the file is not valid XML, was already pointed out by a
    previous poster, Andy Dingley, when he referred me to
    http://www.w3.org/TR/2004/REC-xml-20040204/#syntax

    But thanks for reiterating the point.
     
    , Aug 17, 2006
    #14
  15. Andy Dingley Guest

    wrote:

    > Yes, I agree, my original post stated that the file I wanted to
    > transform was an XML file, whereas it is not valid XML as it contains
    > the greater-than character, >, in an attribute value.


    First of all, the problem is over "well-formed" XML, not "valid" XML.
    "Valid" has a special meaning and we aren't even close to that yet.

    Secondly this _doesn't_ mean that the file isn't well-formed XML.

    "<" and ">" are a problem for XML, so it's possible to replace them
    with &lt; and &gt; It's common good practice to do this everywhere
    (except obviously when they're actually delimiting tags).

    However it's also possible to use the ">" greater-than character
    directly in markup. This doesn't make the file non well-formed and to a
    smart-enough parser there's no ambiguity. It's not friendly to humans
    though, so we tend not to do it.

    This is different from "<". Using "<" would cause parsing problems,
    even to a good parser, so that's more strongly forbidden than ">".

    It's fundamental that any "correctly working" XML tool _must_ always
    accept either of these entity references instead of the character. The
    behaviour when parsing XML is that either of these forms can be used
    and they both mean the same thing -- the parser must recognise both.
    Your downstream tool doesn't do this, so the tool has a bug in it that
    ought to be fixed.

    So in the case of ">", then both forms (character and entity reference)
    are acceptable. You can use either, a parser must understand both, and
    a serialiser can generate whichever it prefers, so long as it's
    well-formed (as all parsers understand both, this doesn't matter). XML
    allows several character-by-character differences in files that still
    represent the same content.

    If your file only contains ">" and not "<", then it's probably
    well-formed. But this doesn't mean that another serialiser would
    generate that exact same file to represent the same content. Most would
    put &gt; in there instead, and they're quite correct to do so.
     
    Andy Dingley, Aug 17, 2006
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    4,419
    Joe Kesselman
    Aug 2, 2006
  2. Replies:
    8
    Views:
    536
  3. George2
    Replies:
    2
    Views:
    407
    James Kanze
    Jan 25, 2008
  4. Replies:
    1
    Views:
    110
    Shane
    Nov 29, 2005
  5. Eric I.
    Replies:
    0
    Views:
    287
    Eric I.
    Oct 5, 2008
Loading...

Share This Page