Maintaining a Great-than Character in an Attribute Value

G

gooooglegroups

I want to transform the following xml file


------------------------------------------------------------------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<a>
<b attrib="if 3 > 2">
</b>

<b attrib="3 > 1">
</b>
</a>

------------------------------------------------------------------------

into this xml file


------------------------------------------------------------------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<a>
<b attrib="3 > 2">
</b>

<b attrib="3 > 1">
</b>
</a>

------------------------------------------------------------------------

i.e. I want to remove the "if" character from the start of the value of
the attribute "attrib"

I am using the style sheet...


------------------------------------------------------------------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="node( ) | @*">
<xsl:copy>
<xsl:apply-templates select="@* | node( )"/>
</xsl:copy>
</xsl:template>

<xsl:template match="@attrib[starts-with(., 'if ')]">
<xsl:attribute name="attrib">
<xsl:value-of select="substring-after(., 'if ')"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>

------------------------------------------------------------------------

with "Xalan Version Xalan Java 2.4.1", but I get the following output


------------------------------------------------------------------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<a>
<b attrib="3 &gt; 2">
</b>

<b attrib="3 &gt; 1">
</b>
</a>

------------------------------------------------------------------------

where the greater-than character is changed to &gt; .

I need to have the single > character in the output also.

Is there any way in XSLT of keeping the greater-than and less-than
characters > and < in attribute values when you transform instead of
having &gt; and &lt; ?

If there is no way to achieve this in XSLT, what would be the
recommended method for achieving this?

Any help\pointers greatly appreciated,

Regards,

Metric
 
M

Martin Honnen

with "Xalan Version Xalan Java 2.4.1", but I get the following output
<b attrib="3 &gt; 2">
where the greater-than character is changed to &gt; .

I need to have the single > character in the output also.

Why, any XML parser/tool should properly unescape the &gt; entity
reference as the '>' character?

If you want '>' then I guess you need to write your own serializer to
serialize the result tree of the XSLT transformation.
 
A

Andy Dingley

where the greater-than character is changed to &gt; .
I need to have the single > character in the output also.

Then stop needing that.
http://www.w3.org/TR/2004/REC-xml-20040204/#syntax
The character ">" MAY be encoded as &gt; where you're encountering it.
So it's an error if your XML-consuming application doesn't recognise
that. You should concentrate on fixing that, not working around its
errors. Otherwise these errors build up and you build a non-robust
system.
 
J

Joe Kesselman

Is there any way in XSLT of keeping the greater-than and less-than
characters > and < in attribute values when you transform instead of
having &gt; and &lt; ?

Not in XSLT by itself, no. Write your own serializer, or (simpler) write
a text-processing-based postprocessor.

Better answer: Don't fix what ain't broke. If your problem is that some
downstream tool cares about this difference, fix that tool.
 
G

gooooglegroups

Thanks for the replies.

I need to keep the great-than character, <, in the attribute value of
the original XML file. So it looks like I will have to use another
approach other than, or in addition to, XSLT.

Thanks.
 
J

Joseph Kesselman

I need to keep the great-than character, <, in the attribute value of
the original XML file.

I still say that the right fix is to undo that requirement. Your milage
may vary.
 
R

Richard Tobin

I need to keep the great-than character, <, in the attribute value of
the original XML file.

(You mean > presumably.)

You need to explain *why* you have to keep it. People aren't going to
spend much time helping you to do something they think is pointless.

-- Richard
 
A

Andy Dingley

I need to keep the great-than character, <, in the attribute value of
the original XML file.

Less than or greater than ? The permitted use of each character is
different in XML. A greater than ">" _MAY_ be replaced by the entity
reference, is acceptable as a character, and must always be parseable
by "downstream" tools whether it appears as a character or an entity
reference.

A less than character "<" is right out. That's just not well-formed if
used there. Verboten.


You can still keep XSLT if you produce the output as a DOM and
serialize it to a file yourself. So long as you can guarantee you'll
avoid encoding issues, namespaces and <![CDATA[ sections then it's not
hard to DIY it.
 
G

gooooglegroups

Andy said:
You can still keep XSLT if you produce the output as a DOM and
serialize it to a file yourself. So long as you can guarantee you'll
avoid encoding issues, namespaces and <![CDATA[ sections then it's not
hard to DIY it.

I have both less-than and greater-than characters in the attribute
value of the input file. I need these less-than and greater-than
characters in the output file also.

Will this work even with a less-than character?
 
G

gooooglegroups

Andy said:
How should I know? That's simply not XML. You do that, you're out on
your own.

Ok thanks, I thought you were implying in your previous post that it is
possible to use XSLT and keep the greater-than and less-than characters
in the input and output files.

So if I want to keep the greater-than and less-than characters in the
input and output files then this cannot be achieved solely through the
use of XSLT (as the input file is not valid xml).
 
J

Johannes Koch

So if I want to keep the greater-than and less-than characters in the
input and output files then this cannot be achieved solely through the
use of XSLT (as the input file is not valid xml).

It's not even well-formed, and so not XML at all. So you can't use XML
tools.
 
C

C. M. Sperberg-McQueen

Andy said:
You can still keep XSLT if you produce the output as a DOM and
serialize it to a file yourself. So long as you can guarantee you'll
avoid encoding issues, namespaces and <![CDATA[ sections then it's not
hard to DIY it.

I have both less-than and greater-than characters in the attribute
value of the input file. I need these less-than and greater-than
characters in the output file also.

There may be a form of category error here. When you speak of having
characters "in the attribute value of the input file", you seem to be
saying you have XML input. When you speak of a literal '<' as being
one of those characters, you are clearly saying you have non-XML
input. I conclude that I haven't got the faintest idea what you are
talking about. And frankly, I'm not certain about you, either.

At the infoset level, both input and output can contain < and > in any
attribute value. At the XML serialization level, < must and > may be
escaped. If your plan is to use XSLT to produce XML output, then your
downstream apps presumably can consume XML, and will have no trouble
with the representation of < as &lt; in the output. (In which case
your only problem is that you think you have a problem.) If your plan
is to use XSLT to produce non-XML output, then it's not clear to me
Will this work even with a less-than character?

What is the antecedent of 'this'?


No, wait. Don't answer. Before you post another message to this
thread, I recommend that you read Eric Raymond's essay "How to ask
questions the smart way". You can find it on the Web at
http://www.catb.org/~esr/faqs/smart-questions.html

best,

C. M. Sperberg-McQueen
 
G

gooooglegroups

There may be a form of category error here.

Yes, I agree, my original post stated that the file I wanted to
transform was an XML file, whereas it is not valid XML as it contains
the greater-than character, >, in an attribute value.
When you speak of having
characters "in the attribute value of the input file", you seem to be
saying you have XML input.

I recommend that you read my previous post, where I state "the input
file is not valid xml". It is quite clear that I've already said, and
recognised the fact, that the input is "non-XML". So no, I am not
saying I have XML input.

This fact, that the file is not valid XML, was already pointed out by a
previous poster, Andy Dingley, when he referred me to
http://www.w3.org/TR/2004/REC-xml-20040204/#syntax

But thanks for reiterating the point.
 
A

Andy Dingley

Yes, I agree, my original post stated that the file I wanted to
transform was an XML file, whereas it is not valid XML as it contains
the greater-than character, >, in an attribute value.

First of all, the problem is over "well-formed" XML, not "valid" XML.
"Valid" has a special meaning and we aren't even close to that yet.

Secondly this _doesn't_ mean that the file isn't well-formed XML.

"<" and ">" are a problem for XML, so it's possible to replace them
with &lt; and &gt; It's common good practice to do this everywhere
(except obviously when they're actually delimiting tags).

However it's also possible to use the ">" greater-than character
directly in markup. This doesn't make the file non well-formed and to a
smart-enough parser there's no ambiguity. It's not friendly to humans
though, so we tend not to do it.

This is different from "<". Using "<" would cause parsing problems,
even to a good parser, so that's more strongly forbidden than ">".

It's fundamental that any "correctly working" XML tool _must_ always
accept either of these entity references instead of the character. The
behaviour when parsing XML is that either of these forms can be used
and they both mean the same thing -- the parser must recognise both.
Your downstream tool doesn't do this, so the tool has a bug in it that
ought to be fixed.

So in the case of ">", then both forms (character and entity reference)
are acceptable. You can use either, a parser must understand both, and
a serialiser can generate whichever it prefers, so long as it's
well-formed (as all parsers understand both, this doesn't matter). XML
allows several character-by-character differences in files that still
represent the same content.

If your file only contains ">" and not "<", then it's probably
well-formed. But this doesn't mean that another serialiser would
generate that exact same file to represent the same content. Most would
put &gt; in there instead, and they're quite correct to do so.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top