What kinds things can be verified of XML files?

C

Cambridge Ray

The question is so abstract, I guess I have to illustrate. One of my
XML files contains a set of rectangular coordinates:

<reference>
<line x1="416" y1="6436" x2="416" y2="3924" />
<line x1="420" y1="6436" x2="420" y2="3924" />
<line x1="1500" y1="5388" x2="1500" y2="4452" />
<line x1="1504" y1="4436" x2="1504" y2="3924" />
<line x1="2884" y1="5388" x2="2884" y2="4456" />
<line x1="412" y1="4436" x2="412" y2="3932" />
</reference>

I would like to make sure that every X2 is greater than or equal to
its X1 companion. Same for Y2 and Y1. Is this something that can be
easily checked at the XML level, or should I perform such check after
the XML file is read and parsed?

I use Xerces-C++.

TIA,

-Ramon
 
C

Cambridge Ray

The question is so abstract, I guess I have to illustrate. One of my
XML files contains a set of rectangular coordinates:

<reference>
    <line x1="416" y1="6436" x2="416" y2="3924" />
    <line x1="420" y1="6436" x2="420" y2="3924" />
    <line x1="1500" y1="5388" x2="1500" y2="4452" />
    <line x1="1504" y1="4436" x2="1504" y2="3924" />
    <line x1="2884" y1="5388" x2="2884" y2="4456" />
    <line x1="412" y1="4436" x2="412" y2="3932" />
</reference>

I would like to make sure that every X2 is greater than or equal to
its X1 companion. Same for Y2 and Y1. Is this something that can be
easily checked at the XML level, or should I perform such check after
the XML file is read and parsed?

I use Xerces-C++.

TIA,

-Ramon

Here's another example. What I would like to check is that the
successive coordinates have an ascending order, and the "skip" element
should only contain 0 and 1 values. Can this be (relatively) easily be
verified at the XML level, or should I do it after the XML file is
read and parsed?

TIA,

-Ramon

-----------

<rows>
<coord>3449</coord>
<coord>3600</coord>
<coord>3893</coord>
<coord>4196</coord>
<coord>4340</coord>
<coord>4644</coord>
<coord>4941</coord>
<coord>5242</coord>
<coord>5541</coord>
</rows>

<columns>
<coord>278</coord>
<coord>876</coord>
<coord>1174</coord>
<coord>1783</coord>
<coord>2555</coord>
<coord>3154</coord>
<coord>4068</coord>
<coord>4825</coord>
</columns>

<skip>
<coord>0</coord>
<coord>1</coord>
<coord>1</coord>
<coord>0</coord>
<coord>1</coord>
</skip>
 
J

Joe Kesselman

The standard XML DTD and Schema languages can't express that kind of
interaction; you'd need to implement it at a higher level of your
application. Basically, if something is application semantics the
application has to deal with it; if it's closer to syntax (type and
range limits, and many but not all kinds of document structure
constraint) schema can check it.

There have been alternatives to the W3C's XML Schema language which can
implement more complicated constraints. The problem is that they aren't
as well standardized or as widely supported, so you really can't count
on anyone else using them. They may still be useful within some
controlled contexts, as an alternative to hand-coding.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
T

Tim Arnold

The standard XML DTD and Schema languages can't express that kind of
interaction; you'd need to implement it at a higher level of your
application. Basically, if something is application semantics the
application has to deal with it; if it's closer to syntax (type and
range limits, and many but not all kinds of document structure
constraint) schema can check it.

There have been alternatives to the W3C's XML Schema language which can
implement more complicated constraints. The problem is that they aren't
as well standardized or as widely supported, so you really can't count
on anyone else using them. They may still be useful within some
controlled contexts, as an alternative to hand-coding.

Hi Joe,
Just curious if schematron with its 'let' and 'value-of' abilities could
be of help for the OP?
thanks,
--Tim
 
J

Joe Kesselman

Just curious if schematron with its 'let' and 'value-of' abilities could
be of help for the OP?

I believe Schematron can express this kind of constraint... if you are
in an environment where you can guarantee that Schematron will be
available on the machine in question. In other words, it might be
reasonable to apply this on the server end where you own all the code,
but unless you can also guarantee that nobody but you will be writing
clients you may not be able to do much with it on that end -- and if you
ARE writing all the clients, you can usually ensure the data is correct
in the first place rather than spending cycles checking it.



--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
M

Martin Honnen

Joe said:
The standard XML DTD and Schema languages can't express that kind of
interaction;

It might be worth noting that the version 1.1 of the schema language is
in the state "Candidate Recommendation" and with that you are able to
define assertions http://www.w3.org/TR/xmlschema11-1/#cAssertions e.g.
<xs:assert test="@x2 ge @x1"/>
I think there is a version of Xerces Java that does implement that
already. And Saxon's commercial schema processor also supports that.
 
J

Joe Kesselman

It might be worth noting that the version 1.1 of the schema language is
in the state "Candidate Recommendation" and with that you are able to
define assertions http://www.w3.org/TR/xmlschema11-1/#cAssertions e.g.
<xs:assert test="@x2 ge @x1"/>

Good point. I'd hesitate to _rely_ on Schema 1.1 until it graduates to
Recommendation -- and even then, not all parsers will support it
promptly -- but it's certainly reasonable to start prototyping against
it if you have it available.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
P

Peter Flynn

The standard XML DTD and Schema languages can't express that kind of
interaction; you'd need to implement it at a higher level of your
application. Basically, if something is application semantics the
application has to deal with it; if it's closer to syntax (type and
range limits, and many but not all kinds of document structure
constraint) schema can check it.

There have been alternatives to the W3C's XML Schema language which can
implement more complicated constraints. The problem is that they aren't
as well standardized or as widely supported, so you really can't count
on anyone else using them. They may still be useful within some
controlled contexts, as an alternative to hand-coding.

I think it's also important to establish what the objective is. The
typical sequence of events when an XML instance is processed can be
expressed as

1. syntactic verification (is the document well-formed)
2. formal validation (well-formed document tested against schema/dtd)
3. processing with whatever language/engine is specified, which may
involve further error-reporting, but at this stage the document
itself is presumed valid to its schema/dtd

The expectation is that if steps 1 or 2 fail, no further action takes
place, although a processor can report an error and even try to fix it,
which may involve digging further into the document to see what is going
on; but it cannot continue as if nothing had happened.

If you specify a constraint at the level of the Schema or DTD then
presumably you do so because you want to prevent the instance being
processed if it fails a well-formedness or validation test.

In effect, an assertion such as Martin mentions (that one attribute has
to be bigger than another) becomes a breaking-point. So we need to
consider how big a deal this is. The document is well-formed, because
validation will only take place if the document has passed (1) above. Is
the fact that <foo bar="42" blort="43"/> going to kill someone, or cause
the stock market to crash, or create a batch of dud chips, or just order
43 paperclips instead of 42? This level of analysis should indicate
whether such a test should cause the entire factory to come to a stop
and evacuate, or simply email a warning to the appropriate person.

I think what I am saying is, the fact that you *can* specify ever
tighter constraints doesn't necessarily mean that it is the right
business decision to do so, because the effects of premature validation
failure can be just as serious as those of remaining undetected until later.

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top