XML::Xerces questions

A

Arvin Portlock

I'm using the XML::Xerces module to validate batches of
XML documents against a schema. The module is still under
development so there is little documentation that I can
find, but I'm still finding it incredibly useful. I have
4 questions that would enhance my Xerces experience greatly.

1. The way to get validation errors seems incredibly odd
to me:

eval {$parser->parse ($file)};
print $@;

Is this the only way to get at error messages? Via $@?
Does this wrapper provide a more direct method? Does this
seem odd to anybody else in the perl community or is
it just me?

2. Is there any way to use local copies of the schemas
rather than have Xerces fetch them from the web? In my
XML documents the referenced schemas have the form:

xsi:schemaLocation="http://www.loc.gov/standards/mets/mets.xsd"

I.e., they are all URLs. I think this is why Xerces is so
slow. As I'd like to use this module to validate batches
of thousands of documents, it would be nice if Xerces didn't
have to go out and fetch the schemas for every single document.

3. Xerces stops validating after the first error encountered.
Is there any way to get it to report all the errors in the
documents. I understand what the standard says about parsers
and errors, but evry other validator I know about has an option
to continue validation after an error. Is there a similar option
for Xerces?

4. Lastly, $@ reports errors in this form:

ERROR:
FILE: D:\sgml\mets\tei2mets/test.mets.xml
LINE: 34
COLUMN: 27
MESSAGE: Unknown element 'mods:namePPart'
at validate.pl line 13

So I need to parse out the various pieces using regular expressions
to compose messages in the form I want. So I guess this is a repeat
of the first question: is there a way to get direct access to the
pieces of the error message?

I'd prefer to not change or extend Xerces.pm itself.

Thanks!
 
T

Tad McClellan

Arvin Portlock said:
1. The way to get validation errors seems incredibly odd
to me:

eval {$parser->parse ($file)};
print $@;

Is this the only way to get at error messages? Via $@?
Does this wrapper provide a more direct method? Does this
seem odd to anybody else in the perl community or is
it just me?


perldoc -f eval

...
It is also Perl's exception trapping mechanism


"eval BLOCK" and "if $@" is Perl's "try" and "catch" mechanism.
 
P

pkent

Arvin Portlock said:
1. The way to get validation errors seems incredibly odd
to me:

eval {$parser->parse ($file)};
print $@;

It looks like parse() throws a fatal error, i.e. a die(), when it hits
an error. A die() will basically exit the program unless you catch the
exception in an eval block. And the thing that was caught in the eval
block is held in the special variable $@.

Now, sometimes the sensible thing to do when you encounter an
unrecoverable error is to throw a fatal exception... sometimes it's
sensible to return 'undef' and allow the caller to interrogate the
object using a method such as lastError() or something... or maybe some
other approach.

Sometimes the user and the module-writer have different ideas, and you
end up thinking "this is a stupid way to detect an error in an XML
document".

One underused (IME) feature of perl >5.005 are exception objects. This
is where you call die() with an object, not a string. The object then
ends up in $@ and you can call methods on it to examine the error. While
this doesn't have Java's stricter model exceptions, it can still help
out in cases like yours where currently you're just getting an error
_string_ and you want to parse that string in some way or get other
information.

Some discussion at
http://www.perl.com/pub/a/2002/11/14/exception.html

P
 
A

Arvin Portlock

Oh I know what eval {} and $@ are all about. I'm just used
to seeing it as a way to catch runtime errrors, not as a built
in interface within a module to record messages. In fact
Xerces itself will experience runtime errors for certain
conditions. So it's still important to trap them in an eval.
But reporting validation errors doesn't seem the best use
for this, especially in a program which has as one of its
main functions the ability to validate a document. I was
hoping for something more along the lines of:

my $status = $parser->parse ($file);
if ($status->errors) {
until ($status->errors->EOF) {
print $status->errors->error;
$status->errors->move_next();
}
}

or something along those lines.
 
S

Steven N. Hirsch

pkent said:
One underused (IME) feature of perl >5.005 are exception objects. This
is where you call die() with an object, not a string. The object then
ends up in $@ and you can call methods on it to examine the error. While
this doesn't have Java's stricter model exceptions, it can still help
out in cases like yours where currently you're just getting an error
_string_ and you want to parse that string in some way or get other
information.

XML::Xerces uses exception objects.
 
A

Arvin Portlock

and you can call methods on it to examine the error.

Thanks! This makes things conceptually clearer for me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top