XML::Xerces questions

Discussion in 'Perl Misc' started by Arvin Portlock, Apr 20, 2004.

  1. I'm using the XML::Xerces module to validate batches of
    XML documents against a schema. The module is still under
    development so there is little documentation that I can
    find, but I'm still finding it incredibly useful. I have
    4 questions that would enhance my Xerces experience greatly.

    1. The way to get validation errors seems incredibly odd
    to me:

    eval {$parser->parse ($file)};
    print $@;

    Is this the only way to get at error messages? Via $@?
    Does this wrapper provide a more direct method? Does this
    seem odd to anybody else in the perl community or is
    it just me?

    2. Is there any way to use local copies of the schemas
    rather than have Xerces fetch them from the web? In my
    XML documents the referenced schemas have the form:

    xsi:schemaLocation="http://www.loc.gov/standards/mets/mets.xsd"

    I.e., they are all URLs. I think this is why Xerces is so
    slow. As I'd like to use this module to validate batches
    of thousands of documents, it would be nice if Xerces didn't
    have to go out and fetch the schemas for every single document.

    3. Xerces stops validating after the first error encountered.
    Is there any way to get it to report all the errors in the
    documents. I understand what the standard says about parsers
    and errors, but evry other validator I know about has an option
    to continue validation after an error. Is there a similar option
    for Xerces?

    4. Lastly, $@ reports errors in this form:

    ERROR:
    FILE: D:\sgml\mets\tei2mets/test.mets.xml
    LINE: 34
    COLUMN: 27
    MESSAGE: Unknown element 'mods:namePPart'
    at validate.pl line 13

    So I need to parse out the various pieces using regular expressions
    to compose messages in the form I want. So I guess this is a repeat
    of the first question: is there a way to get direct access to the
    pieces of the error message?

    I'd prefer to not change or extend Xerces.pm itself.

    Thanks!
     
    Arvin Portlock, Apr 20, 2004
    #1
    1. Advertising

  2. Arvin Portlock <> wrote:

    > 1. The way to get validation errors seems incredibly odd
    > to me:
    >
    > eval {$parser->parse ($file)};
    > print $@;
    >
    > Is this the only way to get at error messages? Via $@?
    > Does this wrapper provide a more direct method? Does this
    > seem odd to anybody else in the perl community or is
    > it just me?



    perldoc -f eval

    ...
    It is also Perl's exception trapping mechanism


    "eval BLOCK" and "if $@" is Perl's "try" and "catch" mechanism.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Apr 20, 2004
    #2
    1. Advertising

  3. Arvin Portlock

    pkent Guest

    In article <c63m7q$kt9$>,
    Arvin Portlock <> wrote:

    > 1. The way to get validation errors seems incredibly odd
    > to me:
    >
    > eval {$parser->parse ($file)};
    > print $@;


    It looks like parse() throws a fatal error, i.e. a die(), when it hits
    an error. A die() will basically exit the program unless you catch the
    exception in an eval block. And the thing that was caught in the eval
    block is held in the special variable $@.

    Now, sometimes the sensible thing to do when you encounter an
    unrecoverable error is to throw a fatal exception... sometimes it's
    sensible to return 'undef' and allow the caller to interrogate the
    object using a method such as lastError() or something... or maybe some
    other approach.

    Sometimes the user and the module-writer have different ideas, and you
    end up thinking "this is a stupid way to detect an error in an XML
    document".

    One underused (IME) feature of perl >5.005 are exception objects. This
    is where you call die() with an object, not a string. The object then
    ends up in $@ and you can call methods on it to examine the error. While
    this doesn't have Java's stricter model exceptions, it can still help
    out in cases like yours where currently you're just getting an error
    _string_ and you want to parse that string in some way or get other
    information.

    Some discussion at
    http://www.perl.com/pub/a/2002/11/14/exception.html

    P

    --
    pkent 77 at yahoo dot, er... what's the last bit, oh yes, com
    Remove the tea to reply
     
    pkent, Apr 20, 2004
    #3
  4. Oh I know what eval {} and $@ are all about. I'm just used
    to seeing it as a way to catch runtime errrors, not as a built
    in interface within a module to record messages. In fact
    Xerces itself will experience runtime errors for certain
    conditions. So it's still important to trap them in an eval.
    But reporting validation errors doesn't seem the best use
    for this, especially in a program which has as one of its
    main functions the ability to validate a document. I was
    hoping for something more along the lines of:

    my $status = $parser->parse ($file);
    if ($status->errors) {
    until ($status->errors->EOF) {
    print $status->errors->error;
    $status->errors->move_next();
    }
    }

    or something along those lines.

    Tad McClellan wrote:

    > Arvin Portlock wrote:
    >
    >
    > >1. The way to get validation errors seems incredibly odd
    > >to me:
    > >
    > >eval {$parser->parse ($file)};
    > >print $@;
    > >
    > >Is this the only way to get at error messages? Via $@?
    > >Does this wrapper provide a more direct method? Does this
    > >seem odd to anybody else in the perl community or is
    > >it just me?

    >
    >
    >
    > perldoc -f eval
    >
    > ...
    > It is also Perl's exception trapping mechanism
    >
    >
    > "eval BLOCK" and "if $@" is Perl's "try" and "catch" mechanism.
    >
    >
     
    Arvin Portlock, Apr 20, 2004
    #4
  5. pkent wrote:

    > One underused (IME) feature of perl >5.005 are exception objects. This
    > is where you call die() with an object, not a string. The object then
    > ends up in $@ and you can call methods on it to examine the error. While
    > this doesn't have Java's stricter model exceptions, it can still help
    > out in cases like yours where currently you're just getting an error
    > _string_ and you want to parse that string in some way or get other
    > information.


    XML::Xerces uses exception objects.
     
    Steven N. Hirsch, Apr 22, 2004
    #5
  6. > and you can call methods on it to examine the error.

    Thanks! This makes things conceptually clearer for me.



    Steven N. Hirsch wrote:

    > pkent wrote:
    >
    > > One underused (IME) feature of perl >5.005 are exception objects. This
    > > is where you call die() with an object, not a string. The object then
    > > ends up in $@ and you can call methods on it to examine the error.
    > > While this doesn't have Java's stricter model exceptions, it can still
    > > help out in cases like yours where currently you're just getting an
    > > error _string_ and you want to parse that string in some way or get
    > > other information.

    >
    >
    > XML::Xerces uses exception objects.
     
    Arvin Portlock, Apr 22, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jimmy Zhang

    a couple of questions on Xerces

    Jimmy Zhang, May 28, 2004, in forum: Java
    Replies:
    0
    Views:
    294
    Jimmy Zhang
    May 28, 2004
  2. CB
    Replies:
    0
    Views:
    539
  3. Jimmy Zhang

    a couple of questions on Xerces

    Jimmy Zhang, May 28, 2004, in forum: XML
    Replies:
    0
    Views:
    339
    Jimmy Zhang
    May 28, 2004
  4. cvissy
    Replies:
    0
    Views:
    630
    cvissy
    Nov 16, 2004
  5. Arvin Portlock
    Replies:
    0
    Views:
    160
    Arvin Portlock
    Jun 20, 2005
Loading...

Share This Page