validate XML file content

S

Sara

Hi all,

I have just started using XML::* modules for validating XML files and
I am trying to understand which module ('tree' or 'stream') would fit
for my requirement which is to get data (a specific nodelist) from a
simple external XML file based on user-input and use it to validate a
source XML file, more specifically the content of elements in the XML
file.

For instance the content of element 'figure' should start with a
string matching regex /^Fig\. \d+\b/. The external file would be
having the format of the element's content in regex format. I have
planned to use XML::XPath for reading the external XML file, but still
undecided about what to use for validating the source XML file because
of the following points.

Is there a better way of doing the following piece of code, in terms
of ease of maintenance and secondarily, code size.

sub start_element {
my ($self, $element) = @_;

if ($element->{Name} eq 'body') {
....
}
elsif ($element->{Name} eq 'head') {
....
}
}
Because if the element 'head' is removed or renamed then the code
would have to be changed. Instead if it was independent of the element
name this change would be eliminated.

Is XML::Checker the only module in CPAN to
1. check if ID of an element was defined
2. get number of times the ID was referenced?
I would prefer not to write less-optimized blocks of code if someone
has already done that in a far more better manner.

Finally, can someone please help or point me to help using namespaces
in SAX.

Thanks,
Sara
 
P

Peroli

hi sara,
Since you are a starter with XML, XML::* modules are pure perl
implementations XML parser. So if you need performance use XML::LibXML
module. It is implemented in C and more robust.
Considering the following XML Document (I think this is what you expect
)
<root>
<image>
<name>IMG_5000.gif</name>
<size>5000</size>
</image>
</root>

use strict;
use XML::LibXML ();

my $xmlfile = "somefile.xml";
my $xmlDom = undef;
eval {
$xmlDom = XML::LibXML->new()->parse_file($xmlfile);
};
die "can't parse xmlfile \n Error: $@\n" if($@);

foreach ($xmlDom->documentElement->findnodes('/root/image')) {
if($_->findvalue('name') =~ /^IMG_/) {
#dosomething
}
}

Doing this thing in SAX would require a new strategy. I think if you
are a newbie start with DOM, because its a lot easy to visualize the
whole problem.

Peroli Sivaprakasam
 
S

sa_ravenone

Hi,
Thanks for the reply, Peroli. And sorry for not making things clear.
Peroli wrote:
Since you are a starter with XML, XML::* modules are pure perl
I am not a starter in XML and not a starter in Perl either, but surely
a newbie in using modules for processing XML.
foreach ($xmlDom->documentElement->findnodes('/root/image')) {
if($_->findvalue('name') =~ /^IMG_/) {
#dosomething
}
}
Thanks again for the clearly-understandable example, that almost
matched what I had in mind, but don't we have to repeat the same loop
for all the elements that need to be validated?
Is there a shorter way to do this, by associating each XML element with
its corresponding validation subroutine.
Doing this thing in SAX would require a new strategy. I think if you
are a newbie start with DOM, because its a lot easy to visualize the
whole problem.

I need to check for IDs, IDREFs and IDREF content also. For that I need
to process IDs before I check IDREFs. Because the order is not
sequential will this work with SAX?
Thanks for all the clarifications.
Sara.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top