optimize XML parsing

S

SynapseTesting

Hi

I am looking to make parallel request and proceess XML data returned
from multiple server. Am using the following modules as under

require LWP::UserAgent;
require LWP::parallel::UserAgent;
require HTTP::Headers;

The XML returned from different servers is to be parsed and merged. I
am using XML::Simple to parse the data returned from diffferent
servers.

I am not sure of the best way to pull relavent data from the XML. As
of now I have used Data::Dumper to place the XML into a hash/array
tree. This however takes about 18+ sec. We need to reduce this time.

You can look at the script as under

Please suggest.


####
1. #### Header objects created for request to be sent to each
supplier
####


#!/usr/bin/perl
use DBI;
use CGI;
...........................................
use Data::Dumper;
use POSIX;

$xml = new XML::Simple;

require LWP::UserAgent;
require LWP::parallel::UserAgent;
require HTTP::Headers;
use HTTP::Request;
use Date::Calc qw(Month_to_Text Add_Delta_Days);
require "function_p.pl";

$h1 = HTTP::Headers->new;
$h1->header('Content-Type' => 'text/xml'); # Header
$h = HTTP::Headers->new;
$h->header('Content-Type' => 'text/html'); # set contents for header


####
2. #### Other info required to make parallel request [url, parameters,
etc.]
####

my $pua = LWP::parallel::UserAgent->new();
$pua->in_order (1); # handle requests in order of registration
$pua->duplicates(0); # ignore duplicates
$pua->timeout (500); # in seconds
$pua->redirect (1); # follow redirects

my $reqs = [
HTTP::Request->new( $method, $kuoniURL, $h1, $kuoni_XMLString),
HTTP::Request->new( $method, $etnURL, $h, $etnXMLRequest),
..............
..............
];

foreach $req (@$reqs)
{
if ($res=$pua->register ($req))
{
print STDERR $res->error_as_HTML;
}
}


####
3. #### Fire requests for each supplier
####

Response time : ~ 25 sec

our $travcoXMLResponse;
our $etnXMLResponse;

foreach (keys %$entries)
{
$res = $entries->{$_}->response;
$resultSet=$res->content;
if(index(lc($resultSet), "findproducts2response")>0)
{
$eXMLResponse=$resultSet;
require "e_result.pl";
}
.....................................

}
}



e_result.pl


####
4. #### Data dumper used to import XML into Array/Hash for parsing,
filtering through results
####


$dumpTime1=time();
print "<br>Start Dump time:".$dumpTime1;
$data = $xml->XMLin($travcoXMLResponse);
$dump = Dumper($data);
$dumpTime2=time();
print "<br>End Dump time:".$dumpTime2;
print "<br>Difference: ".($dumpTime2-$dumpTime1);
$trav1=time();
print "<br>Travco Start parsing Time: " .$trav1;


####
5. #### Data imported is traversed and filtering process is
implemented - stored in array/hash
####



for($i=0; $i<$count; $i++)
{
if($data->{DATA}->{HOTEL_DATA}[$i]->{STATUS} =~ /^Available$/)
{
$roomCount=0;
$totalAmount =0;
$TravcoFinalAmount=0;
$displayIndex=0;

$sourceCurrency = "$data->{DATA}->{HOTEL_DATA}[$i]-
{CURRENCY_CODE}";

if (uc($TargetCurrency) ne uc($sourceCurrency)) {
print "<br>sourceCurrency: $sourceCurrency";

........................................

}
}



.....................................



$arrHotelDetails[$hotelIndex]{'HotelCode'}=$productID;
$arrHotelDetails[$hotelIndex]{'HotelImage'}=$hotelImg;
$arrHotelDetails[$hotelIndex]{'BreakFast'}=$breakfast;
.....................................
$arrHotelDetails[$hotelIndex]{'HotelMap'}='';
$arrHotelDetails[$hotelIndex]{'HotelPhone'}='';
$arrHotelDetails[$hotelIndex]{'SpecialOffer'}=0;
}
} # end of room matching
} # end if available
} # end for total hotels


####
6. #### Sorting and removing duplicates
####



......................................................
 
X

xhoster

Hi

I am looking to make parallel request and proceess XML data returned
from multiple server. Am using the following modules as under

require LWP::UserAgent;
require LWP::parallel::UserAgent;
require HTTP::Headers;

The XML returned from different servers is to be parsed and merged. I
am using XML::Simple to parse the data returned from diffferent
servers.

I am not sure of the best way to pull relavent data from the XML. As
of now I have used Data::Dumper to place the XML into a hash/array
tree.

That doesn't make any sense. XML::Simple already places the data into
a hash/array tree. Data::Dumper takes it *out* of that tree and turns
it back into a string.


This however takes about 18+ sec. We need to reduce this time.

Profile your code and see where it spending its time. See Devel::DProf
Let us know what the results are. Not only will this likely tell you where
it is spending its time, it will likely also be an easy way to figure out
which parser XML::Simple is using behind the scenes.


####
4. #### Data dumper used to import XML into Array/Hash for parsing,
filtering through results
####

$dumpTime1=time();
print "<br>Start Dump time:".$dumpTime1;
$data = $xml->XMLin($travcoXMLResponse);
$dump = Dumper($data);
$dumpTime2=time();

How much time is spent parsing, versus (the apparently useless, as $dump
is never used again) dumping?


Xho
 
J

J. Gleixner

Hi

I am looking to make parallel request and proceess XML data returned
from multiple server. Am using the following modules as under

require LWP::UserAgent;
require LWP::parallel::UserAgent;
require HTTP::Headers;

use strict;
use warnings;

use LWP::UserAgent;
use LWP::parallel::UserAgent;
use HTTP::Headers;
The XML returned from different servers is to be parsed and merged. I
am using XML::Simple to parse the data returned from diffferent
servers.

I am not sure of the best way to pull relavent data from the XML. As
of now I have used Data::Dumper to place the XML into a hash/array
tree. This however takes about 18+ sec. We need to reduce this time.

No need to use Data::Dumper, XML::Simple does it for you.

There are many modules to parse XML. check the FAQ mentioned in the
XML::Simple documentation.

http://perl-xml.source-forge.net/faq/

Post a short and complete example. Showing little pieces of code
doesn't make it clear what problem you're having or what could
be the problem.

In general, create a subroutine that gets the XML, parses it,
and stores whatever you need from it in some global data
structure. Once that works, then add in parallel processes,
and when it's finished, you should have a data structure
containing your data, which you can use to create whatever
is needed Possibly, you could store the data in a database,
if that's easier.

See also: perldoc perldsc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top