REXML Optimization

B

Bucco

So, I'm reading this document on XML and using XML as a database. Of
course the author uses this cryptic perl script to parse the xml file:

#!/usr/bin/perl
use XML::LibXML;
my $parser = new XML::LibXML;
my $doc = $parser->parse_file( shift @ARGV );
my $balance = $doc->findvalue( '/checkbook/@balance-start' );
foreach my $record ( $doc->findnodes( '//debit' )) {
$balance -= $record->findvalue( 'amount' );
}
foreach my $record ( $doc->findnodes( '//deposit' )) {
$balance += $record->findvalue( 'amount' );
}
print "Current balance: $balance\n";

So, since I was trying to figure out how to use xml as a database and
how to use REXML I gave the script a whack and tried to write a ruby
script to do the same thing. Below is a sample of the xml file and the
ruby script:

<?xml version="1.0"?>
<checkbook balanceStart="2460.62">
<title>expenses: january 2002</title>

<debit category="clothes">
<amount>31.19</amount>
<date><year>2002</year><month>1</month><day>3</day></date>
<payto>Walking Store</payto>
<description>shoes</description>
</debit>

<deposit category="salary">
<amount>1549.58</amount>
<date><year>2002</year><month>1</month><day>7</day></date>
<payor>Bob's Bolts</payor>
</deposit>
</checkbook>


#!/usr/bin/ruby -w
require 'rexml/document'

# Read in XML doc
doc = REXML::Document.new(File.open('cb.xml'))
# Future version need to have entry from command line

# Find the balance and assign to float variabl 'balance'
balance = doc.root.attributes['balanceStart'].to_f

# Calculate debits and balance
doc.elements.each("//debit/amount") {|o| balance -= o.text.to_f}
# Calculate deposits and balance
doc.elements.each("//deposit/amount") {|i| balance += i.text.to_f}

#Display final balance:
puts balance

Of course I was able to complete the same task as teh perl script in
ruby with less code. (Not to mention easier to read code)

Just to help me complete the learning process, I wish to pose the
question to the group: Is there a better way to do this, and is there
more optimization I can do to my code?

Thanks:)

SA
 
J

James Britt

Bucco said:
Just to help me complete the learning process, I wish to pose the
question to the group: Is there a better way to do this, and is there
more optimization I can do to my code?


The DOM is not a database, and it shows.

XPath queries can get real slow as the document size grows.

Suggestion: Read and parse the XML once, and store it internally in a
format better suited for queries. XML is great for all sorts of things,
particularly for inter-app data exchange, but once the data is inside
your system that value drops. So, if the code is mainly concerned with
executing queries and such, slurp in the XML and stash it in some
optimized internal structure. Maybe use Madeleine for in-memory storage
and queries.

If need be, add code to serialize the data back to XML for persistence
when the app is shut down.

Try to compute the start-up cost of the parsing and restructuring and
indexing the data right up front, versus the cost of running XPath calls
over and over. See if it gains you anything.



James

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
 
R

Robert Klemme

Bucco said:
So, I'm reading this document on XML and using XML as a database. Of
course the author uses this cryptic perl script to parse the xml file:

#!/usr/bin/perl
use XML::LibXML;
my $parser = new XML::LibXML;
my $doc = $parser->parse_file( shift @ARGV );
my $balance = $doc->findvalue( '/checkbook/@balance-start' );
foreach my $record ( $doc->findnodes( '//debit' )) {
$balance -= $record->findvalue( 'amount' );
}
foreach my $record ( $doc->findnodes( '//deposit' )) {
$balance += $record->findvalue( 'amount' );
}
print "Current balance: $balance\n";

So, since I was trying to figure out how to use xml as a database and
how to use REXML I gave the script a whack and tried to write a ruby
script to do the same thing. Below is a sample of the xml file and
the ruby script:

<?xml version="1.0"?>
<checkbook balanceStart="2460.62">
<title>expenses: january 2002</title>

<debit category="clothes">
<amount>31.19</amount>
<date><year>2002</year><month>1</month><day>3</day></date>
<payto>Walking Store</payto>
<description>shoes</description>
</debit>

<deposit category="salary">
<amount>1549.58</amount>
<date><year>2002</year><month>1</month><day>7</day></date>
<payor>Bob's Bolts</payor>
</deposit>
</checkbook>


#!/usr/bin/ruby -w
require 'rexml/document'

# Read in XML doc
doc = REXML::Document.new(File.open('cb.xml'))
# Future version need to have entry from command line

# Find the balance and assign to float variabl 'balance'
balance = doc.root.attributes['balanceStart'].to_f

# Calculate debits and balance
doc.elements.each("//debit/amount") {|o| balance -= o.text.to_f}
# Calculate deposits and balance
doc.elements.each("//deposit/amount") {|i| balance += i.text.to_f}

#Display final balance:
puts balance

Of course I was able to complete the same task as teh perl script in
ruby with less code. (Not to mention easier to read code)

Just to help me complete the learning process, I wish to pose the
question to the group: Is there a better way to do this, and is there
more optimization I can do to my code?

You could get rid of one traversal by iterating all "amounts" and do the
calculation based on the parent element's type.

Kind regards

robert
 
R

Robert Klemme

Robert said:
Bucco said:
So, I'm reading this document on XML and using XML as a database. Of
course the author uses this cryptic perl script to parse the xml
file:

#!/usr/bin/perl
use XML::LibXML;
my $parser = new XML::LibXML;
my $doc = $parser->parse_file( shift @ARGV );
my $balance = $doc->findvalue( '/checkbook/@balance-start' );
foreach my $record ( $doc->findnodes( '//debit' )) {
$balance -= $record->findvalue( 'amount' );
}
foreach my $record ( $doc->findnodes( '//deposit' )) {
$balance += $record->findvalue( 'amount' );
}
print "Current balance: $balance\n";

So, since I was trying to figure out how to use xml as a database and
how to use REXML I gave the script a whack and tried to write a ruby
script to do the same thing. Below is a sample of the xml file and
the ruby script:

<?xml version="1.0"?>
<checkbook balanceStart="2460.62">
<title>expenses: january 2002</title>

<debit category="clothes">
<amount>31.19</amount>
<date><year>2002</year><month>1</month><day>3</day></date>
<payto>Walking Store</payto>
<description>shoes</description>
</debit>

<deposit category="salary">
<amount>1549.58</amount>
<date><year>2002</year><month>1</month><day>7</day></date>
<payor>Bob's Bolts</payor>
</deposit>
</checkbook>


#!/usr/bin/ruby -w
require 'rexml/document'

# Read in XML doc
doc = REXML::Document.new(File.open('cb.xml'))
# Future version need to have entry from command line

# Find the balance and assign to float variabl 'balance'
balance = doc.root.attributes['balanceStart'].to_f

# Calculate debits and balance
doc.elements.each("//debit/amount") {|o| balance -= o.text.to_f}
# Calculate deposits and balance
doc.elements.each("//deposit/amount") {|i| balance += i.text.to_f}

#Display final balance:
puts balance

Of course I was able to complete the same task as teh perl script in
ruby with less code. (Not to mention easier to read code)

Just to help me complete the learning process, I wish to pose the
question to the group: Is there a better way to do this, and is there
more optimization I can do to my code?

You could get rid of one traversal by iterating all "amounts" and do
the calculation based on the parent element's type.

If you want to speed up things even more you can do stream processing with
REXML's SAX like API:
http://www.germane-software.com/software/rexml/docs/tutorial.html#id2248482

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,123
Latest member
Layne6498
Top