XML parser

C

Cédric H.

Hi guys,

I'm looking for some information about the xml libraries available in
Ruby.

I've read a few blog post about the pro's and con's of REXML and
Libxml but I still have some questions :

- as I understand it REXML is part of ruby standard library and so is
included in ruby distribution ?

- libxml is a wrapper for gnome libxml and must be installed and
compiled with gem ?

- is libxml really a full validating and compliant parser ?

- how do you use xslt in Ruby ? do you use http://raa.ruby-lang.org/project/ruby-xslt/
or http://rubyforge.org/projects/libxsl/ (if I'm right the second one
is part of libxml ? )

As you see I'm lost and I would really appreciate your help or some
comprehensive post about xml processing in ruby .

Thanks !

Cedric
 
D

Dejan Dimic

Hi guys,

I'm looking for some information about the xml libraries available in
Ruby.

I've read a few blog post about the pro's and con's of REXML and
Libxml but I still have some questions :

- as I understand it REXML is part of ruby standard library and so is
included in ruby distribution ?

- libxml is a wrapper for gnome libxml and must be installed and
compiled with gem ?

- is libxml really a full validating and compliant parser ?

- how do you use xslt in Ruby ? do you usehttp://raa.ruby-lang.org/project/ruby-xslt/
orhttp://rubyforge.org/projects/libxsl/(if I'm right the second one
is part of libxml ? )

As you see I'm lost and I would really appreciate your help or some
comprehensive post about xml processing in ruby .

Thanks !

Cedric

Parsing, manipulating XML is such wide subject. There is a more then
one bookshelf full with books about it. Doing it with Ruby is not an
exception.

Beside these two libraries mentioned there is also an Hpricot (http://
code.whytheluckystiff.net/hpricot/) and you should try it to.

When dealing with XML you should consider the following questions:
Who and on what OS the code will be running?
How big the XML document is?
Is the speed a decisive parameter?
What’s the magnitude of manipulation required?

Answers to these questions could help you pick the optimum library but
you should be familiar with all of them.

Do a research, play a little and pick the more appealing to you.
 
P

Phlip

Cédric H. said:
I'm looking for some information about the xml libraries available in
Ruby.

I've read a few blog post about the pro's and con's of REXML and
Libxml but I still have some questions :

- as I understand it REXML is part of ruby standard library and so is
included in ruby distribution ?

Yes. It's also widely acknowledged as very slow. The RE stands for Regular
Expressions, which are only fast when used carefully. Basing an entire parser on
them tends to abuse them.

This blog show how to spot-check compliance issues in the three leading Ruby XML
parsers:

http://www.oreillynet.com/onlamp/blog/2007/08/assert_hpricot_1.html
- libxml is a wrapper for gnome libxml and must be installed and
compiled with gem ?

Ordinarily, that process would be mostly harmless. You may already have
libxml2-dev, if you have a GNU platform such as Ubuntu or CygWin.

However, the current libxml-ruby has a nasty bug. First, it sprays lots of

No definition for ruby_xml_parser_context_options_get

into the console. Then it refuses to install the libxml_so.so file that it just
created. I don't know this bug's status, but because my assert_xpath works best
with libxml, I must overcome it whenever we build a new workstation at work!
Sometimes I must manually copy its executables into Ruby's paths...

(Our production code does not use libxml - only the test code.)

I just tried to install while writing this post, and 0.8.1 might have worked on
Ubuntu.
- is libxml really a full validating and compliant parser ?

I suspect it's the reference implementation for XML. It certainly takes every
DOCTYPE and schema very seriously!

Better, it actually forgives some errors and keeps working, unlike REXML
- how do you use xslt in Ruby ? do you use http://raa.ruby-lang.org/project/ruby-xslt/
or http://rubyforge.org/projects/libxsl/ (if I'm right the second one
is part of libxml ? )

As you see I'm lost and I would really appreciate your help or some
comprehensive post about xml processing in ruby .

Sorry! I was knocking 'em down, and you lost me at XSLT.

In a pinch, I would pipe text thru xsltproc, and not worry about deep language
integration. XSLT is nothing but a big filter, so I thought you could use it
without making an object out of it.
 
P

Phillip Oertel

hi,

you may enjoy reading this!
=
http://www.rubyinside.com/ruby-xml-crisis-over-libxml-0-8-0-released-955.h=
tml
(posted two days ago)

kind regards,
phillip

---

Am 20.07.2008 um 02:34 schrieb Phlip:
Yes. It's also widely acknowledged as very slow. The RE stands for =20
Regular Expressions, which are only fast when used carefully. Basing =20=
an entire parser on them tends to abuse them.

This blog show how to spot-check compliance issues in the three =20
leading Ruby XML parsers:

http://www.oreillynet.com/onlamp/blog/2007/08/assert_hpricot_1.html


Ordinarily, that process would be mostly harmless. You may already =20
have libxml2-dev, if you have a GNU platform such as Ubuntu or CygWin.

However, the current libxml-ruby has a nasty bug. First, it sprays =20
lots of

No definition for ruby_xml_parser_context_options_get

into the console. Then it refuses to install the libxml_so.so file =20
that it just created. I don't know this bug's status, but because my =20=
assert_xpath works best with libxml, I must overcome it whenever we =20=
 
P

Phlip

Tx - that's why my install today worked, right?
FYI, Still some final fine-tuning going on, so don't expect everything
to be all roses just quite yet. But we are close, and might actually
get to to a 1.0.0 release soon.

And to use it with assert_xpath you just gotta put invoke_libxml in your setup...
 
D

Douglas A. Seifert

Phlip said:
But I don't know the Ruby SAX solution!
REXML supports a "SAX Like" stream listening interface as well as DOM.
See the REXML tutorial at
http://www.germane-software.com/software/rexml/docs/tutorial.html,
scroll down until you see the section headed with "Stream Parsing". The
upshot is you write a class that has callback methods (see
http://www.germane-software.com/software/rexml/doc/classes/REXML/StreamListener.html
for a complete list of callbacks) and pass an instance of the class to
REXML's parse_stream method. REXML also supports a SAX2 API, but I have
never used it. Look for the heading "SAX2 Stream Parsing" in the
tutorial link above.

Recently converted a poor DOM based parsing solution to a stream
listener based solution (not SAX2) and realized an order of magnitude
improvement in performance.

Saludos,

-Doug
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,072
Latest member
trafficcone

Latest Threads

Top