Introducing Xaggly, a C-based XML Parser for Ruby

T

Tony Perrie

I have written a C-based XML parser as a ruby plugin. I managed to
benchmark it against REXML and Hpricot, and it appears to run quite
speedily (http://involution.com/images/xmlshootout.png).

The source code is here:
http://involution.com/xaggly.tar.gz

Unfortunately, my XPath support is very primitive right now. The
package only supports fully qualified queries, and attribute searches
aren't working yet. So, you can search /html/body/* to give you all
of the tags open under body, but things like //p and //p[class=foo] do
not work yet. Attributes are parsed and can be accessed from Ruby
though.

I was trying to jigger this plugin to a gem, but it appears that mkmf
doesn't detect the presence of Flex and Bison files automatically. Is
there a supported or standard for doing such a thing?

Regards,

Tony Perrie
http://involution.com
 
P

pat eyler

Hi Tony,

I have written a C-based XML parser as a ruby plugin. I managed to
benchmark it against REXML and Hpricot, and it appears to run quite
speedily (http://involution.com/images/xmlshootout.png).

Sounds ambitious. Any reason you didn't go with libxml? Have you
tried benchmarking it against libxml?

The source code is here:
http://involution.com/xaggly.tar.gz

Unfortunately, my XPath support is very primitive right now. The
package only supports fully qualified queries, and attribute searches
aren't working yet. So, you can search /html/body/* to give you all
of the tags open under body, but things like //p and //p[class=foo] do
not work yet. Attributes are parsed and can be accessed from Ruby
though.

I was trying to jigger this plugin to a gem, but it appears that mkmf
doesn't detect the presence of Flex and Bison files automatically. Is
there a supported or standard for doing such a thing?

Regards,

Tony Perrie
http://involution.com
 
V

Vincent Fourmond

Tony said:
I was trying to jigger this plugin to a gem, but it appears that mkmf
doesn't detect the presence of Flex and Bison files automatically. Is
there a supported or standard for doing such a thing?

From my own limited experience, it is better to ship both the flex and
bison sources files and the C files produced. My belief is that most of
the time, 'it will just work'. But I have limited experience with bison,
flex and cross-platform stuff (apart from the fact that everytime I had
a friend try one of my things on a different platform, I had to give him
the C files because flex wasn't understanding the same set of options...).

Vince
 
T

Tony Perrie

I tried using libxml, but a lot of RSS feeds aren't really conformant.
So, I had trouble reading in various feeds from different sites using
libxml. While REXML and Hpricot worked, I found them to be fairly
slow to parse large files. That was the impetus for me writing this
library.

Tony
http://involution.com

Hi Tony,

I have written a C-based XML parser as a ruby plugin. I managed to
benchmark it against REXML and Hpricot, and it appears to run quite
speedily (http://involution.com/images/xmlshootout.png).

Sounds ambitious. Any reason you didn't go with libxml? Have you
tried benchmarking it against libxml?

The source code is here:
http://involution.com/xaggly.tar.gz

Unfortunately, my XPath support is very primitive right now. The
package only supports fully qualified queries, and attribute searches
aren't working yet. So, you can search /html/body/* to give you all
of the tags open under body, but things like //p and //p[class=foo] do
not work yet. Attributes are parsed and can be accessed from Ruby
though.

I was trying to jigger this plugin to a gem, but it appears that mkmf
doesn't detect the presence of Flex and Bison files automatically. Is
there a supported or standard for doing such a thing?

Regards,

Tony Perrie
http://involution.com
 
T

Tony Perrie

Hmmm, I still wonder if it's possible to coerce mkmf to handle .l and
y files in a general way so I can turn this into a gem.

Tony
 
V

Vincent Fourmond

Tony said:
Hmmm, I still wonder if it's possible to coerce mkmf to handle .l and
.y files in a general way so I can turn this into a gem.

In my own experience, mkmf is not really flexible. I tried to make a
more flexible version (see mkmf2.rubyforge.org, quite outdated compared
to cvs repository), but that isn't really satisfying yet.

And, you'll be bitten by options varying from one computer to
another... Don't expect also that everyone will have flex and bison
installed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,586
Members
45,097
Latest member
RayE496148

Latest Threads

Top