Performance comparison between screen scrapers

C

Conrad Chu

Does anyone know how the following screen scrapers perform against one
another?

* ScrAPI
* RubyfulSoup
* HTree
* Hpricot

I'm trying to write up a tool where a person enters in a URL, and I use
an AJAX call to scrape the contents of that URL for title, description,
etc. So speed is really important (I suppose, regular expressions would
be the fastest, but I need something that is tree-based and supports
HTML tidying)

Thanks
Conrad
 
J

Jan Svitok

Does anyone know how the following screen scrapers perform against one
another?

* ScrAPI
* RubyfulSoup
* HTree
* Hpricot

I'm trying to write up a tool where a person enters in a URL, and I use
an AJAX call to scrape the contents of that URL for title, description,
etc. So speed is really important (I suppose, regular expressions would
be the fastest, but I need something that is tree-based and supports
HTML tidying)

Thanks
Conrad

There was a comparision done on this list some time ago. Search for lib names.
 
T

Timothy Goddard

Conrad said:
Does anyone know how the following screen scrapers perform against one
another?

* ScrAPI
* RubyfulSoup
* HTree
* Hpricot

I'm trying to write up a tool where a person enters in a URL, and I use
an AJAX call to scrape the contents of that URL for title, description,
etc. So speed is really important (I suppose, regular expressions would
be the fastest, but I need something that is tree-based and supports
HTML tidying)

Thanks
Conrad

I haven't used them all but Hpricot is fast (the parser is written in C
with Ragel), error tolerant and perfect for this task. Take a look at
its website for a guide on how to use it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top