scRUBYt! 0.3.1 released

P

Peter Szinek

Hello all,

scRUBYt! version 0.3.1 has been released with a plenty of new features
and bugfixes based on your feedback. Enjoy!

============
What's this?
============

scRUBYt! is a very easy to learn and use, yet powerful Web scraping
framework based on Hpricot and mechanize. It's purpose is to free you
from the drudgery of web page crawling, looking up HTML tags,
attributes, XPaths, form names and other typical low-level web scraping
woes by figuring these out from your examples copy'n'pasted from the Web
page.

===========
What's new?
===========

[NEW] complete rewrite of the output system, creating
a solid foundation for more robust output functions
(credit: Neelance)
[NEW] logging - no annoying puts messages anymore!
(credit: Tim Fletcher)
[NEW] can index an example - e.g.
link 'more[5]'
semantics: give me the 6th element with the text 'link'
[NEW] can use XPath checking an attribute value, like
"//div[@id='content']"
[NEW] default values for missing elements (first version was done in
0.2.8 but it did not work for all cases)
[NEW] possibility to click button with it's text (instead of it's index)
(credit: Nick Merwin)
[NEW] clicking radio buttons
[NEW] can click on image buttons (by specifying the name of the button)
[NEW] possibility to extract an URL with one step, like so:
link 'The Difference/@href'
i.e. give me the href attribute of the element matched by the
example 'The Difference'
[NEW] new way to match an element of the page:
div 'div[The Difference]'
means 'return the div which contains the string "The Difference"'.
This is useful if the XPath of the element is non-constant across
the same site (e.g.sometimes a banner or add is added, sometimes
not etc.)
[NEW] Clicking image maps; At the moment this is achieved by specifying
an index, like
click_image_map 3
which means click the 4th link in the image map
[FIX] Replacing \240 ( ) with space in the preprocessing phase
automatically
[FIX] Fixed: correctly downloading image if the src
attribute had a leading space, as in
<img src=' /files/downloads/images/image.jpg'/>
[FIX] Other misc fixes - a ton of them!

========
Comments
========

The win32 version is just being built as I am writing this, so it will
be available soon.

Please keep the feedback coming - bug reports, questions, suggestions
are warmly welcome at the scRUBYt! forum - http://agora.scrubyt.org.

Cheers,
The scRUBYt! team - http://scrubyt.org
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top