[ANN] scRUBYt! 0.3.4

P

Peter Szinek

Hey all,

I am pleased to announce that the long-awaited new release of scRUBYt!,
0.3.4 is available for download. A lot of bugs have been fixed and some
cool features scrubbed in, so be sure to check it out!

==========
scrubWHAT?
==========

scRUBYt! is a very easy to learn and use, yet powerful Web scraping
framework based on Hpricot and mechanize (and from the next version, on
FireWatir!). It's purpose is to free you from the drudgery of web page
crawling, looking up HTML tags, attributes, XPaths, form names and other
typical low-level web scraping woes by figuring these out from your
examples copy'n'pasted from the Web page.

=========
Changelog
=========

- [NEW] Script pattern; possibility to evaluate custom function on the
input of the pattern
- [NEW] Constant pattern; Can add constant patterns with the syntax:
pattern 'Hello world', :type => :constant
- [NEW] Text pattern; structure agnostic scraping based on labels and
other textual clues
- [NEW] new output method: to_flat_xml for creating feed-like flat XMLs
instead of hierarchical ones
- [NEW] to_flat_xml with spec delimiters splits up the concatenated hash
results
- [MOD] Change in the semantics of the "div[stuff]" style examples
* divs which contain "stuff" (rather than their whole text is
"stuff") are matched
* generalization is false by default
- [NEW] Possibility to define arbitrary delimiter for to_hash (used when
the result
contains commas)
- [NEW/MOD] Changes in the logging module: (Credit: Tim Fletcher)
* Extract the logging into a class to allow for filtering
* Allow the logger to be set to nil (to disable logging), and have
this as the default.
Logging now has to be explicitly enabled, as follows:

Scrubyt.logger = Scrubyt::Logger.new
* Allow loggers to point to streams other than STDERR.
- [NEW/MOD] Changes in the download pattern:
* possibility to specify an array of files that should be ignored
during the downloading
(e.g. 'nopicture.gif')
* Handling timeout during downloads instead of crashing
* Fixed downloading in case the filename contains no '.'
* Fixed downloading for more URL types that were not working before
- [NEW] New option: example_type. Possibility to force example type
(instead of leaving it to scRUBYt! to guess)
- [NEW] Entirely new test suite using rcov; Tests are added continously;
The goal is to achieve full coverage
- [FIX] Fixed the infamous regexp bug which caused the pricegrabber
scenario (among other things) to fail
- [FIX] Do not evaluate the detail pattern twice
- [FIX] Fixed dependencies (namely parse_tree_reloaded) and correct versions

=========
Read more
=========

Some additional explanation about the new release can be found here:
http://scrubyt.org/a-hot-new-release-034-is-out-whats-new

============
In the works
============

Paul Nikitochkin created jscRUBYt!, which should solve the win32
problems by using the J-versions of the dependencies. I have been very
swamped recently, so didn't have too much time to look into his code,
but I am sure this will be very helpful to a lot of you so it's on the
short term TODO list.

Glenn Gillen has almost finished firescRUBYt! - scRUBYt! on FireWatir,
which is using FireWatir as the agent (rather than mechanize) to
navigate and extract data from the web page. I think this is the coolest
addition in scRUBYt!'s history ever, since it enables scraping of pages
containing AJAX/Javascript and/or different tricks which were not
possible to work around with mechanize, and parsing pages with ease
which caused Hpricot to choke and gag...

=========================
Would like to contribute?
=========================

* If you are a coder and would like to be the part of the development
team, contact us at scrubyt['maps-on'.reverse]@scrubyt.org
* If you'd like to contribute to the documentation/how-tos/tutorials,
check out the wiki at http://wiki.scrubyt.org.
* If you found a bug, have suggestions or feature requests, please use
scRUBYt!'s lighthouse tracker at http://scrubyt.lighthouseapp.com
* If you'd like to discuss or propose features, get some help or would
like to check out and learn from the problems of others, visit the forum
at http://agora.scrubyt.org
* If neither of the above, but you still would like to tell us
something, bring us champaigne/chocolates, poke Glenn to finish
FireWatir faster or whatever else, contact us at
scrubyt['maps-on'.reverse]@scrubyt.org

H4ppy scrubbing,
Peter
__
http://www.rubyrailways.com
http://scrubyt.org
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top