Need advice for a script to scrap my Verizon account

S

Steve

I am new to Ruby. I would like to write a script that will login to
my Verizon account and grab my minute details and calculate my
remaining unused minutes for the month. This would involve opening an
HTTPS connection, then scraping and calculating the results. I have
found one Perl and one Java/Firefox solution that does the same, but I
thought it would be a good project to learn more about Ruby. I would
rather not use external shells (to wget, for example) and rely on the
internal Ruby libraries instead. I'm looking for advice on what
libraries to use to get me started down the right path. Thank you.
 
J

Justin Bailey

I am new to Ruby. I would like to write a script that will login to
my Verizon account and grab my minute details and calculate my
remaining unused minutes for the month. This would involve opening an
HTTPS connection, then scraping and calculating the results. I have

I haven't used it myself, but if you might want to look at
WWW::Mechanize, which handles things like cookies for you. I found a
good looking blog post about it here:

http://neurogami.com/cafe-fetcher/

Another excellent library is open-uri, which comes standard with Ruby.
If Verizon does not have complex cookie requirements, you might be
able to handle the details yourself and use open-uri.

Good luck!

Justin
 
P

Philip Hallstrom

I am new to Ruby. I would like to write a script that will login to
my Verizon account and grab my minute details and calculate my
remaining unused minutes for the month. This would involve opening an
HTTPS connection, then scraping and calculating the results. I have
found one Perl and one Java/Firefox solution that does the same, but I
thought it would be a good project to learn more about Ruby. I would
rather not use external shells (to wget, for example) and rely on the
internal Ruby libraries instead. I'm looking for advice on what
libraries to use to get me started down the right path. Thank you.

These will probably help...

for fetching the page...

http://www.ruby-doc.org/core/classes/OpenURI.html
http://curb.rubyforge.org/

for parsing it:

http://code.whytheluckystiff.net/hpricot/
 
B

Bil Kleb

Steve said:
I am new to Ruby. I would like to write a script that will login to
my Verizon account and grab my minute details and calculate my
remaining unused minutes for the month.

I just did something similar for our electronic
time sheet -- I had to negotiate a frameset, viz,

URL = 'https://site.somewhere.gov/'
require 'rubygems'; require 'mechanize'
agent = WWW::Mechanize.new
form = agent.get("#{URL}").forms.name('loginForm').first
form.userid = ARGV[0] # 1st commandline argument
form.password = ARGV[1] # 2nd " "
frame = agent.submit(form).frames.name('Main').href
form = agent.get("#{URL}/#{frame}").forms.name('timeCard').first
form.ENTRY_24378_1_3 = '2.25' # set hours for 1st week, 3rd day
agent.submit(form)

The next step was to use Hpricot to scrape the relations
of table inputs to time codes, but I haven't gotten there
yet...

Regards,
 
M

Marcin Raczkowski

I am new to Ruby. I would like to write a script that will login to
my Verizon account and grab my minute details and calculate my
remaining unused minutes for the month. This would involve opening an
HTTPS connection, then scraping and calculating the results. I have
found one Perl and one Java/Firefox solution that does the same, but I
thought it would be a good project to learn more about Ruby. I would
rather not use external shells (to wget, for example) and rely on the
internal Ruby libraries instead. I'm looking for advice on what
libraries to use to get me started down the right path. Thank you.

try http://www.scrubyt.org - great ruby toolkit for webscraping
and Net:Mechanize
 
B

brenton.leanhardt

I am new to Ruby. I would like to write a script that will login to
my Verizon account and grab my minute details and calculate my
remaining unused minutes for the month. This would involve opening an
HTTPS connection, then scraping and calculating the results. I have
found one Perl and one Java/Firefox solution that does the same, but I
thought it would be a good project to learn more about Ruby. I would
rather not use external shells (to wget, for example) and rely on the
internal Ruby libraries instead. I'm looking for advice on what
libraries to use to get me started down the right path. Thank you.

It's not fancy, but I did pretty much the same thing you are talking
about with curb (a wrapper for lib curl) and hpricot. You can see a
quick solution I can up with here: http://exawkuser.blogspot.com/2007/02/stickittodamanitis.html.
Verizon has probably changed their site in a way that now breaks my
script but you should be able to see the basics of web scraping in
ruby.

-Brenton
 
P

Peter Szinek

try http://www.scrubyt.org - great ruby toolkit for webscraping

Yeah, scRUBYt! is really great and all, but currently it has one major
shortcoming from the viewpoint of verizon scraping: it is built on
WWW::Mechanize, which can not handle javascript/AJAX and it seems the
verizon page has plenty of those.

FireWatir is just being integrated into scRUBYt! and scraping such a
site won't be a problem once it will be finished - however, until then I
would suggest to use either Watir or FireWatir to handle the AJAXy stuff.

Cheers,
Peter
__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby
 
M

Marcin Raczkowski

Yeah, scRUBYt! is really great and all, but currently it has one major
shortcoming from the viewpoint of verizon scraping: it is built on
WWW::Mechanize, which can not handle javascript/AJAX and it seems the
verizon page has plenty of those.

FireWatir is just being integrated into scRUBYt! and scraping such a
site won't be a problem once it will be finished - however, until then I
would suggest to use either Watir or FireWatir to handle the AJAXy stuff.

Cheers,
Peter
__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby

i had problem with site with javascripts and i must say most of it is easly
fixable - using firewatir is overkill in my opinion
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top