Logging to a page and scrapping values

V

Vikash Kumar

I am running a test case, in which I have to first login to a web page
then I have to go to some particular page in the same web site, then
extract some data from that page. The data is in the table.

Such as the script first call http://localhost/login.asp, then we enter
user name and password, then we click on login button. By this we enter
to the web page, then we go to http://localhost/achievements.asp, from
this page we want to extract the data residing in html table. What
should be the approach to do this.

I can use the below code to extract the data if I have not to login to
the web site.

require 'net/http'

# read the page data

http = Net::HTTP.new('kvcrpf.org, 80)
resp, page = http.get('/achievements.htm', nil )

# BEGIN processing HTML

def parse_html(data,tag)
return data.scan(%r{<#{tag}\s*.*?>(.*?)</#{tag}>}im).flatten
end

output = []
table_data = parse_html(page,"table")
table_data.each do |table|
out_row = []
row_data = parse_html(table,"tr")
row_data.each do |row|
cell_data = parse_html(row,"td")
cell_data.each do |cell|
cell.gsub!(%r{<.*?>},"")
end
out_row << cell_data
end
output << out_row
end

# END processing HTML

# examine the result

def parse_nested_array(array,tab = 0)
n = 0
array.each do |item|
if(item.size > 0)
puts "#{"\t" * tab}[#{n}] {"
if(item.class == Array)
parse_nested_array(item,tab+1)
else
puts "#{"\t" * (tab+1)}#{item}"
end
puts "#{"\t" * tab}}"
end
n += 1
end
end

parse_nested_array(output[2][4])

aa, ab, ac, ad = output[2][4]

puts"hello"
puts aa + "\t" + ab + "\t" + ac + "\t" + ad
 
P

Peter Szinek

Vikash said:
I am running a test case, in which I have to first login to a web page
then I have to go to some particular page in the same web site, then
extract some data from that page. The data is in the table.

Such as the script first call http://localhost/login.asp, then we enter
user name and password, then we click on login button. By this we enter
to the web page, then we go to http://localhost/achievements.asp, from
this page we want to extract the data residing in html table. What
should be the approach to do this.

I can use the below code to extract the data if I have not to login to
the web site.

In 2 days I am going to release a web extraction toolkit which will do
exactly what you want (and more of course, but this is a basic use
case)... It's based on Mechanize (which is used for login like stuff)
and HPricot for extracting the relevant stuff. The scenario you
described is an absolutely typical one, so you could try it with my stuff...

I will post here an announcement after the release.

Cheers,
Peter

__
http://www.rubyrailways.com
 
V

Vikash Kumar

require 'net/http'
# read the page data

http = Net::HTTP.new('kvcrpf.org, 80)
resp, page = http.get('/achievements.htm', nil )

# BEGIN processing HTML

The code given above can be used to extract values from a web page, I we
don't have to login to a web page, we know in advance which URL to look
for to get data from it, but the problem is to first login to a page,
then go to some desired location to scrap values from it.

Please help me out in doing this.
Thanks in advance
Vikash
 
L

lrlebron

If you are running on a windows platform that you should look at watir.
It will let you control Internet Explorer and log in to a site.

Luis
 
R

Rodrigo Bermejo

Vikash said:
The code given above can be used to extract values from a web page, I we
don't have to login to a web page, we know in advance which URL to look
for to get data from it, but the problem is to first login to a page,
then go to some desired location to scrap values from it.

Please help me out in doing this.
Thanks in advance
Vikash

There are a few ways of doing this <I am on hurry now to elaborate>, if
your are on windows watir[1] can help you out doing the login stuff, may
the tricky part is how to get the data, but I am sure there is a method
which allows you to extract the hole HTML


http://wtr.rubyforge.org/

$rm rm
rb
 
V

Vikash Kumar

your are on windows watir[1] can help you out doing the login stuff, may
the tricky part is how to get the data, but I am sure there is a method
which allows you to extract the hole HTML


http://wtr.rubyforge.org/

$rm rm
.rb

I am working on windows platform, I tried a lot to first log in to a web
page then go to some desired page to get some data from it, but unable
to do it.

Anyone's help will be appreciated.
Thanks
Vikash
 
C

Charles Lowe

Vikash said:
There are a few ways of doing this <I am on hurry now to elaborate>, if
your are on windows watir[1] can help you out doing the login stuff, may
the tricky part is how to get the data, but I am sure there is a method
which allows you to extract the hole HTML


http://wtr.rubyforge.org/

$rm rm
.rb

I am working on windows platform, I tried a lot to first log in to a web
page then go to some desired page to get some data from it, but unable
to do it.

Anyone's help will be appreciated.
Thanks
Vikash

Try a combination of WWW::Mechanize (gem install mechanize), and Hpricot
(gem install hpricot).
 
V

Vikash Kumar

Try a combination of WWW::Mechanize (gem install mechanize), and Hpricot
(gem install hpricot).

I am new to Mechanize and hpricot, though I have installed it, but I am
still facing the problem in scrapping values by first log in to the web
site then going to some other page to extract data from it.

Please help me.
Vikash
 
A

alex_f_il

You can also try SWExplorerAutomation SWEA from http://webiussoft.com.
SWEA is .Net API, but can be used from Ruby using RubyCLR

example:

require 'rubyclr'
RubyClr::reference 'System'
RubyClr::reference 'SWExplorerAutomationClient'
include SWExplorerAutomation::Client
include SWExplorerAutomation::Client::Controls
include SWExplorerAutomation::Client::DialogControls
explorerManager = ExplorerManager.new
explorerManager.Connect(-1)
explorerManager.LoadProject('google.htp')
explorerManager.Navigate('http://www.google.com/')
scene = explorerManager['Scene_0']
scene.WaitForActive(30000)
scene["q"].Value = 'c#'
scene['btnG'].Click()
scene = explorerManager['Scene_1']
scene.WaitForActive(30000)
explorerManager.DisconnectAndClose()


Vikash said:
There are a few ways of doing this <I am on hurry now to elaborate>, if
your are on windows watir[1] can help you out doing the login stuff, may
the tricky part is how to get the data, but I am sure there is a method
which allows you to extract the hole HTML


http://wtr.rubyforge.org/

$rm rm
.rb

I am working on windows platform, I tried a lot to first log in to a web
page then go to some desired page to get some data from it, but unable
to do it.

Anyone's help will be appreciated.
Thanks
Vikash
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top