What to operate the function "links()"

PP · May 16, 2006

In watir there is a function named links(). It returns a Links object .
I want to put a certain links of one web page into an array and visit
web pages by these links. My codes are as follows,the result sugguests
that the "a2[j]" stores something but not links. Can anyone help me to
check out the errors? Best regards
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
require'Watir'
ie=Watir::IE.new
ie.goto("www.baidu.com")
n=ie.links.length
puts n
$i=1
$j=1
$k=1
a1=Array.new #a1 is used to store all the links in the

#page
a2=Array.new #a2 is used to store the certain links
#that contains the string 'baidu'
while $i<=n
a1[$i]=ie.links[$i].to_s
if /(www.baidu.com)/.matches(a1[$i])
a2[$j]=ie.links[$i]
$j=$j+1
end
$i=$i+1
end
while a2[$k]
ie.goto(a2[$k])
ie.back
$k=$k+1
end

ChrisH · May 16, 2006

Links returns a Links object which mixes in Enumerable, so should be
able to get the array by using to_a:

require'Watir'
ie=Watir::IE.new
ie.goto("www.baidu.com")
linksArray = ie.links.to_a
baiduArray = linksArray.select {|x| /(www.baidu.com)/.matches(x.to_s)}
baiduArray.each{|link|
ie.goto(link)
ie.back
}

PP · May 17, 2006

Your codes are terser than mine. But after I have tried it didn't work
as of old.if put the "linksArray" to the screen,we can see that it
contains not only a url but also the id, name, value, innertext and
type. I think should all of this make the parameter unavailing to the
function"ie.goto()"
Thanks for your replys and expecting the advice about the problem.
Best wishes

PP · May 17, 2006

Just modify the "ie.goto(link)" as "ie.goto(link.href)" all of this can
work.

Bret Pettichord · May 17, 2006

links itself is a collection of evenescent link/com objects on the
current page. these references become stale as soon as a new page is
loaded.

require'Watir'
ie=Watir::IE.new
ie.goto("www.baidu.com")

hrefs = Array.new
ie.links.each do |link|
hrefs << link.href if /(www.baidu.com)/ =~ link.href
end

hrefs.each do |href|
ie.goto(href)
end

However, i think WWW::Mechanize may be a better tool (faster at least)
if you are only interested in link checking.

Bret

PP · May 18, 2006

Actually what I want is saving the web pages whose urls contains a
certain string. Whether to show the web page is not important. Thanks
for your advice, Best reagars.

ChrisH · May 18, 2006

HI PP, saw your other post re saving files i\via IE and the WIn32 api.

If the point really is to just download the pages than doing it via
Waitr/Win32 is a bit like using a lever and pullys move a sheet of
paper

It can be done much easier, simpler and faster via one of the HTML
libraries (i,e, Mechanize mentioned above) or even using the Standard
library Net::HTTP, URI and OpenURI

cheers

PP · May 18, 2006

HI ChrisH, Thank you for giving me so much wonderful advice. My
purpose is just to download some pages whose url contain a certain
string. As I got in touched with ruby and watir just 3 weeks ago, the
methods I have found out are all make the program act just like a human
does.

Can you show me some information about the library "Net" and the
embodier the methods the way to my purpose?
Thanks

ChrisH · May 18, 2006

Your welcome PP,

Here is a quick example, pulls the links off www.baidu.com and prints
to standard output.
Note it downloads a GIF file that is linked, so if you only want HTML
files will need to add some filtering

Also note the last link it tries to process (for me anyway) is
http://www.baidu.com
and it gets an error:
d:/ruby/lib/ruby/1.8/net/http.rb:1556:in `read_status_line': wrong
status line: "<!DOCTYPE HTML PUBLIC \...."
Not sure why

Cheers
Chris

require 'net/http'
require 'uri'

h = Net::HTTP.new('www.baidu.com')
resp = h.get('/', nil)
if resp.message == "OK"
URI.extract(resp.body,['http']){|lnk|
if /www\.baidu\.com/ =~ lnk
p "LINK: #{lnk}"
urilnk = URI.parse(lnk)
p "PATH: #{urilnk.path}"
r = Net::HTTP.new(urilnk.host).get(urilnk.path||'/')
if r.message == "OK"
p "START",r.body,"END"
else
p "BAD LINK #{lnk.to_s}"
end
end
}
end

PP · May 19, 2006

Hi Chris

I have tried your codes and have the same result with yours. In my
opinion it's almost the last step of my job. I have got the urls whith
your help. now a model "webfetcher" has show me some way to save the
pages. some codes as follow can easily save the page to "E:\inText"
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
require'webfetcher'
book = WebFetcher:

age.url('http://wtr.rubyforge.org/rdoc/index.html')
book.recurse.save("E:/inText")
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
but a question still exists. The url can only support the protocol of
http, the url of the page I want to save is "https://*****"ã€‚What can
I do with this problem. Can any function change the protocol "https" as
"http"? I have tried to use "http" to visit these pages and save them.
the results are acceptable. What do you think of this?
Best regards and expect for your response.

ChrisH · May 19, 2006

HTTPS applys encryption to the traffic between the client and
webserver.
If HTTP also works, than the only question is how important is the
security for the info?
Since you are just copying the files, I'd guess you are not submitting
any sensitive info (like a userid & password) so HTTP should be fine.

'webfetcher' looks nice, better than writing our own, eh?

Cheers

PS there is a Net:HTTPS but it seems to be totally undocumented...

PP · May 22, 2006

Thanks for all your help. Now my purpose has already been realized.
Best reagards ChirsH.

How to get certain links that contains some string	2	May 15, 2006
getting the css href links	0	May 25, 2009
Addition and substraction of polynomials is working fine but the multiplication isn't; what's wrong with my code	1	Nov 22, 2022
Problem with assigning 1D array to a 2D array	11	Oct 12, 2012
Programming D. E. Knuth in Python with the Deterministic Finite Automatonconstruct	12	Mar 17, 2012
error while using the function size ()	2	Apr 28, 2014
Accessing URL links inside of Tables	2	May 24, 2008
Sequential microprocessor code to vhdl - easy conversion tips?	6	Feb 2, 2011

What to operate the function "links()"

PP

ChrisH

PP

PP

Bret Pettichord

PP

ChrisH

PP

ChrisH

PP

ChrisH

PP

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads