What to operate the function "links()"

Discussion in 'Ruby' started by PP, May 16, 2006.

  1. PP

    PP Guest

    In watir there is a function named links(). It returns a Links object .
    I want to put a certain links of one web page into an array and visit
    web pages by these links. My codes are as follows,the result sugguests
    that the "a2[j]" stores something but not links. Can anyone help me to
    check out the errors? Best regards
    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    require'Watir'
    ie=Watir::IE.new
    ie.goto("www.baidu.com")
    n=ie.links.length
    puts n
    $i=1
    $j=1
    $k=1
    a1=Array.new #a1 is used to store all the links in the

    #page
    a2=Array.new #a2 is used to store the certain links
    #that contains the string 'baidu'
    while $i<=n
    a1[$i]=ie.links[$i].to_s
    if /(www.baidu.com)/.matches(a1[$i])
    a2[$j]=ie.links[$i]
    $j=$j+1
    end
    $i=$i+1
    end
    while a2[$k]
    ie.goto(a2[$k])
    ie.back
    $k=$k+1
    end
     
    PP, May 16, 2006
    #1
    1. Advertising

  2. PP

    ChrisH Guest

    Links returns a Links object which mixes in Enumerable, so should be
    able to get the array by using to_a:

    require'Watir'
    ie=Watir::IE.new
    ie.goto("www.baidu.com")
    linksArray = ie.links.to_a
    baiduArray = linksArray.select {|x| /(www.baidu.com)/.matches(x.to_s)}
    baiduArray.each{|link|
    ie.goto(link)
    ie.back
    }
     
    ChrisH, May 16, 2006
    #2
    1. Advertising

  3. PP

    PP Guest

    Your codes are terser than mine. But after I have tried it didn't work
    as of old.if put the "linksArray" to the screen,we can see that it
    contains not only a url but also the id, name, value, innertext and
    type. I think should all of this make the parameter unavailing to the
    function"ie.goto()"
    Thanks for your replys and expecting the advice about the problem.
    Best wishes
     
    PP, May 17, 2006
    #3
  4. PP

    PP Guest

    Just modify the "ie.goto(link)" as "ie.goto(link.href)" all of this can
    work.
     
    PP, May 17, 2006
    #4
  5. links itself is a collection of evenescent link/com objects on the
    current page. these references become stale as soon as a new page is
    loaded.

    require'Watir'
    ie=Watir::IE.new
    ie.goto("www.baidu.com")

    hrefs = Array.new
    ie.links.each do |link|
    hrefs << link.href if /(www.baidu.com)/ =~ link.href
    end

    hrefs.each do |href|
    ie.goto(href)
    end

    However, i think WWW::Mechanize may be a better tool (faster at least)
    if you are only interested in link checking.

    Bret
     
    Bret Pettichord, May 17, 2006
    #5
  6. PP

    PP Guest

    Actually what I want is saving the web pages whose urls contains a
    certain string. Whether to show the web page is not important. Thanks
    for your advice, Best reagars.
     
    PP, May 18, 2006
    #6
  7. PP

    ChrisH Guest

    HI PP, saw your other post re saving files i\via IE and the WIn32 api.

    If the point really is to just download the pages than doing it via
    Waitr/Win32 is a bit like using a lever and pullys move a sheet of
    paper

    It can be done much easier, simpler and faster via one of the HTML
    libraries (i,e, Mechanize mentioned above) or even using the Standard
    library Net::HTTP, URI and OpenURI

    cheers
     
    ChrisH, May 18, 2006
    #7
  8. PP

    PP Guest

    HI ChrisH, Thank you for giving me so much wonderful advice. My
    purpose is just to download some pages whose url contain a certain
    string. As I got in touched with ruby and watir just 3 weeks ago, the
    methods I have found out are all make the program act just like a human
    does.

    Can you show me some information about the library "Net" and the
    embodier the methods the way to my purpose?
    Thanks
     
    PP, May 18, 2006
    #8
  9. PP

    ChrisH Guest

    Your welcome PP,

    Here is a quick example, pulls the links off www.baidu.com and prints
    to standard output.
    Note it downloads a GIF file that is linked, so if you only want HTML
    files will need to add some filtering

    Also note the last link it tries to process (for me anyway) is
    http://www.baidu.com
    and it gets an error:
    d:/ruby/lib/ruby/1.8/net/http.rb:1556:in `read_status_line': wrong
    status line: "<!DOCTYPE HTML PUBLIC \...."
    Not sure why

    Cheers
    Chris

    require 'net/http'
    require 'uri'

    h = Net::HTTP.new('www.baidu.com')
    resp = h.get('/', nil)
    if resp.message == "OK"
    URI.extract(resp.body,['http']){|lnk|
    if /www\.baidu\.com/ =~ lnk
    p "LINK: #{lnk}"
    urilnk = URI.parse(lnk)
    p "PATH: #{urilnk.path}"
    r = Net::HTTP.new(urilnk.host).get(urilnk.path||'/')
    if r.message == "OK"
    p "START",r.body,"END"
    else
    p "BAD LINK #{lnk.to_s}"
    end
    end
    }
    end
     
    ChrisH, May 18, 2006
    #9
  10. PP

    PP Guest

    Hi Chris

    I have tried your codes and have the same result with yours. In my
    opinion it's almost the last step of my job. I have got the urls whith
    your help. now a model "webfetcher" has show me some way to save the
    pages. some codes as follow can easily save the page to "E:\inText"
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    require'webfetcher'
    book = WebFetcher::page.url('http://wtr.rubyforge.org/rdoc/index.html')
    book.recurse.save("E:/inText")
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    but a question still exists. The url can only support the protocol of
    http, the url of the page I want to save is "https://*****"。What can
    I do with this problem. Can any function change the protocol "https" as
    "http"? I have tried to use "http" to visit these pages and save them.
    the results are acceptable. What do you think of this?
    Best regards and expect for your response.
     
    PP, May 19, 2006
    #10
  11. PP

    ChrisH Guest

    HTTPS applys encryption to the traffic between the client and
    webserver.
    If HTTP also works, than the only question is how important is the
    security for the info?
    Since you are just copying the files, I'd guess you are not submitting
    any sensitive info (like a userid & password) so HTTP should be fine.

    'webfetcher' looks nice, better than writing our own, eh?

    Cheers

    PS there is a Net:HTTPS but it seems to be totally undocumented...
     
    ChrisH, May 19, 2006
    #11
  12. PP

    PP Guest

    Thanks for all your help. Now my purpose has already been realized.
    Best reagards ChirsH.
     
    PP, May 22, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. benn
    Replies:
    2
    Views:
    623
  2. Maya Young

    Use C# to operate a mobile device

    Maya Young, Apr 15, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    637
    Maya Young
    Apr 17, 2004
  3. Dux
    Replies:
    1
    Views:
    296
    Mike Wahler
    Oct 5, 2003
  4. Timothy Madden

    How does #define operate ?

    Timothy Madden, Sep 27, 2004, in forum: C++
    Replies:
    5
    Views:
    651
    Xenos
    Sep 28, 2004
  5. Richard Cavell

    GMP: Cannot operate on vector<mpz_t>

    Richard Cavell, Feb 18, 2005, in forum: C++
    Replies:
    3
    Views:
    1,172
    Richard Cavell
    Feb 18, 2005
Loading...

Share This Page