extracing the URL from hpricot element

Discussion in 'Ruby' started by Nikita Ratlos, Dec 10, 2008.

  1. I want to get a list of URLs from a webpage as follows:

    First I create the Hpricot element as follows
    doc = Hpricot(open(searchurl))

    links = doc/"//html//body//div[6]//div[2]//a[@id='p-1']" +#

    Next I want to append the URLs to an array as such:

    results << links.map.each{|link| puts link.attributes['href'] }

    The line nicely prints out the URLs how I need them, but then
    puts the whole HTML link in the results array.

    Any ideas how to get the URLs (without the HTML) into my results array ?
    --
    Posted via http://www.ruby-forum.com/.
     
    Nikita Ratlos, Dec 10, 2008
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. buke2
    Replies:
    2
    Views:
    534
    buke2
    Jul 28, 2004
  2. Johannes Bauer

    Linux, extracing symbol table to read core dump

    Johannes Bauer, Nov 8, 2007, in forum: C Programming
    Replies:
    2
    Views:
    595
    Johannes Bauer
    Nov 8, 2007
  3. srinivasan srinivas

    Extracing data from webpage

    srinivasan srinivas, Sep 11, 2008, in forum: Python
    Replies:
    2
    Views:
    288
  4. chris_huh
    Replies:
    27
    Views:
    2,934
    chris_huh
    May 28, 2009
  5. Bonita
    Replies:
    3
    Views:
    171
    Billy Hsu
    Apr 13, 2007
Loading...

Share This Page