How can one get the Hpricot DOM document from Mechanize?

Discussion in 'Ruby' started by Just Another Victim of the Ambient Morality, Sep 13, 2008.

  1. I was wondering if there were some way of getting the Hpricot DOM (for
    lack of a better term) from a Mechanize page. For example:


    agent = WWW:Mechanize.new
    page = agent.get(http://www.website.com)

    # I am currently doing this
    doc = Hpricot(page.body)

    # I would like to do this
    doc = page.get_hpricot_dom


    The idea is that since Mechanize apparently uses Hpricot and it's surely
    using it to parse the HTML begotten from the agent.get method, it would be
    nice if I didn't have to repeat that work.
    Is there a way to get this Hpricot document? ...or am I just totally
    wrong about how Mechanize uses Hpricot?
    Thank you...
     
    Just Another Victim of the Ambient Morality, Sep 13, 2008
    #1
    1. Advertising

  2. Just Another Victim of the Ambient Morality

    Lex Williams Guest

    perhaps it's only me , but would you please detail what is it you want
    to accomplish? maybe , with an example perhaps ?
    --
    Posted via http://www.ruby-forum.com/.
     
    Lex Williams, Sep 13, 2008
    #2
    1. Advertising

  3. Just Another Victim wrote:
    > # I would like to do this
    > doc = page.get_hpricot_dom


    Try page.parser or page.root (they're eqivalent).

    Regards,
    Matthias
    --
    Posted via http://www.ruby-forum.com/.
     
    Matthias Reitinger, Sep 13, 2008
    #3
  4. On Sun, Sep 14, 2008 at 04:03:04AM +0900, Just Another Victim of the Ambient Morality wrote:
    > I was wondering if there were some way of getting the Hpricot DOM (for
    > lack of a better term) from a Mechanize page. For example:
    >
    >
    > agent = WWW:Mechanize.new
    > page = agent.get(http://www.website.com)
    >
    > # I am currently doing this
    > doc = Hpricot(page.body)
    >
    > # I would like to do this
    > doc = page.get_hpricot_dom
    >
    >
    > The idea is that since Mechanize apparently uses Hpricot and it's surely
    > using it to parse the HTML begotten from the agent.get method, it would be
    > nice if I didn't have to repeat that work.
    > Is there a way to get this Hpricot document? ...or am I just totally
    > wrong about how Mechanize uses Hpricot?


    You can get at the Hpricot document by using the "parser" accessor on
    WWW::Mechanize::page. Page also responds to "search", "/", and "at",
    which just delegate to the Hpricot document.

    So you can just do:

    (agent.get('http://tenderlovemaking.com')/'tr').each do |tr|
    ...
    end

    --
    Aaron Patterson
    http://tenderlovemaking.com/
     
    Aaron Patterson, Sep 18, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xeno Campanoli
    Replies:
    1
    Views:
    347
    James Britt
    Jul 1, 2005
  2. Peter Szinek
    Replies:
    2
    Views:
    163
    Peter Szinek
    Feb 21, 2007
  3. Replies:
    6
    Views:
    325
    Stefan Mahlitz
    Aug 16, 2007
  4. Ehud Rosenberg
    Replies:
    2
    Views:
    145
    Ehud Rosenberg
    Nov 14, 2007
  5. Cy Gar
    Replies:
    6
    Views:
    228
    Cy Gar
    May 19, 2008
Loading...

Share This Page