Mechanize and XPath

Ruby Newbie · Oct 15, 2008

Is there a way to select links in a scraped mechanize page using XPath
selectors ?

For example...all links on the second TABLE on the page.

I know it is possible with hpricot but i need the links to be used by
mechanize.

Peter Szinek · Oct 15, 2008

[Note: parts of this message were removed to make it a legal post.]

Is there a way to select links in a scraped mechanize page using XPath
selectors ?

For example...all links on the second TABLE on the page.

I know it is possible with hpricot but i need the links to be used by
mechanize.

From the Mechanize guide (http://mechanize.rubyforge.org/mechanize/files/GUIDE_txt.html
):

Mechanize uses hpricot to parse html. What does this mean for you? You
can treat a mechanize page like an hpricot object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using hpricot methods:
agent.get('http://someurl.com/').search("//p[@class='posted']")
HTH,
Peter

Patrick L. · Feb 18, 2009

Peter said:
Is there a way to select links in a scraped mechanize page using XPath
selectors ?

For example...all links on the second TABLE on the page.

I know it is possible with hpricot but i need the links to be used by
mechanize.

Click to expand...

From the Mechanize guide
(http://mechanize.rubyforge.org/mechanize/files/GUIDE_txt.html
):

Mechanize uses hpricot to parse html. What does this mean for you? You
can treat a mechanize page like an hpricot object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using hpricot methods:
agent.get('http://someurl.com/').search("//p[@class='posted']")
HTH,
Peter

Wait a minute, it says the total opposite on the Mechanize page. But it
definately explains why it's not being friendly with nokogiri...

http://mechanize.rubyforge.org/mechanize/

Mechanize uses nokogiri to parse html. What does this mean for you? You
can treat a mechanize page like an nokogiri object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using nokogiri methods:

agent.get('http://someurl.com/').search(".//p[@class='posted']"

[ANN] Mechanize 2.0.pre.2	0	Apr 18, 2011
Mechanize	2	Dec 17, 2007
Mechanize	0	Jun 20, 2009
mechanize and Content Encoding Error	0	Feb 28, 2011
mechanize - extract href	11	Oct 16, 2010
Mechanize and encoding	1	Nov 22, 2008
Mechanize retrieve headers	1	May 12, 2011
Mechanize/Nokogiri from file	0	Sep 16, 2009

Mechanize and XPath

Ruby Newbie

Peter Szinek

Patrick L.

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads