Mechanize and XPath

R

Ruby Newbie

Is there a way to select links in a scraped mechanize page using XPath
selectors ?

For example...all links on the second TABLE on the page.


I know it is possible with hpricot but i need the links to be used by
mechanize.
 
P

Peter Szinek

[Note: parts of this message were removed to make it a legal post.]


Is there a way to select links in a scraped mechanize page using XPath
selectors ?

For example...all links on the second TABLE on the page.


I know it is possible with hpricot but i need the links to be used by
mechanize.

From the Mechanize guide (http://mechanize.rubyforge.org/mechanize/files/GUIDE_txt.html
):

Mechanize uses hpricot to parse html. What does this mean for you? You
can treat a mechanize page like an hpricot object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using hpricot methods:
agent.get('http://someurl.com/').search("//p[@class='posted']")
HTH,
Peter
 
P

Patrick L.

Peter said:
Is there a way to select links in a scraped mechanize page using XPath
selectors ?

For example...all links on the second TABLE on the page.


I know it is possible with hpricot but i need the links to be used by
mechanize.

From the Mechanize guide
(http://mechanize.rubyforge.org/mechanize/files/GUIDE_txt.html
):

Mechanize uses hpricot to parse html. What does this mean for you? You
can treat a mechanize page like an hpricot object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using hpricot methods:
agent.get('http://someurl.com/').search("//p[@class='posted']")
HTH,
Peter

Wait a minute, it says the total opposite on the Mechanize page. But it
definately explains why it's not being friendly with nokogiri...

http://mechanize.rubyforge.org/mechanize/

Mechanize uses nokogiri to parse html. What does this mean for you? You
can treat a mechanize page like an nokogiri object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using nokogiri methods:

agent.get('http://someurl.com/').search(".//p[@class='posted']"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

[ANN] Mechanize 2.0.pre.2 0
Mechanize 2
Mechanize 0
mechanize and Content Encoding Error 0
mechanize - extract href 11
Mechanize and encoding 1
Mechanize retrieve headers 1
Mechanize/Nokogiri from file 0

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,206
Latest member
SybilSchil

Latest Threads

Top