Hpricot scraping returns nil

S

Sergei Maertens

Good evening

First I'll mention I have used the search function and found some useful
topics, but I still don't really find a solution due to a lack of Ruby
and Hpricot/Xpath knowlegde.

The problem is the following: from
http://users.telenet.be/weerstation.drongen/index.htm/Current_Vantage_Pro.htm
I need to scrape the temperature and Today's Rain values (need those for
Engineering Project). With Xpather and Firebug I looked up the Xpath to
the Temperature values:
/html/body/table/tbody/tr[3]/td[2]/font/strong/small/font (as Xpather
says so).

But when I try to print the value in Ruby, I got nil.

Here is my code:

---------------------------------------------------------------------------
#!/usr/bin/ruby
require 'rubygems'
require 'open-uri'
require 'hpricot'

@url="http://users.telenet.be/weerstation.drongen/index.htm/Current_Vantage_Pro.htm"
xpath = "/html/body/table/tbody/tr[3]/td[2]/font/strong/small/font"
@response=""

begin
open(@url) {|file|
puts "Fetched Document: #{file.base_uri}"
@response = file.read
}

doc = Hpricot(@response)
puts (doc/"#{xpath}").inner_html
rescue Exception => e
puts e
end

---------------------------------------------------------------------------

Since this returned nil, I decided to look up where I got nil returned.
Apparently /html/body/table/tbody is too far, because /html/body/table
still returns an output and tbody returns nil.

I've read that I should try to rebuild the path now, but I really don't
find a way how to do this. This is only my second serious Ruby script
(only the beginning actually) and the first time I used Hpricot.

I'm looking forward to replies, and I'm sorry to bother you with yet
another Hpricot-nil topic, but I'm kinda hopeless because of my
deadline...

Kind regards,
Sergei
 
J

Jn Jacob

It should work if you take the tbody off the xpath. I have read
somewhere that tbody does not work for hpricot , I dont know Y .
Gudluck.
xpath = "/html/body/table//tr[3]/td[2]/font/strong/small/font"
 
P

Peter Szinek

[Note: parts of this message were removed to make it a legal post.]

It should work if you take the tbody off the xpath. I have read
somewhere that tbody does not work for hpricot , I dont know Y .
Gudluck.
xpath = "/html/body/table//tr[3]/td[2]/font/strong/small/font"

There is more to it than "tbody does not work for hpricot".

When a HTML parser (Firefox and Hpricot in this case) parses a HTML
page, it has to build a tree from it (a.k.a. DOM).
The problem is that a lot (most?) of the HTML out there is badly
formatted, so the process of DOM building is very ambiguous (what if
tags are not nested properly? tags that are never closed? and a lot of
other problems) so every parser approaches it a bit differently
(that's one reason why you have the 'works in IE but not in FF' kind
of problems), and e.g. Firefox even makes some efforts to make the
parsed HTML standards compliant - for example inserting a tbody tag
after a table tag if it's missing.

However, this is but only very small difference between how Hpricot
and Firefox parses the HTML/builds the DOM tree (on which XPaths are
evaluated) - Hpricot tries to be as close to FF as possible, but this
doesn't always happen (though _why said he considers these cases bugs).

Bottom line: you can't expect that XPath yanked from FireBug will work
with Hpricot/Mechanize (though it mostly does, and adding a tbody
increases your chances even further).

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
 
S

Sergei Maertens

Jn said:
It should work if you take the tbody off the xpath. I have read
somewhere that tbody does not work for hpricot , I dont know Y .
Gudluck.
xpath = "/html/body/table//tr[3]/td[2]/font/strong/small/font"

I'll try it in a minute, thank you for the answer.

@Peter, thank you for the very complete explanation.
 
S

Sergei Maertens

Sergei said:
I'll try it in a minute, thank you for the answer.


and it does work! Thank you very much Jn Jakob
Now I only have to solve the '�' that appears instead of '°'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,247
Latest member
crypto tax software1

Latest Threads

Top