Ruby(and programming) beginners question regarding 'NoMethodError'while using Hpricot

Sandeep Guria · Feb 15, 2011

Hi!
I am trying to build a web scraper which fetches Fundamental data for
listed companies from finance websites.
let me show an example.

"<tbody>
<tr><td>PE ratio</td><td class="numericalColumn">
16.83</td><td>14/02/11</td></tr>

<tr><td>EPS (Rs)</td><td class="numericalColumn">
10.59</td><td>Mar, 10</td></tr>
<tr><td>Sales (Rs crore)</td><td class="numericalColumn">
13,963.81</td><td>Dec, 10</td></tr>
<tr><td>Face Value (Rs)</td><td
class="numericalColumn">10</td><td> </td></tr>
<tr><td>Net profit margin (%)</td><td class="numericalColumn">
17.72</td><td>Mar, 10</td></tr>

<tr><td>Last dividend (%)</td><td
class="numericalColumn">30</td><td>18/01/11</td></tr>
<tr><td>Return on average equity</td><td
class="numericalColumn">13.69</td><td>Mar, 10</td></tr>
</tbody>
"
I want to the data '16.83' from the above html , so what I do is
I parse the HTML file and save it into doc.
I search doc for inner text 'PE ratio'
And then I chose the next element using next_sibling.
But I am getting an error
'C:\Users\Administrator\Documents>ruby scraper.rb scraper.rb:9:in
`<main>': undefined method `next_sibling' for #<Hpricot::Elements[{elem
<td> "PE ratio" </td>}]> (NoMethodError)'

I'll be grateful for any suggestions .
Sorry about the formatting of the HTML Text!

Attachments:
http://www.ruby-forum.com/attachment/5911/scraper.rb

Estanislau Trepat · Feb 15, 2011

[Note: parts of this message were removed to make it a legal post.]

Hi Sandeep.

The #search method returns an Hpricot::Elements object, which is somewaht
similar to an array. You should call #next_sibling on any of the elements
inside that collection, which, in fact, are Hpricot::Elem objects. For
instance:

# perform search

elements = doc.search('td[text()="PE ratio"]')

=> #<Hpricot::Elements[{elem <td> "PE ratio" </td>}]>

# get the targeted cell

cell = elements*.first.*next_sibling

=> {elem <td class="numericalColumn"> " 16.83" </td>}

# printout raw value

puts cell.to_plain_text

16.83
=> nil

Regards.

--
Estanislau Trepat

2011/2/15 Sandeep Guria said:
Hi!
I am trying to build a web scraper which fetches Fundamental data for
listed companies from finance websites.
let me show an example.

"<tbody>
<tr><td>PE ratio</td><td class="numericalColumn">
16.83</td><td>14/02/11</td></tr>

<tr><td>EPS (Rs)</td><td class="numericalColumn">
10.59</td><td>Mar, 10</td></tr>
<tr><td>Sales (Rs crore)</td><td class="numericalColumn">
13,963.81</td><td>Dec, 10</td></tr>
<tr><td>Face Value (Rs)</td><td
class="numericalColumn">10</td><td> </td></tr>
<tr><td>Net profit margin (%)</td><td class="numericalColumn">
17.72</td><td>Mar, 10</td></tr>

<tr><td>Last dividend (%)</td><td
class="numericalColumn">30</td><td>18/01/11</td></tr>
<tr><td>Return on average equity</td><td
class="numericalColumn">13.69</td><td>Mar, 10</td></tr>
</tbody>
"
I want to the data '16.83' from the above html , so what I do is
I parse the HTML file and save it into doc.
I search doc for inner text 'PE ratio'
And then I chose the next element using next_sibling.
But I am getting an error
'C:\Users\Administrator\Documents>ruby scraper.rb scraper.rb:9:in
`<main>': undefined method `next_sibling' for #<Hpricot::Elements[{elem
<td> "PE ratio" </td>}]> (NoMethodError)'

I'll be grateful for any suggestions .
Sorry about the formatting of the HTML Text!

Attachments:
http://www.ruby-forum.com/attachment/5911/scraper.rb

Sandeep Guria · Feb 17, 2011

Thank You! very much Estanislau

If I am not bothering you too much why wasn't it('next_sibling') working
on my code??and
what are those '*' for in here

cell = elements*.first.*next_sibling

They were giving an error 'syntax error, unexpected '.''.
I removed them and now it's working fine.

One more thing I need to ask , if I could use this thread!
I have this web page 'http://money.rediff.com/companies/all/1-200'
at the bottom there is a link('next') to the next page of the list.
Now this link is a java script .
What I want to do is after finishing scraping this page I want to go to
the next through the 'Next' link. Is there any way to do it???

Note:- A cruder method will be to go to every page o the list by their
web page and scraping from that page (total number of pages will be 17).

Any suggestions are welcome!
Thank you!
Sandeep Guria

Estanislau Trepat · Feb 17, 2011

[Note: parts of this message were removed to make it a legal post.]

Hi Sandeep.

The #next_sibling method was not working because you were using it on the
whole elements array (in fact, an Hpricot::Elems object) and not on each of
the elements inside. That's because we had to use elements.first to get the
first node which met our search criteria and then call #next_sibling on that
node. The #next_sibling method is only defined on each of those nodes not on
the array itself.

I apologize for the * characters, I think I was trying to put that part in
bold and got bad formatting out of my email client.

For the problem you expose, maybe you could try using Watir<http://watir.com/>.
It drives a real web browser, and can thus handle Javascript links.

If you allow me a suggestion: Taking a look at the page you're trying to
scrape and the structure of the query parameters, I'd suggest to extract the
total number of results from the bottom part which reads "Showing 1 - 200 of
3529". If you extract that last number (the total number of results) then
you could point your scraping script to:
http://money.rediff.com/companies/all/1-3529 without needing to follow
javascript links.

Hope it helps.

Regards.

Sandeep Guria · Feb 19, 2011

Hi!
Thanks! for the link Estanislau.
it certainly did my work lot easier.

I uploaded my 'almost' final program.What it does is it searches for
some data for each company on BSE and writes it down on an excel sheet .

First I collected all the links and saved it in an array 'x'
Then i collect the data that i need and save it to an spreadsheet I
defined earlier in the program.
Lastly I write the spreadsheet to an excel File.
I can control how many companies I want by changing the number of
iterations(In this case 8).
This program is running fine if the number of iteration is less than 6
otherwise, I get a error
'links.rb:34:in `block in <main>': undefined method `next_sibling' for
nil:NilCla
ss (NoMethodError)
from links.rb:28:in `times'
from links.rb:28:in `<main>''

I'm puzzled(like always!).
All suggestions are welcome!
Thank you
Sandeep Guria

Attachments:
http://www.ruby-forum.com/attachment/5928/links.rb

Sandeep Guria · Feb 25, 2011

Hi!
I tried to find the class of the object on which I am using the method
next_sibling using the code below is returning (by iterating it for 25
times )

sheet1[num,1]=doc.search('td[text()="PE ratio"]').first
puts num # num is |num|
puts sheet1[num,1].class

It turns out it gives 'nil class' for 13th, 15th 16th and 24th
iteration.
So 'next_method' gives a no method error.

Please help me with this problem

Attachments:
http://www.ruby-forum.com/attachment/5964/scraper.rb

How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 22, 2023
Elegant Solution to a Seemingly Simple Problem?	11	Apr 18, 2010
how to hide a select box in a table when i scroll top or left using stylee sheets	0	May 24, 2006
This form is submitting after date validation,not checking email and all,can anybody solve it?	1	Aug 22, 2007
This form is submitting after date validation,not checking email and all,	2	Aug 22, 2007
Using Nokogiri	17	Nov 8, 2009
Prototype 1.6--Somebody Stop These People	6	Dec 24, 2009

Ruby(and programming) beginners question regarding 'NoMethodError'while using Hpricot

Sandeep Guria

Estanislau Trepat

Sandeep Guria

Estanislau Trepat

Sandeep Guria

Sandeep Guria

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads