Ruby(and programming) beginners question regarding 'NoMethodError'while using Hpricot

Discussion in 'Ruby' started by Sandeep Guria, Feb 15, 2011.

  1. Hi!
    I am trying to build a web scraper which fetches Fundamental data for
    listed companies from finance websites.
    let me show an example.


    "<tbody>
    <tr><td>PE ratio</td><td class="numericalColumn">
    16.83</td><td>14/02/11</td></tr>

    <tr><td>EPS (Rs)</td><td class="numericalColumn">
    10.59</td><td>Mar, 10</td></tr>
    <tr><td>Sales (Rs crore)</td><td class="numericalColumn">
    13,963.81</td><td>Dec, 10</td></tr>
    <tr><td>Face Value (Rs)</td><td
    class="numericalColumn">10</td><td>&nbsp;</td></tr>
    <tr><td>Net profit margin (%)</td><td class="numericalColumn">
    17.72</td><td>Mar, 10</td></tr>

    <tr><td>Last dividend (%)</td><td
    class="numericalColumn">30</td><td>18/01/11</td></tr>
    <tr><td>Return on average equity</td><td
    class="numericalColumn">13.69</td><td>Mar, 10</td></tr>
    </tbody>
    "
    I want to the data '16.83' from the above html , so what I do is
    I parse the HTML file and save it into doc.
    I search doc for inner text 'PE ratio'
    And then I chose the next element using next_sibling.
    But I am getting an error
    'C:\Users\Administrator\Documents>ruby scraper.rb scraper.rb:9:in
    `<main>': undefined method `next_sibling' for #<Hpricot::Elements[{elem
    <td> "PE ratio" </td>}]> (NoMethodError)'

    I'll be grateful for any suggestions .
    Sorry about the formatting of the HTML Text!

    Attachments:
    http://www.ruby-forum.com/attachment/5911/scraper.rb
     
    Sandeep Guria, Feb 15, 2011
    #1
    1. Advertisements

  2. [Note: parts of this message were removed to make it a legal post.]

    Hi Sandeep.

    The #search method returns an Hpricot::Elements object, which is somewaht
    similar to an array. You should call #next_sibling on any of the elements
    inside that collection, which, in fact, are Hpricot::Elem objects. For
    instance:

    # perform search
    => #<Hpricot::Elements[{elem <td> "PE ratio" </td>}]>

    # get the targeted cell
    => {elem <td class="numericalColumn"> " 16.83" </td>}

    # printout raw value
    16.83
    => nil

    Regards.

    --
    Estanislau Trepat


     
    Estanislau Trepat, Feb 15, 2011
    #2
    1. Advertisements

  3. Thank You! very much Estanislau

    If I am not bothering you too much why wasn't it('next_sibling') working
    on my code??and
    what are those '*' for in here
    They were giving an error 'syntax error, unexpected '.''.
    I removed them and now it's working fine.

    One more thing I need to ask , if I could use this thread!
    I have this web page 'http://money.rediff.com/companies/all/1-200'
    at the bottom there is a link('next') to the next page of the list.
    Now this link is a java script .
    What I want to do is after finishing scraping this page I want to go to
    the next through the 'Next' link. Is there any way to do it???

    Note:- A cruder method will be to go to every page o the list by their
    web page and scraping from that page (total number of pages will be 17).

    Any suggestions are welcome!
    Thank you!
    Sandeep Guria
     
    Sandeep Guria, Feb 17, 2011
    #3
  4. [Note: parts of this message were removed to make it a legal post.]

    Hi Sandeep.

    The #next_sibling method was not working because you were using it on the
    whole elements array (in fact, an Hpricot::Elems object) and not on each of
    the elements inside. That's because we had to use elements.first to get the
    first node which met our search criteria and then call #next_sibling on that
    node. The #next_sibling method is only defined on each of those nodes not on
    the array itself.

    I apologize for the * characters, I think I was trying to put that part in
    bold and got bad formatting out of my email client.

    For the problem you expose, maybe you could try using Watir<http://watir.com/>.
    It drives a real web browser, and can thus handle Javascript links.

    If you allow me a suggestion: Taking a look at the page you're trying to
    scrape and the structure of the query parameters, I'd suggest to extract the
    total number of results from the bottom part which reads "Showing 1 - 200 of
    3529". If you extract that last number (the total number of results) then
    you could point your scraping script to:
    http://money.rediff.com/companies/all/1-3529 without needing to follow
    javascript links.

    Hope it helps.

    Regards.
     
    Estanislau Trepat, Feb 17, 2011
    #4
  5. Hi!
    Thanks! for the link Estanislau.
    it certainly did my work lot easier.

    I uploaded my 'almost' final program.What it does is it searches for
    some data for each company on BSE and writes it down on an excel sheet .

    First I collected all the links and saved it in an array 'x'
    Then i collect the data that i need and save it to an spreadsheet I
    defined earlier in the program.
    Lastly I write the spreadsheet to an excel File.
    I can control how many companies I want by changing the number of
    iterations(In this case 8).
    This program is running fine if the number of iteration is less than 6
    otherwise, I get a error
    'links.rb:34:in `block in <main>': undefined method `next_sibling' for
    nil:NilCla
    ss (NoMethodError)
    from links.rb:28:in `times'
    from links.rb:28:in `<main>''

    I'm puzzled(like always!).
    All suggestions are welcome!
    Thank you
    Sandeep Guria

    Attachments:
    http://www.ruby-forum.com/attachment/5928/links.rb
     
    Sandeep Guria, Feb 19, 2011
    #5
  6. Hi!
    I tried to find the class of the object on which I am using the method
    next_sibling using the code below is returning (by iterating it for 25
    times )
    It turns out it gives 'nil class' for 13th, 15th 16th and 24th
    iteration.
    So 'next_method' gives a no method error.

    Please help me with this problem

    Attachments:
    http://www.ruby-forum.com/attachment/5964/scraper.rb
     
    Sandeep Guria, Feb 25, 2011
    #6
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.