URL paramater sts - mechanize & nokogiri differences

Discussion in 'Ruby' started by Don Norcott, Oct 9, 2010.

  1. Don Norcott

    Don Norcott Guest

    I have written ruby code (with mechanize and nokogiri) to do the
    following

    1) Retrieve the search webpage
    2) Enter search criteria into the from
    3) Submit the form and retrieve the first webpage which is a list of
    book titles embedded in the page
    4) For each title in the retrieved web page extract 5 fields
    5) Retrieve the next webpage of titles
    6) Repeat 4 & 5 until all titles retrieved

    The mechanize code below works to the point of submitting the form. The
    first webpage returned is missing at least 2 of the
    Fields for each title.

    Now if I grab the url generated by mech.submit and use it in firefox it
    displays all the titles and information normally BUT
    the URL has been changed slightly before the titles are displayed.

    THIS IS THE URL RETURNED BY MECHANIZE.SUBMIT
    #<URI::HTTP:0x17706d8
    URL:http://www.xyz.com/servlet/SearchResults?an=Asimov&bi=0&bx=off&ds=30&kn=science+fiction&recentlyadded=all&sortby=17&sts=t>}


    Now if I take the URL from the submit and use it in the nokogiri code
    below it fails to open with BAD URI.
    Also if take the URL from fire fox and use it in the nokogiri code
    below it also fails to open with BAD URI.

    Now if I start off in firefox at the search page and enter the same data
    into the form and submit it manually I wind up with the
    same screen displayed as when I cut and pasted in the url from the
    mechanize.submit code.

    If I now copy the url from firefox and use it in the nokogiri code below
    it works fine and the "puts node.text" shows that
    all 5 of the fields I require are there (plus others not present in the
    mechanize object)

    Now the urls from the 3 steps above only differ in one way, the last
    variable (sts) on the url line.
    &sortby=17&sts=t>}" from mechanize.submit
    &sortby=17&sts=t%3E}" copied from firefox after submit url used and
    webpage displayed (changed url)
    &sortby=17&sts=t&x=84&y=10" manualy entered the search and this is
    the url upon display of first page

    The attached file shows what the source (from web page) for the last
    title looks like and what the mechanize content for that same title
    looks like.

    THE CONTENTS OF BOTH <td class="itemNumbr" valign="top">
    AND <div class="result-price"> are missing in the mechanize object

    Can anyone shed light on what is happening. It would be greatly
    appreciated.
    Thanks Don

    #MECHANIZE CODE
    require 'rubygems'
    require 'open-uri'
    require 'nokogiri'
    require 'mechanize'
    url = "...." # url of search form
    a = Mechanize.new { |agent|
    agent.user_agent_alias = 'Mac Safari';
    };
    search_page = a.get(url);
    search_form = search_page.form_with:)name => 'form-advancedSearch')
    search_form.an = 'Asimov'
    search_form.kn = 'science fiction'
    title_pg = search_form.submit # capture submitted url and title_pg
    contents
    title_pg.links.each do |link|
    puts link.text #not all the data is there
    end

    NOKOGIRI CODE
    require 'open-uri'
    require 'nokogiri'
    url = "http://www.xyz.com/....."

    doc = Nokogiri::HTML(open(url))
    doc.xpath('//tr').each do |node|
    puts node.text
    end

    Attachments:
    http://www.ruby-forum.com/attachment/5154/WepPage-Docs.txt

    --
    Posted via http://www.ruby-forum.com/.
    Don Norcott, Oct 9, 2010
    #1
    1. Advertising

  2. Don Norcott

    Don Norcott Guest

    I still have not resolved (or do not understand my problem) but the
    following is a work around that allows me to continue with development

    title_pg = search_form.submit # get first title page - last line of
    orig code

    #initialize a Nokogiri::HTML Object with 'title_pg.body' the returned
    web page
    doc = Nokogiri::HTML(title_pg.body)

    can now use Nokogiri to process the title page HTML
    doc.xpath('//tr').each do |node|
    puts node.text
    end

    This prints out the fields that are missing in the mechanize object.
    Not sure if this is really is a problem or I simply do not understand
    the mechanize object properly and the data is there but requires a
    different selector??
    --
    Posted via http://www.ruby-forum.com/.
    Don Norcott, Oct 10, 2010
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andrew Thompson

    Arcane sts on the MS VM

    Andrew Thompson, Feb 27, 2004, in forum: Java
    Replies:
    2
    Views:
    454
    Andrew Thompson
    Feb 28, 2004
  2. Edouard Dantes
    Replies:
    1
    Views:
    145
    Luis Parravicini
    Jan 29, 2009
  3. Patrick L.

    Moving Mechanize to Nokogiri

    Patrick L., Feb 19, 2009, in forum: Ruby
    Replies:
    3
    Views:
    110
    Ryan Davis
    Feb 19, 2009
  4. Rowan Udell

    Mechanize/Nokogiri from file

    Rowan Udell, Sep 16, 2009, in forum: Ruby
    Replies:
    0
    Views:
    122
    Rowan Udell
    Sep 16, 2009
  5. Squawk Boxed
    Replies:
    2
    Views:
    259
    Squawk Boxed
    Mar 11, 2011
Loading...

Share This Page