[noob] Parsing problems using https and redirects

Discussion in 'Ruby' started by Ramiro Diaz Trepat, Dec 14, 2007.

  1. [Note: parts of this message were removed to make it a legal post.]

    Hello list,
    I have to develop a simple script to parse some parts of a web site and I
    thought it could be a good opportunity to start trying Ruby.
    I found that there are two network libraries that I could supposedly use
    to retrieve the contents of the web site: open-uri and net-http.

    *First problem*
    This web site is accessed only with https and has a self issued
    certificate. This has made it impossible so far for me to access the
    contents of the web site.
    Simple examples from the Hpricot html parsing library like this one:

    require 'hpricot'
    require 'open-uri'
    doc = Hpricot(open("https://xxxxxx"))

    will not work because the open will fail because of problems due to
    https.

    *Second problem*
    I need to know also how to handle redirection and cookies. But to be
    fair, I still can do some further reading myself on these issues.

    Thank you very much.
    Ramiro Diaz Trepat, Dec 14, 2007
    #1
    1. Advertising

  2. Thank you very much Konrad, it seems that I am on my way now.
    The only weird thing that happened now with Mechanize is that it all works
    perfectly on my Linux but it doesn=B4t on my Mac/Leopard.
    Both have Ruby 1.8.6

    On the mac I get the following error while trying to execute the first
    Mechanize example:

    /mechanize.rb:4: uninitialized constant WWW (NameError)
    from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:27:in
    `gem_original_require'
    from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:27:in `require'
    from goog.rb:2


    and the code is the first example of machanize:

    require 'rubygems'
    require 'mechanize'

    agent =3D WWW::Mechanize.new
    agent.user_agent_alias =3D 'Mac Safari'
    page =3D agent.get("http://www.google.com/")
    search_form =3D page.forms.with.name("f").first
    search_form.q =3D "Hello"
    search_results =3D agent.submit(search_form)
    puts search_results.body


    I really don't know why this constant is uninitialized and how could I
    initialize it. Besides it worries my that on Linux, after installing the
    mechanize gem, everything worked out of the box.

    Thanks again




    On Dec 14, 2007 10:14 PM, Konrad Meyer <> wrote:

    > Quoth Ramiro Diaz Trepat:
    > > Hello list,
    > > I have to develop a simple script to parse some parts of a web site

    > and I
    > > thought it could be a good opportunity to start trying Ruby.
    > > I found that there are two network libraries that I could supposedly

    > use
    > > to retrieve the contents of the web site: open-uri and net-http.
    > >
    > > *First problem*
    > > This web site is accessed only with https and has a self issued
    > > certificate. This has made it impossible so far for me to access the
    > > contents of the web site.
    > > Simple examples from the Hpricot html parsing library like this one:
    > >
    > > require 'hpricot'
    > > require 'open-uri'
    > > doc =3D Hpricot(open("https://xxxxxx"))
    > >
    > > will not work because the open will fail because of problems due to
    > > https.
    > >
    > > *Second problem*
    > > I need to know also how to handle redirection and cookies. But to b=

    e
    > > fair, I still can do some further reading myself on these issues.
    > >
    > > Thank you very much.

    >
    > 2) Look at mechanize.
    >
    > 1) Look at http-access2 (or whatever it's been renamed to).
    >
    > Regards,
    > --
    > Konrad Meyer <> http://konrad.sobertillnoon.com/
    >
    Ramiro Diaz Trepat, Dec 15, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    491
    Marcus Kwok
    May 11, 2006
  2. Axel
    Replies:
    8
    Views:
    1,061
    Adrienne Boswell
    Apr 27, 2009
  3. Koncept
    Replies:
    9
    Views:
    185
    Mark Hubbart
    Mar 3, 2004
  4. jotto
    Replies:
    4
    Views:
    373
    jotto
    Oct 2, 2006
  5. Naveen Dhanuka
    Replies:
    1
    Views:
    263
Loading...

Share This Page