Hpricot & mechanize fail to parse page after redirect

Discussion in 'Ruby' started by Ehud Rosenberg, Nov 14, 2007.

  1. Hi everyone,
    My quest with mechanize/Hpricot continues :)
    Something extremely strange happened today - some simple working code
    broke down, and i can't figure out why.

    I am trying to access a piratebay.org search page, which does a redirect
    to a relative url like this:
    original link:
    http://thepiratebay.org/s/?page=0&orderby=3&q=football manager 2008&searchTitle=on

    redirects to:
    /search/football manager 2008/0/3/0

    Now, this all worked dandily up till yesterday. the page was redirected,
    and mechanize even handled the cookie that was sent back from the site.
    But today, i am getting this strange error:
    "URI::InvalidURIError: bad URI(is not URI?): /search/football manager
    2008/0/3/0"
    from Hpricot. Mechanize gives a different one, but i'm sure it's
    inherited from hpricot's problem with getting the page.

    I have tested this on 2 different machines, and they both break down.
    Can someone please give it a go and see if they can figure it out?
    I would be very very thankful :)

    Thanks,
    Ehud

    PS - I am using hpricot 0.6, and the redirected page is parsed correctly
    when accessed directly
    --
    Posted via http://www.ruby-forum.com/.
    Ehud Rosenberg, Nov 14, 2007
    #1
    1. Advertising

  2. On Nov 14, 2007, at 2:17 PM, Ehud Rosenberg wrote:

    > Hi everyone,
    > My quest with mechanize/Hpricot continues :)
    > Something extremely strange happened today - some simple working code
    > broke down, and i can't figure out why.
    >
    > I am trying to access a piratebay.org search page, which does a
    > redirect
    > to a relative url like this:
    > original link:
    > http://thepiratebay.org/s/?page=0&orderby=3&q=football manager 2008&searchTitle=on
    >
    > redirects to:
    > /search/football manager 2008/0/3/0
    >
    > Now, this all worked dandily up till yesterday. the page was
    > redirected,
    > and mechanize even handled the cookie that was sent back from the
    > site.
    > But today, i am getting this strange error:
    > "URI::InvalidURIError: bad URI(is not URI?): /search/football manager
    > 2008/0/3/0"
    > from Hpricot. Mechanize gives a different one, but i'm sure it's
    > inherited from hpricot's problem with getting the page.
    >
    > I have tested this on 2 different machines, and they both break down.
    > Can someone please give it a go and see if they can figure it out?
    > I would be very very thankful :)
    >
    > Thanks,
    > Ehud
    >
    > PS - I am using hpricot 0.6, and the redirected page is parsed
    > correctly
    > when accessed directly



    If the redirect is via a 302 with a Location: header that is just the:
    "/search/football manager 2008/0/3/0"

    it's probably similar to the issue I had using HTTPClient. The
    relevant bit of code from HTTPClient is:
    def default_redirect_uri_callback(uri, res)
    newuri = URI.parse(res.header['location'][0])
    unless newuri.is_a?(URI::HTTP)
    newuri = URI.join(uri, newuri)
    STDERR.puts(
    "could be a relative URI in location header which is not
    recommended")
    STDERR.puts(
    "'The field value consists of a single absolute URI' in HTTP
    spec")
    end
    puts "Redirect to: #{newuri}" if $DEBUG
    newuri
    end

    Note the line: URI.join(uri, newuri) which takes the (presumed)
    relative newuri and interprets it with respect to the original uri.
    (Note also that I've recently sent the author of httpclient a patch
    that fixed this line.)

    -Rob

    Rob Biedenharn http://agileconsultingllc.com
    Rob Biedenharn, Nov 14, 2007
    #2
    1. Advertising

  3. That is probably the case when using Hpricot - but mechanize handles
    this and has a method that takes a relative url redirect and creates a
    fully qualified one.
    Also it worked for me yesterday with the exact same code (I know that
    sounds crazy! :)

    Thanks for the quick and thorough reply bob!
    --
    Posted via http://www.ruby-forum.com/.
    Ehud Rosenberg, Nov 14, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Szinek
    Replies:
    2
    Views:
    135
    Peter Szinek
    Feb 21, 2007
  2. Replies:
    6
    Views:
    305
    Stefan Mahlitz
    Aug 16, 2007
  3. Rita Amritkar
    Replies:
    0
    Views:
    106
    Rita Amritkar
    Dec 28, 2007
  4. Cy Gar
    Replies:
    6
    Views:
    217
    Cy Gar
    May 19, 2008
  5. Just Another Victim of the Ambient Morality

    How can one get the Hpricot DOM document from Mechanize?

    Just Another Victim of the Ambient Morality, Sep 13, 2008, in forum: Ruby
    Replies:
    3
    Views:
    97
    Aaron Patterson
    Sep 18, 2008
Loading...

Share This Page