Nokogiri not getting html body sometimes

Discussion in 'Ruby' started by Jarmo Pertman, May 20, 2009.

  1. I'm using Mechanize to get imdb page and then Nokogiri Node#search
    method to get some info from the page, but I've stumbled onto one
    special case where #search doesn't work properly, e.g. all other pages
    I've tried so far work as expected.

    It seems that some special characters are causing the trouble for
    Nokogiri, because when I tried to print document itself it outputted
    only half of <head> tag and no body tags at all!

    Anyway here is the code snippet which I'd expect to output "false" 4
    times. Instead, it outputs false, false, true, false. Try with some
    other imdb url and it's ok.

    require 'mechanize'

    mech = {|agent| agent.user_agent_alias = 'Windows
    mech.get("") do |page|
    puts page.body.empty?

    What could be the problem?

    I'm using ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]
    Jarmo Pertman, May 20, 2009
    1. Advertisements

  2. Jarmo Pertman

    Lui Core Guest

    Lui Core, May 21, 2009
    1. Advertisements

  3. Thank you! It did the trick.

    Best regards,
    Jarmo Pertman, May 21, 2009
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.