Nokogiri not getting html body sometimes

Discussion in 'Ruby' started by Jarmo Pertman, May 20, 2009.

  1. I'm using Mechanize to get imdb page and then Nokogiri Node#search
    method to get some info from the page, but I've stumbled onto one
    special case where #search doesn't work properly, e.g. all other pages
    I've tried so far work as expected.

    It seems that some special characters are causing the trouble for
    Nokogiri, because when I tried to print document itself it outputted
    only half of <head> tag and no body tags at all!

    Anyway here is the code snippet which I'd expect to output "false" 4
    times. Instead, it outputs false, false, true, false. Try with some
    other imdb url and it's ok.

    require 'mechanize'

    mech = {|agent| agent.user_agent_alias = 'Windows
    mech.get("") do |page|
    puts page.body.empty?

    What could be the problem?

    I'm using ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]
    Jarmo Pertman, May 20, 2009
  2. Jarmo Pertman

    Lui Core Guest

    Lui Core, May 21, 2009
  3. Thank you! It did the trick.

    Best regards,
    Jarmo Pertman, May 21, 2009
