yet another private method `gsub' called for nil:NilClass error

M

Mr. Bill

I've got a set of scripts that collect URLs from certain web pages and
I'm trying to extract some content from each of those pages (translation
stage). I keep seeing the error below.
Can someone help me understand what's happening here? I'm certainly not
expecting a fix, I just want to get some insights into the nature of
this issue. We've been seeing several similar issues on this project
we're working on. Thanks in advance.

-----------------------------------------------------
(projectx) Running translation stage
DEBUG [2010-12-28 12:57:01 EST] (PageContentExtractor#559021) Executing
plugin input_files=48 (0477475be2aa9f8b79013eaf8e410f8d, etc)
ERROR [2010-12-28 12:57:01 EST] (projectx) Unexepected fatal error
while processing page_1: private method `gsub' called for nil:NilClass
/usr/lib/ruby/1.8/uri/common.rb:289:in `escape'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:37:in
`execute'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
`each'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
`execute'
~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:191:in `call'
~/sandbox/projectx/lib/filesystem_lock_provider.rb:66:in `lock'
~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:190:in `call'
~/sandbox/projectx/lib/feed.rb:216:in `run'
~/sandbox/projectx/lib/feed.rb:212:in `each'
~/sandbox/projectx/lib/feed.rb:212:in `run'
~/sandbox/projectx/lib/feed.rb:207:in `each'
~/sandbox/projectx/lib/feed.rb:207:in `run'
bin/_run_feeds:77
bin/_run_feeds:74:in `each'
bin/_run_feeds:74
-----------------------------------------------------

Going through each step with rdebug, we can get a view of what is
happening when it trips up:

(rdb:1) step
projectx/core_plugins/plugins/PageContentExtractor.rb:37
host = @host_cache[filename] =
URI(URI.escape(@state[filename['link'])).host.downcase
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:285 unless unsafe.kind_of?(Regexp)
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:289 str.gsub(unsafe) do |us|
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:197
@logger.context=prev_context
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:198 @basedir = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:199 @lockdir = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:200 @state = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:201 @input_files = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:202 @permstate = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:203 @context_counters = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:204 @on_error = nil
(rdb:1) step
projectx/lib/feed.rb:221
msg = "Unexepected fatal error while running translation:
#{@name}: #{e.message}"


...code snippet from PageContentExtractor.rb:

# Load all files and caches them for further processing.
input_filenames.each do |filename|
host = @host_cache[filename] =
URI(URI.escape(@state[filename]['link'])).host.downcase # line 37
(@document_cache[host] ||= {})[filename] =
Nokogiri::HTML(file_contents(filename))
end
 
J

Jesús Gabriel y Galán

I've got a set of scripts that collect URLs from certain web pages and
I'm trying to extract some content from each of those pages (translation
stage). I keep seeing the error below.
Can someone help me understand what's happening here? I'm certainly not
expecting a fix, I just want to get some insights into the nature of
this issue. We've been seeing several similar issues on this project
we're working on. Thanks in advance.

-----------------------------------------------------
(projectx) Running translation stage
DEBUG [2010-12-28 12:57:01 EST] =A0(PageContentExtractor#559021) Executin= g
plugin input_files=3D48 (0477475be2aa9f8b79013eaf8e410f8d, etc)
ERROR [2010-12-28 12:57:01 EST] =A0(projectx) Unexepected fatal error
while processing page_1: private method `gsub' called for nil:NilClass
/usr/lib/ruby/1.8/uri/common.rb:289:in `escape'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:37:in
`execute'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
`each'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
`execute'
~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:191:in `call'
~/sandbox/projectx/lib/filesystem_lock_provider.rb:66:in `lock'
~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:190:in `call'
~/sandbox/projectx/lib/feed.rb:216:in `run'
~/sandbox/projectx/lib/feed.rb:212:in `each'
~/sandbox/projectx/lib/feed.rb:212:in `run'
~/sandbox/projectx/lib/feed.rb:207:in `each'
~/sandbox/projectx/lib/feed.rb:207:in `run'
bin/_run_feeds:77
bin/_run_feeds:74:in `each'
bin/_run_feeds:74
-----------------------------------------------------

Going through each step with rdebug, we can get a view of what is
happening when it trips up:

(rdb:1) step
projectx/core_plugins/plugins/PageContentExtractor.rb:37
host =3D @host_cache[filename] =3D
URI(URI.escape(@state[filename['link'])).host.downcase
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:285 unless unsafe.kind_of?(Regexp)
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:289 str.gsub(unsafe) do |us|
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:197
@logger.context=3Dprev_context
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:198 @basedir =3D nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:199 @lockdir =3D nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:200 @state =3D nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:201 @input_files =3D nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:202 @permstate =3D nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:203 @context_counters =3D nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:204 @on_error =3D nil
(rdb:1) step
projectx/lib/feed.rb:221
msg =3D "Unexepected fatal error while running translation:
#{@name}: #{e.message}"


...code snippet from PageContentExtractor.rb:

# Load all files and caches them for further processing.
=A0 =A0input_filenames.each do |filename|
=A0 =A0 =A0host =3D @host_cache[filename] =3D
URI(URI.escape(@state[filename]['link'])).host.downcase # line 37
=A0 =A0 =A0(@document_cache[host] ||=3D {})[filename] =3D
Nokogiri::HTML(file_contents(filename))
=A0 =A0end

Without looking too much into it, I would say that
@state[filename]['link'] is nil. You are passing that nil to
URI.escape, which raises an error. Can you print
@state[filename]['link'] before calling URI.escape?

Jesus.
 
M

Mr. Bill

Update: we found a solution that involves simply not using the
PageContentExtractor but another ruby plugin.
Thank you for your time and attention to this.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top