open-uri fetches outdated content vs. curl

D

Daniel Choi

Try running the following program:

================
require 'open-uri'

feed_url = "http://www.slate.com/rss"

result1 = open(feed_url).read
puts "Saving result1.xml:"
File.open("result1.xml", "w") {|f| f.write(result1)}

result2 = `curl -L #{feed_url}`
puts "Saving result2.xml:"
File.open("result2.xml", "w") {|f| f.write(result2)}

command = "diff result1.xml result2.xml"
puts system(command)
================

result1 should be identical to result2, but it turns out that the feed
that open-uri fetches is outdated content (by over a month), while the
feed that curl fetches is up-to-date. Can anyone please explain what
is going on?

Thanks!
 
R

Robert Klemme

Try running the following program:

================
require 'open-uri'

feed_url = "http://www.slate.com/rss"

result1 = open(feed_url).read
puts "Saving result1.xml:"
File.open("result1.xml", "w") {|f| f.write(result1)}

result2 = `curl -L #{feed_url}`
puts "Saving result2.xml:"
File.open("result2.xml", "w") {|f| f.write(result2)}

command = "diff result1.xml result2.xml"
puts system(command)
================

result1 should be identical to result2, but it turns out that the feed
that open-uri fetches is outdated content (by over a month), while the
feed that curl fetches is up-to-date. Can anyone please explain what
is going on?

Reasons I can think of:

i) Both approaches use different paths to the server, namely a different
(or no) proxy.

ii) There is something in the request that makes the server send
different data.

Can you try to obtain HTTP headers from both approaches? That might
clear up a few things. Also, on Unix type systems check for environment
variables and ~/.xyzrc files which might affect proxy settings.

Another good idea might be to try a different tool, e.g. a web browser,
to see what that turns up.

Kind regards

robert
 
D

Daniel Choi

Reasons I can think of:

i) Both approaches use different paths to the server, namely a different
(or no) proxy.

ii) There is something in the request that makes the server send
different data.

Can you try to obtain HTTP headers from both approaches? That might
clear up a few things. Also, on Unix type systems check for environment
variables and ~/.xyzrc files which might affect proxy settings.

Another good idea might be to try a different tool, e.g. a web browser,
to see what that turns up.

Kind regards

robert


Thanks for these suggestions. The problem actually just cleared itself
up, after several days where the open-uri fetch was getting outdated
content. I think it was a problem is upstream proxies. I'll try to
look at the headers out of curiosity.
 
D

Daniel Choi

I used net/http to do the same thing, but this time I printed out the
redirect locations. The result is very interesting. If it don't set
the "User-Agent" header, it get redirected to one proxy -- the one
with outdated content. If I set the "User-Agent" header to "Mozilla/
5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/XX (KHTML, like
Gecko) Safari/YY" (faking Apple Safari), I get redirected to another
proxy, with the up to date content.

I didn't know that servers redirected requests to bad or good proxies
depending on what the User Agent header is. But this seems to be the
case here.
 
R

Robert Klemme

I used net/http to do the same thing, but this time I printed out the
redirect locations. The result is very interesting. If it don't set
the "User-Agent" header, it get redirected to one proxy -- the one
with outdated content. If I set the "User-Agent" header to "Mozilla/
5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/XX (KHTML, like
Gecko) Safari/YY" (faking Apple Safari), I get redirected to another
proxy, with the up to date content.

I didn't know that servers redirected requests to bad or good proxies
depending on what the User Agent header is. But this seems to be the
case here.

Daniel, thanks for the update! This is interesting stuff. The
distinction is probably not so much between "bad" or "good" proxies but
between proxies tailored for a particular browser version. Maybe it's a
bug and you should show this to your IT department. Could be that they
changed firewall rules in the past and the "bad" proxy never gets
updated because of lacking connectivity. :)

Cheers

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top