URI::merge bug?

A

Alex Young

Hi all,

I'm not sure whether this is a bug:

irb(main):013:0> a = URI.parse("http://www.example.com/foo/bar?a=b")
=> #<URI::HTTP:0xfdbb6d160 URL:http://www.example.com/foo/bar?a=b>
irb(main):014:0> b = URI.parse("?a=c")
=> #<URI::Generic:0xfdbb6b770 URL:?a=c>
irb(main):015:0> puts a.merge(b).to_s
http://www.example.com/foo/?a=c

I'd expect the last line to be "http://www.example.com/foo/bar?a=c" - it
seems to lose the last element of the path. Firefox seems to think do
what I'm expecting, anyway... Can someone more au fait with URI RFC's
comment?
 
J

Jano Svitok

Hi all,

I'm not sure whether this is a bug:

irb(main):013:0> a = URI.parse("http://www.example.com/foo/bar?a=b")
=> #<URI::HTTP:0xfdbb6d160 URL:http://www.example.com/foo/bar?a=b>
irb(main):014:0> b = URI.parse("?a=c")
=> #<URI::Generic:0xfdbb6b770 URL:?a=c>
irb(main):015:0> puts a.merge(b).to_s
http://www.example.com/foo/?a=c

I'd expect the last line to be "http://www.example.com/foo/bar?a=c" - it
seems to lose the last element of the path. Firefox seems to think do
what I'm expecting, anyway... Can someone more au fait with URI RFC's
comment?

Hi,

have alook at what "?a=c" parses to, i.e. if it is interpreted as
"/?a=c" that would explain the behaviour.

J.
 
D

Daniel Martin

Alex Young said:
I'm not sure whether this is a bug:

irb(main):013:0> a = URI.parse("http://www.example.com/foo/bar?a=b")
=> #<URI::HTTP:0xfdbb6d160 URL:http://www.example.com/foo/bar?a=b>
irb(main):014:0> b = URI.parse("?a=c")
=> #<URI::Generic:0xfdbb6b770 URL:?a=c>
irb(main):015:0> puts a.merge(b).to_s
http://www.example.com/foo/?a=c

I'll note that although firefox agrees with your expectations, lynx
agrees with the behavior of the uri module.

To understand what the uri module is doing, look at this:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> b = URI.parse("?a=c")
=> #<URI::Generic:0xfdbccb1d0 URL:?a=c>
irb(main):003:0> b.scheme
=> nil
irb(main):004:0> b.userinfo || b.host || b.port
=> nil
irb(main):005:0> b.path
=> ""
irb(main):006:0> b.query
=> "a=c"
irb(main):007:0> b.fragment
=> nil

That is, the scheme and authority portions of the uri are *nil*, but
the path is present, as the empty string. When merging an empty path
with the path "/foo/bar" , the uri module comes up with "/foo/". Not
a totally unreasonable choice.

In fact, this is a bug, but not the one you think. "?a=b" is a
malformed relative URI. You should get a parse error trying to create
that.

According to RFC2396, a relative URI consists of (section 5, near the
bottom of pg. 17):

relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]

rel_path = rel_segment [ abs_path ]

rel_segment = 1*( unreserved | escaped |
";" | "@" | "&" | "=" | "+" | "$" | "," )

See the 1* part? That means that a relative uri path segment must
consist of at least one character. An empty path segment is illegal.
(Note that uri references that begin with '#' are covered in section 4
of the RFC, and match the rule "URI-reference" rather than the rule
"relativeURI")

Now, given that the URI module does indeed accept relative URIs like
this, perhaps we should redefine URI merging for these pathological
cases so that the URI module behaves as some particular well-known
browser does:

module URI
class Generic
def merge_like(browser, other)
if !other.absolute? and other.path and other.path.empty? and
not (other.userinfo || other.host || other.port) then
case browser
when :firefox, :netscape
other = other.dup
other.path = self.path
when :ie, :microsoft, :links
other = other.dup
if other.query || other.fragment
other.path = self.path
else
other.path = '.'
end
when :lynx
# we're good already, so we don't *need* to do
# this, but let's pass the real merge function
# valid relative uris anyway, okay?
other = other.dup
if other.query
other.path = '.'
else
other.path = self.path
end
else
# Could someone test how opera handles the three links on
# http://snowplow.org/martin/relative_uri_test.html ?
raise "Unhandled browser type #{browser}"
end
end
return merge(other)
end
end
end
 
D

Daniel Martin

Daniel Martin said:
In fact, this is a bug, but not the one you think. "?a=b" is a
malformed relative URI. You should get a parse error trying to create
that.

According to RFC2396, a relative URI consists of (section 5, near the
bottom of pg. 17):

You know what? I'm using an old RFC. Everything I said applies to
RFC 2396, but that's not the current URI RFC. The current one is
RFC 3986, and by that one firefox is doing exactly the right thing.
(So, by the way, is Opera)

So it is a bug in URI, and the bug is "written to an old RFC".

I don't know how to explain Internet Explorer's odd behavior with
regard to a relative uri of "", but that's Microsoft for you.
 
A

Alex Young

Daniel said:
You know what? I'm using an old RFC. Everything I said applies to
RFC 2396, but that's not the current URI RFC. The current one is
RFC 3986, and by that one firefox is doing exactly the right thing.
(So, by the way, is Opera)

So it is a bug in URI, and the bug is "written to an old RFC".

I don't know how to explain Internet Explorer's odd behavior with
regard to a relative uri of "", but that's Microsoft for you.
Thanks - that pretty much confirms what I thought. Would a patch to
bring URI up to RFC 3986 compliance be a huge undertaking?
 
N

Nobuyoshi Nakada

Hi,

At Sat, 19 May 2007 00:52:37 +0900,
Daniel Martin wrote in [ruby-talk:252116]:
You know what? I'm using an old RFC. Everything I said applies to
RFC 2396, but that's not the current URI RFC. The current one is
RFC 3986, and by that one firefox is doing exactly the right thing.
(So, by the way, is Opera)

So it is a bug in URI, and the bug is "written to an old RFC".

Then, the test is wrong too?


Index: lib/uri/generic.rb
===================================================================
--- lib/uri/generic.rb (revision 12297)
+++ lib/uri/generic.rb (working copy)
@@ -632,9 +632,4 @@ module URI
base_path.slice!(i - 1, 2)
end
- if base_path.empty?
- base_path = [''] # keep '/' for root directory
- else
- base_path.pop
- end

# RFC2396, Section 5.2, 6), c)
@@ -654,5 +649,10 @@ module URI
end

- add_trailer_slash = true
+ add_trailer_slash = !tmp.empty?
+ if base_path.empty?
+ base_path = [''] # keep '/' for root directory
+ elsif add_trailer_slash
+ base_path.pop
+ end
while x = tmp.shift
if x == '..' && base_path.size > 1
Index: test/uri/test_generic.rb
===================================================================
--- test/uri/test_generic.rb (revision 12297)
+++ test/uri/test_generic.rb (working copy)
@@ -297,9 +297,9 @@ class TestGeneric < Test::Unit::TestCase

# http://a/b/c/d;p?q
-# ?y = http://a/b/c/?y
+# ?y = http://a/b/c/d;p?y
url = @base_url.merge('?y')
assert_kind_of(URI::HTTP, url)
- assert_equal('http://a/b/c/?y', url.to_s)
- url = @base_url.route_to('http://a/b/c/?y')
+ assert_equal('http://a/b/c/d;p?y', url.to_s)
+ url = @base_url.route_to('http://a/b/c/d;p?y')
assert_kind_of(URI::Generic, url)
assert_equal('?y', url.to_s)
 
D

Daniel Martin

Nobuyoshi Nakada said:
Hi,

At Sat, 19 May 2007 00:52:37 +0900,
Daniel Martin wrote in [ruby-talk:252116]:
You know what? I'm using an old RFC. Everything I said applies to
RFC 2396, but that's not the current URI RFC. The current one is
RFC 3986, and by that one firefox is doing exactly the right thing.
(So, by the way, is Opera)

So it is a bug in URI, and the bug is "written to an old RFC".

Then, the test is wrong too?

Yes, it is. That test case is straight out of RFC 2396, and it is
clearly incorrect given the pseudocode in section 5.2.2 (pages 31-32)
of RFC 3986.

RFC 3986 updates the test cases, including them in section 5.4 (Not
in an appendix as RFC 2396 did) That test case is mentioned, and the
behavior is indeed what it looks like from the pseudocode.

We should probably check that ruby's uri module passes all the tests
now in RFC 3986. If I have a chance I'll go do that.
 
D

Daniel Martin

Daniel Martin said:
RFC 3986 updates the test cases, including them in section 5.4 (Not
in an appendix as RFC 2396 did) That test case is mentioned, and the
behavior is indeed what it looks like from the pseudocode.

We should probably check that ruby's uri module passes all the tests
now in RFC 3986. If I have a chance I'll go do that.

I've converted all the test examples in that section of RFC 3986 into
test code; see below.

And even with Nakada-san's patch, there's still a problem or two with
our resolver relative to the RFC 3986 test cases. However, with that
patch it passes everything labeled in the RFC as "Normal Cases"
(section 5.4.1). It still has issues with the abnormal cases of
section 5.4.2

#! /usr/bin/env ruby
# This is test only of the merge method based on the cases in the text
# of RFC 3986, section 5.4. The test cases were cut-and-pasted
# directly to reduce the chance of transcription error.

require 'test/unit'
require 'uri'
require 'enumerator'

module URI
class TestGenericMerge < Test::Unit::TestCase
def setup
@url = 'http://a/b/c/d;p?q'
@base_url = URI.parse(@url)
end
def test_merge
testcases = %w[
"g:h" = "g:h"
"g" = "http://a/b/c/g"
"./g" = "http://a/b/c/g"
"g/" = "http://a/b/c/g/"
"/g" = "http://a/g"
"//g" = "http://g"
"?y" = "http://a/b/c/d;p?y"
"g?y" = "http://a/b/c/g?y"
"#s" = "http://a/b/c/d;p?q#s"
"g#s" = "http://a/b/c/g#s"
"g?y#s" = "http://a/b/c/g?y#s"
";x" = "http://a/b/c/;x"
"g;x" = "http://a/b/c/g;x"
"g;x?y#s" = "http://a/b/c/g;x?y#s"
"" = "http://a/b/c/d;p?q"
"." = "http://a/b/c/"
"./" = "http://a/b/c/"
".." = "http://a/b/"
"../" = "http://a/b/"
"../g" = "http://a/b/g"
"../.." = "http://a/"
"../../" = "http://a/"
"../../g" = "http://a/g"

"../../../g" = "http://a/g"
"../../../../g" = "http://a/g"

"/./g" = "http://a/g"
"/../g" = "http://a/g"
"g." = "http://a/b/c/g."
".g" = "http://a/b/c/.g"
"g.." = "http://a/b/c/g.."
"..g" = "http://a/b/c/..g"

"./../g" = "http://a/b/g"
"./g/." = "http://a/b/c/g/"
"g/./h" = "http://a/b/c/g/h"
"g/../h" = "http://a/b/c/h"
"g;x=1/./y" = "http://a/b/c/g;x=1/y"
"g;x=1/../y" = "http://a/b/c/y"

"g?y/./x" = "http://a/b/c/g?y/./x"
"g?y/../x" = "http://a/b/c/g?y/../x"
"g#s/./x" = "http://a/b/c/g#s/./x"
"g#s/../x" = "http://a/b/c/g#s/../x"

"http:g" = "http:g"
]
testcases.each_slice(3) { |rel, eq, expected|
rel.gsub!(/"/,'')
expected.gsub!(/"/,'')
assert_equal(expected, @base_url.merge(rel).to_s, rel)
}
end
end
end

__END__
 
N

Nobuyoshi Nakada

Hi,

At Sat, 19 May 2007 04:03:13 +0900,
Daniel Martin wrote in [ruby-talk:252151]:
I've converted all the test examples in that section of RFC 3986 into
test code; see below.

Thank you and sorry to be late.
And even with Nakada-san's patch, there's still a problem or two with
our resolver relative to the RFC 3986 test cases. However, with that
patch it passes everything labeled in the RFC as "Normal Cases"
(section 5.4.1). It still has issues with the abnormal cases of
section 5.4.2

URI.parse("ftp://example.com/pub").path returns "/pub" in 1.8
while "pub" in 1.9. Is 1.9 correct?


Index: lib/uri/generic.rb
===================================================================
--- lib/uri/generic.rb (revision 12858)
+++ lib/uri/generic.rb (working copy)
@@ -617,10 +617,6 @@ module URI

def merge_path(base, rel)
- # RFC2396, Section 5.2, 5)
- if rel[0] == ?/ #/
- # RFC2396, Section 5.2, 5)
- return rel

- else
+ # RFC2396, Section 5.2, 5)
# RFC2396, Section 5.2, 6)
base_path = split_path(base)
@@ -632,8 +628,8 @@ module URI
base_path.slice!(i - 1, 2)
end
- if base_path.empty?
- base_path = [''] # keep '/' for root directory
- else
- base_path.pop
+
+ if (first = rel_path.first) and first.empty?
+ base_path.clear
+ rel_path.shift
end

@@ -654,10 +650,15 @@ module URI
end

- add_trailer_slash = true
+ add_trailer_slash = !tmp.empty?
+ if base_path.empty?
+ base_path = [''] # keep '/' for root directory
+ elsif add_trailer_slash
+ base_path.pop
+ end
while x = tmp.shift
- if x == '..' && base_path.size > 1
+ if x == '..'
# RFC2396, Section 4
# a .. or . in an absolute path has no special meaning
- base_path.pop
+ base_path.pop if base_path.size > 1
else
# if x == '..'
@@ -676,5 +677,4 @@ module URI
return base_path.join('/')
end
- end
private :merge_path

Index: test/uri/test_generic.rb
===================================================================
--- test/uri/test_generic.rb (revision 12858)
+++ test/uri/test_generic.rb (working copy)
@@ -1,9 +1,7 @@
require 'test/unit'
require 'uri'
+require 'enumerator'

-module URI
-
-
-class TestGeneric < Test::Unit::TestCase
+class URI::TestGeneric < Test::Unit::TestCase
def setup
@url = 'http://a/b/c/d;p?q'
@@ -297,9 +295,9 @@ class TestGeneric < Test::Unit::TestCase

# http://a/b/c/d;p?q
-# ?y = http://a/b/c/?y
+# ?y = http://a/b/c/d;p?y
url = @base_url.merge('?y')
assert_kind_of(URI::HTTP, url)
- assert_equal('http://a/b/c/?y', url.to_s)
- url = @base_url.route_to('http://a/b/c/?y')
+ assert_equal('http://a/b/c/d;p?y', url.to_s)
+ url = @base_url.route_to('http://a/b/c/d;p?y')
assert_kind_of(URI::Generic, url)
assert_equal('?y', url.to_s)
@@ -453,8 +451,8 @@ class TestGeneric < Test::Unit::TestCase

# http://a/b/c/d;p?q
-# /./g = http://a/./g
+# /./g = http://a/g
url = @base_url.merge('/./g')
assert_kind_of(URI::HTTP, url)
- assert_equal('http://a/./g', url.to_s)
+ assert_equal('http://a/g', url.to_s)
url = @base_url.route_to('http://a/./g')
assert_kind_of(URI::Generic, url)
@@ -465,5 +463,5 @@ class TestGeneric < Test::Unit::TestCase
url = @base_url.merge('/../g')
assert_kind_of(URI::HTTP, url)
- assert_equal('http://a/../g', url.to_s)
+ assert_equal('http://a/g', url.to_s)
url = @base_url.route_to('http://a/../g')
assert_kind_of(URI::Generic, url)
@@ -507,8 +505,8 @@ class TestGeneric < Test::Unit::TestCase

# http://a/b/c/d;p?q
-# ../../../g = http://a/../g
+# ../../../g = http://a/g
url = @base_url.merge('../../../g')
assert_kind_of(URI::HTTP, url)
- assert_equal('http://a/../g', url.to_s)
+ assert_equal('http://a/g', url.to_s)
url = @base_url.route_to('http://a/../g')
assert_kind_of(URI::Generic, url)
@@ -520,5 +518,5 @@ class TestGeneric < Test::Unit::TestCase
url = @base_url.merge('../../../../g')
assert_kind_of(URI::HTTP, url)
- assert_equal('http://a/../../g', url.to_s)
+ assert_equal('http://a/g', url.to_s)
url = @base_url.route_to('http://a/../../g')
assert_kind_of(URI::Generic, url)
@@ -693,6 +691,56 @@ class TestGeneric < Test::Unit::TestCase
assert_raises(URI::InvalidURIError) { uri.query = 'bar' }
end
+
+ def m(s)
+ @base_url.merge(s).to_s
end

+ def test_rfc3986_examples
+ assert_equal("g:h", m("g:h"))
+ assert_equal("http://a/b/c/g", m("g"))
+ assert_equal("http://a/b/c/g", m("./g"))
+ assert_equal("http://a/b/c/g/", m("g/"))
+ assert_equal("http://a/g", m("/g"))
+ assert_equal("http://g", m("//g"))
+ assert_equal("http://a/b/c/d;p?y", m("?y"))
+ assert_equal("http://a/b/c/g?y", m("g?y"))
+ assert_equal("http://a/b/c/d;p?q#s", m("#s"))
+ assert_equal("http://a/b/c/g#s", m("g#s"))
+ assert_equal("http://a/b/c/g?y#s", m("g?y#s"))
+ assert_equal("http://a/b/c/;x", m(";x"))
+ assert_equal("http://a/b/c/g;x", m("g;x"))
+ assert_equal("http://a/b/c/g;x?y#s", m("g;x?y#s"))
+ assert_equal("http://a/b/c/d;p?q", m(""))
+ assert_equal("http://a/b/c/", m("."))
+ assert_equal("http://a/b/c/", m("./"))
+ assert_equal("http://a/b/", m(".."))
+ assert_equal("http://a/b/", m("../"))
+ assert_equal("http://a/b/g", m("../g"))
+ assert_equal("http://a/", m("../.."))
+ assert_equal("http://a/", m("../../"))
+ assert_equal("http://a/g", m("../../g"))
+ assert_equal("http://a/g", m("../../../g"))
+ assert_equal("http://a/g", m("../../../../g"))
+
+ assert_equal("http://a/g", m("/./g"))
+ assert_equal("http://a/g", m("/../g"))
+ assert_equal("http://a/b/c/g.", m("g."))
+ assert_equal("http://a/b/c/.g", m(".g"))
+ assert_equal("http://a/b/c/g..", m("g.."))
+ assert_equal("http://a/b/c/..g", m("..g"))
+
+ assert_equal("http://a/b/g", m("./../g"))
+ assert_equal("http://a/b/c/g/", m("./g/."))
+ assert_equal("http://a/b/c/g/h", m("g/./h"))
+ assert_equal("http://a/b/c/h", m("g/../h"))
+ assert_equal("http://a/b/c/g;x=1/y", m("g;x=1/./y"))
+ assert_equal("http://a/b/c/y", m("g;x=1/../y"))
+
+ assert_equal("http://a/b/c/g?y/./x", m("g?y/./x"))
+ assert_equal("http://a/b/c/g?y/../x", m("g?y/../x"))
+ assert_equal("http://a/b/c/g#s/./x", m("g#s/./x"))
+ assert_equal("http://a/b/c/g#s/../x", m("g#s/../x"))

+ assert_equal("http:g", m("http:g"))
+ end
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top