URI::merge bug?

Discussion in 'Ruby' started by Alex Young, May 18, 2007.

  1. Alex Young

    Alex Young Guest

    Hi all,

    I'm not sure whether this is a bug:

    irb(main):013:0> a = URI.parse("http://www.example.com/foo/bar?a=b")
    => #<URI::HTTP:0xfdbb6d160 URL:http://www.example.com/foo/bar?a=b>
    irb(main):014:0> b = URI.parse("?a=c")
    => #<URI::Generic:0xfdbb6b770 URL:?a=c>
    irb(main):015:0> puts a.merge(b).to_s
    http://www.example.com/foo/?a=c

    I'd expect the last line to be "http://www.example.com/foo/bar?a=c" - it
    seems to lose the last element of the path. Firefox seems to think do
    what I'm expecting, anyway... Can someone more au fait with URI RFC's
    comment?

    --
    Alex
     
    Alex Young, May 18, 2007
    #1
    1. Advertising

  2. Alex Young

    Jano Svitok Guest

    On 5/18/07, Alex Young <> wrote:
    > Hi all,
    >
    > I'm not sure whether this is a bug:
    >
    > irb(main):013:0> a = URI.parse("http://www.example.com/foo/bar?a=b")
    > => #<URI::HTTP:0xfdbb6d160 URL:http://www.example.com/foo/bar?a=b>
    > irb(main):014:0> b = URI.parse("?a=c")
    > => #<URI::Generic:0xfdbb6b770 URL:?a=c>
    > irb(main):015:0> puts a.merge(b).to_s
    > http://www.example.com/foo/?a=c
    >
    > I'd expect the last line to be "http://www.example.com/foo/bar?a=c" - it
    > seems to lose the last element of the path. Firefox seems to think do
    > what I'm expecting, anyway... Can someone more au fait with URI RFC's
    > comment?


    Hi,

    have alook at what "?a=c" parses to, i.e. if it is interpreted as
    "/?a=c" that would explain the behaviour.

    J.
     
    Jano Svitok, May 18, 2007
    #2
    1. Advertising

  3. Alex Young <> writes:

    > I'm not sure whether this is a bug:
    >
    > irb(main):013:0> a = URI.parse("http://www.example.com/foo/bar?a=b")
    > => #<URI::HTTP:0xfdbb6d160 URL:http://www.example.com/foo/bar?a=b>
    > irb(main):014:0> b = URI.parse("?a=c")
    > => #<URI::Generic:0xfdbb6b770 URL:?a=c>
    > irb(main):015:0> puts a.merge(b).to_s
    > http://www.example.com/foo/?a=c


    I'll note that although firefox agrees with your expectations, lynx
    agrees with the behavior of the uri module.

    To understand what the uri module is doing, look at this:

    irb(main):001:0> require 'uri'
    => true
    irb(main):002:0> b = URI.parse("?a=c")
    => #<URI::Generic:0xfdbccb1d0 URL:?a=c>
    irb(main):003:0> b.scheme
    => nil
    irb(main):004:0> b.userinfo || b.host || b.port
    => nil
    irb(main):005:0> b.path
    => ""
    irb(main):006:0> b.query
    => "a=c"
    irb(main):007:0> b.fragment
    => nil

    That is, the scheme and authority portions of the uri are *nil*, but
    the path is present, as the empty string. When merging an empty path
    with the path "/foo/bar" , the uri module comes up with "/foo/". Not
    a totally unreasonable choice.

    In fact, this is a bug, but not the one you think. "?a=b" is a
    malformed relative URI. You should get a parse error trying to create
    that.

    According to RFC2396, a relative URI consists of (section 5, near the
    bottom of pg. 17):

    relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]

    rel_path = rel_segment [ abs_path ]

    rel_segment = 1*( unreserved | escaped |
    ";" | "@" | "&" | "=" | "+" | "$" | "," )

    See the 1* part? That means that a relative uri path segment must
    consist of at least one character. An empty path segment is illegal.
    (Note that uri references that begin with '#' are covered in section 4
    of the RFC, and match the rule "URI-reference" rather than the rule
    "relativeURI")

    Now, given that the URI module does indeed accept relative URIs like
    this, perhaps we should redefine URI merging for these pathological
    cases so that the URI module behaves as some particular well-known
    browser does:

    module URI
    class Generic
    def merge_like(browser, other)
    if !other.absolute? and other.path and other.path.empty? and
    not (other.userinfo || other.host || other.port) then
    case browser
    when :firefox, :netscape
    other = other.dup
    other.path = self.path
    when :ie, :microsoft, :links
    other = other.dup
    if other.query || other.fragment
    other.path = self.path
    else
    other.path = '.'
    end
    when :lynx
    # we're good already, so we don't *need* to do
    # this, but let's pass the real merge function
    # valid relative uris anyway, okay?
    other = other.dup
    if other.query
    other.path = '.'
    else
    other.path = self.path
    end
    else
    # Could someone test how opera handles the three links on
    # http://snowplow.org/martin/relative_uri_test.html ?
    raise "Unhandled browser type #{browser}"
    end
    end
    return merge(other)
    end
    end
    end

    --
    s=%q( Daniel Martin --
    puts "s=%q(#{s})",s.to_a.last )
    puts "s=%q(#{s})",s.to_a.last
     
    Daniel Martin, May 18, 2007
    #3
  4. Daniel Martin <> writes:

    > In fact, this is a bug, but not the one you think. "?a=b" is a
    > malformed relative URI. You should get a parse error trying to create
    > that.
    >
    > According to RFC2396, a relative URI consists of (section 5, near the
    > bottom of pg. 17):


    You know what? I'm using an old RFC. Everything I said applies to
    RFC 2396, but that's not the current URI RFC. The current one is
    RFC 3986, and by that one firefox is doing exactly the right thing.
    (So, by the way, is Opera)

    So it is a bug in URI, and the bug is "written to an old RFC".

    I don't know how to explain Internet Explorer's odd behavior with
    regard to a relative uri of "", but that's Microsoft for you.

    --
    s=%q( Daniel Martin --
    puts "s=%q(#{s})",s.to_a.last )
    puts "s=%q(#{s})",s.to_a.last
     
    Daniel Martin, May 18, 2007
    #4
  5. Alex Young

    Alex Young Guest

    Daniel Martin wrote:
    > Daniel Martin <> writes:
    >
    >> In fact, this is a bug, but not the one you think. "?a=b" is a
    >> malformed relative URI. You should get a parse error trying to create
    >> that.
    >>
    >> According to RFC2396, a relative URI consists of (section 5, near the
    >> bottom of pg. 17):

    >
    > You know what? I'm using an old RFC. Everything I said applies to
    > RFC 2396, but that's not the current URI RFC. The current one is
    > RFC 3986, and by that one firefox is doing exactly the right thing.
    > (So, by the way, is Opera)
    >
    > So it is a bug in URI, and the bug is "written to an old RFC".
    >
    > I don't know how to explain Internet Explorer's odd behavior with
    > regard to a relative uri of "", but that's Microsoft for you.
    >

    Thanks - that pretty much confirms what I thought. Would a patch to
    bring URI up to RFC 3986 compliance be a huge undertaking?

    --
    Alex
     
    Alex Young, May 18, 2007
    #5
  6. Hi,

    At Sat, 19 May 2007 00:52:37 +0900,
    Daniel Martin wrote in [ruby-talk:252116]:
    > You know what? I'm using an old RFC. Everything I said applies to
    > RFC 2396, but that's not the current URI RFC. The current one is
    > RFC 3986, and by that one firefox is doing exactly the right thing.
    > (So, by the way, is Opera)
    >
    > So it is a bug in URI, and the bug is "written to an old RFC".


    Then, the test is wrong too?


    Index: lib/uri/generic.rb
    ===================================================================
    --- lib/uri/generic.rb (revision 12297)
    +++ lib/uri/generic.rb (working copy)
    @@ -632,9 +632,4 @@ module URI
    base_path.slice!(i - 1, 2)
    end
    - if base_path.empty?
    - base_path = [''] # keep '/' for root directory
    - else
    - base_path.pop
    - end

    # RFC2396, Section 5.2, 6), c)
    @@ -654,5 +649,10 @@ module URI
    end

    - add_trailer_slash = true
    + add_trailer_slash = !tmp.empty?
    + if base_path.empty?
    + base_path = [''] # keep '/' for root directory
    + elsif add_trailer_slash
    + base_path.pop
    + end
    while x = tmp.shift
    if x == '..' && base_path.size > 1
    Index: test/uri/test_generic.rb
    ===================================================================
    --- test/uri/test_generic.rb (revision 12297)
    +++ test/uri/test_generic.rb (working copy)
    @@ -297,9 +297,9 @@ class TestGeneric < Test::Unit::TestCase

    # http://a/b/c/d;p?q
    -# ?y = http://a/b/c/?y
    +# ?y = http://a/b/c/d;p?y
    url = @base_url.merge('?y')
    assert_kind_of(URI::HTTP, url)
    - assert_equal('http://a/b/c/?y', url.to_s)
    - url = @base_url.route_to('http://a/b/c/?y')
    + assert_equal('http://a/b/c/d;p?y', url.to_s)
    + url = @base_url.route_to('http://a/b/c/d;p?y')
    assert_kind_of(URI::Generic, url)
    assert_equal('?y', url.to_s)


    --
    Nobu Nakada
     
    Nobuyoshi Nakada, May 18, 2007
    #6
  7. Nobuyoshi Nakada <> writes:

    > Hi,
    >
    > At Sat, 19 May 2007 00:52:37 +0900,
    > Daniel Martin wrote in [ruby-talk:252116]:
    >> You know what? I'm using an old RFC. Everything I said applies to
    >> RFC 2396, but that's not the current URI RFC. The current one is
    >> RFC 3986, and by that one firefox is doing exactly the right thing.
    >> (So, by the way, is Opera)
    >>
    >> So it is a bug in URI, and the bug is "written to an old RFC".

    >
    > Then, the test is wrong too?


    Yes, it is. That test case is straight out of RFC 2396, and it is
    clearly incorrect given the pseudocode in section 5.2.2 (pages 31-32)
    of RFC 3986.

    RFC 3986 updates the test cases, including them in section 5.4 (Not
    in an appendix as RFC 2396 did) That test case is mentioned, and the
    behavior is indeed what it looks like from the pseudocode.

    We should probably check that ruby's uri module passes all the tests
    now in RFC 3986. If I have a chance I'll go do that.

    --
    s=%q( Daniel Martin --
    puts "s=%q(#{s})",s.to_a.last )
    puts "s=%q(#{s})",s.to_a.last
     
    Daniel Martin, May 18, 2007
    #7
  8. Daniel Martin <> writes:

    > RFC 3986 updates the test cases, including them in section 5.4 (Not
    > in an appendix as RFC 2396 did) That test case is mentioned, and the
    > behavior is indeed what it looks like from the pseudocode.
    >
    > We should probably check that ruby's uri module passes all the tests
    > now in RFC 3986. If I have a chance I'll go do that.


    I've converted all the test examples in that section of RFC 3986 into
    test code; see below.

    And even with Nakada-san's patch, there's still a problem or two with
    our resolver relative to the RFC 3986 test cases. However, with that
    patch it passes everything labeled in the RFC as "Normal Cases"
    (section 5.4.1). It still has issues with the abnormal cases of
    section 5.4.2

    #! /usr/bin/env ruby
    # This is test only of the merge method based on the cases in the text
    # of RFC 3986, section 5.4. The test cases were cut-and-pasted
    # directly to reduce the chance of transcription error.

    require 'test/unit'
    require 'uri'
    require 'enumerator'

    module URI
    class TestGenericMerge < Test::Unit::TestCase
    def setup
    @url = 'http://a/b/c/d;p?q'
    @base_url = URI.parse(@url)
    end
    def test_merge
    testcases = %w[
    "g:h" = "g:h"
    "g" = "http://a/b/c/g"
    "./g" = "http://a/b/c/g"
    "g/" = "http://a/b/c/g/"
    "/g" = "http://a/g"
    "//g" = "http://g"
    "?y" = "http://a/b/c/d;p?y"
    "g?y" = "http://a/b/c/g?y"
    "#s" = "http://a/b/c/d;p?q#s"
    "g#s" = "http://a/b/c/g#s"
    "g?y#s" = "http://a/b/c/g?y#s"
    ";x" = "http://a/b/c/;x"
    "g;x" = "http://a/b/c/g;x"
    "g;x?y#s" = "http://a/b/c/g;x?y#s"
    "" = "http://a/b/c/d;p?q"
    "." = "http://a/b/c/"
    "./" = "http://a/b/c/"
    ".." = "http://a/b/"
    "../" = "http://a/b/"
    "../g" = "http://a/b/g"
    "../.." = "http://a/"
    "../../" = "http://a/"
    "../../g" = "http://a/g"

    "../../../g" = "http://a/g"
    "../../../../g" = "http://a/g"

    "/./g" = "http://a/g"
    "/../g" = "http://a/g"
    "g." = "http://a/b/c/g."
    ".g" = "http://a/b/c/.g"
    "g.." = "http://a/b/c/g.."
    "..g" = "http://a/b/c/..g"

    "./../g" = "http://a/b/g"
    "./g/." = "http://a/b/c/g/"
    "g/./h" = "http://a/b/c/g/h"
    "g/../h" = "http://a/b/c/h"
    "g;x=1/./y" = "http://a/b/c/g;x=1/y"
    "g;x=1/../y" = "http://a/b/c/y"

    "g?y/./x" = "http://a/b/c/g?y/./x"
    "g?y/../x" = "http://a/b/c/g?y/../x"
    "g#s/./x" = "http://a/b/c/g#s/./x"
    "g#s/../x" = "http://a/b/c/g#s/../x"

    "http:g" = "http:g"
    ]
    testcases.each_slice(3) { |rel, eq, expected|
    rel.gsub!(/"/,'')
    expected.gsub!(/"/,'')
    assert_equal(expected, @base_url.merge(rel).to_s, rel)
    }
    end
    end
    end

    __END__


    --
    s=%q( Daniel Martin --
    puts "s=%q(#{s})",s.to_a.last )
    puts "s=%q(#{s})",s.to_a.last
     
    Daniel Martin, May 18, 2007
    #8
  9. Hi,

    At Sat, 19 May 2007 04:03:13 +0900,
    Daniel Martin wrote in [ruby-talk:252151]:
    > I've converted all the test examples in that section of RFC 3986 into
    > test code; see below.


    Thank you and sorry to be late.

    > And even with Nakada-san's patch, there's still a problem or two with
    > our resolver relative to the RFC 3986 test cases. However, with that
    > patch it passes everything labeled in the RFC as "Normal Cases"
    > (section 5.4.1). It still has issues with the abnormal cases of
    > section 5.4.2


    URI.parse("ftp://example.com/pub").path returns "/pub" in 1.8
    while "pub" in 1.9. Is 1.9 correct?


    Index: lib/uri/generic.rb
    ===================================================================
    --- lib/uri/generic.rb (revision 12858)
    +++ lib/uri/generic.rb (working copy)
    @@ -617,10 +617,6 @@ module URI

    def merge_path(base, rel)
    - # RFC2396, Section 5.2, 5)
    - if rel[0] == ?/ #/
    - # RFC2396, Section 5.2, 5)
    - return rel

    - else
    + # RFC2396, Section 5.2, 5)
    # RFC2396, Section 5.2, 6)
    base_path = split_path(base)
    @@ -632,8 +628,8 @@ module URI
    base_path.slice!(i - 1, 2)
    end
    - if base_path.empty?
    - base_path = [''] # keep '/' for root directory
    - else
    - base_path.pop
    +
    + if (first = rel_path.first) and first.empty?
    + base_path.clear
    + rel_path.shift
    end

    @@ -654,10 +650,15 @@ module URI
    end

    - add_trailer_slash = true
    + add_trailer_slash = !tmp.empty?
    + if base_path.empty?
    + base_path = [''] # keep '/' for root directory
    + elsif add_trailer_slash
    + base_path.pop
    + end
    while x = tmp.shift
    - if x == '..' && base_path.size > 1
    + if x == '..'
    # RFC2396, Section 4
    # a .. or . in an absolute path has no special meaning
    - base_path.pop
    + base_path.pop if base_path.size > 1
    else
    # if x == '..'
    @@ -676,5 +677,4 @@ module URI
    return base_path.join('/')
    end
    - end
    private :merge_path

    Index: test/uri/test_generic.rb
    ===================================================================
    --- test/uri/test_generic.rb (revision 12858)
    +++ test/uri/test_generic.rb (working copy)
    @@ -1,9 +1,7 @@
    require 'test/unit'
    require 'uri'
    +require 'enumerator'

    -module URI
    -
    -
    -class TestGeneric < Test::Unit::TestCase
    +class URI::TestGeneric < Test::Unit::TestCase
    def setup
    @url = 'http://a/b/c/d;p?q'
    @@ -297,9 +295,9 @@ class TestGeneric < Test::Unit::TestCase

    # http://a/b/c/d;p?q
    -# ?y = http://a/b/c/?y
    +# ?y = http://a/b/c/d;p?y
    url = @base_url.merge('?y')
    assert_kind_of(URI::HTTP, url)
    - assert_equal('http://a/b/c/?y', url.to_s)
    - url = @base_url.route_to('http://a/b/c/?y')
    + assert_equal('http://a/b/c/d;p?y', url.to_s)
    + url = @base_url.route_to('http://a/b/c/d;p?y')
    assert_kind_of(URI::Generic, url)
    assert_equal('?y', url.to_s)
    @@ -453,8 +451,8 @@ class TestGeneric < Test::Unit::TestCase

    # http://a/b/c/d;p?q
    -# /./g = http://a/./g
    +# /./g = http://a/g
    url = @base_url.merge('/./g')
    assert_kind_of(URI::HTTP, url)
    - assert_equal('http://a/./g', url.to_s)
    + assert_equal('http://a/g', url.to_s)
    url = @base_url.route_to('http://a/./g')
    assert_kind_of(URI::Generic, url)
    @@ -465,5 +463,5 @@ class TestGeneric < Test::Unit::TestCase
    url = @base_url.merge('/../g')
    assert_kind_of(URI::HTTP, url)
    - assert_equal('http://a/../g', url.to_s)
    + assert_equal('http://a/g', url.to_s)
    url = @base_url.route_to('http://a/../g')
    assert_kind_of(URI::Generic, url)
    @@ -507,8 +505,8 @@ class TestGeneric < Test::Unit::TestCase

    # http://a/b/c/d;p?q
    -# ../../../g = http://a/../g
    +# ../../../g = http://a/g
    url = @base_url.merge('../../../g')
    assert_kind_of(URI::HTTP, url)
    - assert_equal('http://a/../g', url.to_s)
    + assert_equal('http://a/g', url.to_s)
    url = @base_url.route_to('http://a/../g')
    assert_kind_of(URI::Generic, url)
    @@ -520,5 +518,5 @@ class TestGeneric < Test::Unit::TestCase
    url = @base_url.merge('../../../../g')
    assert_kind_of(URI::HTTP, url)
    - assert_equal('http://a/../../g', url.to_s)
    + assert_equal('http://a/g', url.to_s)
    url = @base_url.route_to('http://a/../../g')
    assert_kind_of(URI::Generic, url)
    @@ -693,6 +691,56 @@ class TestGeneric < Test::Unit::TestCase
    assert_raises(URI::InvalidURIError) { uri.query = 'bar' }
    end
    +
    + def m(s)
    + @base_url.merge(s).to_s
    end

    + def test_rfc3986_examples
    + assert_equal("g:h", m("g:h"))
    + assert_equal("http://a/b/c/g", m("g"))
    + assert_equal("http://a/b/c/g", m("./g"))
    + assert_equal("http://a/b/c/g/", m("g/"))
    + assert_equal("http://a/g", m("/g"))
    + assert_equal("http://g", m("//g"))
    + assert_equal("http://a/b/c/d;p?y", m("?y"))
    + assert_equal("http://a/b/c/g?y", m("g?y"))
    + assert_equal("http://a/b/c/d;p?q#s", m("#s"))
    + assert_equal("http://a/b/c/g#s", m("g#s"))
    + assert_equal("http://a/b/c/g?y#s", m("g?y#s"))
    + assert_equal("http://a/b/c/;x", m(";x"))
    + assert_equal("http://a/b/c/g;x", m("g;x"))
    + assert_equal("http://a/b/c/g;x?y#s", m("g;x?y#s"))
    + assert_equal("http://a/b/c/d;p?q", m(""))
    + assert_equal("http://a/b/c/", m("."))
    + assert_equal("http://a/b/c/", m("./"))
    + assert_equal("http://a/b/", m(".."))
    + assert_equal("http://a/b/", m("../"))
    + assert_equal("http://a/b/g", m("../g"))
    + assert_equal("http://a/", m("../.."))
    + assert_equal("http://a/", m("../../"))
    + assert_equal("http://a/g", m("../../g"))
    + assert_equal("http://a/g", m("../../../g"))
    + assert_equal("http://a/g", m("../../../../g"))
    +
    + assert_equal("http://a/g", m("/./g"))
    + assert_equal("http://a/g", m("/../g"))
    + assert_equal("http://a/b/c/g.", m("g."))
    + assert_equal("http://a/b/c/.g", m(".g"))
    + assert_equal("http://a/b/c/g..", m("g.."))
    + assert_equal("http://a/b/c/..g", m("..g"))
    +
    + assert_equal("http://a/b/g", m("./../g"))
    + assert_equal("http://a/b/c/g/", m("./g/."))
    + assert_equal("http://a/b/c/g/h", m("g/./h"))
    + assert_equal("http://a/b/c/h", m("g/../h"))
    + assert_equal("http://a/b/c/g;x=1/y", m("g;x=1/./y"))
    + assert_equal("http://a/b/c/y", m("g;x=1/../y"))
    +
    + assert_equal("http://a/b/c/g?y/./x", m("g?y/./x"))
    + assert_equal("http://a/b/c/g?y/../x", m("g?y/../x"))
    + assert_equal("http://a/b/c/g#s/./x", m("g#s/./x"))
    + assert_equal("http://a/b/c/g#s/../x", m("g#s/../x"))

    + assert_equal("http:g", m("http:g"))
    + end
    end


    --
    Nobu Nakada
     
    Nobuyoshi Nakada, Jul 30, 2007
    #9
  10. Hi,

    At Tue, 31 Jul 2007 12:21:33 +0900,
    nan wu wrote in [ruby-talk:262574]:
    > Now my question is how I can get the file you patched?
    > I have visited the http://svn.ruby-lang.org/repos/ruby/trunk/lib/uri/
    > But I found the file is not patched.


    You can patch with "patch" command, or:
    <http://www.rubyist.net/~nobu/ruby/uri/generic.rb>

    --
    Nobu Nakada
     
    Nobuyoshi Nakada, Jul 31, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Simon Harris
    Replies:
    0
    Views:
    6,469
    Simon Harris
    May 10, 2005
  2. Stanimir Stamenkov
    Replies:
    1
    Views:
    2,513
    Stanimir Stamenkov
    Aug 17, 2005
  3. Pavel
    Replies:
    2
    Views:
    1,701
    Peter Flynn
    Aug 4, 2004
  4. etheriau
    Replies:
    1
    Views:
    688
    Pavel
    Aug 23, 2004
  5. Joe Curry

    Invalid URI: The format of the URI could not be determined.

    Joe Curry, Oct 8, 2003, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    372
    Joe Curry
    Oct 8, 2003
Loading...

Share This Page