java.net.URI.equals()

Peter Davis · Sep 16, 2004

I'm trying to figure something out -- java.net.URI.equals() makes no
sense the way it is specified:

"When testing the user-information, path, query, fragment, authority,
or scheme-specific parts of two URIs for equality, the raw forms rather
than the encoded forms of these components are compared and the
hexadecimal digits of escaped octets are compared without regard to
case."

First of all, is that a typo? I think it's trying to say "the raw
forms rather than the /decoded/ forms...". Aside from that, the
problem is that it would make sense if two URIs with encoded or
unencoded versions of the same characters should be equal, but they're
not. I wrote a little test class:

import java.net.*;

class Test {
public static void main(String[] args) throws Throwable {
URI u1 = new URI("foo%7Ebar");
URI u2 = new URI("foo%7ebar");
URI u3 = new URI("foo~bar");

System.out.println(u1 + " == " + u2 + " => " + u1.equals(u2));
System.out.println(u1 + " == " + u3 + " => " + u1.equals(u3));
System.out.println(u2 + " == " + u3 + " => " + u2.equals(u3));
}
}

Outputs:

foo%7Ebar == foo%7ebar => true
foo%7Ebar == foo~bar => false
foo%7ebar ==foo~bar => false

Why in the world would it compare the raw rather than decoded forms of
the URI? Anybody have any clues?

Daniel Bonniot · Sep 16, 2004

Peter said:
I'm trying to figure something out -- java.net.URI.equals() makes no
sense the way it is specified:

"When testing the user-information, path, query, fragment,
authority, or scheme-specific parts of two URIs for equality, the raw
forms rather than the encoded forms of these components are compared and
the hexadecimal digits of escaped octets are compared without regard
to case."

First of all, is that a typo? I think it's trying to say "the raw forms
rather than the /decoded/ forms...".

Unless by raw they mean "decoded". But the implementation does not seem to
match that.

Aside from that, the problem is
that it would make sense if two URIs with encoded or unencoded versions
of the same characters should be equal, but they're not. I wrote a
little test class:

import java.net.*;

class Test {
public static void main(String[] args) throws Throwable {
URI u1 = new URI("foo%7Ebar");
URI u2 = new URI("foo%7ebar");
URI u3 = new URI("foo~bar");

System.out.println(u1 + " == " + u2 + " => " +
u1.equals(u2));
System.out.println(u1 + " == " + u3 + " => " +
u1.equals(u3));
System.out.println(u2 + " == " + u3 + " => " +
u2.equals(u3));
}
}

Outputs:

foo%7Ebar == foo%7ebar => true
foo%7Ebar == foo~bar => false
foo%7ebar ==foo~bar => false

Why in the world would it compare the raw rather than decoded forms of
the URI? Anybody have any clues?

Not a direct answer, but what should happen when non-ascii characters occur,
(directly or in encoded form)? For instance, is %E1 equal to á (like it does
in some latin-? encodings).

Daniel

Peter Davis · Sep 16, 2004

Unless by raw they mean "decoded". But the implementation does not seem
to match that.

The rest of the class is consistent with "raw" meaning the
originally-parsed URI string, which will likely contain %XX escapes.

Perhaps that's the whole problem. Some spec author at Sun probably
meant raw==decoded, and some other implementor probably interpreted it
as raw==encoded-but-not-normalized, like it is throughout the rest of
the class. Otherwise it doesn't make any sense that two URIs can be
semantically equal via character escapes but not equals().

Aside from that, the problem is that it would make sense if two URIs
with encoded or unencoded versions of the same characters should be
equal, but they're not. I wrote a little test class:

[...snip...]

Outputs:

foo%7Ebar == foo%7ebar => true
foo%7Ebar == foo~bar => false
foo%7ebar ==foo~bar => false

Why in the world would it compare the raw rather than decoded forms of
the URI? Anybody have any clues?

Click to expand...

Not a direct answer, but what should happen when non-ascii characters
occur, (directly or in encoded form)? For instance, is %E1 equal to á
(like it does in some latin-? encodings).

It's specified that %XX escapes are decoded as if they are UTF-8 bytes,
so %E1 wouldn't be equal to á but, for example, %E2%82%AC is equal to
the Euro character.

So anyway, I understand that the spec is the way it is, and it's
perfectly unambiguous and reproducable in this manner, but it just
seems useless to me.

For example, if you have a URI like "foo/../bar", and you invoke
normalize() on it, then that URI will be equal() to "bar". So there is
a way to normalize paths, but there is no way to normalize escape
sequences in a way that equals() will behave properly. This is a
problem because %7E (~) is a notoriously confused character, with some
applications escaping it and some not.

[LONG] java.net.URI encoding weirdness	18	May 5, 2014
Need clarification on Object.equals.	56	Dec 18, 2012
Stack trace in C	0	Jul 25, 2006
Eloquence Denied?: getElementsByTagName	5	May 28, 2014
School Project	1	Dec 8, 2022
Obtaining "the" name of a function/method	3	Nov 17, 2013
Unespected result of define_method with "for" and "each"	1	Mar 14, 2011
Why multiple spaces are replaced with one	7	Feb 12, 2009

java.net.URI.equals()

Peter Davis

Daniel Bonniot

Peter Davis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads