Good point. Firefox & Safari seem to recognize a few URI schemes like
file and ftp. Firefox can even do gopher.
I think some people are getting pretty confused here. file and ftp are URL
schemes. With an 'L'. ftp://sun.com/pub is a URL. It's right there in RFC
1738. But URLs - all of them - are also URIs. URIs are a superset of URLs.
It's not about whether they go in documents or are sent over HTTP.
There's another kind of URI that is not a URL, and that's a URN - a
Uniform Resource Name. URNs are funny little things that don't get seen
very often; they look like urn:scheme
ath, with a conspicuous absence of
slashes, at least at the start. The big idea is that a URN is a name
rather than a location - it uniquely identifies something, but it doesn't
tell you how to find it. URNs are split up into different namespaces,
which are different ways of identifying elements of different sets of
things. For example, one of the URN namespaces is ISBN, for books, so
urn:isbn:978-1594743344 identifies a book - but doesn't immediately help
you find it. You'd have to go to some kind of resolver service to map it
to a URL which you could actually use - and that's the idea, since it
provides a layer of indirection which decouples identity of an object,
which is eternal, from the means to access it, which is transient.
The raison d'etre of the concept of a URI is merely to unify URLs and
URNs.
At least, that's how it started out. See RFC 3305 for meditations on
meaning and taxonomy.
The rules for URIs, URLs and URNs are in agreement on escaping: it's doe
with percent signs. Encoding spaces as pluses is not part of those
specifications.
Rather, the plus for space thing is part of the specification of the
application/x-www-form-urlencoded content type. This is a content type
which encodes a list of key-value pairs (where keys and values, or at
least values, can be arbitrary byte strings) as text comprising characters
from a limited subset of ASCII. It's like a cross between java's
properties file format and base-64. Anyway, it says to encode spaces as
pluses. The purpose of x-www-form-urlencoded is to encode the values in an
HTML form in such a way that they can be transmitted as part of a URL (or
as an entity body in a POST, but that's less of a driver): an
x-www-form-urlencoded string is safe to use as the query part of a URL.
To clarify, when you see a URL like:
http://www.google.co.uk/search?hl=en&safe=off&q=my+query&btnG=Search
There are *two* *different* layers of syntax here. First is the URI/URL,
syntax, which breaks the string down to:
Scheme: http
Authority:
www.google.co.uk
Path: search
Query: hl=en&safe=off&q=my+query&btnG=Search
Second is x-www-form-urlencoding of the query part, which breaks it down
to:
hl: en
safe: off
q: my query
btnG: Search
Note that it is permitted to have raw + signs in the query part: they're
reserved characters in URI syntax, but in the lesser 'subcomponent
delimiter' set, rather than the greater 'generic delimiter' set, and that
means that they can be used unescaped in a part, provided that the syntax
for that part permits it. I can't find anything in a specification of the
http URL scheme that forbids + from the query part, and thus, applying
ancient Anglo-Saxon legal principles, it's permitted. If you don't like
it, you can always escape them:
http://www.google.co.uk/search?hl=en&safe=off&q=my+query&btnG=Search
I sincerely believe that that URL is exactly equivalent to the one above.
Although i note that Google doesn't think so. Hmm.
Finally, note that x-www-form-urlencoded only applies to the query part,
and only to queries which specifically use it as an encoding (it's the
default in HTML, which is why you see it so often). That means that this:
http://example.org/plus+path
is *not* x-www-form-urlencoded, and is *not* equivalent to:
http://example.org/plus path
Both of those are legal (i think) and different to each other.
tom