My *codeURI* Gotcha :/

V

vbgunz

forgive me for my bumbling confusion. I am learning javascript and got
caught up in a gotcha. this may be due to Mozillas spidermonkey or
most likely to my n00b mindset on how encodeURI* and decodeURI* work.

Can we all agree that this does *not* work:
decodeURI("http://domain.com%");

Can we all agree that this *does* work:
decodeURI("http://domain.com");

here is where the confusion settles in. decodeURI has no problems with
simply ignoring the second example (which is not even encoded) and
producing exactly what it was given but considers the first example a
malformed URI and returns a URIError. maybe I am the dumb one for
passing that URI in *but* this is where the confusion takes a turn for
the worse.

Why does this work?
var uri = encodeURI("http://domain.com%");
decodeURI(uri);

whats the trick usage here? I mean, why does a malformed URI passed
directly to decode choke but if I simply encode it, decode won't choke
and return exactly the URI I passed in uncoded? I expect, whatever may
be passed in to decode will simply return decoded *if* possible e.g.,
example 2.

I am certain, this is most likely on purpose but am failing to
understand how, why, when, etc :/

Can someone help clear this up for me, thanks a million!
 
T

Thomas 'PointedEars' Lahn

vbgunz said:
Can we all agree that this does *not* work:
decodeURI("http://domain.com%");

Can we all agree that this *does* work:
decodeURI("http://domain.com");
Agreed.

here is where the confusion settles in. decodeURI has no problems with
simply ignoring the second example (which is not even encoded) and
producing exactly what it was given

Exactly. No percent-encoded character, so nothing to decode. Works as
designed.
but considers the first example a malformed URI and returns a URIError.

Works as designed, too. That *is* a malformed URI. "%" is a special
character in URIs, used for percent-encoding. It has to be followed
by two hexadecimal digits.
maybe I am the dumb one for passing that URI in *but* this is where
the confusion takes a turn for the worse.

Why does this work?
var uri = encodeURI("http://domain.com%");
decodeURI(uri);

whats the trick usage here? I mean, why does a malformed URI passed
directly to decode choke but if I simply encode it, decode won't choke
and return exactly the URI I passed in uncoded?

Because encoding an URI-like string so that it *becomes* a valid URI is what
encodeURI() is supposed to do. And decodeURI() is supposed to decode valid
URIs. If you inspect the values either method returns (I recommend Firebug
anyway), you see why it works: the "%" is percent-encoded as "%25", and then
decoded back to "%".

See also http://rfc-editor.org/rfc/rfc3986.txt or
http://www.rfc-editor.org/rfc/std/std66.txt


HTH

PointedEars
 
V

vbgunz

Because encoding an URI-like string so that it *becomes* a valid URI is what
encodeURI() is supposed to do. And decodeURI() is supposed to decode valid
URIs. If you inspect the values either method returns (I recommend Firebug
anyway), you see why it works: the "%" is percent-encoded as "%25", and then
decoded back to "%".

this sort of makes my point. decodeURI will not decode a malformed URI
but to get it to do so or to get away with doing so, I simply have to
encode it first. to me the only difference is, in raw form, I get an
error where as, if it is encoded, I don't. I'll try to show another
example of what I mean.

var a = encodeURI("http://domain.com%");
var b = decodeURI(a); // -> http://domain.com%
var c = decodeURI("http://domain.com%"); // -> URIError

*b* decodes correctly here because %25 decoded translates to %. *c*
springs a URIError because % by itself, well it's invalid. what I
expected is one of these too. b -> URIError (technically malformed) OR
c -> http://domain.com% (because I shouldn't have to encode it to
simply get it back, e.g., b).

print(decodeURI("abc")); // nothing to decode so no error!

in the above example, there is absolutely nothing to decode
(technically it is malformed, because there is nothing to decode) so,
return what I passed in, all is good, everybody have a pizza OR throw
a URIError because there is nothing to decode and so warm up the Death
star. Awesome its pizza night on the death star :)

I feel like an idiot because I feel certain this is all by design and
the only flaw is my thinking. truth is, I've completed 12 chapters on
core JavaScript and before starting on client-side, I decided to go
through the reference too while trying my hand at a few scripts. in
the core reference though, I found *codeURI* and don't remember them
explained in enough detail to understand this.

holy cow batman are you frigging kidding me? those docs are too long
so I initiated a search and was told 150 minutes (building the index)
I did find % and %25 and went over a few hits. I got some knowledge
from it and further decided to cement them by using Google for better
explanations. I found the Mozilla core JavaScript guide and they too
helped shed some light on the matter.

anyhow, I am not going to go nuts over this. I have a general idea
about them and hope I'll learn the idiomatic and intended uses for
them. only need to practice ;)
 
T

Thomas 'PointedEars' Lahn

vbgunz said:
this sort of makes my point.

Quite the contrary.
decodeURI will not decode a malformed URI
Correct.

but to get it to do so or to get away with doing so, I simply have to
encode it first.

No, you don't.
to me the only difference is, in raw form, I get an
error where as, if it is encoded, I don't. I'll try to show another
example of what I mean.

var a = encodeURI("http://domain.com%");
var b = decodeURI(a); // -> http://domain.com%
var c = decodeURI("http://domain.com%"); // -> URIError

*b* decodes correctly here because %25 decoded translates to %. *c*
springs a URIError because % by itself, well it's invalid. what I
expected is one of these too. b -> URIError (technically malformed) OR
c -> http://domain.com% (because I shouldn't have to encode it to
simply get it back, e.g., b).

You miss one important step in your thinking.

1. "http://domain.com%" is not a valid URI, because the "%" is not followed
by two hexadecimal digits (i.e. that is not a proper percent-encoding).
2. encodeURI() exists to make (1) valid.
3. encodeURI("http://domain.com%") yields a == "http://domain.com%",
because that character is designated unsafe if not part of a
percent-encoding character sequence (because it is used for
percent-encoding of characters itself) and 0x25 is the ASCII code for
the "%" character.

4. a == "http://domain.com%" is a valid URI, because the "%" is followed
by two hexadecimal digits (see above).
5. decodeURI() exists to decode it.
6. decodeURI("http://domain.com%") yields b == "http://domain.com%",
because 0x25 is the ASCII code for the "%" character.

But:

7. "http://domain.com%" is not a valid URI, for the reasons given above.
8. However, decodeURI() exists to decode only *valid* URIs.
9. decodeURI("http://domain.com%") yields nothing but throws a URIError
exception.

That said, I find it seldom necessary to use encode/decodeURI().
encode/decodeURIComponent() is much more often used, to pass data in the
query part of a HTTP GET request or in the message body of a POST request.
holy cow batman are you frigging kidding me? those docs are too long
so I initiated a search and was told 150 minutes (building the index)

ISTM you are using either one of a computer, CPU, operating system, or a Web
user agent that is too slow for the Web or too busy doing other things.

With Firefox 2.0.0.6, on this Pentium M 740 powered notebook (that is not
even sold anymore) on Windows XP SP2, it took me about only a second after
pressing Ctrl+F to find the relevant section. And there was no indexing needed.
I did find % and %25 and went over a few hits.

You could have searched for "percent".


PointedEars
 
B

Bart Van der Donck

Thomas said:
"http://domain.com%" is a valid URI, because the "%" is followed
by two hexadecimal digits (see above).

No, that is not a valid URI. From http://www.ietf.org/rfc/rfc3986.txt
:

| URI producing applications must not use percent-encoding in
| host unless it is used to represent a UTF-8 character sequence.
| URI producers should provide these registered names in the IDNA
| encoding, rather than a percent-encoding, if they wish to
| maximize interoperability with legacy URI resolvers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,786
Messages
2,569,626
Members
45,328
Latest member
66Teonna9

Latest Threads

Top