Illegal characters in variables passed via a fat url?

F

Fernie

Hello again,

Is there a list of characters that may cause problems in a fat url? I can
think of charcters such as (? = & and blank spaces) that may cause problems
with a url.

The reason for this is that I am considering using fat urls to pass
protected variables (encrypted on the server using the BlowFish algorithm).

Thanks very much,

Fernie
 
J

Jukka K. Korpela

Fernie said:
Is there a list of characters that may cause problems in a fat url?

Yes. See RFC 2396, which applies to URLs irrespectively of their color,
sex, and fatness.
I can think of charcters such as (? = & and blank spaces) that may
cause problems with a url.

The right question is "which characters are safe?".
The reason for this is that I am considering using fat urls to pass
protected variables (encrypted on the server using the BlowFish
algorithm).

I'm pretty sure you are solving the wrong problem, since you asked such
an elementary question and yet describe a fairly complex setting.
 
R

Richard

Fernie said:
Hello again,
Is there a list of characters that may cause problems in a fat url? I can
think of charcters such as (? = & and blank spaces) that may cause
problems with a url.
The reason for this is that I am considering using fat urls to pass
protected variables (encrypted on the server using the BlowFish
algorithm).
Thanks very much,


In a url in coding, the proper form is to use the "special" character set.
That is, for & use &amp.
&nbsp is for a blank space.
 
F

Fernie

Richard said:
In a url in coding, the proper form is to use the "special" character set.
That is, for & use &amp.
&nbsp is for a blank space.

Richard & Dylan,

Thanks for your suggestions.

Regards,

Fernie
 
F

Fernie

Jukka K. Korpela said:
Yes. See RFC 2396, which applies to URLs irrespectively of their color,
sex, and fatness.


Jukka, I really appreciate the info. I found a copy of the specification
you provided. It was posted at the following site:
http://asg.web.cmu.edu/rfc/
The right question is "which characters are safe?".

Not these according to the document:

";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","


If I understood correctly, the above reserve characters must be encoded if
passed in a URL.

Regards,

Fernie
 
S

Steve Pugh

Dylan said:
Richard wrote:
[url encoding]
&nbsp is for a blank space.
No it isn't. %20 is for space in URLs.

That's for the output.
Within a tag inside the html, the correct format is &nbsp.

No. For starters a space and a non-breaking space are different
characters. Positions 32 and 160 respectively in ISO-8859-1 and many
other character encodings.

In a URL, wherever it is written, a space is %20.
A non-breaking space would be   in HTML or %A0 in URLs.

Steve
 
J

Jukka K. Korpela

Fernie said:
Not these according to the document:

";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","

It depends. RFC 2396 is tough reading. I have read it about three times
and converted it to hypertext, and still don't understand all the
pieces (partly because the terminology is horrendously confused and not
compatible with character code standard terminology).
If I understood correctly, the above reserve characters must be
encoded if passed in a URL.

It's much more complex than that. Some of them _must not_ be encoded
when used in a _specific meaning_.

I repeat: you are probably solving the wrong problem. Why create
"fat URLs" in the first place? You won't need them if you use the POST
method.
 
F

Fernie

Jukka K. Korpela said:
I repeat: you are probably solving the wrong problem. Why create
"fat URLs" in the first place? You won't need them if you use the POST
method.

Hyperlinks is my stumbling block.

I am trying to achieve redundant session management and tamper prevention.
At this time, I am sucessfully using:

* Cookies. Unless cookies are rejected by the client, therefore, I cannot
rely solely on this.

* Hidden fields. The session data is posted to new pages whenever a post is
performed. Sessions are maintained. The drawback is that I have hyperlinks
to other pages and if cookies are off, the session is lost when the
hyperlinks are clicked.

The cookies and hidden fields are to be encrypted, therefore, tampering hits
can be immediately discovered and ignored by returning a blank page if the
format is in any way invalid. Processing stops here instead of going on
further, to perhaps a database lookup routine, etc.

I've heard that Perl, PHP, and other interpreters already have built in
functionality to handle this but I am writting my application in C++ since I
don't know how to use any of the mentioned tools specific to web
development. I have found two different urlencode/urldecode functions
written for C++ that I will try today. Here is a documentation snippet:

URLEncode is a String class function that converts a US-ASCII string to its
representation in the URL Encoding scheme. URLEncode is based on the URL
character encoding rules as described in the Internet Standards document RFC
1738 - Uniform Resource Locators (URL)
(http://www.rfc-editor.org/rfc/rfc1738.txt).

Regards,

Fernie
 
J

Jukka K. Korpela

Fernie said:
At this time, I am sucessfully using: - -
* Hidden fields. The session data is posted to new pages whenever
a post is performed. Sessions are maintained.

That's what I've been suggesting, more or less.
The drawback is
that I have hyperlinks to other pages and if cookies are off, the
session is lost when the hyperlinks are clicked.

So don't do that. This is a place for using buttons instead of links.
Note that you could use CSS to make a button (in a form containing just
the button and a collection of hidden fields) much like a link. Not
quite, but quite a lot. But maybe a button would be more suitable and
"honest", since it is a submission of a kind rather than a normal link.
I've heard that Perl, PHP, and other interpreters already have
built in functionality to handle this but I am writting my
application in C++ since I don't know how to use any of the
mentioned tools specific to web development.

Sounds like the hard way of doing things. Perl takes more than an
eternity to learn properly, but the basics aren't really rocker
science. PHP is much easier.
I have found two
different urlencode/urldecode functions written for C++ that I will
try today. Here is a documentation snippet:

URLEncode is a String class function that converts a US-ASCII
string to its representation in the URL Encoding scheme. URLEncode
is based on the URL character encoding rules as described in the
Internet Standards document RFC 1738 - Uniform Resource Locators
(URL) (http://www.rfc-editor.org/rfc/rfc1738.txt).

Sounds pretty old. As far as generic URL syntax is considered, RFC 1738
was superseded _in 1998_. That's more than six years ago. Moreover, the
reference is incorrect; RFC 1738 wasn't an Internet Standard (which is
a status specifically given to fairly few RFCs).

In practice the changes aren't that big, but it's still a very old
specification, and the new one should be used instead. However it's a
major effort to implement it properly in all details.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,228
Latest member
MikeMichal

Latest Threads

Top