CGI.pm and lost carriage returns

A

Alan J. Flavell

Sanity checks? Checking arbitrary text entered in a textbox is
'difficult' in a very real sense.

What is needed is not checking, but encoding.

Yes, but what is primarily needed here is a sound analysis of what's
intended to be achieved!

If the user is intended to be supplying *plain text* then indeed the
correct thing to do is to encode the whole thing. It would be the
same principle (albeit a differerent encoding) if you were intending
to use the user input for looking-up a database entry, for example
(consider SQLencode).

So, in the present context, if the user supplies any HTML tags, then
the result should be that the HTML tags can be seen in the final
result, without causing any side effects.

Ergo, if the user input is supplied as plain text, and the server-side
process is going to insert it into some HTML, then at the very least
it has to HTML-encode the characters "<" and "&" (for example as &lt;
and &amp;), in addition to dealing with the newlines issue. That's
what you had in mind by "encoding", isn't it?

If, on the other hand, it's intended that the user should be allowed
to insert a safe subset of HTML markup, then the server-side process
needs to thorougly analyze the input and allow *only* those permitted
markups through. It's amazing how many ways have been found to bypass
naive checks because the analysis wasn't thorough enough. As the
relevant FAQ warns us, regexps are an inappropriate tool for analyzing
HTML markup - bearing in mind that the programmer has no control over
what the user may be entering - linebreaks inside HTML tags -
deliberately defective comment markups - etc. etc.

The above-quoted reference to "sanity checks" is a dangerous swerve.
The only secure way to proceed is to allow *only* what you've decided
to allow, and reject all else. The converse idea, that you can detect
all malicious cases, and allow anyything that hasn't been explicitly
rejected, has proven over and over again to be a dangerous
miaconception, and is rightly condemned in any security analysis.

I'll try to make this my last word on the topic, as I realise that
much of it is O.T for this group, but I was disappointed by the number
of responses which leaped immediately into the nuts and bolts of
implementing a "solution" - without apparently having analyzed what
the real problem might be. I would have to insist on this point that I
made before:

[
Have we even understood what it is that the O.P is intending to
achieve? Whatever it is, I'm highly sceptical of the server-side
processing merely sprinkling the input with <br> tags instead of
newlines, and nothing more: it does not seem to be a solution to any
variant of this problem that I can think of.
]

regards
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top