how to code to avoid SQL insertion attacks

Lee Fesperman · Mar 12, 2005

Earlier, I intimated that PostgreSQL was brain-dead because it supported ((foo = NULL)
IS NULL). I still think (foo = NULL) is useless (because it always produces
NULL/UNKNOWN) and redundant (because the appropriate construct is (foo IS NULL)).
However, due to your cites of PostgreSQL docs and my own research (see below), I will
admit there is some 'logic' to their support of ((foo = bar) IS NULL).

Whereas both PostgreSQL and MySQL (*shudder*) evaluate predicates to TRUE,
FALSE, or NULL and have no truth value referred to as UNKNOWN. If we
suppose that NULL IS UNKNOWN is true and that UNKNOWN IS NULL is true
(this seems to be the sticking point) then the SQL definition and both of
those DBMS servers are compatible.

Interesting quotes:

PostgreSQL:
"SQL uses a three-valued Boolean logic where the null value represents
"unknown"."
<http://www.postgresql.org/docs/8.0/interactive/functions.html>

MySQL:
"The NULL value means ``no data.''"
<http://dev.mysql.com/doc/mysql/en/null-values.html>

There are some better quotes on MySQL but no matter. By chance, I downloaded MySQL
4.1.10 a few weeks ago for an entirely unrelated reason (before this sub-thread and my
current 'battle' with MySQL denizens on comp.databases). I can verify that MySQL
supports ((foo = NULL) IS NULL) with the same semantics as PostgreSQL and doesn't
support IS UNKNOWN, though at one point, it refers to the truth value as "NULL
(UNKNOWN)".

I'll have a look at Oracle and possibly MSSQL at some point. This is
turning out to be more interesting than expected.

I'd be interested. I have "Oracle Unleashed", but it doesn't touch on the subject as far
as I can tell.

I'm a little afraid that others may not be interested in this discussion, in fact bored
by it! But, who cares, they can just skip the whole thing.

Personally, I find PostgreSQL's definition of NULL far more useful,
conceptually; a NULL in a tuple simply means that the corresponding piece
of information is missing or unknown (the value is not NULL, the value is
*not there*), and a NULL result from a comparison means (a) that at least
one value was NULL and (b) that the result of the comparison is likewise
unknown. Perhaps UNKNOWN is a better label for the concept, but NULL is
succinct and very widely understood at this point.

Very widely understood, as regards to the two major open-source DBMSs. I'm pretty sure
the lesser ones which are more SQL Standard compliant have a different view. Who knows
about the major commercial ones though I'd bet that DB2 follows the standard in this
area since the original standard was based on DB2. Our product does follow the standard,
though we've fixed the EXISTS problem.

Certainly I can assert that the standard and RM itself (according to Codd ... Date would
demur since he now is opposed to 3VL and to NULLs in general) do distinguish between
regular expressions (arithmetic, string, ...) and logical expressions (predicates).
Regular expressions can be IS NULL or IS NOT NULL. Logical expressions can be IS TRUE,
IS FALSE or IS UNKNOWN.

I was totally unaware of this dichotomy of opinions, thus my somewhat brash statements
earlier in the thread. However, this is purely a syntax issue; the semantics are the
same (I think).

On that issue, I'll ask you to test the semantics of the logical operators AND and OR
for PostgreSQL. Here's some tests:

+ ((1 = 2) AND (1 = NULL))

+ ((1 = 1) OR (1 = NULL))

Under 3VL, the first should be FALSE not NULL/UNKNOWN and the second should be TRUE. You
might reverse the operands as an additional check since the operators are supposed to be
commutative. I used literals, but you might wish to try column values just to make sure.
Also, since NULL/UNKNOWN are treated as FALSE in a WHERE clause, you might add IS NULL
to the first to really verify. Note: I'd be pretty sure NOT has the right semantics ...
(NOT (1 = NULL)) is NULL/UKNOWN.

The MySQL documentation states that it provides the above semantics, though it didn't in
earlier versions.

Finally, you might be interested in the several white papers about NULLs and 3VL on
www.firstsql.com. I won't list them to save bandwidth. I've had many debates with Date
and his cohorts (Fabian Pascal, David McGoveran, Hugh Darwen) on the subject. They're
completely opposed to NULLs and 3VL; I consider them an essential part of RM (like Codd
did). Even so, I'm still on reasonable terms with them. Some of the papers came out of
my debates with them. You can see their views on www.dbdebunk.com.

Owen Jacobson · Mar 14, 2005

Earlier, I intimated that PostgreSQL was brain-dead because it supported
((foo = NULL) IS NULL). I still think (foo = NULL) is useless (because
it always produces NULL/UNKNOWN) and redundant (because the appropriate
construct is (foo IS NULL)). However, due to your cites of PostgreSQL
docs and my own research (see below), I will admit there is some 'logic'
to their support of ((foo = bar) IS NULL).

Indeed. The support of (foo = NULL) is an artefact of their support for
the comparison of arbitrary expressions; NULL is simply a primitive
expression evaluating to NULL in that context and is not treated
specially. I absolutely agree that there is no reason to use such a
comparison, on much the same grounds that I'd object to (someBoolean IS
TRUE) rather than simply (someBoolean).

On that issue, I'll ask you to test the semantics of the logical operators
AND and OR for PostgreSQL. Here's some tests:

+ ((1 = 2) AND (1 = NULL))

Evaluated directly (as a pseudo-column within a SELECT statement):

template1=# select ((1 = 2) AND (1 = NULL));
?column?
----------
f

template1=# select ((1 = NULL) AND (1 = 2));
?column?
----------
f

I see no particular need to test this within a WHERE clause, as to my
knowledge PostgreSQL uses the same expression evaluator for both tasks.

+ ((1 = 1) OR (1 = NULL))

template1=# select ((1 = 1) OR (1 = NULL));
?column?
----------
t

template1=# select ((1 = NULL) OR (1 = 1));
?column?
----------
t

Under 3VL, the first should be FALSE not NULL/UNKNOWN and the second
should be TRUE.

Just to ensure that I understand the decision table here:

a AND TRUE -> a
a AND FALSE -> FALSE
a AND NULL -> UNKNOWN/NULL

and

a OR TRUE -> TRUE
a OR FALSE -> a
a OR NULL -> UNKNOWN/NULL

Correct?

Considering this further, and breaking away from ANSI SQL slightly, one
could see the statement 'a IS NULL' as composed of two expressions and an
operator 'IS', rather than as one expression and an operator 'IS NULL';
'IS', in this case, is a conceptual comparator rather than a relational
comparator (for lack of a better term). Does this make sense, and is this
backed up in the literature anywhere?

Owen

Lee Fesperman · Mar 15, 2005

Owen said:
Indeed. The support of (foo = NULL) is an artefact of their support for
the comparison of arbitrary expressions; NULL is simply a primitive
expression evaluating to NULL in that context and is not treated
specially. I absolutely agree that there is no reason to use such a
comparison, on much the same grounds that I'd object to (someBoolean IS
TRUE) rather than simply (someBoolean).

No, that is quite a different case. (someBoolean) can be TRUE, FALSE, NULL/UNKNOWN while
(someBoolean is TRUE) filters it into TRUE or FALSE. This can be quite useful in some
cases ... related to the 3VL operation of logical operators (AND/OR/NOT) discussed
below.

Evaluated directly (as a pseudo-column within a SELECT statement):

template1=# select ((1 = 2) AND (1 = NULL));
?column?
----------
f

template1=# select ((1 = NULL) AND (1 = 2));
?column?

That is correct 3VL. Thanks for testing it.

I see no particular need to test this within a WHERE clause, as to my
knowledge PostgreSQL uses the same expression evaluator for both tasks.

No, that is perfectly fine. It's still a predicate wherever it occurs. SQL92 DML only
allows predicates in WHERE, HAVING, CASE and the ON clause of joins. In fact,
distinguishing TRUE, FALSE and NULL/UNKNOWN can be quite difficult using a WHERE clause,
as I found out in testing the addition of the SQL92 IS TRUE/FALSE/UNKNOWN construct to
our ORDBMS. I wanted a set of queries that created a nice truth table like below for
testing. I did finally bring it off.

template1=# select ((1 = 1) OR (1 = NULL));
?column?
----------
t

template1=# select ((1 = NULL) OR (1 = 1));
?column?

Again, that is proper 3VL. The upshot is that they are handling 3VL in a consistent
manner that is also compatible with everyone else. Take that Chris Date!

Just to ensure that I understand the decision table here:

a AND TRUE -> a
a AND FALSE -> FALSE
a AND NULL -> UNKNOWN/NULL

The first two are correct but not the last. The 3rd result depends on the logical value
of 'a', see the tests above.

and

a OR TRUE -> TRUE
a OR FALSE -> a
a OR NULL -> UNKNOWN/NULL

Similarly, the third depends on the logical value of 'a'. Also, you can check the truth
tables in our SQL Tutorial WHERE clause reference I provided earlier.

Correct?

Not really, as I indicated above. However, that brings up an interesting point about
PostgreSQL/MySQL treating NULL and UNKNOWN the same. In non-logic expressions
(arithmetic, string, ...), the presence of NULL in any operand causes the entire
expression to be NULL. In logical expressions (even in PostgreSQL and MySQL), the rules
for AND and OR are quite different, as above. In my view, that adds weight to RM and
standard SQL's distinction between NULL (used in arithmetic/string/... expressions) and
UNKNOWN, only used in logical expressions.

Considering this further, and breaking away from ANSI SQL slightly, one
could see the statement 'a IS NULL' as composed of two expressions and an
operator 'IS', rather than as one expression and an operator 'IS NULL';
'IS', in this case, is a conceptual comparator rather than a relational
comparator (for lack of a better term). Does this make sense, and is this
backed up in the literature anywhere?

I disagree. That is quite a different animal. Consider (as above), that the presence of
NULL as an operand or 'value' of an operand in regular expressions forces the result to
NULL, while in IS NULL it never results in NULL (always TRUE or FALSE). IOW, I don't see
the NULL as regular operand in IS NULL; it is a special construct. Unfortunately, I
don't know of any relevant literature, except that (foo IS NULL) is normally considered
a less confusing alternative to using (foo = NULL), which nobody treats as equivalent to
IS NULL. Well, nobody we've found so far.

Anyway, thanks again for your excellent participation in this discussion. Lately, it
seems I've been involved in some rather over-the-top discussions here and other forums.
It's so nice having heads remain in place ;^)

Owen Jacobson · Mar 16, 2005

No, that is quite a different case. (someBoolean) can be TRUE, FALSE,
NULL/UNKNOWN while (someBoolean is TRUE) filters it into TRUE or FALSE.
This can be quite useful in some cases ... related to the 3VL operation of
logical operators (AND/OR/NOT) discussed below.

A subtle point I'd quite missed. Thank you. For what it's worth,
PostgreSQL once again correctly follows this:

template1=# select (NULL IS FALSE);
?column?
----------
f

template1=# select (NULL IS TRUE);
?column?
----------
f

template1=# select (NULL = FALSE) IS NULL;
?column?
----------
t

The first two are correct but not the last. The 3rd result depends on the
logical value of 'a', see the tests above.

Fair point, and one I considered. In my haste to shorten the table I
introduced a fairly signifigant logic error.

Unfortunately, I don't know of any relevant literature, except that (foo
IS NULL) is normally considered a less confusing alternative to using
(foo = NULL), which nobody treats as equivalent to IS NULL.

MySQL has in the past treated 'foo = NULL' as equivalent to 'foo IS NULL',
though I believe that behaviour was beaten out of them some time ago.

Lee Fesperman · Mar 16, 2005

Owen said:
MySQL has in the past treated 'foo = NULL' as equivalent to 'foo IS NULL',
though I believe that behaviour was beaten out of them some time ago.

Yep, I included MySQL because I checked that behavior with MySQL 4.1.10, as I said
earlier.

Procedure Invalid Identifier in SQL / PLSQL	1	Jul 26, 2022
How to install and use PhpSanitization	0	Feb 7, 2021
Scripts	3	Feb 28, 2022
How to store data from a sign up form on a website into an sql databse	1	Sep 9, 2022
How to host data visualization beginner friendly?	1	Aug 10, 2023
Problem with code	2	Apr 18, 2023
Duplicate ID issue when trying to copy table row	1	Jul 30, 2022
I'm about to get in trouble with the HTML <body></body> tags	10	Aug 12, 2023

how to code to avoid SQL insertion attacks

Lee Fesperman

Owen Jacobson

Lee Fesperman

Owen Jacobson

Lee Fesperman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads