Sqlite3. Substitution of names in query.


L

Lacrima

Hello!

I use sqlite3 module for my sqlite database. I am trying to substitute
table name in sql query.
That's ok
That's ok too
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
sqlite3.OperationalError: near "?": syntax error

So what's wrong now?
Is it impossible to substitute table names, using DB API?
If so, what is a way to make table name substitutions? Are string
operations like
'select * from %s where...' % tablename
ok in this case?

Thanks in advance!
 
Ad

Advertisements

D

Diez B. Roggisch

Lacrima said:
Hello!

I use sqlite3 module for my sqlite database. I am trying to substitute
table name in sql query.

That's ok

That's ok too

Traceback (most recent call last):
File "<string>", line 1, in <fragment>
sqlite3.OperationalError: near "?": syntax error

So what's wrong now?
Is it impossible to substitute table names, using DB API?

Yes, it is. How should the api discern that you meant the tabe-name
(which gets passed without quotes, or if so with double-quotes) instead
of a string-literal? There *could* be means to do so, but in the end all
this boils down to crossing a border that better is left alone - because
there are many problems of sql injection lurking if you are basing your
tablenames/columns on *input*.
If so, what is a way to make table name substitutions? Are string
operations like
'select * from %s where...' % tablename
ok in this case?

By substituting it with simple string-interpolation. Or even better, by
not doing it at all - because usually, your datamodel is tied to your
program, so the need for this kind of dynamicity shouldn't arise in the
first place.

Die
 
M

Matteo

Hello!

I use sqlite3 module for my sqlite database. I am trying to substitute
table name in sql query.


That's ok


That's ok too


Traceback (most recent call last):
  File "<string>", line 1, in <fragment>
sqlite3.OperationalError: near "?": syntax error

So what's wrong now?
Is it impossible to substitute table names, using DB API?
If so, what is a way to make table name substitutions? Are string
operations like
'select * from %s where...' % tablename
ok in this case?

Thanks in advance!

I've found myself wanting this ability too, but alas, it is not
possible. SQLite statements are compiled into an intermediate bytecode
so that they can execute very quickly. This bytecode allows for
placeholders to be used for values, so that the same compiled bytecode
can be run for a multitude of values (handy for large INSERTS, of
course) without recompilation.

As I understand it, the bytecode is specific to the table(s) and
columns used in the statement. I don't know the specific mechanism,
but I would suspect that a column name gets converted to an offset
into a row, or to a pointer to a table's column array, or somesuch. In
particular, the code generated is probably drastically different
depending on whether or not a column in a table is indexed or not.
Thus, if a placeholder was used for a column, then the whole statement
would have to be recompiled each time it was run, which would do very
nasty things to efficiency.

So, if you really need that ability, you must use normal python string
interpolation.
 
M

Matteo

Or even better, by
not doing it at all - because usually, your datamodel is tied to your
program, so the need for this kind of dynamicity shouldn't arise in the
first place.

Die

Perhaps that is true in the majority of cases, but there are
exceptions. I can think of a couple of instances where one might need
to do it:
1) A general database exploration or visualization application, or

2) Where one needs to perform a similar operation on several different
tables. In a current case of mine, I'm converting from several
externally provided tab-delimited tables to an in-memory sqlite
database. Most of my app is tied closely to the model, and needs no
such dynamicity. However, I want to automate the conversion, so I
don't have to write 20 or so similar functions.
 
D

Dennis Lee Bieber

1) A general database exploration or visualization application, or
Which should interrogate the database itself to find the
table/column names, and present them to the user in some form of menu --
such that the user never enters the table/column name as text, but
instead as a selection from those displayed. Internally, one converts
the selection back into a table/column name.

This way regular string interpolation operations (or whatever Python
3.x has replaced it with) are safe to construct the SQL, leaving only
user supplied (or program generated) data values to be passed via the
DB-API parameter system -- so that they are properly escaped and
rendered safe.
2) Where one needs to perform a similar operation on several different
tables. In a current case of mine, I'm converting from several
externally provided tab-delimited tables to an in-memory sqlite
database. Most of my app is tied closely to the model, and needs no
such dynamicity. However, I want to automate the conversion, so I
don't have to write 20 or so similar functions.

Which again is an application in which the "user" does not have
direct input of table/column names.
 
L

Lawrence D'Oliveiro

Dennis Lee said:
This way regular string interpolation operations (or whatever Python
3.x has replaced it with) are safe to construct the SQL, leaving only
user supplied (or program generated) data values to be passed via the
DB-API parameter system -- so that they are properly escaped and
rendered safe.

Mixing the two is another recipe for confusion and mistakes.
 
Ad

Advertisements

C

Carsten Haese

Lawrence said:
Mixing the two is another recipe for confusion and mistakes.

Mixing the two is necessary. According to the SQL standard, parameters
can only take the place of literal values. Parameters can't take the
place of identifiers or keywords that make up the structure of the query.

So, you use string manipulation to build the structure of the query, and
then you use parameter binding to fill values into the query. They are
two different tools for two fundamentally different jobs. As long as you
understand what you're doing, there should be no confusion. (And if you
don't understand what you're doing, you shouldn't be doing it!)
 
L

Lawrence D'Oliveiro

Carsten said:
Mixing the two is necessary.
...
As long as you understand what you're doing, there should be no confusion.
(And if you don't understand what you're doing, you shouldn't be doing
it!)

But if you understand what you're doing, you don't need to mix the two.
 
C

Carsten Haese

Lawrence said:
But if you understand what you're doing, you don't need to mix the two.

Are we talking about the same thing here? I thought we're talking about
string interpolation and parameter binding, and I explained that mixing
those two is necessary if you have a query in which the "movable" bits
are identifiers or other syntax elements.

On what grounds are you asserting that it's not necessary to mix the
two? Please elaborate your point.
 
L

Lawrence D'Oliveiro

Carsten said:
On what grounds are you asserting that it's not necessary to mix the
two? Please elaborate your point.

On the grounds that Python has more general and powerful string parameter-
substitution mechanisms than anything built into any database API.
 
D

Dennis Lee Bieber

On the grounds that Python has more general and powerful string parameter-
substitution mechanisms than anything built into any database API.

Really? In the case of MySQLdb, the DB-API /uses/ Pythons string
interpolation -- but first does all needed escaping of special
characters followed by putting quotes around the data. Of course, MySQL
didn't have parameterized/prepared queries until recently, so the API
has to submit complete SQL statements for every operation (this may
change in the future, but then means one can not run against a v4.x
MySQL engine)

Since SQLite uses prepared/parameterized queries, the parameters are
passed in a totally different means, and the engine itself ensures the
data can't be jiggered to look like a set of statements.

In both cases, the only thing that changes in the SQL is placeholder
-- for SQLite, the API uses "?", and for MySQL the API uses "%s"... But
the spec for the API also states that one can retrieve the parameter
marker dynamically. So one could moderately easily write code that is
engine agnostic... Something not that easy to do for data items if one
is going to ignore the DB-API parameter passing scheme totally.
 
Ad

Advertisements

C

Carsten Haese

Lawrence said:
On the grounds that Python has more general and powerful string parameter-
substitution mechanisms than anything built into any database API.

That statement is fundamentally flawed. You are assuming that the
preferred way of getting a value into a query is by substituting a
literal into the query string. That is, in general, not true, because
that would be horribly inefficient. This is also why I despise the term
"parameter substitution", since it implies incorrectly that passing
parameters to a query is merely a string formatting exercise. The
correct term is "parameter binding."

Most databases actually provide an API for supplying parameters
separately from the query string. This is more efficient, because it
eliminates the need to render the parameter value into a literal form on
the client side and to parse the literal form on the server side. Also,
it allows the engine to perform the same query multiple times with
different values without having to re-parse the query.

Finally, you're assuming that every value that can be supplied to a
query actually HAS a literal form. That is not true. For example, in
Informix databases, there are no literals for BYTE-type values. (You'd
probably call them blobs.) So, if vomiting literals into the query
string were your only way of conveying values to the database, you'd
never be able to populate a BYTE column on an Informix database. The
only way to pass a BYTE value to an Informix database is by parameter
binding.

Since parameter binding is in general much more than string
substitution, it is indeed necessary to mix the two.
 
L

Lawrence D'Oliveiro

Carsten said:
That statement is fundamentally flawed. You are assuming that the
preferred way of getting a value into a query is by substituting a
literal into the query string. That is, in general, not true, because
that would be horribly inefficient.

Says someone who hasn't realized where the real inefficiencies are. Remember
what Tony Hoare told us: "premature optimization is the root of all evil".
These are databases we're talking about. Real-world databases are large, and
reside on disk, which is several orders of magnitude slower than RAM. And
RAM is where string parameter substitutions take place. So a few hundred
extra RAM accesses isn't going to make any significant difference to the
speed of database queries.
Finally, you're assuming that every value that can be supplied to a
query actually HAS a literal form. That is not true. For example, in
Informix databases, there are no literals for BYTE-type values.

Probably why I don't use Informix. What use is a binary data type if you
can't insert and retrieve binary data values?
 
C

Carsten Haese

Lawrence said:
Says someone who hasn't realized where the real inefficiencies are. Remember
what Tony Hoare told us: "premature optimization is the root of all evil".
These are databases we're talking about. Real-world databases are large, and
reside on disk, which is several orders of magnitude slower than RAM. And
RAM is where string parameter substitutions take place. So a few hundred
extra RAM accesses isn't going to make any significant difference to the
speed of database queries.

You're just not getting it. The cost is not in performing the parameter
substitutions themselves. The cost is in parsing what's essentially the
same query one million times over when it could have been parsed only
once. You might find an increase of seven orders of magnitude
insignificant, but I don't.
Probably why I don't use Informix. What use is a binary data type if you
can't insert and retrieve binary data values?

You CAN insert and retrieve binary data values. You just have to use the
right tool for the job, and that is parameter binding.
 
L

Lawrence D'Oliveiro

Carsten said:
You're just not getting it. The cost is not in performing the parameter
substitutions themselves. The cost is in parsing what's essentially the
same query one million times over when it could have been parsed only
once. You might find an increase of seven orders of magnitude
insignificant, but I don't.

There is no such parsing overhead. I speak from experience.

Look at the BulkInserter class here
<http://www.codecodex.com/wiki/Useful_MySQL_Routines>. I have successfully
used that to insert tens of thousands of records in just a few seconds. Yet
it makes no use of the parameter-substitution provided by MySQLdb; it is all
just straight Python, including the SQLString routine (also on that page),
which goes through every single character of each string value to decide
what needs escaping. Python can do all that, and do it fast.

You don't get to figure out what's efficient and what's not by mere hand-
waving; you have to do actual real-world tests.
 
L

Lawrence D'Oliveiro

Really? In the case of MySQLdb, the DB-API /uses/ Pythons string
interpolation ...

Only a limited subset thereof. For instance, I'm not aware of any database
API that lets me do this:

sql.cursor.execute \
(
"update numbers set flags = flags | %(setflags)u where projectid = %(projectid)s"
"%(match_listid)s and number = %(number)s"
%
{
"projectid" : SQLString(ProjectID),
"match_listid" :
("", " and listid = %s" % SQLString(ListID))[ListID != None],
"number" : SQLString(number),
"setflags" : flags,
}
)
 
Ad

Advertisements

R

Robert Kern

Really? In the case of MySQLdb, the DB-API /uses/ Pythons string
interpolation ...

Only a limited subset thereof. For instance, I'm not aware of any database
API that lets me do this:

sql.cursor.execute \
(
"update numbers set flags = flags | %(setflags)u where projectid = %(projectid)s"
"%(match_listid)s and number = %(number)s"
%
{
"projectid" : SQLString(ProjectID),
"match_listid" :
("", " and listid = %s" % SQLString(ListID))[ListID != None],
"number" : SQLString(number),
"setflags" : flags,
}
)

When programmatically generating SQL, I like to use SQLAlchemy. This use case is
handled with .where() on Update expression objects. Personally, I find
manipulating the SQLAlchemy expression objects clearer, safer, and more portable
than building the raw SQL through string interpolation.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
L

Lawrence D'Oliveiro

Robert said:
For instance, I'm not aware of any database API that lets me do this:

sql.cursor.execute \
(
"update numbers set flags = flags | %(setflags)u where
projectid = %(projectid)s" "%(match_listid)s and number =
%(number)s"
%
{
"projectid" : SQLString(ProjectID),
"match_listid" :
("", " and listid = %s" % SQLString(ListID))[ListID
!= None],
"number" : SQLString(number),
"setflags" : flags,
}
)

When programmatically generating SQL, I like to use SQLAlchemy. This use
case is handled with .where() on Update expression objects. Personally, I
find manipulating the SQLAlchemy expression objects clearer, safer, and
more portable than building the raw SQL through string interpolation.

Doesn't seem to support bulk insertions à la this
<http://www.codecodex.com/wiki/Useful_MySQL_Routines#Bulk_Insertion>. Or
even literal lists as per the SQLStringList routine on the same page.
 
C

Carsten Haese

Lawrence said:
There is no such parsing overhead. I speak from experience.

Look at the BulkInserter class here
<http://www.codecodex.com/wiki/Useful_MySQL_Routines>. I have successfully
used that to insert tens of thousands of records in just a few seconds. Yet
it makes no use of the parameter-substitution provided by MySQLdb; it is all
just straight Python, including the SQLString routine (also on that page),
which goes through every single character of each string value to decide
what needs escaping. Python can do all that, and do it fast.

With all due respect, but if your experience is exclusive to
MySQL/MySQLdb, your experience means very little for database
programming practices in general. Throughout most of its history, MySQL
did not support prepared statements and parameter binding, and MySQLdb
doesn't use any parameter binding API that might be available, so you're
comparing your own implementation of string interpolation to MySQLdb's
implementation of string interpolation. Your experience says nothing
about databases that have an *actual* parameter binding API.
You don't get to figure out what's efficient and what's not by mere hand-
waving;

I'm not handwaving.
you have to do actual real-world tests.

I have.

See for example the timing test in
http://informixdb.blogspot.com/2007/07/filling-in-blanks.html . If you'd
like to try it for yourself, here is a version of the test for SQLite:

=================================================================
# querytest.py
class Tester(object):
def __init__(self):
import sqlite3
conn = sqlite3.connect(":memory:")
self.cur = conn.cursor()
self.cur.execute("create temp table t1(a int, b int)")
self.counter = 0

def with_params(self):
self.counter += 1
self.cur.execute("insert into t1 values(?,?)",
(self.counter,self.counter*2) )

def without_params(self):
self.counter += 1
self.cur.execute("insert into t1 values(%s,%s)" %
(self.counter,self.counter*2) )
=================================================================

And here are the corresponding results on my laptop:
$ python -mtimeit -s "from querytest import Tester; t=Tester()"
't.with_params()'
10000 loops, best of 3: 20.9 usec per loop
$ python -mtimeit -s "from querytest import Tester; t=Tester()"
't.without_params()'
10000 loops, best of 3: 36.2 usec per loop

So, you can say whatever you want, but you will never convince me that
string interpolation is better than parameter binding for getting
variable values into a query. Even if you don't accept my proof that it
is more efficient, you have not proved that parameter binding is less
efficient.

In addition to the efficiency factor, parameter binding is inherently
secure, whereas string interpolation is too easy to use insecurely.
Finally, parameter binding is the standard method, as defined by the SQL
standard, of getting variable values into a query.

You may call it "premature optimization", but I call it "choosing the
right tool for the job."

I assume that none of this will convince you, but that's fine. We'll
just agree to disagree on this.
 
Ad

Advertisements

D

Dennis Lee Bieber

There is no such parsing overhead. I speak from experience.
said:
Look at the BulkInserter class here
<http://www.codecodex.com/wiki/Useful_MySQL_Routines>. I have successfully
used that to insert tens of thousands of records in just a few seconds. Yet
it makes no use of the parameter-substitution provided by MySQLdb; it is all

You picked the wrong database to use for your argument.

Prior to MySQL version 5, MySQL ITSELF did not support prepared
(that is, preparsed and parameterized) SQL with parameters passed
separately.

MySQLdb, being compatible with MySQL 3.x and 4.x (along with 5.x),
therefore, ITSELF creates full SQL statements using Python string
interpolation operations (the main reason it uses %s as the
placeholder).

So your main claim turns into:

"My Python string interpolation is just as fast as MySQLdb's Python
string interpolation"

Don't believe me? Just take a look at the MySQLdb Python code -- the
execute() method summarizes to (not the real names):

resultSQL = SQL_with_placeholders % escape_arguments(argument_tuple)
submit_to_MySQL(resultSQL)

where escape_arguments() generates a string representation of each
argument, escapes quotes and other dangerous characters, and then wraps
the stringified arguments with MySQL quotes (which is why you can't use
%f to specify a placeholder for floating numerics -- the numeric was
converted to a quoted string before getting fed to the %f!)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top