Lie Hetland book: Beginning Python..

V

Vittorio

I am reading "Beginning Python from Novice to Professional" and the book
is really awesome. Nonetheless on ch 13 "Database Support" I found this
code to import data (in a txt file) into a SQLite Database:

#this was corrected because original "import sqlite" does not work
from pysqlite2 import dbapi2 as sqlite

#this function strips the txt file from special chars
def convert(value):
if value.startswith('~'):
return value.strip('~')
if not value:
value = '0'
return float(value)

conn = sqlite.connect('food.db')
curs = conn.cursor()

curs.execute('''
CREATE TABLE food (
id TEXT PRIMARY KEY,
desc TEXT,
water FLOAT,
kcal FLOAT,
protein FLOAT,
fat FLOAT,
ash FLOAT,
carbs FLOAT,
fiber FLOAT,
sugar FLOAT
)
''')

field_count = 10

#following is the line I suspect mistyped
markers = ', '.join(['%s']*field_count)

query = 'INSERT INTO food VALUES (%s)' % markers


for line in open('ABBREV.txt'):
fields = line.split('^')
vals = [convert(f) for f in fields[:field_count]]
#the following line raises error
curs.execute(query,vals)

conn.commit()
conn.close


The error was "Traceback (most recent call last):
File "C:\Python24\food.py", line 39, in ?
curs.execute(query,vals)
pysqlite2.dbapi2.OperationalError: near "%": syntax error"

After two hours of trying (did I say I am a beginner?) and after some
documentation about PySqlite I suspect the error is in:
markers = ', '.join(['%s']*field_count)

I think Magnus intended:
markers = ', '.join(['?']*field_count)


Did I found an errata or my Python is still too green?
 
S

Steve Holden

Vittorio said:
I am reading "Beginning Python from Novice to Professional" and the book
is really awesome. Nonetheless on ch 13 "Database Support" I found this
code to import data (in a txt file) into a SQLite Database:

#this was corrected because original "import sqlite" does not work
from pysqlite2 import dbapi2 as sqlite

#this function strips the txt file from special chars
def convert(value):
if value.startswith('~'):
return value.strip('~')
if not value:
value = '0'
return float(value)

conn = sqlite.connect('food.db')
curs = conn.cursor()

curs.execute('''
CREATE TABLE food (
id TEXT PRIMARY KEY,
desc TEXT,
water FLOAT,
kcal FLOAT,
protein FLOAT,
fat FLOAT,
ash FLOAT,
carbs FLOAT,
fiber FLOAT,
sugar FLOAT
)
''')

field_count = 10

#following is the line I suspect mistyped
markers = ', '.join(['%s']*field_count)

query = 'INSERT INTO food VALUES (%s)' % markers


for line in open('ABBREV.txt'):
fields = line.split('^')
vals = [convert(f) for f in fields[:field_count]]
#the following line raises error
curs.execute(query,vals)

conn.commit()
conn.close


The error was "Traceback (most recent call last):
File "C:\Python24\food.py", line 39, in ?
curs.execute(query,vals)
pysqlite2.dbapi2.OperationalError: near "%": syntax error"

After two hours of trying (did I say I am a beginner?) and after some
documentation about PySqlite I suspect the error is in:
markers = ', '.join(['%s']*field_count)

I think Magnus intended:
markers = ', '.join(['?']*field_count)


Did I found an errata or my Python is still too green?
No, you actually did quite a creditable piece of debugging. The DB-API
specifications allow database modules to substitute parameters into SQL
commands in a number of different ways, and they are supposed to
indicate the technique they use by setting a module variable
"paramstyle" to one of five possible values.

Magnus' original code was written to use a different (but valid)
paramstyle, so I'm guessing that his sqlite module and your sqlite2
simply use different paramstyles. Whether that's because a change was
made in developing the pysqlite code or because pysqlite and pysqlite2
come from different developers I couldn't say, but you have nailed the
problem. Well done!

regards
Steve
 
V

Vittorio

No, you actually did quite a creditable piece of debugging. The DB-API
specifications allow database modules to substitute parameters into
SQL commands in a number of different ways, and they are supposed to
indicate the technique they use by setting a module variable
"paramstyle" to one of five possible values.

Magnus' original code was written to use a different (but valid)
paramstyle, so I'm guessing that his sqlite module and your sqlite2
simply use different paramstyles. Whether that's because a change was
made in developing the pysqlite code or because pysqlite and pysqlite2
come from different developers I couldn't say, but you have nailed the
problem. Well done!

Thanks Steve for your encouragement.

Actually, subsequently to my posting, I had realised that the point was
in the different version of the Pysqlite interface: in fact I found many
other pieces of code which were affected the same way, and each of them
made use of "import pysqlite".

Nonetheless, I was unable to find any documentation about such a
different behaviour between Pysqlite and Pysqlite2; from my beginner
point of view the Pysqlite (Magnus' version) paramstyle looks a better
and more pythonic choice and I don't grasp the Pysqlite2 developers'
intentions deviating from that way.

I would be very grateful if someone would cast a light over
Pysqlite/Pysqlite2 discrepancies.
 
M

Magnus Lycka

Vittorio said:
Nonetheless, I was unable to find any documentation about such a
different behaviour between Pysqlite and Pysqlite2; from my beginner
point of view the Pysqlite (Magnus' version) paramstyle looks a better
and more pythonic choice and I don't grasp the Pysqlite2 developers'
intentions deviating from that way.

Please note that the DB-APIs let you use a foreign language, SQL,
in Python strings. Having SQL look Pythonic is hardly a virtue.
SQL should look SQLic! The SQL standards clearly state that '?'
is the correct symbol for dynamic SQL placeholders. For embedded
SQL (which is really moot for Python) it's ':NAME', but '%s' has
nothing to do with SQL. Pysqlite supports both '?' and ':NAME',
but no longer '%s', which is a blessing in my book.

Please note that while there is a rough correspondence between
the placeholders in SQL and %s and friends in Python strings,
they are far from the same. With SQL placeholders and separately
passed parameters, proper implementations of database servers will
prevent SQL injection attacks and provde a much better performance
than if you build an SQL string with Python's %-operator and %s etc
in the SQL string. Proper SQL parameter passing will also mean that
parameter quoting is handled for you. On the other hand, you can only
use placeholders in certain positions in SQL, so you might need %s
as well in SQL strings too, if you for instance need to determine
the table to search from in runtime.

Using the same symbol for both string substitutions and SQL placeholder
such as pysqlite 1 and the MySQL interface does, is not really a bright
idea in my opinion. Who thinks this is pretty?

sql = "SELECT %s FROM %s WHERE %s = %%s"
cur.execute(sql % (col,table,search_col), (param,))

I think it's less confusing with:

sql = "SELECT %s FROM %s WHERE %s = ?"
cur.execute(sql % (col,table,search_col), (param,))

With %s as placeholder, it's easy to do either...

sql = "SELECT %s FROM %s WHERE %s = %s"
cur.execute(sql % (col,table,search_col,param))

If you do this, you won't have any help with quoting,
you are suceptible to SQL injection attacks, and your
performance won't improve if the same query is performed
repeatedly with different values for param, since the
database server will make a new query execution plan
every time. :(

or...

sql = "SELECT %s FROM %s WHERE %s = %s"
cur.execute(sql, (col,table,search_col,param))

If this works with your DB driver, it's likely to be really
broken and just work as the previous example. In other
words you don't have the benefits in performance, convenience
or security that parameter passing provides in dynamic SQL.

Of course, the "proper" way, with %s-substitution for e.g. table
names and ? for parameters is also open for SQL injection attacks
if the values in the strings col, table and search_col above are
user input, but since they are plain SQL identifiers, they are much
easier to check than arbitrary search values. You'd probably have
a set of allowed values, and check that the input was in that
set. They are also less likely to come from an untrusted source.

The DB-API spec is available at http://python.org/peps/pep-0249.html
It's a good read. You could also look at:
http://www.thinkware.se/epc2004db/
I would be very grateful if someone would cast a light over
Pysqlite/Pysqlite2 discrepancies.

I'm afraid I haven't seen that anywhere. Some of the more subtle
changes probably results from the difference between SQLite 2 and
SQLite 3, since these are the versions those Python libraries wrap.

As you can see in
http://initd.org/tracker/pysqlite/wiki/PysqliteVersions
you can use pysqlite 1.1 if you want to use the old pysqlite 1
API.

Pysqlite2 is documented here:
http://initd.org/pub/software/pysqlite/doc/usage-guide.html

It contains a fair amount of examples, but unfortunately no
direct comparision with pysqlite 1.
 
V

Vittorio

Thank you for your message I found really illuminating.
Using the same symbol for both string substitutions and SQL placeholder
such as pysqlite 1 and the MySQL interface does, is not really a bright
idea in my opinion. Who thinks this is pretty?

sql = "SELECT %s FROM %s WHERE %s = %%s"
cur.execute(sql % (col,table,search_col), (param,))

I think it's less confusing with:

sql = "SELECT %s FROM %s WHERE %s = ?"
cur.execute(sql % (col,table,search_col), (param,))

I had never thought about it, I really agree.

Of course, the "proper" way, with %s-substitution for e.g. table
names and ? for parameters is also open for SQL injection attacks
if the values in the strings col, table and search_col above are
user input, but since they are plain SQL identifiers, they are much
easier to check than arbitrary search values. You'd probably have
a set of allowed values, and check that the input was in that
set. They are also less likely to come from an untrusted source.
undoubtedly.

As you can see in
http://initd.org/tracker/pysqlite/wiki/PysqliteVersions
you can use pysqlite 1.1 if you want to use the old pysqlite 1
API.

Pysqlite2 is documented here:
http://initd.org/pub/software/pysqlite/doc/usage-guide.html

yes I had already found the docs and noticed there was no explanation
about such a remarkable difference from pysqlite1 and pysqlite2.

After your message, I find even more strange that Magnus' book reported
pysqlite1 examples as Sqlite 3 was a great step forward in my opinion.
But this can not prevent me from saying that it is the best Python book
around in the beginner to intermediate range.
 
?

=?ISO-8859-1?Q?Gerhard_H=E4ring?=

Vittorio said:
[...]
Nonetheless, I was unable to find any documentation about such a
different behaviour between Pysqlite and Pysqlite2; from my beginner
point of view the Pysqlite (Magnus' version) paramstyle looks a better
and more pythonic choice and I don't grasp the Pysqlite2 developers'
intentions deviating from that way.

The reason why pysqlite 0.x/1.x used paramstyle "pyformat", based on
Python string substitution for SQL parameters is that at the time
pysqlite was started, SQLite 2.x did not have any support for parameter
binding. So we had to "fake" it in Python, just like the MySQL interface
does (for the same reasons).

Later SQLite 2.x versions and of course SQLite 3.x supported real bound
parameters and pysqlite2 was developed from scratch to benefit from
them. SQLite 3.x supports both qmark and named paramstyles, so you can
use question marks *or* named parameters:

I've also once written a wrapper using pysqlite 2.x's hooks that allows
you to use the "format" paramstyle with pysqlite 2.x, so you can reuse
more code that was originally written against pysqlite 0.x/1.x:

from pysqlite2 import dbapi2 as sqlite

class PyFormatConnection(sqlite.Connection):
def cursor(self):
return sqlite.Connection.cursor(self, PyFormatCursor)

class PyFormatCursor(sqlite.Cursor):
def execute(self, sql, args=None):
if args:
qmarks = ["?"] * len(args)
sql = sql % tuple(qmarks)
sqlite.Cursor.execute(self, sql, args)
else:
sqlite.Cursor.execute(self, sql)

con = sqlite.connect(":memory:", factory=PyFormatConnection)
cur = con.cursor()
cur.execute("create table test(a, b, c)")
cur.execute("insert into test(a, b, c) values (%s, %s, %s)", ('asdf', 4,
5.2))
cur.execute("select a, b, c from test where c <> %s", (4.27,))
print cur.fetchone()
cur.close()
con.close()
I would be very grateful if someone would cast a light over
Pysqlite/Pysqlite2 discrepancies.

I think about the only place I wrote a bit about the differences was in
the pysqlite 2.0 final announcement:

http://lists.initd.org/pipermail/pysqlite/2005-May/000043.html

-- Gerhard
 
S

Steve Holden

Gerhard said:
Vittorio said:
[...]
Nonetheless, I was unable to find any documentation about such a
different behaviour between Pysqlite and Pysqlite2; from my beginner
point of view the Pysqlite (Magnus' version) paramstyle looks a better
and more pythonic choice and I don't grasp the Pysqlite2 developers'
intentions deviating from that way.


The reason why pysqlite 0.x/1.x used paramstyle "pyformat", based on
Python string substitution for SQL parameters is that at the time
pysqlite was started, SQLite 2.x did not have any support for parameter
binding. So we had to "fake" it in Python, just like the MySQL interface
does (for the same reasons).

Later SQLite 2.x versions and of course SQLite 3.x supported real bound
parameters and pysqlite2 was developed from scratch to benefit from
them. SQLite 3.x supports both qmark and named paramstyles, so you can
use question marks *or* named parameters:

I've also once written a wrapper using pysqlite 2.x's hooks that allows [...]
I would be very grateful if someone would cast a light over
Pysqlite/Pysqlite2 discrepancies.


I think about the only place I wrote a bit about the differences was in
the pysqlite 2.0 final announcement:

http://lists.initd.org/pipermail/pysqlite/2005-May/000043.html
Unfortunately this appears to mean that pysqlite2 isn't fully DB
API-conformant.
Traceback (most recent call last):

Of course, given the module's flexibility it's difficult to know what
you *would* put in paramstyle. I take it mixing different paramstyles in
the same query will fail?

regards
Steve
 
C

carlo

Gerhard Häring ha scritto:
The reason why pysqlite 0.x/1.x used paramstyle "pyformat", based on
Python string substitution for SQL parameters is that at the time
pysqlite was started, SQLite 2.x did not have any support for parameter
binding. So we had to "fake" it in Python, just like the MySQL interface
does (for the same reasons).

Thanks Gerhard for your valuable help.
 
S

Scott David Daniels

Magnus said:
Vittorio wrote:
Using the same symbol for both string substitutions and SQL placeholder
such as pysqlite 1 and the MySQL interface does, is not really a bright
idea in my opinion. Who thinks this is pretty?

sql = "SELECT %s FROM %s WHERE %s = %%s"
cur.execute(sql % (col,table,search_col), (param,))

I think it's less confusing with:

sql = "SELECT %s FROM %s WHERE %s = ?"
cur.execute(sql % (col,table,search_col), (param,))
or you could use:

sql = "SELECT %s FROM %s WHERE %s = %s"
cur.execute(sql % (col,table,search_col, '%s'), (param,))

which I like better, because you don't have to read
extremely carefully for the double-percents.

--Scott David Daniels
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top