DB API 2.0 and transactions

  • Thread starter Christopher J. Bottaro
  • Start date
C

Christopher J. Bottaro

Hi,
Why is there no support for explicit transactions in the DB API? I mean
like transaction() to start the trans and commit() and rollback() would end
the trans or something.

The reason why I ask is because I wrote a daemon that interacts with a
Postgres DB. The value of CURRENT_TIMESTAMP according to Postgres is NOT
the actual walltime, but the walltime when the current transaction started.

This gets weird when using the Python DB API to interact with Postgres
because a transaction gets started in 3 places: connection, commit,
rollback.

So consider the daemon:

[pseudo code]
connect # begin trans at 12:00
sleep waiting # lets say 15 mins
wake up
put entry in db using CURRENT_TIMESTAMP # oops
[/code]

Oops, the CURRENT_TIMESTAMP evaluates to 12:00, not 12:15.

Now I know there are ways around this...
1) In Postgres, you can get the walltime and cast it to a timestamp.
2) In Python, you can just do an empty commit in order to "manually" start
a new transaction.

I just think its a bit weird because this bit me in the butt for quite a
while and this didn't happen when doing the same thing in other langs.

Anyone have any opinions on this?
 
V

vincent wehren

Newsbeitrag | Hi,
| Why is there no support for explicit transactions in the DB API? I mean
| like transaction() to start the trans and commit() and rollback() would
end
| the trans or something.
|
| The reason why I ask is because I wrote a daemon that interacts with a
| Postgres DB. The value of CURRENT_TIMESTAMP according to Postgres is NOT
| the actual walltime, but the walltime when the current transaction
started.
|
| This gets weird when using the Python DB API to interact with Postgres
| because a transaction gets started in 3 places: connection, commit,
| rollback.
|
| So consider the daemon:
|
| [pseudo code]
| connect # begin trans at 12:00
| sleep waiting # lets say 15 mins
| wake up
| put entry in db using CURRENT_TIMESTAMP # oops
| [/code]
|
| Oops, the CURRENT_TIMESTAMP evaluates to 12:00, not 12:15.
|
| Now I know there are ways around this...
| 1) In Postgres, you can get the walltime and cast it to a timestamp.
| 2) In Python, you can just do an empty commit in order to "manually"
start
| a new transaction.
|
| I just think its a bit weird because this bit me in the butt for quite a
| while and this didn't happen when doing the same thing in other langs.
|
| Anyone have any opinions on this?

The described behavior seems to be totally in synch with expectations

(both in accordance with PostgreSQL docs and the DB-API.)

These "other languages" *must* be doing something wrong! ;)

( Auto-commit set to "on" perhaps? )



Regards,

-

Vincent Wehren


|
 
F

flyingfred0

I've found it's occasionally necessary to go one lever deeper and use
the _pg module directly so you are in control of your transactions
(instead of letting pgdb automatically put you in one).
 
M

Magnus Lycka

You might have spotted a fairly nasty bug there!
Hi,
Why is there no support for explicit transactions in the DB API? I mean
like transaction() to start the trans and commit() and rollback() would end
the trans or something.

To quote from Date & Darwen "A Guide to the SQL Standard, 4th ed.":

"An SQL-transaction is initiated when the relevant SQL-agent executes
a 'transaction-initiating' SQL statement (...) and the SQL-agent does
not already have an SQL transaction in progress. Note, therefore, that
(...) SQL-transactions can't be nested. Note too that transaction
initiation is always implicit--there is no explicit 'BEGIN TRANSACTION'
statement."

The Python DB-API standard matches the SQL standard, and that seems
reasonable.
This gets weird when using the Python DB API to interact with Postgres
because a transaction gets started in 3 places: connection, commit,
rollback.

That's not how it's supposed to work! Are you sure that you don't
implicitly start transactions by SELECTs etc?

PostgreSQL violates the SQL standards by running in autocommit mode
unless you explicitly perform its non-standard BEGIN command. If you
are right about the behaviour you describe, the PostgreSQL binding
for Python that you use may have taken the easy route, and performs
a "BEGIN" on connect and after every commit or rollback.

If so, this is a serious bug, and should be reported as one. The correct
thing to do is to insert the BEGIN just before the first SQL statement
that is affecting transactions. Of course, this means that the binding
needs to keep track of transaction state, and this makes it a little
bit more complicated. You'd need something like this in the binding:

class connection:
def __init__(...):
...
self.inTxn = False

def commit(...):
...
self.inTxn = False

def rollback(...):
...
self.inTxn = False

def execute(...):
...
if not self.inTxn:
perform the BEGIN command against the backend
self.inTxn = True
...

Actually, this isn't perfect either, because not all SQL commands
(should) initate transactions, but it's a lot closer to what we want.

This bug has implications far beyond timestamps. Imagine two transaction
running with isolation level set to e.g. serializable. Transaction A
updates the AMOUNT column in various rows of table X, and transaction
B calculates the sum of all AMOUNTS.

Lets say they run over time like this, with | marking begin and >
commit (N.B. ASCII art, you need a fixed font):

....|--A-->.......|--A-->........
............|-B->.........|-B->..

This works as expected...but imagine transactions implicitly
begin too early:

|-----A-->|---------A-->|-------
|------------B->|----------B->|-

This will cause the aggregations in B to show "delayed" results.
Not at all what one might expect...


For more about isolation levels, see e.g. here:
http://pgsqld.active-venture.com/transaction-iso.html
 
C

Christopher J. Bottaro

<posted & mailed>

Magnus said:
You might have spotted a fairly nasty bug there!
PostgreSQL violates the SQL standards by running in autocommit mode
unless you explicitly perform its non-standard BEGIN command. If you
are right about the behaviour you describe, the PostgreSQL binding
for Python that you use may have taken the easy route, and performs
a "BEGIN" on connect and after every commit or rollback.

Check this out...

<code>
import pgdb
import time

print time.ctime()
db = pgdb.connect(user='test', host='localhost', database='test')
time.sleep(5)
db.cursor().execute('insert into time_test
(datetime)
values
(CURRENT_TIMESTAMP)')
db.commit()
curs = db.cursor()
curs.execute('select datetime from time_test order by datetime desc limit
1')
row = curs.fetchone()
print row[0]
</code>

<output>
Fri Jun 10 17:27:21 2005
'2005-06-10 17:27:21.654897-05'
</output>

Notice the times are exactly the same instead of 5 sec difference.

What do you make of that? Some other replies to this thread seemed to
indicate that this is expected and proper behavior.

-- C
 
M

Magnus Lycka

I'm CC:ing this to D'Arcy J.M. Cain. (See comp.lang.python for prequel
D'Arcy.)
Check this out...

<code>
import pgdb
import time

print time.ctime()
db = pgdb.connect(user='test', host='localhost', database='test')
time.sleep(5)
db.cursor().execute('insert into time_test
(datetime)
values
(CURRENT_TIMESTAMP)')
db.commit()
curs = db.cursor()
curs.execute('select datetime from time_test order by datetime desc limit
1')
row = curs.fetchone()
print row[0]
</code>

<output>
Fri Jun 10 17:27:21 2005
'2005-06-10 17:27:21.654897-05'
</output>

Notice the times are exactly the same instead of 5 sec difference.

What do you make of that? Some other replies to this thread seemed to
indicate that this is expected and proper behavior.

This is wrong. It should not behave like that if it is to follow
the SQL standard which *I* would expect and consider proper.

I don't think the SQL standard mandates that all evaluations of
CURRENT_TIMESTAMP within a transaction should be the same. It does
manadate that CURRENT_TIMESTAMP in only evaluated once in each SQL
statement, so "CURRENT_TIMESTAMP=CURRENT_TIMESTAMP" should always be
true in a WHERE statement. I don't think it's a bug if all timestamps
in a transaction are the same though. It's really a bonus if we can
view all of a transaction as taking place at the same time. (A bit
like Piper Halliwell's time-freezing spell in "Charmed".)

The problem is that transactions should never start until the first
transaction-initiating SQL statement takes place. (In SQL-92, all
standard SQL statements are transaction initiating except CONNECT,
DISCONNECT, COMMIT, ROLLBACK, GET DAIGNOSTICS and most SET commands
(SET DESCRIPTOR is the exception here).) Issuing BEGIN directly after
CONNECT, ROLLBACK and COMMIT is in violation with the SQL standards.

A workaround for you could be to explicitly start a new transaction
before the insert as PostgreSQL (but not the SQL standard) wants you
to do. I suppose you can easily do that using e.g. db.rollback(). If
you like, I guess you could do db.begin=db.rollback in the beginning
of your code and then use db.begin().

Another option would be to investigate if any of the other postgreSQL
drivers have a more correct behaviour. The non-standard behaviour that
you describe it obvious from the pgdb source. See:
http://www.pygresql.org/cvsweb.cgi/pygresql/module/pgdb.py?rev=1.27
(Comments added by me.)
class pgdbCnx:

def __init__(self, cnx):
self.__cnx = cnx
self.__cache = pgdbTypeCache(cnx)
try:
src = self.__cnx.source()
src.execute("BEGIN") # Ouch!
except:
raise OperationalError, "invalid connection."

...
def commit(self):
try:
src = self.__cnx.source()
src.execute("COMMIT")
src.execute("BEGIN") # Ouch!
except:
raise OperationalError, "can't commit."

def rollback(self):
try:
src = self.__cnx.source()
src.execute("ROLLBACK")
src.execute("BEGIN") # Ouch!
except:
raise OperationalError, "can't rollback."



....

This should be changed to something like this (untested):
class pgdbCnx:

def __init__(self, cnx):
self.__cnx = cnx
self.__cache = pgdbTypeCache(cnx)
self.inTxn = False #NEW
try:
src = self.__cnx.source() # No BEGIN here
except:
raise OperationalError, "invalid connection."
....
def commit(self):
try:
src = self.__cnx.source()
src.execute("COMMIT")
self.inTxn = False # Changed
except:
raise OperationalError, "can't commit."

def rollback(self):
try:
src = self.__cnx.source()
src.execute("ROLLBACK")
self.inTxn = False # Changed
except:
raise OperationalError, "can't rollback." ....
def cursor(self):
try:
src = self.__cnx.source()
return pgdbCursor(src, self.__cache, self) # Added self
except:
raise pgOperationalError, "invalid connection."

....

class pgdbCursor:

def __init__(self, src, cache, conn): # Added conn
self.__cache = cache
self.__source = src
> self.__conn = conn # New
self.description = None
self.rowcount = -1
self.arraysize = 1
self.lastrowid = None
....
(execute calls executemany)
....
def executemany(self, operation, param_seq):
self.description = None
self.rowcount = -1

# first try to execute all queries
totrows = 0
sql = "INIT"
try:
for params in param_seq:
if params != None:
sql = _quoteparams(operation, params)
else:
sql = operation
> if not self.__conn.inTxn: # Added test
self.__source.execute('BEGIN')
> self.__conn.inTxn = True
rows = self.__source.execute(sql)
if rows != None: # true is __source is NOT a DQL
totrows = totrows + rows
else:
self.rowcount = -1

I guess it would be even better if the executemany method checked
that it was really a tranasction-initiating SQL statement, but that
makes things a bit slower and more complicated, especially as I
suspect that the driver premits several SQL statements separated
by semicolon in execute and executemany. We really don't want to
add a SQL parser to pgdb. Making all statements transaction-initiating
is at least much closer to standard behaviour than to *always* start
transactions start prematurely. I guess it will remove problems like
the one I mentioned earlier (repeated below) in more than 99% of the
cases.

This bug has implications far beyond timestamps. Imagine two
transaction running with isolation level set to e.g. serializable.
Transaction A updates the AMOUNT column in various rows of table
X, and transaction B calculates the sum of all AMOUNTs in X.

Lets say they run over time like this, with | marking transaction
start and > commit (N.B. ASCII art follows, you need a fixed font
to view this):

....|--A-->.......|--A-->........
............|-B->.........|-B->..

This works as expected... The first B-transaction sums up AMOUNTs
after the first A-transaction is done etc, but imagine what happens
if transactions implicitly begin too early as with the current pgdb:

|-----A-->|---------A-->|-------
|------------B->|----------B->|-

This will cause B1 to sum up AMOUNTs before A1, and B2 will sum up
AMOUNTs after A1, not after A2.
 
S

Stuart Bishop

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Magnus said:
Another option would be to investigate if any of the other postgreSQL
drivers have a more correct behaviour. The non-standard behaviour that
you describe it obvious from the pgdb source. See:
http://www.pygresql.org/cvsweb.cgi/pygresql/module/pgdb.py?rev=1.27

fwiw psycopg 1.1.18 has the correct behavior (as usual).

I can add this test to the DB-API 2.0 Anal Compliance Test Suite if we want,
although it sounds like it is only an issue with PostgreSQL drivers.

- --
Stuart Bishop <[email protected]>
http://www.stuartbishop.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCsk3oAfqZj7rGN0oRAjHAAJ4kQzxJXFW6hX6Q1t896fMzT0EUjACgkBhw
X4wB17+4FwO9TsKpiIBJB50=
=mYfn
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,151
Latest member
JaclynMarl
Top