psycopg, transactions and multiple cursors

A

Alban Hertroys

Oh no! It's me and transactions again :)

I'm not really sure whether this is a limitation of psycopg or
postgresql. When I use multiple cursors in a transaction, the records
inserted at the start of the transaction aren't visible to those later
on in that transaction (using a different cursor).

Attached is a simplified example (the except's are a bit blunt, I know)
of what I'm trying to do. In reality, the different cursors are
sometimes run in differeent threads, or they're nested, which is why I
can't use the same cursor for all queries.

Am I doing something silly again or is this a bug (in what?)? Does
anybody here have some ideas about a solution for this problem? What is
your credit card number (No, don't answer that!)? What will I learn this
time?

Regards,

Alban Hertroys,
MAG Productions.
 
A

Alban Hertroys

Alban said:
Attached is a simplified example (the except's are a bit blunt, I know)
of what I'm trying to do. In reality, the different cursors are
sometimes run in differeent threads, or they're nested, which is why I
can't use the same cursor for all queries.

I suppose the actual error message would be useful too (Doh!):
> python transaction.py
{'id': 1L}
Traceback (most recent call last):
File "transaction.py", line 40, in ?
cursor2.execute("INSERT INTO y (x_id) VALUES (%s)" % row["id"])
psycopg.IntegrityError: ERROR: insert or update on table "y" violates
foreign key constraint "$1"
DETAIL: Key (x_id)=(1) is not present in table "x".

INSERT INTO y (x_id) VALUES (1)
 
D

Diez B. Roggisch

psycopg.IntegrityError: ERROR: insert or update on table "y" violates
foreign key constraint "$1"
DETAIL: Key (x_id)=(1) is not present in table "x".

INSERT INTO y (x_id) VALUES (1)

As the error is about integrity violation, its safe to assume that the
problem is postgresql related.

And changes made in one transaction can't be seen in another until they are
commited _and_ the other transaction is opened after the first one is
commited - that makes sense, as otherwise the second transaction could
create references that then are invalidated by rolling back the first
transaction, creating undefined state.

Now looking at your code, one sees that BEGIN is called on the cursor1 - so
the transaction boundaries are around cursor1, thus they are not visible to
the transaction going on in cursor2. From the psycopg site:

"""
Every time a new cursor is created, a new connection does not need to be
opened;
"""

So the cursor2 seems to be using its own connection, thus its own
transaction. This is also in the README for psycopg.

As I don't have psycopg running here, I can't experiment myself, but I think
you should use cursor1 for all of your sql statements that belong to one
transaction.
 
A

Alban Hertroys

Diez said:
As the error is about integrity violation, its safe to assume that the
problem is postgresql related.

Well, actually it merely shows that the queries aren't in the same
transaction, or that there was never a transaction to begin with. That's
what I hoped to illustrate with this example; Apparently, I could have
been more clear...
And changes made in one transaction can't be seen in another until they are
commited _and_ the other transaction is opened after the first one is
commited - that makes sense, as otherwise the second transaction could
create references that then are invalidated by rolling back the first
transaction, creating undefined state.

Of course, that's on of the reasons to have transactions; to make sure
that 'other' users don't see data that cannot be guaranteed to be valid yet.
Now looking at your code, one sees that BEGIN is called on the cursor1 - so
the transaction boundaries are around cursor1, thus they are not visible to

AFAIK, you can't open a transaction w/o using a cursor; You need a query
that says "BEGIN;".
You can commit a connection object, however. It would be nice to be able
to start a transaction on a connection object, but then you still could
only have one transaction per connection... :(

This would make it impossible to have nested loops of queries inside a
transaction (the cursor gets repositioned in the inner fetch of the loop
and the outer fetch looses track of which record it was positioned at).

Maybe the transaction shouldn't be closed until the cursor is closed,
even if other cursors are opened (in the same transaction) before that
happens.
Aside from that, it turns out that the first query gets committed, even
though the transaction should have failed. This probably just indicates
that the transaction indeed is closed too soon, however, I didn't tell
it to commit. Shouldn't it issue a rollback instead? [This would
probably confuse people, but is it correct to do otherwise? Shouldn't it
rollback if you don't commit?]

I know PostgreSQL can do transactions on nested cursors, I have used
that now and then in stored procedures (pl/pgsql).
> So the cursor2 seems to be using its own connection, thus its own
transaction. This is also in the README for psycopg.

So that's why I couldn't find the documentation... (Though the DBAPI 2.0
document linked from the Python site is rather helpful).
It would probably be nice to have those documents on the psycopg site
somewhere... (Or if they are, put clear pointers to them from the main
site - I've never been able to find them).
As I don't have psycopg running here, I can't experiment myself, but I think
you should use cursor1 for all of your sql statements that belong to one
transaction.

I cross my fingers that such may not be necessary.

Thanks for helping.
 
M

Michel Claveau - abstraction méta-galactique non t

Bonjour !
Hi !

Très intéressante observation.
Very interesting observation.

Mais est-ce bien un problème Python ?
But is this well a Python problem?

@-salutations
 
A

Alban Hertroys

Michel Claveau - abstraction méta-galactique non triviale en fuite
perpétuelle. said:
Mais est-ce bien un problème Python ?
But is this well a Python problem?

It is probably a problem with psycopg (or with PostgreSQL, for which it
is a Python module). So, I suppose it is, but I lack the information to
be certain.
 
D

Diez B. Roggisch

AFAIK, you can't open a transaction w/o using a cursor; You need a query
that says "BEGIN;".
You can commit a connection object, however. It would be nice to be able
to start a transaction on a connection object, but then you still could
only have one transaction per connection... :(

thats actually the case for all DBs I know including e.g. oracle with jdbc -
so the abstraction layers usually use connection pooling to speed up
opening a connection thus the app doesn't suffer so much.

Aside from that, it turns out that the first query gets committed, even
though the transaction should have failed. This probably just indicates
that the transaction indeed is closed too soon, however, I didn't tell
it to commit. Shouldn't it issue a rollback instead? [This would
probably confuse people, but is it correct to do otherwise? Shouldn't it
rollback if you don't commit?]

No idea why thats happening - on jdbc, one can set an "autocommit"
connection property that will do exactly that: enforce a commit if a
statement was successful. Maybe psycopg has that too?
I know PostgreSQL can do transactions on nested cursors, I have used
that now and then in stored procedures (pl/pgsql).

I didn't find much on nested cursors on google, so I don't know how they
work - but I assume if they are part of psycopg, they somehow have to be
created using an existing cursor, as otherwise how should psycopg know that
what you want is a nested and not a new cursor.

So is there something on cursor objecst to get a new cursor, or at least the
connection so you can get a cursor on the very same connection?
 
A

Alban Hertroys

Diez said:
thats actually the case for all DBs I know including e.g. oracle with jdbc -
so the abstraction layers usually use connection pooling to speed up
opening a connection thus the app doesn't suffer so much.

Ok, that means I won't get away with a single connection object (unless
psycopg puts a connection pool in a single connection object).
No idea why thats happening - on jdbc, one can set an "autocommit"
connection property that will do exactly that: enforce a commit if a
statement was successful. Maybe psycopg has that too?

Yes, it does have autoCommit, and thankfully it can be turned off (which
I did, of course).
I didn't find much on nested cursors on google, so I don't know how they
work - but I assume if they are part of psycopg, they somehow have to be
created using an existing cursor, as otherwise how should psycopg know that
what you want is a nested and not a new cursor.

Actually, nesting of cursors is something that PL/PgSQL can do. And so
can PL/SQL in Oracle.
It's something that's possible on a low database API level, and (to my
understanding) the DBAPI 2.0 uses them for queries. It's one of Pythons'
advantages over eg. PHP, and one of the reasons I chose to use Python
for this project.
So is there something on cursor objecst to get a new cursor, or at least the
connection so you can get a cursor on the very same connection?

A cursor is comparable to an iterator over a result set (where the
cursor fetches one record from the database at a time).

You use one for every query, and it can often be reused for the next
query as well.

However, if you loop through a result set (with a cursor) and you need
to do a query based on the record that the cursor is currently
positioned at, you can't use the same cursor again for that query, but
need to open a new cursor. That's what I meant by "nesting cursors".

That shouldn't matter for the state of a transaction, though...

Maybe there's a difference between database level cursors and DB API 2.0
level cursors in Python?

The number of questions is increasing...
 
D

Diez B. Roggisch

I just found the commit.py example in psycopg's docs (on debian its
installed automatically, but it should be in the source distribution ) - it
shows that what you want is possible. From what I see, it lacks the BEGIN
statement on a cursor, and commits on the connection - so maybe you should
ommit that statement and see what happens.
 
S

Steve Holden

Yes, most database connections will generate an implicit transaction the
first time a change is made to the database (in the absence of autocommit).
Ok, that means I won't get away with a single connection object (unless
psycopg puts a connection pool in a single connection object).



Yes, it does have autoCommit, and thankfully it can be turned off (which
I did, of course).
It certainly isn't too useful if you occasionally need to roll things
back. Structural changes to the database will frequently cause an
automatic commit anyway, though - you aren't modifying the database
structure at all, I take it?
Actually, nesting of cursors is something that PL/PgSQL can do. And so
can PL/SQL in Oracle.
It's something that's possible on a low database API level, and (to my
understanding) the DBAPI 2.0 uses them for queries. It's one of Pythons'
advantages over eg. PHP, and one of the reasons I chose to use Python
for this project.
Nested cursors aren;t nested transcations, though, right?
A cursor is comparable to an iterator over a result set (where the
cursor fetches one record from the database at a time).
B e careful that you don;t confuse the DB API curosrs with the cursors
you get with DECLARE CURSOR in PL/SQL, for example. The two aren;t
necessarily the same thing (and I've always felt that "cursor" was, for
that reason, not the best possible terminological choice for the DB API).
You use one for every query, and it can often be reused for the next
query as well.

However, if you loop through a result set (with a cursor) and you need
to do a query based on the record that the cursor is currently
positioned at, you can't use the same cursor again for that query, but
need to open a new cursor. That's what I meant by "nesting cursors".
Bear in mind, though, that it will often be *much* more efficient to do
a fetchall() from the cursor and iterate over that result. This
typically avoids many round-trips by fetching all the data at once,
though it's less practical if data sets become huge.

There's sometimes a middle ground to be found with repeated calls to
fetchmany().

In that way the curosr can be reused with impunity once the data has
been fetched.
That shouldn't matter for the state of a transaction, though...

Maybe there's a difference between database level cursors and DB API 2.0
level cursors in Python?
As I mentioned above, there often is.

I think we've already agreed that the psycopg cursors aren't DB API
compliant anyway, precisely because of the way they handle transactions.
An API-compliant library shares transaction state across all cursors
generated from the same connection, which (IMHO) gives the flexibility
one needs to to handle complex database interactions.
The number of questions is increasing...

Well, the number of answers is, too, but it seems to me you *are* kind
of wanting to have your cake and eat it. In previous threads you've
suggested that the psycopg cursor behavior is what you want, but now
that very behavior might (?) be biting you.

Anyway, you couldn't be talking to a better bunch of guys to try and
solve this problem. c.l.py is sometimes persistent beyond all reasonable
limits. Good luck!

regards
Steve
 
A

Alban Hertroys

Steve said:
Yes, most database connections will generate an implicit transaction the
first time a change is made to the database (in the absence of autocommit).

I quoted you on this in the class-file that is (supposedly) going to
solve my problems.
It certainly isn't too useful if you occasionally need to roll things
back. Structural changes to the database will frequently cause an
automatic commit anyway, though - you aren't modifying the database
structure at all, I take it?

Oh no, there's rarely a need for that and it tends to get you into trouble.
For example, I know a CMS (not Python related, sorry) that creates
database tables for classes it uses for various site objects, but it
apparently doesn't check for the class name being a reserved SQL
keyword. Oops...
Nested cursors aren;t nested transcations, though, right?

Indeed. My intention is actually to keep them in the same transaction.
B e careful that you don;t confuse the DB API curosrs with the cursors
you get with DECLARE CURSOR in PL/SQL, for example. The two aren;t
necessarily the same thing (and I've always felt that "cursor" was, for
that reason, not the best possible terminological choice for the DB API).

PHP uses 'resource identifier' or something similar. I can't say it's
much clearer, but you won't confuse it with cursors. OTOH, a connection
in PHP is also a 'resource identifier'.
Bear in mind, though, that it will often be *much* more efficient to do
a fetchall() from the cursor and iterate over that result. This
typically avoids many round-trips by fetching all the data at once,
though it's less practical if data sets become huge.

There's sometimes a middle ground to be found with repeated calls to
fetchmany().

In that way the curosr can be reused with impunity once the data has
been fetched.

True enough, and I don't use nested cursors that way in my Python code.
It is something I do frequently in PL/SQL, though, so I suppose I used
that as an example.
In Python the problem is different, I used a bad example.

The reason I use multiple cursors in Python is not so much that I nest
them, but that they are often in a local scope (in a method, a class, etc.).
As the cursor has to execute a different query in each scope anyway, I
don't think it really matters whether I go through loops to reuse the
previous cursor object (risking entering a nested loop unknowingly) or
create a new one.
The examples I encountered do the same thing, so I suppose it's alright.
I think we've already agreed that the psycopg cursors aren't DB API
compliant anyway, precisely because of the way they handle transactions.
An API-compliant library shares transaction state across all cursors
generated from the same connection, which (IMHO) gives the flexibility
one needs to to handle complex database interactions.
Indeed.


Well, the number of answers is, too, but it seems to me you *are* kind
of wanting to have your cake and eat it. In previous threads you've

I'm not familiar with that expression, but I suppose I would change that
to: "I baked my own cake, and though it doesn't taste as well as I
intended, I'm trying to eat it anyway".
It means a lot of work if some basic assumptions in a large project turn
out to be wrong... I was hoping to get it working the way it is, but it
seems that the "one connection" idea is not possible. Too bad...
suggested that the psycopg cursor behavior is what you want, but now
that very behavior might (?) be biting you.

I am indeed starting to have my doubts about the usefulness of being
able to commit a cursor. If it doesn't do anything to a transaction, or
if you can't start a new transaction using a cursor after that commit
(I'm not yet certain which of the two happens, if it isn't a third
option), then it seems kind of pointless and confusing.

The possibility suggested (to me at least) that it would be possible to
do multiple (maybe even parallel) transactions using only one database
connection, by making the cursors handle the transactions. Alas, it's
not so.
Anyway, you couldn't be talking to a better bunch of guys to try and
solve this problem. c.l.py is sometimes persistent beyond all reasonable
limits. Good luck!

You guys are great indeed! I'm learning things that I thought I knew
already. And you're quick too, all of my complicated questions so far
have been answered/solved within a day or so.

But I have to admit, the complicated questions are usually the most fun ;)

Thanks for all the help, it's much appreciated.

Alban.
 
S

Stuart Bishop

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alban Hertroys wrote:
| Oh no! It's me and transactions again :)
|
| I'm not really sure whether this is a limitation of psycopg or
| postgresql. When I use multiple cursors in a transaction, the records
| inserted at the start of the transaction aren't visible to those later
| on in that transaction (using a different cursor).

If you didn't get the answers you want in this thread, there is a
psycopg specific mailing list available at
http://lists.initd.org/mailman/listinfo/psycopg where you can talk
directly with the developers and other users.

- --
Stuart Bishop <[email protected]>
http://www.stuartbishop.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBmFxvAfqZj7rGN0oRAqQAAJ9QMWjVc2B/ZtjAQpxjI24J+AKvEQCdHiqb
CXSeHZYRg7TwNnALmyiywDs=
=EdkG
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top