Psycopg; How to detect row locking?

A

Alban Hertroys

Good day,

I have a number of threads doing inserts in a table, after which I want
to do a select. This means that it will occur that the row that I want
to select is locked (by the DB). In these cases, I believe I receive an
OperationalError (or is it an InterfaceError?).

Is it possible (and if so - how?) to verify that the exception occured
because of row locking, so that I can wait and try again?

Or are there better ways to achieve this? I'm not too charmed about
polling loops that may never end.

I'm using python 2.3 with psycopg 1.1.13 and PostgreSQL 7.4.2.

Regards, Alban Hertroys.
 
J

Jorge Godoy

Alban Hertroys said:
Good day,

I have a number of threads doing inserts in a table, after which I want
to do a select. This means that it will occur that the row that I want
to select is locked (by the DB). In these cases, I believe I receive an
OperationalError (or is it an InterfaceError?).

Is it possible (and if so - how?) to verify that the exception occured
because of row locking, so that I can wait and try again?

Or are there better ways to achieve this? I'm not too charmed about
polling loops that may never end.

I'm using python 2.3 with psycopg 1.1.13 and PostgreSQL 7.4.2.

Are you manually locking those rows? If so, you can maintain some
structure to keep track of locked rows.

If you are not locking, PostgreSQL uses MVCC where it locks as little as
possible and you are able to select the new data inside the same
transaction and old data outside of it (until it is commited).
 
A

Alban Hertroys

Jorge said:
Are you manually locking those rows? If so, you can maintain some
structure to keep track of locked rows.

No, I don't. That may be a solution, though. Then at least I *know* when
a record is locked... But it's going to complicate things, if the
transaction is split among multiple threads... I'm not looking forward
to that.

I think the rows are locked because the inserts haven't finished
inserting yet. The select takes place in the same session AFAIK, but not
in the same thread of my Python application. I'm probably looking at a
race condition here... (Ain't multithreading fun...)

I'm also not sure whether I'm actually looking at the same transaction.
Is there a way to verify such?

I do know for certain that all transactions use the same database
connection (I pass it along in a context object, together with config
settings and debugging methods). And I'm also quite sure that it doesn't
commit in between.
If you are not locking, PostgreSQL uses MVCC where it locks as little as
possible and you are able to select the new data inside the same
transaction and old data outside of it (until it is commited).

I suppose there must be a short while where the row is locked during the
insert, where I may already be trying to select it. If this is indeed
the case, I would expect to receive a "row is locked" type of error.

Alternatively, the select may be waiting (w/in psql) until the insert
finished (which should be pretty soon in all cases[*]), but that depends
on implementations beyond my reach. Not that that matters, I shouldn't
have this problem in that case.


[*] Unless you break your database with triggers that lock up or
something similar. That could be a reason for the PostgreSQL team to not
let select wait until an insert on the same row finished, but to
return an error instead.
 
J

Jorge Godoy

Alban Hertroys said:
No, I don't. That may be a solution, though. Then at least I *know* when
a record is locked... But it's going to complicate things, if the
transaction is split among multiple threads... I'm not looking forward
to that.

:) I don't think that locking is a solution... The MVCC approach
sounds much better to me.
I think the rows are locked because the inserts haven't finished
inserting yet. The select takes place in the same session AFAIK, but not
in the same thread of my Python application. I'm probably looking at a
race condition here... (Ain't multithreading fun...)

If there's no data then a SELECT would return nothing at all, without
any error after all you're querying the database for some information
that doesn't exist.
I'm also not sure whether I'm actually looking at the same transaction.
Is there a way to verify such?

You have to look at psycopg's docs. I use pyPgSQL here.
I do know for certain that all transactions use the same database
connection (I pass it along in a context object, together with config
settings and debugging methods). And I'm also quite sure that it doesn't
commit in between.

You would benefit a lot of transactions if you are inserting a lot of
data or if there's some relationship between data (and constraints and
triggers and ... at the database). The INSERT isn't commited until you
issue a COMMIT. Any SELECT before that will return nothing. If you're
not using transactions, then you are hitting the disk for each and every
command. I've made some tests here and the difference goes from some
seconds (with transactions) to several minutes (without transactions)
for a 65k rows insert on an old project we did.
I suppose there must be a short while where the row is locked during the
insert, where I may already be trying to select it. If this is indeed
the case, I would expect to receive a "row is locked" type of error.

There's no such need. There's no data there so the SELECT returns
nothing. If there's data and you're updating it, then until you commit
the transaction you get the old values.
Alternatively, the select may be waiting (w/in psql) until the insert
finished (which should be pretty soon in all cases[*]), but that depends
on implementations beyond my reach. Not that that matters, I shouldn't
have this problem in that case.


[*] Unless you break your database with triggers that lock up or
something similar. That could be a reason for the PostgreSQL team to not
let select wait until an insert on the same row finished, but to
return an error instead.

They use MVCC: Multi-Version Concurrency Control. You might want to
read about it:

http://www.linuxgazette.com/issue68/mitchell.html
http://www.developer.com/open/article.php/877181


Be seeing you,
 
S

Steve Holden

Alban said:
No, I don't. That may be a solution, though. Then at least I *know* when
a record is locked... But it's going to complicate things, if the
transaction is split among multiple threads... I'm not looking forward
to that.

I think the rows are locked because the inserts haven't finished
inserting yet. The select takes place in the same session AFAIK, but not
in the same thread of my Python application. I'm probably looking at a
race condition here... (Ain't multithreading fun...)

I'm also not sure whether I'm actually looking at the same transaction.
Is there a way to verify such?
If all threads are using the same database connection then you need to
make sure that psycopg is thread-safe, which some database modules are
and some aren't.
I do know for certain that all transactions use the same database
connection (I pass it along in a context object, together with config
settings and debugging methods). And I'm also quite sure that it doesn't
commit in between.
If all threads are using the same database connection, even if you
create multiple cursors, then you shouldn't have any locking issues
because all threads are part of the same transaction.
I suppose there must be a short while where the row is locked during the
insert, where I may already be trying to select it. If this is indeed
the case, I would expect to receive a "row is locked" type of error.
This will perhaps depend on the isolation level you've selected. I'm
sorry, I'm not a PostgreSQL user so I don't know the detail of what's
available, but in many databases you can determine whether transactional
changes are visible from other connections.
Alternatively, the select may be waiting (w/in psql) until the insert
finished (which should be pretty soon in all cases[*]), but that depends
on implementations beyond my reach. Not that that matters, I shouldn't
have this problem in that case.


[*] Unless you break your database with triggers that lock up or
something similar. That could be a reason for the PostgreSQL team to not
let select wait until an insert on the same row finished, but to return
an error instead.

regards
Steve
 
J

Jorge Godoy

Steve Holden said:
If all threads are using the same database connection, even if you
create multiple cursors, then you shouldn't have any locking issues
because all threads are part of the same transaction.

I think that there is one transaction per cursor and not per
connection. Is it really like that or there's one transaction per
connection?


Be seeing you,
 
I

Istvan Albert

Alban said:
I'm also not sure whether I'm actually looking at the same transaction.
Is there a way to verify such?

Are you using the same connection across threads or the
same cursor?

Transactions happen per cursor. They get committed when you call
the commit() on the cursor (and you probably should not share the cursor
across threads).
I suppose there must be a short while where the row is locked during the
insert, where I may already be trying to select it. If this is indeed
the case, I would expect to receive a "row is locked" type of error.

It makes no sense to return a "row is locked" type of error. A database is
supposed to take care of business. It will either return the new view
or the old view of the data depending on whether the new data
has been committed or not. Assuring the data was committed in
time (before selecting it) is your responsibility.

Not so along ago there was a similar post. In both, the posters were
using a mulithreaded database system, postgresql within a mulithreaded program
yet at the same time desperately trying to fully control the database
threads from the program threads, both hoping that transactions
will save the day. But that is not what transactions are about. Transactions are
simply a way to ensure that a series of database instructions
(within a thread) either all execute or none of them do. One can
always try to coerce them to do something fancier or more different
but then quality of the solution shows this.

Istvan.
 
A

Alban Hertroys

Jorge said:
If there's no data then a SELECT would return nothing at all, without
any error after all you're querying the database for some information
that doesn't exist.

You're right. I thought I had finally found the problem, but apparently
I'll have to look further :(
You have to look at psycopg's docs. I use pyPgSQL here.

I'm afraid that documentation on psycopg is rather scarce. The most
enlightening is the documentation of DBAPI 2.0, but that's not specific
to the psycopg implementation of course.
You would benefit a lot of transactions if you are inserting a lot of
data or if there's some relationship between data (and constraints and
triggers and ... at the database). The INSERT isn't commited until you
issue a COMMIT. Any SELECT before that will return nothing. If you're
not using transactions, then you are hitting the disk for each and every
command. I've made some tests here and the difference goes from some
seconds (with transactions) to several minutes (without transactions)
for a 65k rows insert on an old project we did.

Not to mention the possibility to rollback if a query in the transaction
fails. I'm so glad I didn't choose for something crude like MySQL ;)
They use MVCC: Multi-Version Concurrency Control. You might want to
read about it:

http://www.linuxgazette.com/issue68/mitchell.html
http://www.developer.com/open/article.php/877181

Actually I did know that, I just forgot to remember ;). I used to work
with Oracle, and it works sort of the same way.

Thanks for the help, I'll have to find a better culprit.

Alban.
 
A

Alban Hertroys

Alban said:
Actually I did know that, I just forgot to remember ;). I used to work
with Oracle, and it works sort of the same way.

Not _at_ Oracle, mind you. Before I confuse someone.
 
A

Alban Hertroys

Istvan said:
Are you using the same connection across threads or the
same cursor?

Transactions happen per cursor. They get committed when you call
the commit() on the cursor (and you probably should not share the cursor
across threads).

Ok, that means I'm looking at different transactions. That's an
important distinction...
Not so along ago there was a similar post. In both, the posters were
using a mulithreaded database system, postgresql within a mulithreaded
program
yet at the same time desperately trying to fully control the database
threads from the program threads, both hoping that transactions
will save the day. But that is not what transactions are about.

One of those posters was probably me... I'm still kind of stuck on the
problem, but now knowing that I'm looking at different transactions I am
pretty sure that I should try a different approach.
Transactions are
simply a way to ensure that a series of database instructions
(within a thread) either all execute or none of them do. One can
always try to coerce them to do something fancier or more different
but then quality of the solution shows this.

Which would have been the case if I were looking at a single
transaction. But apparently that's not the case. Thank you very much for
your help, now at least I know where I should look.

Alban.
 
S

Steve Holden

Alban said:
See below.

Ok, that means I'm looking at different transactions. That's an
important distinction...
In that case, psycopg must be non-compliant with DBAPI, because commit()
is a connection method, not a cursor method.

Hence my advice that all cursors were part of the same transaction.
One of those posters was probably me... I'm still kind of stuck on the
problem, but now knowing that I'm looking at different transactions I am
pretty sure that I should try a different approach.
Please make sure that you correctly understand the way psycopg is acting
before you assume that multiple cursors on the same connection are
isolated from each other. You may be right, but if so then psycopg is
definitely not DBAPI compliant.
Which would have been the case if I were looking at a single
transaction. But apparently that's not the case. Thank you very much for
your help, now at least I know where I should look.

Alban.

regards
Steve
 
J

Jim Hefferon

Steve Holden said:
Please make sure that you correctly understand the way psycopg is acting
before you assume that multiple cursors on the same connection are
isolated from each other. You may be right, but if so then psycopg is
definitely not DBAPI compliant.
This is from the README for the (reasonably fresh) version of psycopg that
I happen to have (I couldn't find a link, so I'm pasting):

Extensions to the Python DBAPI-2.0
----------------------------------

psycopg offers some little extensions on the Python DBAPI-2.0. Note that the
extension do not make psycopg incompatible and you can still use it without
ever knowing the extensions are here.

The DBAPI-2.0 mandates that cursors derived from the same connection are not
isolated, i.e., changes done to the database by one of them should be
immediately visible by all the others. This is done by serializing the queries
on the same physical connection to the database (PGconn struct in C.)
Serializing queries when the network latencies are hight (and network speed is
low) dramatically lowers performance, so it is possible to put a connection
into not-serialized mode, by calling the .serialize() method giving it a
0-value argument or by creating a connection using the following code:

conn = psycopg.connect("dbname=...", serialize=0)

After that every cursor will get its own physical connection to the database
and multiple threads will go at full speed. Note that this feature makes the
new cursors non-compliant respect to the DBAPI-2.0.

The main extension is that we support (on not-serialized cursors) per-cursor
commits. If you do a commit() on the connection all the changes on all the
cursors derived from that connection are committed to the database (in random
order, so take your care.) But you can also call commit() on a single cursor
to commit just the operations done on that cursor. Pretty nice.

Note that you *do have* to call .commit() on the cursors or on the connection
if you want to change your database. Note also that you *do have* to call
commit() on a cursor even before a SELECT if you want to see the changes
apported by other threads to the database.

So, although you can turn it off, by default (and with some justification) the
behavior is non-compliant.

Jim
 
C

Cliff Wells

Good day,

I have a number of threads doing inserts in a table, after which I want
to do a select. This means that it will occur that the row that I want
to select is locked (by the DB). In these cases, I believe I receive an
OperationalError (or is it an InterfaceError?).

Is it possible (and if so - how?) to verify that the exception occured
because of row locking, so that I can wait and try again?

I've seen this exception and frankly, it has always occurred due to row
locking. Simply retry the query until it completes.
Or are there better ways to achieve this? I'm not too charmed about
polling loops that may never end.

I've written multithreaded apps with hundreds of threads that used the
retry technique and never had a problem. In fact, I believe Zope uses
this method.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top