Psycopg and threads problem

A

Alban Hertroys

Hello,

I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.

Is there a way to tell psycopg or python to wait until the insert took
place?

Alban.
 
C

Cliff Wells

Hello,

I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.

Is there a way to tell psycopg or python to wait until the insert took
place?

I believe turning autocommit on in the connection would achieve this (at
the expense of performance and transaction support). You could also add
a "commit" SQL statement to each thread.

Why not have the threads notify the other threads that it's okay to
start working? If you have to wait until the data is completely
inserted anyway, have your final thread perform the "commit" and then
set a threading.Event or somesuch .

Regards,
Cliff
 
M

Michael Fuhr

Alban Hertroys said:
I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.

I just wrote a small test program and was unable to duplicate your
problem -- the inserting threads ran, then the collecting thread
saw the data.

Without seeing your code we can only guess at what's wrong. Please
post the simplest program that reproduces the problem.

What versions of Python, psycopg, and PostgreSQL are you using?
What operating system and version are you running on?
Is there a way to tell psycopg or python to wait until the insert took
place?

In my test program, the inserting threads called conn.commit() and
then exited. I then joined all of the inserting threads before
starting the collecting thread. You could also use a condition
variable to signal when the inserts have been committed.
 
I

Istvan Albert

Alban said:
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no

Make sure you commit the inserts. Otherwise you might
simply end up selecting on the old view.

There is a commit when you close the db connection so the data is
there when you check it later. When migrating from dbs
without transaction support this can be very confusing.

Istvan.
 
A

Alban Hertroys

This has become a bit of a mixed reply, but I think it's better for
understanding the problem I'm trying to solve.

Istvan said:
Make sure you commit the inserts. Otherwise you might
simply end up selecting on the old view.

There is a commit when you close the db connection so the data is
there when you check it later. When migrating from dbs
without transaction support this can be very confusing.

But PostgreSQL does have transaction support... I rely on that. It is
one of the reasons I chose for Python (+psycopg).

The script I'm working on does bulk inserts from multiple related XML
files (parsed using the sax parser) and take turns inserting small
batches of xml records from those files. The collection thread combines
these into 1 xml record, which is why it's so important that the inserts
are done in time.

I can't commit until all the data has been inserted and combined. The
commit shouldn't happen until the end of the main thread is reached.

If the server goes down, or there's another reason it can't continue
parsing, the whole transaction should rollback. Committing in between
would be 'problematic' (where in the XML files were we interupted? Hard
to tell).

Also, I don't think I can join the threads (as someone else suggested),
as they are still working in an Application instance (part of the SAX
parser). The threads are waiting until they're allowed to continue; by a
linked list of Events (so that I can remove events for threads that
finished - which shouldn't happen, but it may). Unless I misunderstand
thread joining, of course.

I have to admit that I have very little experience with thread
programming, and this is (part of) my first Python program. It is
definitely a steep learning curve; I hope I don't fall back down too often.

So far, Python has been nice to me :)

Alban.
 
A

Alban Hertroys

Michael said:
I just wrote a small test program and was unable to duplicate your
problem -- the inserting threads ran, then the collecting thread
saw the data.

Without seeing your code we can only guess at what's wrong. Please
post the simplest program that reproduces the problem.

That's going to be difficult, as it is a rather complex system of
interwoven threads with XML parsers and such. I tried making a test case
myself, but after an hour of programming I wasn't near the actual situation.
It probably doesn't help that I don't have any previous experience with
either Python or thread programming and that I _need_ a vacation...
What versions of Python, psycopg, and PostgreSQL are you using?
What operating system and version are you running on?

Python 2.3
Psycopg 1.0
PostgreSQL 7.4.2

Keep in mind that these are Debian Linux versions, and with their
tendency to "touch" things to break it better... ;)
In my test program, the inserting threads called conn.commit() and
then exited. I then joined all of the inserting threads before
starting the collecting thread. You could also use a condition
variable to signal when the inserts have been committed.

At the moment, I still use commit (it has to go eventually, I need to be
able to rollback), but the threads don't end at that point. Instead,
they wait until they can insert the next record.

This may be the cause of my problem. It's all rather complex; I may be
missing something in my own code (of which the base was done by someone
else, who can't be reached for a while, to make matters worse).

Alban.
 
I

Istvan Albert

Alban said:
I can't commit until all the data has been inserted and combined. The
commit shouldn't happen until the end of the main thread is reached.

If you don't commit the inserts you cannot combine them (because they are
not visible) in a different database connection. I think you should
have a commit at the end of your insert threads. That way
when all the inserts are finished the data will be available
for the combine thread.

I might be wrong here but I think all this confusion arises because
psycopg pools database connections (which is a good thing, it speeds
up access). But then even when you seemingly open a new connection
you might be just reusing a db connection from the pool that was
started (and kept alive) before the inserts took place.

Istvan.
 
K

Ken Godee

The script I'm working on does bulk inserts from multiple related XML
files (parsed using the sax parser) and take turns inserting small
batches of xml records from those files. The collection thread combines
these into 1 xml record, which is why it's so important that the inserts
are done in time.

I can't commit until all the data has been inserted and combined. The
commit shouldn't happen until the end of the main thread is reached.

If the server goes down, or there's another reason it can't continue
parsing, the whole transaction should rollback. Committing in between
would be 'problematic' (where in the XML files were we interupted? Hard
to tell).

Also, I don't think I can join the threads (as someone else suggested),
as they are still working in an Application instance (part of the SAX
parser). The threads are waiting until they're allowed to continue; by a
linked list of Events (so that I can remove events for threads that
finished - which shouldn't happen, but it may). Unless I misunderstand
thread joining, of course.

I would probally pass a queue instance to the worker threads
and in the main thread create a timer method/function that periodically
checks the queue.

When the worker thread is done doing its thing, do a commit and
queue.put('done')

When the main thread checks the queue.get(0) = 'done'
have it do it's thing.

This is the beauty of queues.
 
I

Istvan Albert

Istvan said:
have a commit at the end of your insert threads. That way
when all the inserts are finished the data will be available
for the combine thread.

I forgot this, if the commit after insert is unfeasible
you could store the db connections in a data structure shared across threads
and if/when all the insert threads complete without errors you can call commit
on each of these before the combine thread.

Sounds a bit hackish though, just an idea.

Istvan.
 
A

Adrien Di Mascio

Hi,
I believe turning autocommit on in the connection would achieve this (at
the expense of performance and transaction support). You could also add
a "commit" SQL statement to each thread.

I'm just doing a quick reply here, since I've only skimmed through all
posts in this thread, but I think, as you said, that turning autocommit on
will solve the problem. Also notice that the autocommit() will be
deprecated (if not yet), so you should consider using the
"set_isolation_level()" method.

Doing something like this should do the trick :

cnx = psycopg.conenct('dbname=your_db user=your_user')
cnx.set_isolation_level(1)

This way, you should be able to access written data from other threads or
processes

You may find more information here :

http://lists.initd.org/pipermail/psycopg/2004-February/002577.html

Hope this helps,
Cheers,
Adrien.
 
A

Alban Hertroys

Istvan said:
I forgot this, if the commit after insert is unfeasible
you could store the db connections in a data structure shared across
threads
and if/when all the insert threads complete without errors you can call
commit
on each of these before the combine thread.

That's what I already do, actually. It would be kind of difficult to use
the same database session/transaction amongst different connections
(that, as mentioned, could incidentally be the same due to pooling).

I pass an application context to every object in the application, and
that includes a database connection. I spawn cursors from that single
connection.
Sounds a bit hackish though, just an idea.

Not really, if you want to use the transaction handling in your DBMS.
 
A

Alban Hertroys

Istvan said:
If you don't commit the inserts you cannot combine them (because they are
not visible) in a different database connection. I think you should

That's why I use only one DB connection. ;)
have a commit at the end of your insert threads. That way
when all the inserts are finished the data will be available
for the combine thread.

I might be wrong here but I think all this confusion arises because
psycopg pools database connections (which is a good thing, it speeds
up access). But then even when you seemingly open a new connection
you might be just reusing a db connection from the pool that was
started (and kept alive) before the inserts took place.

Well, apparently you are :p
Maybe I'm still not clear enough in explaining this 'little' problem of
mine. The answers I got so far aren't entirely 'on track'.

But thanks anyway.

Alban.
 
A

Alban Hertroys

Ken said:
I would probally pass a queue instance to the worker threads
and in the main thread create a timer method/function that periodically
checks the queue.

When the worker thread is done doing its thing, do a commit and
queue.put('done')

When the main thread checks the queue.get(0) = 'done'
have it do it's thing.

This is the beauty of queues.

Apparently I missed the point of queues. I will have a look at them again.

Thread programming makes twisted minds, I suspect. ;)
 
A

Alban Hertroys

Ken said:
I would probally pass a queue instance to the worker threads
and in the main thread create a timer method/function that periodically
checks the queue.

When the worker thread is done doing its thing, do a commit and
queue.put('done')

When the main thread checks the queue.get(0) = 'done'
have it do it's thing.

This is the beauty of queues.

Apparently I missed the point of queues. I will have a look at them again.

Thread programming makes twisted minds, I suspect. ;)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top