Database Query Contains Old Data

G

giraffeboy

Hi there,

I'm having a problem with the Python db api, using MySQL.

I've written a program with a GUI using wxPython, the GUI is contained
in main.py which imports another module - reports.py. There are
several reports that are selected using the gui, and each report is a
class in the file reports.py. These classes contain a method which is
passed data as arguments and produces a report, writing HTML to a file
and using matplotlib to create a graph file.

The report class methods are called from the GUI using the following
code:
agentlist = self.getselected()
detail = self.cb1.GetValue()
if self.getdates() != False:
fromdate, todate = self.getdates()
app.mainframe.SetStatusText("Generating Report...")
if self.reportchoice.GetSelection() == 0:
thereport = reports.VehicleRunningCost()
thereport.producereport(agentlist, fromdate=fromdate,
todate=todate, detail=detail)
app.mainframe.SetStatusText("Report Complete")
viewerframe = ViewerFrame(None, -1, "Mileage and Fuel
Report Viewer")
viewerframe.Show(True)

The first time you run a report, everything works as expected but if
you run it a second time, after modifying data, it seems that the data
from before the modification is selected on the second report run.

It would be much appreciated if anyone could give me any hints as to
why the old data is selected. I haven't posted the full code because
it's quite long but will post more if anyone wants to see specific
parts.

Thanks!
Andrew
 
J

Jerry Hill

The first time you run a report, everything works as expected but if
you run it a second time, after modifying data, it seems that the data
from before the modification is selected on the second report run.

Did you remember to commit your changes before re-running the report?
Python's DB API requires that any auto-commit feature of the
underlying database be turned off by default, so you are required to
commit changes yourself. If you're used to having auto-commit turned
on, this can be confusing.

See http://www.python.org/dev/peps/pep-0249/ for more details of
python's DB API.
 
G

giraffeboy

Did you remember to commit your changes before re-running the report?
Python's DB API requires that any auto-commit feature of the
underlying database be turned off by default, so you are required to
commit changes yourself. If you're used to having auto-commit turned
on, this can be confusing.

I did and I confirmed this by modifying the data, selecting it from
the mysql command line client to verify the changes, then running the
report again. If I exit the application and then start it again,
everything works as expected until the second instance of the report
is run. I have a feeling it has something to do with there already
being an instance of the report class, but I can't work out how to
destroy it and aren't really sure how to remove the references to it
either. I added a print statement to the code and it displays a row
that doesn't exist any more. Code, output of print statement and
result from the mysql client shown below.

Andrew


Code:
#Get all the mileages and add them together for the month:
curs.execute('SELECT date, mileage, name FROM mileage, agents WHERE
mileage.agent = agent.rowid AND date >= CAST(%s AS DATE) AND date <
CAST(%s AS DATE) AND mileage.agent=%s ORDER BY date', (fromdate,
todate, agentid))
for row in curs:
month = str(row[0])[:7]
print row[1], month

Output:
100.0 2008-03

MySQL Output just prior to running the report:
mysql> select * from mileage;
+-------+------------+---------+---------+-------+
| rowid | date | mileage | vehicle | agent |
+-------+------------+---------+---------+-------+
| 1 | 2008-04-28 | 875.63 | 3 | 3 |
| 2 | 2008-04-28 | 1188.13 | 6 | 6 |
| 3 | 2008-04-28 | 676.88 | 4 | 4 |
| 4 | 2008-04-21 | 1111.25 | 6 | 6 |
| 5 | 2008-04-21 | 1126.88 | 3 | 3 |
| 6 | 2008-04-28 | 1029.38 | 7 | 8 |
| 7 | 2008-04-21 | 953.13 | 7 | 8 |
| 8 | 2008-04-21 | 675.63 | 4 | 4 |
| 9 | 2008-04-14 | 891.88 | 3 | 3 |
+-------+------------+---------+---------+-------+
9 rows in set (0.00 sec)

As you can see there's no row with a mileage of 100 with the correct
month.
 
P

Paul Boddie

I did and I confirmed this by modifying the data, selecting it from
the mysql command line client to verify the changes, then running the
report again. If I exit the application and then start it again,
everything works as expected until the second instance of the report
is run.

Note that if you have a connection open in your program, especially if
that connection has already been used to select data, it may be the
case that you then have to perform a rollback or commit before
attempting to access newly added data. The reason for this behaviour
is that the DB-API modules will have begun a transaction on your
behalf, and while that transaction is open, changes committed in other
transactions may be unavailable to your own transaction, depending on
the transaction isolation level.

MySQL appears to use "repeatable read" by default [1] as its
transaction isolation level, whereas PostgreSQL (for example) uses
"read committed" by default [2]. I would guess that if you were using
PostgreSQL, this particular problem would not have occurred, but there
are other reasons to be aware of the effects of long duration
transactions in PostgreSQL, and the practice of periodically
performing a rollback would still be worth considering with that
database system.

Paul

[1] http://dev.mysql.com/doc/refman/5.1/en/innodb-transaction-isolation.html
[2] http://www.postgresql.org/docs/8.1/interactive/transaction-iso.html
 
G

giraffeboy

Note that if you have a connection open in your program, especially if
that connection has already been used to select data, it may be the
case that you then have to perform a rollback or commit before
attempting to access newly added data. The reason for this behaviour
is that the DB-API modules will have begun a transaction on your
behalf, and while that transaction is open, changes committed in other
transactions may be unavailable to your own transaction, depending on
the transaction isolation level.

Thanks for that Paul, seems to have solved the problem perfectly. I
had always just thought querying a database would always give you the
most current data, guess it just goes to show that things are never as
simple as they first appear!
 
J

John Nagle

Paul said:
Note that if you have a connection open in your program, especially if
that connection has already been used to select data, it may be the
case that you then have to perform a rollback or commit before
attempting to access newly added data.

Exactly. Although it seems counterintutive, it's not enough
to do a COMMIT after UPDATE and INSERT operations. You also have to
do a COMMIT after a SELECT if you're going to reuse the database handle
and do another SELECT. Otherwise, you reread the same data forever.

COMMIT, by the way, is per database handle, so if you have
multiple database handles, each needs to handle its own COMMIT needs.

John Nagle
 
P

Paul Boddie

Exactly. Although it seems counterintutive, it's not enough
to do a COMMIT after UPDATE and INSERT operations. You also have to
do a COMMIT after a SELECT if you're going to reuse the database handle
and do another SELECT. Otherwise, you reread the same data forever.

You can also do a rollback, as I noted, since the aim is merely to
obtain a new transaction by discarding the current one. Upon
performing a new select using the DB-API such a new transaction will
then be obtained, since the rollback or commit will have ensured that
no transaction is currently open.

If the DB-API exposed the MySQL/PostgreSQL semantics with explicit
transactions (see [1] for pertinent material), then the technique
discussed above would either be superfluous (you might not be using
transactions at all) or more obvious (you would have issued an
explicit "start transaction" or "begin" command), but there are
obviously good arguments for exposing the standard semantics through
the API instead.
COMMIT, by the way, is per database handle, so if you have
multiple database handles, each needs to handle its own COMMIT needs.

That's worth remembering, yes.

Paul

[1] http://www.postgresql.org/docs/8.1/static/sql-start-transaction.html
 
P

Paul Boddie

On Wed, 21 May 2008 07:23:04 -0700 (PDT), Paul Boddie
MySQL appears to use "repeatable read" by default [1] as its
transaction isolation level, whereas PostgreSQL (for example) uses
"read committed" by default [2]. I would guess that if you were using
PostgreSQL, this particular problem would not have occurred, but there
are other reasons to be aware of the effects of long duration
transactions in PostgreSQL, and the practice of periodically
performing a rollback would still be worth considering with that
database system.

If one has transactions open for a long time, or transactions that
involve a great deal of data, this will result in poor performance or
poor scalability.

I think you need to explain this to me. If there's a long-running
transaction happening in the background and my own transactions get
created and rolled back periodically, how would the long-running
transaction be affected? If, on the other hand, my own transaction is
long-running, I can see that rolling it back would incur a cost, but
what choice do I have other than to perform a rollback more often (or
to disable transactions, which might incur other costs)? I don't want
to perform a commit instead merely for performance reasons, especially
if it impacts correctness.

I was actually thinking of lock acquisition in PostgreSQL when I made
the remark. With lots of tables in a database, it's possible to
acquire a large number of locks, and retaining locks can also prevent
other operations from being carried out.
But one may have such large transactions without
being aware of it. Is there any way to make transaction size salient
to the developer? Any way to make sure one is committing as early and
often as possible?

I'm not aware of anything which will tell you how big your transaction
is, but there may be some kind of table or function which provides
some details about such things. However, it is possible to see how
many locks your transaction has, and on which tables.

Paul
 
M

M.-A. Lemburg

MySQL appears to use "repeatable read" by default [1] as its
transaction isolation level, whereas PostgreSQL (for example) uses
"read committed" by default [2]. I would guess that if you were using
PostgreSQL, this particular problem would not have occurred, but there
are other reasons to be aware of the effects of long duration
transactions in PostgreSQL, and the practice of periodically
performing a rollback would still be worth considering with that
database system.

If one has transactions open for a long time, or transactions that
involve a great deal of data, this will result in poor performance or
poor scalability.

Poor performance is usually not an issue since databases are
optimized to work with transactions.

What's more important is that an open transaction will cause locks
on the tables you are writing to. Depending on the database
backend these locks may lock the entire table or just a few rows.

In any case, such locks prevent accessing the tables or rows
in question from other connections and that will quickly turn
into a major problem if you have more than just one connection
to the database.
But one may have such large transactions without
being aware of it. Is there any way to make transaction size salient
to the developer? Any way to make sure one is committing as early and
often as possible?

This depends on the database in question. By accessing system tables
directly you can usually find out a lot about the database and your
current transaction.

That said, it's not a good idea to commit a logical transaction (ie.
a user entering data) in multiple chunks. You'd lose the most important
feature of transactions: that of being able to rollback to the start
of the transaction.

As others have mentioned, in systems that have long running logical
transactions, it's usually best to collect the data until the very
end and then apply all changes in one go (and one database
transaction).

Another problem with long running transactions is that the data
in the tables may change after the start of the transaction. This
can result in invalid data being committed (e.g. one part of a calculation
uses the data at time t1 and another at time t2). This can be
avoided by using snapshots, versioning and timestamps in the tables,
so that all queries use the same data.

And if this were not enough, you often run into conflicts during
the commit phase due to changes made by others to the same tables.

These can usually only be avoided by implementing merge strategies
in your application, unless you want to lock out all other users
during the transaction ... which would bring you back to the
original problem.

In summary: long running transactions are not easy to get right :)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Jun 03 2008)________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania 33 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
 
M

M.-A. Lemburg

I understand you to mean that one should arrange matters so that what
is a lengthy transaction from the point of view of the user is a short
transaction from the point of view of the database.

Yes, mainly to avoid database locks.

In a multi-user environment, you will need to add some merge logic
in your application, to prevent conflict errors in the database.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Jun 04 2008)________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania 32 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top