Database Query Contains Old Data

giraffeboy · May 21, 2008

Hi there,

I'm having a problem with the Python db api, using MySQL.

I've written a program with a GUI using wxPython, the GUI is contained
in main.py which imports another module - reports.py. There are
several reports that are selected using the gui, and each report is a
class in the file reports.py. These classes contain a method which is
passed data as arguments and produces a report, writing HTML to a file
and using matplotlib to create a graph file.

The report class methods are called from the GUI using the following
code:
agentlist = self.getselected()
detail = self.cb1.GetValue()
if self.getdates() != False:
fromdate, todate = self.getdates()
app.mainframe.SetStatusText("Generating Report...")
if self.reportchoice.GetSelection() == 0:
thereport = reports.VehicleRunningCost()
thereport.producereport(agentlist, fromdate=fromdate,
todate=todate, detail=detail)
app.mainframe.SetStatusText("Report Complete")
viewerframe = ViewerFrame(None, -1, "Mileage and Fuel
Report Viewer")
viewerframe.Show(True)

The first time you run a report, everything works as expected but if
you run it a second time, after modifying data, it seems that the data
from before the modification is selected on the second report run.

It would be much appreciated if anyone could give me any hints as to
why the old data is selected. I haven't posted the full code because
it's quite long but will post more if anyone wants to see specific
parts.

Thanks!
Andrew

Jerry Hill · May 21, 2008

The first time you run a report, everything works as expected but if
you run it a second time, after modifying data, it seems that the data
from before the modification is selected on the second report run.

Did you remember to commit your changes before re-running the report?
Python's DB API requires that any auto-commit feature of the
underlying database be turned off by default, so you are required to
commit changes yourself. If you're used to having auto-commit turned
on, this can be confusing.

See http://www.python.org/dev/peps/pep-0249/ for more details of
python's DB API.

giraffeboy · May 21, 2008

Did you remember to commit your changes before re-running the report?
Python's DB API requires that any auto-commit feature of the
underlying database be turned off by default, so you are required to
commit changes yourself. If you're used to having auto-commit turned
on, this can be confusing.

I did and I confirmed this by modifying the data, selecting it from
the mysql command line client to verify the changes, then running the
report again. If I exit the application and then start it again,
everything works as expected until the second instance of the report
is run. I have a feeling it has something to do with there already
being an instance of the report class, but I can't work out how to
destroy it and aren't really sure how to remove the references to it
either. I added a print statement to the code and it displays a row
that doesn't exist any more. Code, output of print statement and
result from the mysql client shown below.

Andrew

Code:
#Get all the mileages and add them together for the month:
curs.execute('SELECT date, mileage, name FROM mileage, agents WHERE
mileage.agent = agent.rowid AND date >= CAST(%s AS DATE) AND date <
CAST(%s AS DATE) AND mileage.agent=%s ORDER BY date', (fromdate,
todate, agentid))
for row in curs:
month = str(row[0])[:7]
print row[1], month

Output:
100.0 2008-03

MySQL Output just prior to running the report:
mysql> select * from mileage;
+-------+------------+---------+---------+-------+
| rowid | date | mileage | vehicle | agent |
+-------+------------+---------+---------+-------+
| 1 | 2008-04-28 | 875.63 | 3 | 3 |
| 2 | 2008-04-28 | 1188.13 | 6 | 6 |
| 3 | 2008-04-28 | 676.88 | 4 | 4 |
| 4 | 2008-04-21 | 1111.25 | 6 | 6 |
| 5 | 2008-04-21 | 1126.88 | 3 | 3 |
| 6 | 2008-04-28 | 1029.38 | 7 | 8 |
| 7 | 2008-04-21 | 953.13 | 7 | 8 |
| 8 | 2008-04-21 | 675.63 | 4 | 4 |
| 9 | 2008-04-14 | 891.88 | 3 | 3 |
+-------+------------+---------+---------+-------+
9 rows in set (0.00 sec)

As you can see there's no row with a mileage of 100 with the correct
month.

Paul Boddie · May 21, 2008

I did and I confirmed this by modifying the data, selecting it from
the mysql command line client to verify the changes, then running the
report again. If I exit the application and then start it again,
everything works as expected until the second instance of the report
is run.

Note that if you have a connection open in your program, especially if
that connection has already been used to select data, it may be the
case that you then have to perform a rollback or commit before
attempting to access newly added data. The reason for this behaviour
is that the DB-API modules will have begun a transaction on your
behalf, and while that transaction is open, changes committed in other
transactions may be unavailable to your own transaction, depending on
the transaction isolation level.

MySQL appears to use "repeatable read" by default [1] as its
transaction isolation level, whereas PostgreSQL (for example) uses
"read committed" by default [2]. I would guess that if you were using
PostgreSQL, this particular problem would not have occurred, but there
are other reasons to be aware of the effects of long duration
transactions in PostgreSQL, and the practice of periodically
performing a rollback would still be worth considering with that
database system.

Paul

[1] http://dev.mysql.com/doc/refman/5.1/en/innodb-transaction-isolation.html
[2] http://www.postgresql.org/docs/8.1/interactive/transaction-iso.html

giraffeboy · May 21, 2008

Note that if you have a connection open in your program, especially if
that connection has already been used to select data, it may be the
case that you then have to perform a rollback or commit before
attempting to access newly added data. The reason for this behaviour
is that the DB-API modules will have begun a transaction on your
behalf, and while that transaction is open, changes committed in other
transactions may be unavailable to your own transaction, depending on
the transaction isolation level.

Thanks for that Paul, seems to have solved the problem perfectly. I
had always just thought querying a database would always give you the
most current data, guess it just goes to show that things are never as
simple as they first appear!

John Nagle · May 23, 2008

Paul said:
Note that if you have a connection open in your program, especially if
that connection has already been used to select data, it may be the
case that you then have to perform a rollback or commit before
attempting to access newly added data.

Exactly. Although it seems counterintutive, it's not enough
to do a COMMIT after UPDATE and INSERT operations. You also have to
do a COMMIT after a SELECT if you're going to reuse the database handle
and do another SELECT. Otherwise, you reread the same data forever.

COMMIT, by the way, is per database handle, so if you have
multiple database handles, each needs to handle its own COMMIT needs.

John Nagle

Paul Boddie · May 23, 2008

Exactly. Although it seems counterintutive, it's not enough
to do a COMMIT after UPDATE and INSERT operations. You also have to
do a COMMIT after a SELECT if you're going to reuse the database handle
and do another SELECT. Otherwise, you reread the same data forever.

You can also do a rollback, as I noted, since the aim is merely to
obtain a new transaction by discarding the current one. Upon
performing a new select using the DB-API such a new transaction will
then be obtained, since the rollback or commit will have ensured that
no transaction is currently open.

If the DB-API exposed the MySQL/PostgreSQL semantics with explicit
transactions (see [1] for pertinent material), then the technique
discussed above would either be superfluous (you might not be using
transactions at all) or more obvious (you would have issued an
explicit "start transaction" or "begin" command), but there are
obviously good arguments for exposing the standard semantics through
the API instead.

COMMIT, by the way, is per database handle, so if you have
multiple database handles, each needs to handle its own COMMIT needs.

That's worth remembering, yes.

Paul

[1] http://www.postgresql.org/docs/8.1/static/sql-start-transaction.html

Paul Boddie · Jun 3, 2008

On Wed, 21 May 2008 07:23:04 -0700 (PDT), Paul Boddie

MySQL appears to use "repeatable read" by default [1] as its
transaction isolation level, whereas PostgreSQL (for example) uses
"read committed" by default [2]. I would guess that if you were using
PostgreSQL, this particular problem would not have occurred, but there
are other reasons to be aware of the effects of long duration
transactions in PostgreSQL, and the practice of periodically
performing a rollback would still be worth considering with that
database system.

Click to expand...

If one has transactions open for a long time, or transactions that
involve a great deal of data, this will result in poor performance or
poor scalability.

I think you need to explain this to me. If there's a long-running
transaction happening in the background and my own transactions get
created and rolled back periodically, how would the long-running
transaction be affected? If, on the other hand, my own transaction is
long-running, I can see that rolling it back would incur a cost, but
what choice do I have other than to perform a rollback more often (or
to disable transactions, which might incur other costs)? I don't want
to perform a commit instead merely for performance reasons, especially
if it impacts correctness.

I was actually thinking of lock acquisition in PostgreSQL when I made
the remark. With lots of tables in a database, it's possible to
acquire a large number of locks, and retaining locks can also prevent
other operations from being carried out.

But one may have such large transactions without
being aware of it. Is there any way to make transaction size salient
to the developer? Any way to make sure one is committing as early and
often as possible?

I'm not aware of anything which will tell you how big your transaction
is, but there may be some kind of table or function which provides
some details about such things. However, it is possible to see how
many locks your transaction has, and on which tables.

Paul

M.-A. Lemburg · Jun 3, 2008

MySQL appears to use "repeatable read" by default [1] as its
transaction isolation level, whereas PostgreSQL (for example) uses
"read committed" by default [2]. I would guess that if you were using
PostgreSQL, this particular problem would not have occurred, but there
are other reasons to be aware of the effects of long duration
transactions in PostgreSQL, and the practice of periodically
performing a rollback would still be worth considering with that
database system.

Click to expand...

If one has transactions open for a long time, or transactions that
involve a great deal of data, this will result in poor performance or
poor scalability.

Poor performance is usually not an issue since databases are
optimized to work with transactions.

What's more important is that an open transaction will cause locks
on the tables you are writing to. Depending on the database
backend these locks may lock the entire table or just a few rows.

In any case, such locks prevent accessing the tables or rows
in question from other connections and that will quickly turn
into a major problem if you have more than just one connection
to the database.

But one may have such large transactions without
being aware of it. Is there any way to make transaction size salient
to the developer? Any way to make sure one is committing as early and
often as possible?

This depends on the database in question. By accessing system tables
directly you can usually find out a lot about the database and your
current transaction.

That said, it's not a good idea to commit a logical transaction (ie.
a user entering data) in multiple chunks. You'd lose the most important
feature of transactions: that of being able to rollback to the start
of the transaction.

As others have mentioned, in systems that have long running logical
transactions, it's usually best to collect the data until the very
end and then apply all changes in one go (and one database
transaction).

Another problem with long running transactions is that the data
in the tables may change after the start of the transaction. This
can result in invalid data being committed (e.g. one part of a calculation
uses the data at time t1 and another at time t2). This can be
avoided by using snapshots, versioning and timestamps in the tables,
so that all queries use the same data.

And if this were not enough, you often run into conflicts during
the commit phase due to changes made by others to the same tables.

These can usually only be avoided by implementing merge strategies
in your application, unless you want to lock out all other users
during the transaction ... which would bring you back to the
original problem.

In summary: long running transactions are not easy to get right

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Jun 03 2008)________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania 33 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611

M.-A. Lemburg · Jun 4, 2008

I understand you to mean that one should arrange matters so that what
is a lengthy transaction from the point of view of the user is a short
transaction from the point of view of the database.

Yes, mainly to avoid database locks.

In a multi-user environment, you will need to add some merge logic
in your application, to prevent conflict errors in the database.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Jun 04 2008)________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania 32 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611

EEG stream data with mne and brainfolw	0	Jul 26, 2023
Replacing old data with new data using python	1	Mar 28, 2013
Help me fix this bug in my program! Displaying chart data	1	Apr 8, 2023
JPA and time stamped data	4	Jul 24, 2007
How to add dropdown selected data to table using jquery	2	Jul 2, 2022
Collect Excel Data from Website	5	Apr 30, 2022
A Discussion on Python and Data Visualization	0	Dec 3, 2012
database query - logic question	1	Jan 2, 2008

Database Query Contains Old Data

giraffeboy

Jerry Hill

giraffeboy

Paul Boddie

giraffeboy

John Nagle

Paul Boddie

Paul Boddie

M.-A. Lemburg

M.-A. Lemburg

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads