Python 2.6 and Sqlite3 - Slow

bruceg113355 · Aug 26, 2012

My program uses Python 2.6 and Sqlite3 and connects to a network database 100 miles away.
My program reads approx 60 records (4000 bytes) from a Sqlite database in less than a second.
Each time the user requests data, my program can continuously read 60 records in less than a second.
However, if I access the network drive (e.g. DOS command DIR /S) while my program is running, my program takes 20 seconds to read the same 60 records. If I restart my program, my program once again takes less than a second to read 60 records.

Any ideas?

Thanks,
Bruce

Ulrich Eckhardt · Aug 27, 2012

Am 27.08.2012 03:23, schrieb (e-mail address removed):

My program uses Python 2.6 and Sqlite3 and connects to a network
database 100 miles away.

Wait, isn't SQLite completely file-based? In that case, SQLite accesses
a file, which in turn is stored on a remote filesystem. This means that
there are other components involved here, namely your OS, the network
(bandwidth & latency), the network filesystem and the filesystem on the
remote machine. It would help if you told us what you have there.

My program reads approx 60 records (4000 bytes) from a Sqlite
database in less than a second. Each time the user requests data, my
program can continuously read 60 records in less than a second.
However, if I access the network drive (e.g. DOS command DIR /S)
while my program is running, my program takes 20 seconds to read the
same 60 records. If I restart my program, my program once again takes
less than a second to read 60 records.

Questions here:
1. Is each record 4kB or are all 60 records together 4kB?
2. Does the time for reading double when you double the number of
records? Typically you have B + C * N, but it would be interesting to
know the bias B and the actual time (and size) of each record.
3. How does the timing change when running dir/s?
4. What if you run two instances of your program?
5. Is the duration is only reset by restarting the program or does it
also decrease when the dir/s call has finished? What if you close and
reopen the database without terminating the program?

My guess is that the concurrent access by another program causes the
accesses to become synchronized, while before most of the data is
cached. That would cause a complete roundtrip between the two machines
for every access, which can easily blow up the timing via the latency.

In any case, I would try Python 2.7 in case this is a bug that was
already fixed.

Good luck!

Uli

ahsanraza211 · Aug 27, 2012

Am 27.08.2012 03:23, schrieb (e-mail address removed):

Wait, isn't SQLite completely file-based? In that case, SQLite accesses

a file, which in turn is stored on a remote filesystem. This means that

there are other components involved here, namely your OS, the network

(bandwidth & latency), the network filesystem and the filesystem on the

remote machine. It would help if you told us what you have there.

Questions here:

1. Is each record 4kB or are all 60 records together 4kB?

2. Does the time for reading double when you double the number of

records? Typically you have B + C * N, but it would be interesting to

know the bias B and the actual time (and size) of each record.

3. How does the timing change when running dir/s?

4. What if you run two instances of your program?

5. Is the duration is only reset by restarting the program or does it

also decrease when the dir/s call has finished? What if you close and

reopen the database without terminating the program?

My guess is that the concurrent access by another program causes the

accesses to become synchronized, while before most of the data is

cached. That would cause a complete roundtrip between the two machines

for every access, which can easily blow up the timing via the latency.

In any case, I would try Python 2.7 in case this is a bug that was

already fixed.

Good luck!

Uli

bruceg113355 · Aug 27, 2012

Uli,

Answers to your questions:
1) There are approx 65 records and each record is 68 bytes in length.
2) Not applicable because number of records is fixed.
3) Takes less than a second to read all 65 records when all is well.
Takes 17 seconds to read all 65 records when all is NOT WELL
4) Performance is also sluggish, at least 12 seconds.
5) Most likely, I misspoken. Restarting my program does not always help with performance.

When using the database on my C Drive, Sqlite performance is great! (<1S)
When using the database on a network, Sqlite performance is terrible! (17S)

I like your idea of trying Python 2.7

Finally, the way my program is written is:
loop for all database records:
read a database record
process data
display data (via wxPython)

Perhaps, this is a better approach:
read all database records
loop for all records:
process data
display data (via wxPython)

Thanks,
Bruce

Demian Brecht · Aug 27, 2012

Is there a reason that you're using SQLite in a network environment rather than a database server?

bruceg113355 · Aug 27, 2012

Demian,

I am not a database expert!
I selected sqlite for the following reasons:

1) Ships with Python.
2) Familiar with Python.
3) The Sqlite description at http://www.sqlite.org/whentouse.html appears to meet my requirements:
Very low volume and concurrency, small datasets, simple to use.

Bruce

Bryan · Aug 27, 2012

bruceg113 said:
I selected sqlite for the following reasons:

1) Ships with Python.
2) Familiar with Python.
3) The Sqlite description athttp://www.sqlite.org/whentouse.htmlappears to meet my requirements:
Very low volume and concurrency, small datasets, simple to use.

All good reasons, but a database file on a network drive is
contraindication for SQLite. A Google site-specific search
for "network" on www.sqlite.org, finds such warnings as:

"We have received reports of implementations of both Windows network
filesystems and NFS in which locking was subtly broken. We can not
verify these reports, but as locking is difficult to get right on a
network filesystem we have no reason to doubt them. You are advised to
avoid using SQLite on a network filesystem in the first place, since
performance will be slow."

That said, I don't know where your 17 seconds is going.

-Bryan

bruceg113355 · Aug 27, 2012

All good reasons, but a database file on a network drive is

contraindication for SQLite. A Google site-specific search

for "network" on www.sqlite.org, finds such warnings as:

"We have received reports of implementations of both Windows network

filesystems and NFS in which locking was subtly broken. We can not

verify these reports, but as locking is difficult to get right on a

network filesystem we have no reason to doubt them. You are advised to

avoid using SQLite on a network filesystem in the first place, since

performance will be slow."

That said, I don't know where your 17 seconds is going.

-Bryan

Bryan,

Thank you for your reply.
Are you saying having a sqlite database file on a shared LOCAL network drive is problematic?

Bruce

Bryan · Aug 28, 2012

bruceg113 said:
Thank you for your reply.
Are you saying having a sqlite database file on a
shared LOCAL network drive is problematic?

Yes, mostly, I think I am saying that.

A "LOCAL network drive" is network drive, and is not a
local drive, local as the network may be. We read and
write such a drive over a network protocol, in this
case a Microsoft protocol and implementation in the
SMB/CIFS family.

Where are your 17 seconds going? Hard to tell. Is
your experience of astonishing filesystem slothfulness
rare? Not so much.

We could probably diagnose the problem in a few weeks.
We'd use some open-source tools, WireShark among them,
plus some Microsoft tools for which we might have to
pay, plus the SQLite3 project's C library. With that
investment I'd bet we could diagnose, but not cure.

-Bryan

Dennis Lee Bieber · Aug 28, 2012

Are you saying having a sqlite database file on a shared LOCAL network drive is problematic?

I'd avoid it...

SQLite3 (and M$ Access/JET) are "file server" databases. The client
program runs the code of the engine as part of its processes,
maintaining open file handles on the database file. That means an awful
lot of network traffic to handle a query (it's not a case of a client
getting just a few records back from the database in response to a
select/where query... instead every record in the affected tables get
transmitted across the network so the client side library can determine
if it qualifies -- and then maybe writing the result set /back/ to the
file in a temporary region before sending them back to the client as the
result; on a true client/server only the query goes across, the database
engine runs on the machine with the files, and only the smaller result
set gets returned).

At my level of knowledge, your DIR command shell might be sending
"keep-alive" packets across the network and those of interfering with
caching on the remote end.

Cameron Simpson · Aug 28, 2012

| When using the database on my C Drive, Sqlite performance is great! (<1S)
| When using the database on a network, Sqlite performance is terrible! (17S)

Let me first echo everyone saying not to use SQLite on a network file.

| I like your idea of trying Python 2.7

I doubt it will change anything.

| Finally, the way my program is written is:
| loop for all database records:
| read a database record
| process data
| display data (via wxPython)
|
| Perhaps, this is a better approach:
| read all database records
| loop for all records:
| process data
| display data (via wxPython)

Yes, provided the "read all database records" is a single select
statement. In general, with any kind of remote resource you want to
minimise the number of transactions - the to and fro part, because each
such item tends to have latency while something is sent to and again
receiving from. So if you can say "gimme all the records" you get one
"unit" of latency at the start and end, versus latency around each
record fetch.

Having said all that, because SQLite works directly against the file, if
you say to it "giev me all the records" and the file is remote, SQLite
will probably _still_ fetch each record individually internally, gaining
you little.

This is why people are suggesting a database "server": then you can say
"get me all the records" over the net, and the server does
local-to-the-server file access to obtain the data. So all the "per
record" latency is at its end, and very small. Not to mention any
cacheing it may do.

Of course, if your requirements are very simple you might be better off
with a flat text file, possibly in CSV format, and avoid SQLite
altogether.

Cheers,
--
Cameron Simpson <[email protected]>

I do not trust thee, Cage from Hell, / The reason why I cannot tell, /
But this I know, and know full well: / I do not trust thee, Cage from Hell.
- Leigh Ann Hussey, (e-mail address removed), DoD#5913

bruceg113355 · Aug 28, 2012

| When using the database on my C Drive, Sqlite performance is great! (<1S)

| When using the database on a network, Sqlite performance is terrible! (17S)

Let me first echo everyone saying not to use SQLite on a network file.

| I like your idea of trying Python 2.7

I doubt it will change anything.

| Finally, the way my program is written is:

| loop for all database records:

| read a database record

| process data

| display data (via wxPython)

|

| Perhaps, this is a better approach:

| read all database records

| loop for all records:

| process data

| display data (via wxPython)

Yes, provided the "read all database records" is a single select

statement. In general, with any kind of remote resource you want to

minimise the number of transactions - the to and fro part, because each

such item tends to have latency while something is sent to and again

receiving from. So if you can say "gimme all the records" you get one

"unit" of latency at the start and end, versus latency around each

record fetch.

Having said all that, because SQLite works directly against the file, if

you say to it "giev me all the records" and the file is remote, SQLite

will probably _still_ fetch each record individually internally, gaining

you little.

This is why people are suggesting a database "server": then you can say

"get me all the records" over the net, and the server does

local-to-the-server file access to obtain the data. So all the "per

record" latency is at its end, and very small. Not to mention any

cacheing it may do.

Of course, if your requirements are very simple you might be better off

with a flat text file, possibly in CSV format, and avoid SQLite

altogether.

Cheers,

--

Cameron Simpson <[email protected]>

I do not trust thee, Cage from Hell, / The reason why I cannot tell, /

But this I know, and know full well: / I do not trust thee, Cage from Hell.

- Leigh Ann Hussey, (e-mail address removed), DoD#5913

Cameron,

I did some testing and approach #1 is significantly faster than approach #2:
Approach #1:
read all database records
loop for all records:
process data
display data (via wxPython)

Approach #2:
loop for all database records:
read a database record
process data
display data (via wxPython)

Various test results to read 50 records from a network drive.
#1 0:00:00.078000
#2 0:00:04.219000

#1 0:00:00.875000
#2 0:00:08.031000

#1 0:00:00.063000
#2 0:00:06.109000

#1 0:00:00.078000
#2 0:00:05.110000

#1 0:00:00.156000
#2 0:00:02.625000

This explains some of my slowness issues.

Note: When the network drive is behaving (not slow), approach #2 is close to approach #1.

From the site: http://www.sqlite.org/different.html
------------------------------------------------------------------------------
Most SQL database engines are implemented as a separate server process.Programs that want to access the database communicate with the server using some kind of interprocess communication (typically TCP/IP) to send requests to the server and to receive back results. SQLite does not work this way.. With SQLite, the process that wants to access the database reads and writes directly from the database files on disk. There is no intermediary server process.

There are advantages and disadvantages to being serverless. The main advantage is that there is no separate server process to install, setup, configure, initialize, manage, and troubleshoot. This is one reason why SQLite is a "zero-configuration" database engine. Programs that use SQLite requireno administrative support for setting up the database engine before they are run. Any program that is able to access the disk is able to use an SQLite database.

On the other hand, a database engine that uses a server can provide better protection from bugs in the client application - stray pointers in a client cannot corrupt memory on the server. And because a server is a single persistent process, it is able control database access with more precision,allowing for finer grain locking and better concurrency.

Most SQL database engines are client/server based. Of those that are serverless, SQLite is the only one that this author knows of that allows multiple applications to access the same database at the same time.
------------------------------------------------------------------------------

Doesn't the last paragraph imply that SQLite can operate on a network drive..

Thanks,
Bruce

bruceg113355 · Aug 28, 2012

| When using the database on my C Drive, Sqlite performance is great! (<1S)

| When using the database on a network, Sqlite performance is terrible! (17S)

Let me first echo everyone saying not to use SQLite on a network file.

| I like your idea of trying Python 2.7

I doubt it will change anything.

| Finally, the way my program is written is:

| loop for all database records:

| read a database record

| process data

| display data (via wxPython)

|

| Perhaps, this is a better approach:

| read all database records

| loop for all records:

| process data

| display data (via wxPython)

Yes, provided the "read all database records" is a single select

statement. In general, with any kind of remote resource you want to

minimise the number of transactions - the to and fro part, because each

such item tends to have latency while something is sent to and again

receiving from. So if you can say "gimme all the records" you get one

"unit" of latency at the start and end, versus latency around each

record fetch.

Having said all that, because SQLite works directly against the file, if

you say to it "giev me all the records" and the file is remote, SQLite

will probably _still_ fetch each record individually internally, gaining

you little.

This is why people are suggesting a database "server": then you can say

"get me all the records" over the net, and the server does

local-to-the-server file access to obtain the data. So all the "per

record" latency is at its end, and very small. Not to mention any

cacheing it may do.

Of course, if your requirements are very simple you might be better off

with a flat text file, possibly in CSV format, and avoid SQLite

altogether.

Cheers,

--

Cameron Simpson <[email protected]>

I do not trust thee, Cage from Hell, / The reason why I cannot tell, /

But this I know, and know full well: / I do not trust thee, Cage from Hell.

- Leigh Ann Hussey, (e-mail address removed), DoD#5913

Cameron,

I did some testing and approach #1 is significantly faster than approach #2:
Approach #1:
read all database records
loop for all records:
process data
display data (via wxPython)

Approach #2:
loop for all database records:
read a database record
process data
display data (via wxPython)

Various test results to read 50 records from a network drive.
#1 0:00:00.078000
#2 0:00:04.219000

#1 0:00:00.875000
#2 0:00:08.031000

#1 0:00:00.063000
#2 0:00:06.109000

#1 0:00:00.078000
#2 0:00:05.110000

#1 0:00:00.156000
#2 0:00:02.625000

This explains some of my slowness issues.

Note: When the network drive is behaving (not slow), approach #2 is close to approach #1.

From the site: http://www.sqlite.org/different.html

------------------------------------------------------------------------------
Most SQL database engines are implemented as a separate server process.Programs that want to access the database communicate with the server using some kind of interprocess communication (typically TCP/IP) to send requests to the server and to receive back results. SQLite does not work this way.. With SQLite, the process that wants to access the database reads and writes directly from the database files on disk. There is no intermediary server process.

There are advantages and disadvantages to being serverless. The main advantage is that there is no separate server process to install, setup, configure, initialize, manage, and troubleshoot. This is one reason why SQLite is a "zero-configuration" database engine. Programs that use SQLite requireno administrative support for setting up the database engine before they are run. Any program that is able to access the disk is able to use an SQLite database.

On the other hand, a database engine that uses a server can provide better protection from bugs in the client application - stray pointers in a client cannot corrupt memory on the server. And because a server is a single persistent process, it is able control database access with more precision,allowing for finer grain locking and better concurrency.

Most SQL database engines are client/server based. Of those that are serverless, SQLite is the only one that this author knows of that allows multiple applications to access the same database at the same time.
------------------------------------------------------------------------------

Doesn't the last paragraph imply that SQLite can operate on a network drive..

Thanks,
Bruce

Dennis Lee Bieber · Aug 28, 2012

Doesn't the last paragraph imply that SQLite can operate on a network drive.

Most anything "can operate" on a network drive... But should it?

The main thing the documentation is explaining is that one
application accessing the database FILE does NOT LOCK OTHERS from
accessing the file. Nothing about how the file is accessed. A
low-activity web service would allow lots of people to concurrently
access it -- but the processes that are doing said access are all local
to the database file.

Technically, M$ Access/JET (which is also file server database) also
permits multiple clients -- but the locking becomes a pain.

Pedro Larroy · Aug 28, 2012

Try incrementing the variable cursor.arraysize a lot.

Pedro.

sqlite3 db update extremely slow	5	Jul 16, 2007
sqlite3 performance problems only in python	0	Jul 23, 2009
Python 2.6 Deprecation Warnings with __new__ — Can someone explain why?	7	Oct 22, 2009
ChatBot	4	Jan 19, 2021
Is it just me, or is Sqlite3 goofy?	74	Sep 4, 2006
Advice Criticism on Python App	4	Mar 23, 2010
ANN: eGenix mxODBC Connect - Python Database Interface 2.0.2	0	Dec 14, 2012
ANN: eGenix mxODBC Connect 2.1.0 - Python ODBC Database Interface	0	May 28, 2014

Python 2.6 and Sqlite3 - Slow

bruceg113355

Ulrich Eckhardt

ahsanraza211

bruceg113355

Demian Brecht

bruceg113355

Bryan

bruceg113355

Bryan

Dennis Lee Bieber

Cameron Simpson

bruceg113355

bruceg113355

Dennis Lee Bieber

Pedro Larroy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads