Slow loading of large in-memory tables

  • Thread starter Philipp K. Janert, Ph.D.
  • Start date
P

Philipp K. Janert, Ph.D.

Dear All!

I am trying to load a relatively large table (about 1 Million
rows) into an sqlite table, which is kept in memory. The
load process is very slow - on the order of 15 minutes or
so.

I am accessing sqlite from Python, using the pysqlite driver.
I am loading all records first using cx.execute( "insert ..." ).
Only once I have run cx.execute() for all records, I commit all
the preceding inserts with conn.commit()

I have tried using cx.executemany(), but if anything, this
makes the process slower.

I have not tried mucking manually with transactions.
I have sufficiently convinced myself that the slow part
is in fact the cx.execute() - not reading the data from file
or anything else.

My system specs and versions:
SuSE 9.1
Python 2.3.3
SQLite 2.8.12
pysqlite 0.5.1
1 GB memory (I am not swapping, this is not the problem).

Are there ways to make this process faster?

Also, I want to keep the DB in memory, since I use it later
to run a very DB-intensive simulation against it. However,
this implies that I need to load the DB from the same python
program which will later run the simulation - I think.

Any hints appreciated!

(Please cc me when replying to the list in regards to this
message!)

Best regards,

Ph.
 
T

Thorsten Kampe

* Philipp K. Janert, Ph.D. (2004-09-07 07:14 +0200)
I am trying to load a relatively large table (about 1 Million
rows) into an sqlite table, which is kept in memory. The
load process is very slow - on the order of 15 minutes or
so.

I am accessing sqlite from Python, using the pysqlite driver.
I am loading all records first using cx.execute( "insert ..." ).
Only once I have run cx.execute() for all records, I commit all
the preceding inserts with conn.commit()

I have tried using cx.executemany(), but if anything, this
makes the process slower.

I have not tried mucking manually with transactions.
I have sufficiently convinced myself that the slow part
is in fact the cx.execute() - not reading the data from file
or anything else.

Are there ways to make this process faster?

According to [1]:

pragma temp_store = memory;
# or any bigger value ('2000' is the default)
pragma cache_size = 4000;
pragma count_changes = off;
pragma synchronous = off;

Also SQLite makes a commit after every SQL statement (not only those
that alter the database)[2]. Therefor you have to manually start a
transaction before the first SQL statement and manually commit after
the last statement. You have to turn the integrated pysqlite
committing off to be able to do this:

connection = sqlite.connect(database,
autocommit = 1)
Also, I want to keep the DB in memory, since I use it later
to run a very DB-intensive simulation against it. However,
this implies that I need to load the DB from the same python
program which will later run the simulation - I think.

Yes.

Thorsten

[1] http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html
[2] http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html#transactions
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top