Slow loading of large in-memory tables

Philipp K. Janert, Ph.D. · Sep 7, 2004

Dear All!

I am trying to load a relatively large table (about 1 Million
rows) into an sqlite table, which is kept in memory. The
load process is very slow - on the order of 15 minutes or
so.

I am accessing sqlite from Python, using the pysqlite driver.
I am loading all records first using cx.execute( "insert ..." ).
Only once I have run cx.execute() for all records, I commit all
the preceding inserts with conn.commit()

I have tried using cx.executemany(), but if anything, this
makes the process slower.

I have not tried mucking manually with transactions.
I have sufficiently convinced myself that the slow part
is in fact the cx.execute() - not reading the data from file
or anything else.

My system specs and versions:
SuSE 9.1
Python 2.3.3
SQLite 2.8.12
pysqlite 0.5.1
1 GB memory (I am not swapping, this is not the problem).

Are there ways to make this process faster?

Also, I want to keep the DB in memory, since I use it later
to run a very DB-intensive simulation against it. However,
this implies that I need to load the DB from the same python
program which will later run the simulation - I think.

Any hints appreciated!

(Please cc me when replying to the list in regards to this
message!)

Best regards,

Ph.

Thorsten Kampe · Sep 15, 2004

* Philipp K. Janert, Ph.D. (2004-09-07 07:14 +0200)

I am trying to load a relatively large table (about 1 Million
rows) into an sqlite table, which is kept in memory. The
load process is very slow - on the order of 15 minutes or
so.

I am accessing sqlite from Python, using the pysqlite driver.
I am loading all records first using cx.execute( "insert ..." ).
Only once I have run cx.execute() for all records, I commit all
the preceding inserts with conn.commit()

I have tried using cx.executemany(), but if anything, this
makes the process slower.

I have not tried mucking manually with transactions.
I have sufficiently convinced myself that the slow part
is in fact the cx.execute() - not reading the data from file
or anything else.

Are there ways to make this process faster?

According to [1]:

pragma temp_store = memory;
# or any bigger value ('2000' is the default)
pragma cache_size = 4000;
pragma count_changes = off;
pragma synchronous = off;

Also SQLite makes a commit after every SQL statement (not only those
that alter the database)[2]. Therefor you have to manually start a
transaction before the first SQL statement and manually commit after
the last statement. You have to turn the integrated pysqlite
committing off to be able to do this:

connection = sqlite.connect(database,
autocommit = 1)

Also, I want to keep the DB in memory, since I use it later
to run a very DB-intensive simulation against it. However,
this implies that I need to load the DB from the same python
program which will later run the simulation - I think.

Yes.

Thorsten

[1] http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html
[2] http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html#transactions

advice on loading and searching large map in memory	24	Feb 20, 2011
Large list in memory slows Python	0	Jan 3, 2012
Memory error	3	Mar 24, 2014
Loading C extension from memory	1	May 13, 2010
Large array	0	Jan 6, 2014
memory use with regard to large pickle files	3	Oct 15, 2008
sqlite3 db update extremely slow	5	Jul 16, 2007
CPython loading modules into memory	3	Feb 11, 2009

Slow loading of large in-memory tables

Philipp K. Janert, Ph.D.

Thorsten Kampe

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads