sqlite3 import performance

B

Ben Lee

hi folks --

a quick python and sqlite3 performance question. i find that
inserting a million rows of in-memory data into an in-memory database
via a single executemany() is about 30% slower than using the sqlite3
CLI and the .import command (reading the same data from a disk file,
even.) i find this surprising, executemany() i assume is using a
prepared statement and this is exactly what the .import command does
(based on my quick perusal of the source.)

is this discrepancy to be expected? where is the overhead coming
from?

for full disclosure: the python code is at the end; run it first to
generate the data file. to test the CLI, i couldn't find a better way
than to create an init file "sqlcmds" containing

create table test (k int primary key, v int not null);
..import data test

and then run

time sqlite3 -init sqlcmds ':memory:' '.quit'

the python code is

#!/usr/bin/env python

import sqlite3, random, timeit

con = None
def prepare():
global con, cur

con=sqlite3.connect(':memory:')
con.isolation_level="EXCLUSIVE"
cur=con.cursor()

def ins():
global con, data, cur

try:
cur.execute('drop table test')
except:
pass
cur.execute('create table test (key int primary key, val int not
null);')
con.commit()

cur.executemany("INSERT into test (key, val) values (?, 10)",data)
con.commit()

hs = {}

print 'generating data...'
size = 1000000
data = [[a] for a in random.sample(xrange(10000000), size)]
print 'done!'

# save the data for the sqlite3 CLI
f = file('data', 'w')
f.writelines([str(a[0])+"|10\n" for a in data])
f.close()

print 'testing ' + str(size) + ' inserts...'
# test 100K inserts
t = timeit.Timer(stmt='ins()',
setup="import sqlite3\nfrom __main__ import prepare, ins
\nprepare()")
print t.repeat(3,1)

thanks, ben
 
G

Gerhard Häring

Ben said:
hi folks --

a quick python and sqlite3 performance question. i find that
inserting a million rows of in-memory data into an in-memory database
via a single executemany() is about 30% slower than using the sqlite3
CLI and the .import command (reading the same data from a disk file,
even.) i find this surprising, executemany() i assume is using a
prepared statement and this is exactly what the .import command does
(based on my quick perusal of the source.)

is this discrepancy to be expected? where is the overhead coming
from? [...]

Ok, I'll bite.

Well, first, the a 30 % slowdown with a Python DB-API wrapper compared
to the native commandline tool of the database is to be considered still
quite reasonable, in my opinion.

About a year ago I compared the performance of pysqlite vs. the other
SQLite wrapper, APSW. At the time, APSW was a bit faster, not
excessively, but measurable. In meaningless benchmarks like yours ;-)

So I changed pysqlite here and there to get the same performance as
APSW. Only minor tweaks, nothing spectacular. And a few hardcore tricks
as well, like special-casing *not-subclassed* classes. The result was
pysqlite 2.3.5:
http://oss.itsystementwicklung.de/trac/pysqlite/wiki/2.3.5_Changelog

These optmizations are not yet in Python 2.5.x, but they can be found in
the sqlite3 module of the Python 2.6/3.0 betas.

Well, here are my results of your benchmark :)

-- Gerhard

# with Python 2.5.2's sqlite3 module
gerhard@lara:~/tmp$ python t.py
generating data...
done!
testing 1000000 inserts...
[42.795290946960449, 44.337385892868042, 46.35642409324646]


# with pysqlite 2.5.0, which I released earlier today
gerhard@lara:~/tmp$ python t.py
generating data...
done!
testing 1000000 inserts...
[33.027599096298218, 32.73675012588501, 32.823790073394775]

# SQLite commandline

gerhard@lara:~/tmp$ time sqlite3 -init sqlcmds ':memory:' '.quit'
real 0m32.514s
gerhard@lara:~/tmp$ time sqlite3 -init sqlcmds ':memory:' '.quit'
real 0m32.576s
gerhard@lara:~/tmp$ time sqlite3 -init sqlcmds ':memory:' '.quit'
real 0m32.604s
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,821
Messages
2,569,748
Members
45,726
Latest member
RaleighAll

Latest Threads

Top