Newbie Question: python mysqldb performance question

cjl · May 20, 2007

Group:

I'm new to python and new to mysql.

I have a csv file that is about 200,000 rows that I want to add to a
mysql database. Yes, I know that I can do this directly from the
mysql command line, but I am doing it through a python script so that
I can munge the data before adding it.

I have the following example code:

conn = MySQLdb.connect(db="database", host="localhost", user="root",
passwd="password")
c = conn.cursor()

reader = csv.reader(open(sys.argv[1]))
for row in reader:
data1, data2, data3, data4 = row
data = (data1,data2,data3,data4)
c.execute("""insert into datatable values (%s, %s, %s, %s)""",
data)
conn.commit()

This takes a really long time to execute, on the order of minutes.
Directly importing the csv file into mysql using 'load infile' takes
seconds.

What am I doing wrong? What can I do to speed up the operation?

Thanks in advance,
cjl

Dennis Lee Bieber · May 20, 2007

reader = csv.reader(open(sys.argv[1]))
for row in reader:
data1, data2, data3, data4 = row
data = (data1,data2,data3,data4)

That set is redundant in this example -- It's the very same as
saying

data = row

c.execute("""insert into datatable values (%s, %s, %s, %s)""",
data)
conn.commit()

This takes a really long time to execute, on the order of minutes.
Directly importing the csv file into mysql using 'load infile' takes
seconds.

"load infile", for local to server files (if so configured) means
all processing is done directly by the server. Using Python, you are
going through a TCP/IP connection with send/response traffic for each
command.

Step one: Don't commit after every single row... Feed all the rows
and then commit (among other things, that means an error in the middle
of reading the CSV rolls back all database activity, allowing you to try
again without having duplicate records in the database).

Step two: Don't process the rows one-at-a-time. Build up batches,
and use .executemany() on the batch.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

rurpy · May 20, 2007

On May 20 said:
conn = MySQLdb.connect(db="database", host="localhost", user="root",
passwd="password")
c = conn.cursor()

reader = csv.reader(open(sys.argv[1]))
for row in reader:
data1, data2, data3, data4 = row
data = (data1,data2,data3,data4)
c.execute("""insert into datatable values (%s, %s, %s, %s)""",
data)
conn.commit()

This takes a really long time to execute, on the order of minutes.
Directly importing the csv file into mysql using 'load infile' takes
seconds.

What am I doing wrong? What can I do to speed up the operation?

In addition to the previous poster's suggestions,
if you have indexes, foreign keys, or other constraints
on the table, and you are sure that you are loading
"good" data, you may want to drop them before doing
the inserts, and recreate them after. Updating indexes
and checking contraints can be time consuming.

But you will probabably never get your code to run
as fast as the mysql "load" command, which (I'm
guessing here, don't use mysql much) skips all
the sql machinery and writes directly to the table.

John Nagle · May 20, 2007

cjl said:
Group:

I'm new to python and new to mysql.

I have a csv file that is about 200,000 rows that I want to add to a
mysql database. Yes, I know that I can do this directly from the
mysql command line, but I am doing it through a python script so that
I can munge the data before adding it.

I have the following example code:

conn = MySQLdb.connect(db="database", host="localhost", user="root",
passwd="password")
c = conn.cursor()

reader = csv.reader(open(sys.argv[1]))
for row in reader:
data1, data2, data3, data4 = row
data = (data1,data2,data3,data4)
c.execute("""insert into datatable values (%s, %s, %s, %s)""",
data)
conn.commit()

This takes a really long time to execute, on the order of minutes.
Directly importing the csv file into mysql using 'load infile' takes
seconds.

What am I doing wrong? What can I do to speed up the operation?

"LOAD INFILE" is generally faster than doing
many inserts. There are ways to speed things up, especially if
you can reconfigure MySQL to use more memory. You may want to
load the data without indexing it, then build the indices. Ask
in a MySQL group, or read the manual.

If you can use LOAD INFILE, do so. Preferably from an empty
database. Then it can sort the records and insert them all at once.
You can create a file for LOAD INFILE from Python, then issue the
LOAD INFILE command.

A few minutes isn't a "really long time" for that. I had to do
15,000,000 INSERT operations a few months back, and it took three days.

John Nagle

MySQLdb	2	Jan 29, 2008
MySQLdb not playing nice with unicode	1	Mar 30, 2013
MySQLdb syntax issues - HELP	10	Dec 16, 2007
multi-result set MySQLdb queries.	0	Feb 7, 2013
Problem with MySQLdb and mod_python	4	Jul 15, 2008
JSON from Python mysqldb	4	Aug 26, 2008
MySQLdb trouble	6	May 10, 2006
Strange array.array performance	0	Feb 19, 2009

Newbie Question: python mysqldb performance question

cjl

Dennis Lee Bieber

rurpy

John Nagle

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads