Newbie Question: python mysqldb performance question

Discussion in 'Python' started by cjl, May 21, 2007.

  1. cjl

    cjl Guest

    Group:

    I'm new to python and new to mysql.

    I have a csv file that is about 200,000 rows that I want to add to a
    mysql database. Yes, I know that I can do this directly from the
    mysql command line, but I am doing it through a python script so that
    I can munge the data before adding it.

    I have the following example code:

    conn = MySQLdb.connect(db="database", host="localhost", user="root",
    passwd="password")
    c = conn.cursor()

    reader = csv.reader(open(sys.argv[1]))
    for row in reader:
    data1, data2, data3, data4 = row
    data = (data1,data2,data3,data4)
    c.execute("""insert into datatable values (%s, %s, %s, %s)""",
    data)
    conn.commit()

    This takes a really long time to execute, on the order of minutes.
    Directly importing the csv file into mysql using 'load infile' takes
    seconds.

    What am I doing wrong? What can I do to speed up the operation?

    Thanks in advance,
    cjl
     
    cjl, May 21, 2007
    #1
    1. Advertising

  2. On 20 May 2007 16:55:07 -0700, cjl <> declaimed the
    following in comp.lang.python:

    > reader = csv.reader(open(sys.argv[1]))
    > for row in reader:
    > data1, data2, data3, data4 = row
    > data = (data1,data2,data3,data4)


    That set is redundant in this example -- It's the very same as
    saying

    data = row

    > c.execute("""insert into datatable values (%s, %s, %s, %s)""",
    > data)
    > conn.commit()
    >
    > This takes a really long time to execute, on the order of minutes.
    > Directly importing the csv file into mysql using 'load infile' takes
    > seconds.


    "load infile", for local to server files (if so configured) means
    all processing is done directly by the server. Using Python, you are
    going through a TCP/IP connection with send/response traffic for each
    command.


    Step one: Don't commit after every single row... Feed all the rows
    and then commit (among other things, that means an error in the middle
    of reading the CSV rolls back all database activity, allowing you to try
    again without having duplicate records in the database).

    Step two: Don't process the rows one-at-a-time. Build up batches,
    and use .executemany() on the batch.
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, May 21, 2007
    #2
    1. Advertising

  3. cjl

    Guest

    On May 20, 5:55 pm, cjl <> wrote:
    ....snip...
    > conn = MySQLdb.connect(db="database", host="localhost", user="root",
    > passwd="password")
    > c = conn.cursor()
    >
    > reader = csv.reader(open(sys.argv[1]))
    > for row in reader:
    > data1, data2, data3, data4 = row
    > data = (data1,data2,data3,data4)
    > c.execute("""insert into datatable values (%s, %s, %s, %s)""",
    > data)
    > conn.commit()
    >
    > This takes a really long time to execute, on the order of minutes.
    > Directly importing the csv file into mysql using 'load infile' takes
    > seconds.
    >
    > What am I doing wrong? What can I do to speed up the operation?


    In addition to the previous poster's suggestions,
    if you have indexes, foreign keys, or other constraints
    on the table, and you are sure that you are loading
    "good" data, you may want to drop them before doing
    the inserts, and recreate them after. Updating indexes
    and checking contraints can be time consuming.

    But you will probabably never get your code to run
    as fast as the mysql "load" command, which (I'm
    guessing here, don't use mysql much) skips all
    the sql machinery and writes directly to the table.
     
    , May 21, 2007
    #3
  4. cjl

    John Nagle Guest

    cjl wrote:
    > Group:
    >
    > I'm new to python and new to mysql.
    >
    > I have a csv file that is about 200,000 rows that I want to add to a
    > mysql database. Yes, I know that I can do this directly from the
    > mysql command line, but I am doing it through a python script so that
    > I can munge the data before adding it.
    >
    > I have the following example code:
    >
    > conn = MySQLdb.connect(db="database", host="localhost", user="root",
    > passwd="password")
    > c = conn.cursor()
    >
    > reader = csv.reader(open(sys.argv[1]))
    > for row in reader:
    > data1, data2, data3, data4 = row
    > data = (data1,data2,data3,data4)
    > c.execute("""insert into datatable values (%s, %s, %s, %s)""",
    > data)
    > conn.commit()
    >
    > This takes a really long time to execute, on the order of minutes.
    > Directly importing the csv file into mysql using 'load infile' takes
    > seconds.
    >
    > What am I doing wrong? What can I do to speed up the operation?


    "LOAD INFILE" is generally faster than doing
    many inserts. There are ways to speed things up, especially if
    you can reconfigure MySQL to use more memory. You may want to
    load the data without indexing it, then build the indices. Ask
    in a MySQL group, or read the manual.

    If you can use LOAD INFILE, do so. Preferably from an empty
    database. Then it can sort the records and insert them all at once.
    You can create a file for LOAD INFILE from Python, then issue the
    LOAD INFILE command.

    A few minutes isn't a "really long time" for that. I had to do
    15,000,000 INSERT operations a few months back, and it took three days.

    John Nagle
     
    John Nagle, May 21, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jm
    Replies:
    1
    Views:
    519
    alien2_51
    Dec 12, 2003
  2. Dave Merrill
    Replies:
    7
    Views:
    440
    Dave Merrill
    Nov 29, 2004
  3. Alex Meier
    Replies:
    5
    Views:
    1,511
    Steve Holden
    Dec 29, 2004
  4. John Nagle
    Replies:
    4
    Views:
    879
    John Nagle
    Nov 19, 2008
  5. Albert-jan Roskam

    Newbie: Limesurvey and... MySQLdb?

    Albert-jan Roskam, Mar 20, 2009, in forum: Python
    Replies:
    0
    Views:
    253
    Albert-jan Roskam
    Mar 20, 2009
Loading...

Share This Page