perl and mysql: slow inserts with innodb

rickyars · Oct 4, 2006

i'm trying to insert 500k rows from a text file into a innodb table and
it's taking a little over 4 hours.

would someone tell me what is wrong with this code? would someone show
me how to do a multi-value input in perl (e.g. 10k rows with single
insert statment)?

thanks,

-ricky

my $select_handle =
$dbh->prepare('SELECT agent_id, agent_fullname FROM agents WHERE
agent_shortname = ? AND RDEOUTPUT_rde_id = ?');
my $insert_handle =
$dbh->prepare('INSERT INTO actual VALUES
(DEFAULT,?,?,?,?,?,?,?,?,?)');

while (<$GTRUTH>) {
chomp;
my @gtruth = split;

# get FK: AGENTS_agent_id
$select_handle->execute($gtruth[3],$rde_id);
my ($AGENTS_agent_id, $agent_fullname) = $select_handle->fetchrow;

$insert_handle->execute(
$AGENTS_agent_id, # FK from agents
$rde_id, # FK from rdeoutput
$gtruth_line[0], # ground truth time
0, # target is DEAD
$gtruth_line[7], # move status
$gtruth_line[8] # action
);
}

Tad McClellan · Oct 4, 2006

i'm trying to insert 500k rows from a text file into a innodb table and
it's taking a little over 4 hours.

my $select_handle =
$dbh->prepare('SELECT agent_id, agent_fullname FROM agents WHERE
agent_shortname = ? AND RDEOUTPUT_rde_id = ?');

Does your DB have an index on the agent_shortname and RDEOUTPUT_rde_id columns?

rickyars · Oct 4, 2006

hi tad,

yes, the table has an index for both columns.

rickyars · Oct 4, 2006

hi tad,

yes, the table has an index for both columns.

xhoster · Oct 4, 2006

i'm trying to insert 500k rows from a text file into a innodb table and
it's taking a little over 4 hours.

How is the CPU usage divided between mysql and perl during that time?
Where/how are you committing?

would someone tell me what is wrong with this code?

There isn't necessarily anything wrong with your code. Some things
are just slow.

would someone show
me how to do a multi-value input in perl (e.g. 10k rows with single
insert statment)?

Do you know that mysql syntax for doing that? What part of that are you
having trouble with? This is probably one place where I would not use
placeholders, but rather use $dbh->quote($data). If you want to do it in
one command, you could use (from right to left) a map to escape and quote
the data, a join to put commas in, a map to wrap those in paranthesis, and
a join to put commas between them. And with a
while (my @chunk = splice @data,0,10_000) {
around all of it to break it into chunks of 10k.

my $select_handle =
$dbh->prepare('SELECT agent_id, agent_fullname FROM agents WHERE
agent_shortname = ? AND RDEOUTPUT_rde_id = ?');
my $insert_handle =
$dbh->prepare('INSERT INTO actual VALUES
(DEFAULT,?,?,?,?,?,?,?,?,?)');

while (<$GTRUTH>) {
chomp;
my @gtruth = split;

# get FK: AGENTS_agent_id
$select_handle->execute($gtruth[3],$rde_id);
my ($AGENTS_agent_id, $agent_fullname) =
$select_handle->fetchrow;

$insert_handle->execute(
$AGENTS_agent_id, # FK from agents
$rde_id, # FK from
rdeoutput $gtruth_line[0], #
ground truth time 0,
# target is DEAD $gtruth_line[7],
# move status $gtruth_line[8] #
action );

How long does it take if you comment out the $insert_handle->execute and
only do the $select_handle part? How about just printing the formatted
data into a text file (rather than inserting it) then using mysql LOAD DATA
command?

}

Xho

rickyars · Oct 4, 2006

How is the CPU usage divided between mysql and perl during that time?

Where/how are you committing?

almost no CPU usage and small memory footprint (6MB). the database is
on a computer in another city, but we have T1 lines.

There isn't necessarily anything wrong with your code. Some things
are just slow.

i'm starting to figure this out.

How long does it take if you comment out the $insert_handle->execute and
only do the $select_handle part? How about just printing the formatted

the select handle roughly halves the run time.

data into a text file (rather than inserting it) then using mysql LOAD DATA
command?

this is finally what i resorted to. i can parse the files to CSV text
documents in < 1 min. using LOAD DATA i can fill the table in 40
seconds. i would really like to know why my code is ~400 time slower
than LOAD DATA.

Brian Wakem · Oct 4, 2006

almost no CPU usage and small memory footprint (6MB). the database is
on a computer in another city, but we have T1 lines.

There's your bottleneck. I'd say 500k rows in 4 hours (34.7/secs) is
acceptable under the circumstances. Even on a T1 it will be vastly slower
than if running on a local network or the same machine.

If you can run the script on the same machine as the db I expect you will
cut the 4hrs to nearer 4 minutes.

rickyars · Oct 4, 2006

Everyone, thank you for your comments and suggestions. I ended up
storing foreign keys into hashes and then writing out my data to a CSV
file. Afterward LOAD DATA filled my tables in < 1 minute. Needless to
say we're happy (although I'm not to sure about the aesthetics of the
solution).

-ricky

Cyrillic web pages with Perl and MySQL	1	Nov 28, 2005
mysql -- cannot connect (can in Perl)	6	Mar 7, 2011
Python, MS SQL, and batch inserts	11	Apr 21, 2009
getting DBI and ODBC to work with PERL and MYSQL data source	1	Mar 22, 2007
dbi mysql problem with placeholders and the default keyword	4	Feb 12, 2009
Search MySQL	2	Aug 15, 2007
MySQL perl script	12	Apr 8, 2004
Text::CSV and Mysql - invalid number of columns	3	Feb 23, 2007

perl and mysql: slow inserts with innodb

rickyars

Tad McClellan

rickyars

rickyars

xhoster

rickyars

Brian Wakem

rickyars

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads