perl and mysql: slow inserts with innodb

R

rickyars

i'm trying to insert 500k rows from a text file into a innodb table and
it's taking a little over 4 hours.

would someone tell me what is wrong with this code? would someone show
me how to do a multi-value input in perl (e.g. 10k rows with single
insert statment)?

thanks,

-ricky

my $select_handle =
$dbh->prepare('SELECT agent_id, agent_fullname FROM agents WHERE
agent_shortname = ? AND RDEOUTPUT_rde_id = ?');
my $insert_handle =
$dbh->prepare('INSERT INTO actual VALUES
(DEFAULT,?,?,?,?,?,?,?,?,?)');

while (<$GTRUTH>) {
chomp;
my @gtruth = split;

# get FK: AGENTS_agent_id
$select_handle->execute($gtruth[3],$rde_id);
my ($AGENTS_agent_id, $agent_fullname) = $select_handle->fetchrow;

$insert_handle->execute(
$AGENTS_agent_id, # FK from agents
$rde_id, # FK from rdeoutput
$gtruth_line[0], # ground truth time
0, # target is DEAD
$gtruth_line[7], # move status
$gtruth_line[8] # action
);
}
 
T

Tad McClellan

i'm trying to insert 500k rows from a text file into a innodb table and
it's taking a little over 4 hours.
my $select_handle =
$dbh->prepare('SELECT agent_id, agent_fullname FROM agents WHERE
agent_shortname = ? AND RDEOUTPUT_rde_id = ?');


Does your DB have an index on the agent_shortname and RDEOUTPUT_rde_id columns?
 
X

xhoster

i'm trying to insert 500k rows from a text file into a innodb table and
it's taking a little over 4 hours.

How is the CPU usage divided between mysql and perl during that time?
Where/how are you committing?
would someone tell me what is wrong with this code?

There isn't necessarily anything wrong with your code. Some things
are just slow.
would someone show
me how to do a multi-value input in perl (e.g. 10k rows with single
insert statment)?

Do you know that mysql syntax for doing that? What part of that are you
having trouble with? This is probably one place where I would not use
placeholders, but rather use $dbh->quote($data). If you want to do it in
one command, you could use (from right to left) a map to escape and quote
the data, a join to put commas in, a map to wrap those in paranthesis, and
a join to put commas between them. And with a
while (my @chunk = splice @data,0,10_000) {
around all of it to break it into chunks of 10k.


my $select_handle =
$dbh->prepare('SELECT agent_id, agent_fullname FROM agents WHERE
agent_shortname = ? AND RDEOUTPUT_rde_id = ?');
my $insert_handle =
$dbh->prepare('INSERT INTO actual VALUES
(DEFAULT,?,?,?,?,?,?,?,?,?)');

while (<$GTRUTH>) {
chomp;
my @gtruth = split;

# get FK: AGENTS_agent_id
$select_handle->execute($gtruth[3],$rde_id);
my ($AGENTS_agent_id, $agent_fullname) =
$select_handle->fetchrow;

$insert_handle->execute(
$AGENTS_agent_id, # FK from agents
$rde_id, # FK from
rdeoutput $gtruth_line[0], #
ground truth time 0,
# target is DEAD $gtruth_line[7],
# move status $gtruth_line[8] #
action );

How long does it take if you comment out the $insert_handle->execute and
only do the $select_handle part? How about just printing the formatted
data into a text file (rather than inserting it) then using mysql LOAD DATA
command?

Xho
 
R

rickyars

How is the CPU usage divided between mysql and perl during that time?
Where/how are you committing?

almost no CPU usage and small memory footprint (6MB). the database is
on a computer in another city, but we have T1 lines.
There isn't necessarily anything wrong with your code. Some things
are just slow.

i'm starting to figure this out.
How long does it take if you comment out the $insert_handle->execute and
only do the $select_handle part? How about just printing the formatted

the select handle roughly halves the run time.
data into a text file (rather than inserting it) then using mysql LOAD DATA
command?

this is finally what i resorted to. i can parse the files to CSV text
documents in < 1 min. using LOAD DATA i can fill the table in 40
seconds. i would really like to know why my code is ~400 time slower
than LOAD DATA.
 
B

Brian Wakem

almost no CPU usage and small memory footprint (6MB). the database is
on a computer in another city, but we have T1 lines.


There's your bottleneck. I'd say 500k rows in 4 hours (34.7/secs) is
acceptable under the circumstances. Even on a T1 it will be vastly slower
than if running on a local network or the same machine.

If you can run the script on the same machine as the db I expect you will
cut the 4hrs to nearer 4 minutes.
 
R

rickyars

Everyone, thank you for your comments and suggestions. I ended up
storing foreign keys into hashes and then writing out my data to a CSV
file. Afterward LOAD DATA filled my tables in < 1 minute. Needless to
say we're happy (although I'm not to sure about the aesthetics of the
solution).

-ricky
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top