Better performance ...

cms-team · Aug 28, 2006

Hy!

I've got the following code and would like to ask, if there is any
possibilty for more performance. The script inserts about 240
recordsets per second into a MySQL database.

The file (input.txt) contains about 365.000 lines/recordsets.

====================================================
# Grundwerte initialisieren
require "/home/vip/my.properties";
$IMPDAT="$HOME_HOST/input.txt";

if (!open (FP1,"<$IMPDAT"))
{
print ("$IMPDAT nicht vorhanden!\n");
exit(3);
}
#Mit der DB verbinden
$rc=&connect_to_db();
$linecount = 0;

# Alte Tabelleninhalte vorher löschen
$statment="DROP TABLE IF EXISTS `testtabelle`";
$rv=$dbh->do($statment);
$statment="CREATE TABLE `testtabelle` (`apl` varchar(2) NOT NULL
default '',`vsnr` varchar(11) NOT NULL default '',`pvsnr` varchar(8)
NOT NULL default '',`art` char(1) NOT NULL default '',`aktiv` char(1)
NOT NULL default '',`vmnr` varchar(4) NOT NULL default '',`vbeginn`
varchar(8) NOT NULL default '',`vende` varchar(8) default NULL,`zw`
int(1) NOT NULL default '0',`ikm` char(1) NOT NULL default '',`ktoidx`
char(1) default NULL,`nettobeitrag` decimal(9,2) NOT NULL default
'0.00',`bruttoratenbeitrag` decimal(9,2) NOT NULL default
'0.00',`produkttext` varchar(150) NOT NULL default '',`rohbau`
varchar(6) default NULL,PRIMARY KEY (`vsnr`),KEY `pvsnr` (`pvsnr`),KEY
`vm` (`vmnr`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;";
$rv=$dbh->do($statment);

# Nimmt die Startzeit der Anwendung
$startzeit = time;

# Zeilenweise Auslesen der Datei
while (<FP1>)
{
$line = $_;
chop ($line);
$linecount += 1;

$line=~s/\"//g;
($apl,$vsnr1,$vsnr2,$pvsnr,$art,$aktiv,$vmnr,$vbeginn,$vende,$zw,$ikm,$ktoidx,$nettobeitrag,$bruttoratenbeitrag,$produkttext,$rohbau)
= split(/;/,$line);

$vende=~s/ *$//g;
$vende=~s/,/\./g;

$ktoidx=~s/ *$//g;
$ktoidx=~s/,/\./g;

$nettobeitrag=~s/ *$//g;
$nettobeitrag=~s/,/\./g;

$bruttoratenbeitrag=~s/ *$//g;
$bruttoratenbeitrag=~s/,/\./g;

$produkttext=~s/ *$//g;
$produkttext=~s/,/\./g;

$rohbau=~s/ *$//g;
$rohbau=~s/,/\./g;

$statment="INSERT into testtabelle
(apl,vsnr,pvsnr,art,aktiv,vmnr,vbeginn,vende,zw,ikm,ktoidx,nettobeitrag,bruttoratenbeitrag,produkttext,rohbau)
value
('$apl','$vsnr1$vsnr2','$pvsnr','$art','$aktiv','$vmnr','$vbeginn',".($vende
eq "" ? "NULL" : "'".$vende."'").",$zw,'$ikm',".($ktoidx eq "" ? "NULL"
:
"'".$ktoidx."'").",$nettobeitrag,$bruttoratenbeitrag,'$produkttext',".($rohbau
eq "" ? "NULL" : "'".$rohbau."'").")";

$rv=$dbh->do($statment);
}
$rc = $dbh->disconnect;
close(FP1);

# Zeit ausgeben
&Dauer;
$est = $est == 0 ? 1 : $est;
$dsek = int($linecount/$est);
print "Das Script 'laden_all_daten.pl' hat fuer
".punktuiere($linecount)." Datensaetze insgesamt ".$dauer." Minuten
gebraucht (".$dsek." Datensaetze pro Sekunde)\n";
unlink($TEMPDAT);
exit(0);
====================================================

Where can I make this thing run a little faster?

Thx in advance!

Greetings
Marcus Dau

Marc Espie · Aug 28, 2006

Hy!

I've got the following code and would like to ask, if there is any
possibilty for more performance. The script inserts about 240
recordsets per second into a MySQL database.

The file (input.txt) contains about 365.000 lines/recordsets.

Where can I make this thing run a little faster?

Turn autocommit off, and commit just once every n transactions.

Not sure how good it will be with mysql, but sqlite loves it quite
a bit.

Otherwise, use some other database engine. mysql is simple to set up,
but definitely not the fastest.

cms-team · Aug 28, 2006

Thx a lot. This does the trick.

Performance went up from 50 minutes runtime for 30 scripts (tables) to
10 minutes!!!

Greetings
Marcus

goho · Aug 28, 2006

Thx a lot. This does the trick.

Performance went up from 50 minutes runtime for 30 scripts (tables) to
10 minutes!!!

Greetings
Marcus

Your problem doesn't have a lot to do with perl; I suggest you look
towards the mysql group, however, that said:

I've not used MySQL but from experience with other DBMS you may get
significant improvement by creating the pvsnr and vm keys (indexes)
AFTER loading the data.

In some databases it helps to offline pre-sort in order of the primary
key - try it. Without knowing the internal structure of InnoDB I cant
say whether it will improve things or be a disaster but you wont know
until you try.

Be aware of ROLLBACK issues if you dont regularly commit; not a subject
for discussion here, I'll leave that for you to follow up. If you are
looking for a book, Michael Kofler's 'The Definitive Guide to MySQL'
looks pretty reasonable

Good luck.

Paul Lalli · Aug 28, 2006

I've got the following code and would like to ask, if there is any
possibilty for more performance. The script inserts about 240
recordsets per second into a MySQL database.

while (<FP1>)
{
$line = $_;

Why make two assignments?

while (my $line = said:
chop ($line);

You want chomp(), not chop(). There is a difference. Look at perldoc
-f chomp and perldoc -f chop

$linecount += 1;

No need for this variable. Perl maintains the linecount in the $.
variable for you.

$line=~s/\"//g;
($apl,$vsnr1,$vsnr2,$pvsnr,$art,$aktiv,$vmnr,$vbeginn,$vende,$zw,$ikm,$ktoidx,$nettobeitrag,$bruttoratenbeitrag,$produkttext,$rohbau)
= split(/;/,$line);

$vende=~s/ *$//g;
$vende=~s/,/\./g;

$ktoidx=~s/ *$//g;
$ktoidx=~s/,/\./g;

$nettobeitrag=~s/ *$//g;
$nettobeitrag=~s/,/\./g;

$bruttoratenbeitrag=~s/ *$//g;
$bruttoratenbeitrag=~s/,/\./g;

$produkttext=~s/ *$//g;
$produkttext=~s/,/\./g;

$rohbau=~s/ *$//g;
$rohbau=~s/,/\./g;

For readability sake, make this one for loop that iterates over your
six variables. For performance sake, change the * to a + and change
the second s///g to a tr///

for ($vende, $ktoidx, $nettobeitrag, $bruttoratenbeitrag, $producttext,
$rohbau) {
s/ +$//g;
tr/,/./;
}

$statment="INSERT into testtabelle
(apl,vsnr,pvsnr,art,aktiv,vmnr,vbeginn,vende,zw,ikm,ktoidx,nettobeitrag,bruttoratenbeitrag,produkttext,rohbau)
value
('$apl','$vsnr1$vsnr2','$pvsnr','$art','$aktiv','$vmnr','$vbeginn',".($vende
eq "" ? "NULL" : "'".$vende."'").",$zw,'$ikm',".($ktoidx eq "" ? "NULL"
:
"'".$ktoidx."'").",$nettobeitrag,$bruttoratenbeitrag,'$produkttext',".($rohbau
eq "" ? "NULL" : "'".$rohbau."'").")";

$rv=$dbh->do($statment);

Here's a large performance hit. Instead of compiling this SQL each
time through, prepare this SQL *once*, outside the loop. Then execute
it many times in the loop.

#outside the loop:
my $statement = <<"EO_SQL";
INSERT INTO testabelle
(apl,vsnr,pvsnr,art,aktiv,vmnr,vbeginn,vende,zw,ikm,ktoidx,nettobeitrag,bruttoratenbeitrag,produkttext,rohbau)
VALUE
(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);
EO_SQL
my $sth = $dbh->prepare($statement);
#...

#later, inside the loop:
$sth->execute($apl, $vsnr1 . $vnsr2, $pvsnr, $art, $aktiv, $vmnr,
$vbeginn, ($vende eq q{} ? 'NULL' : $vende), $zw, $ikm, ($ktoidx eq q{}
? 'NULL' : $ktoidx), $nettobeitrag, $bruttoratenbeitrag, $produkttext,
($rohbau eq q{} ? 'NULL' : $rohbau));

You'll note that this also saves you the annoyance of having to figure
out all those quotes in your original SQL.

The previous recommendations given to you are good - committing only
after all the inserts will definately save you time - but this should
help as well.

Paul Lalli

Dr.Ruud · Aug 28, 2006

(e-mail address removed) schreef:

$vende=~s/ *$//g;

I don't think the g-modifier is needed. Also change the "*" to a "+",
but Paul already showed you that.

J. Gleixner · Aug 28, 2006

Hy!

I've got the following code and would like to ask, if there is any
possibilty for more performance. The script inserts about 240
recordsets per second into a MySQL database.

The file (input.txt) contains about 365.000 lines/recordsets.

Also, see MySQL specific techniques, especially "LOAD DATA INFILE",
which they say "is usually 20 times faster than using INSERT statements."

http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

TypeError: not all arguments converted during string formatting	2	Dec 13, 2013
Performance problem with Pthread code	1	Mar 26, 2010
Hash table implementation.	38	Aug 11, 2011
Better crypto hash functions, long, with code	2	Aug 26, 2005
JDBC-ODBC and MS SQL	1	May 1, 2005
Storing values in a grid	1	Sep 25, 2006
Symbolic computation : the derivative of a quotient.	0	Mar 10, 2007
reference-counted dynamic tag-union objects	21	Mar 4, 2009

Better performance ...

cms-team

Marc Espie

cms-team

goho

Paul Lalli

Dr.Ruud

J. Gleixner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads