Better performance ...

Discussion in 'Perl Misc' started by cms-team@ivi.de, Aug 28, 2006.

  1. Guest

    Hy!

    I've got the following code and would like to ask, if there is any
    possibilty for more performance. The script inserts about 240
    recordsets per second into a MySQL database.

    The file (input.txt) contains about 365.000 lines/recordsets.

    ====================================================
    # Grundwerte initialisieren
    require "/home/vip/my.properties";
    $IMPDAT="$HOME_HOST/input.txt";

    if (!open (FP1,"<$IMPDAT"))
    {
    print ("$IMPDAT nicht vorhanden!\n");
    exit(3);
    }
    #Mit der DB verbinden
    $rc=&connect_to_db();
    $linecount = 0;

    # Alte Tabelleninhalte vorher löschen
    $statment="DROP TABLE IF EXISTS `testtabelle`";
    $rv=$dbh->do($statment);
    $statment="CREATE TABLE `testtabelle` (`apl` varchar(2) NOT NULL
    default '',`vsnr` varchar(11) NOT NULL default '',`pvsnr` varchar(8)
    NOT NULL default '',`art` char(1) NOT NULL default '',`aktiv` char(1)
    NOT NULL default '',`vmnr` varchar(4) NOT NULL default '',`vbeginn`
    varchar(8) NOT NULL default '',`vende` varchar(8) default NULL,`zw`
    int(1) NOT NULL default '0',`ikm` char(1) NOT NULL default '',`ktoidx`
    char(1) default NULL,`nettobeitrag` decimal(9,2) NOT NULL default
    '0.00',`bruttoratenbeitrag` decimal(9,2) NOT NULL default
    '0.00',`produkttext` varchar(150) NOT NULL default '',`rohbau`
    varchar(6) default NULL,PRIMARY KEY (`vsnr`),KEY `pvsnr` (`pvsnr`),KEY
    `vm` (`vmnr`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;";
    $rv=$dbh->do($statment);

    # Nimmt die Startzeit der Anwendung
    $startzeit = time;

    # Zeilenweise Auslesen der Datei
    while (<FP1>)
    {
    $line = $_;
    chop ($line);
    $linecount += 1;

    $line=~s/\"//g;
    ($apl,$vsnr1,$vsnr2,$pvsnr,$art,$aktiv,$vmnr,$vbeginn,$vende,$zw,$ikm,$ktoidx,$nettobeitrag,$bruttoratenbeitrag,$produkttext,$rohbau)
    = split(/;/,$line);

    $vende=~s/ *$//g;
    $vende=~s/,/\./g;

    $ktoidx=~s/ *$//g;
    $ktoidx=~s/,/\./g;

    $nettobeitrag=~s/ *$//g;
    $nettobeitrag=~s/,/\./g;

    $bruttoratenbeitrag=~s/ *$//g;
    $bruttoratenbeitrag=~s/,/\./g;

    $produkttext=~s/ *$//g;
    $produkttext=~s/,/\./g;

    $rohbau=~s/ *$//g;
    $rohbau=~s/,/\./g;

    $statment="INSERT into testtabelle
    (apl,vsnr,pvsnr,art,aktiv,vmnr,vbeginn,vende,zw,ikm,ktoidx,nettobeitrag,bruttoratenbeitrag,produkttext,rohbau)
    value
    ('$apl','$vsnr1$vsnr2','$pvsnr','$art','$aktiv','$vmnr','$vbeginn',".($vende
    eq "" ? "NULL" : "'".$vende."'").",$zw,'$ikm',".($ktoidx eq "" ? "NULL"
    :
    "'".$ktoidx."'").",$nettobeitrag,$bruttoratenbeitrag,'$produkttext',".($rohbau
    eq "" ? "NULL" : "'".$rohbau."'").")";

    $rv=$dbh->do($statment);
    }
    $rc = $dbh->disconnect;
    close(FP1);

    # Zeit ausgeben
    &Dauer;
    $est = $est == 0 ? 1 : $est;
    $dsek = int($linecount/$est);
    print "Das Script 'laden_all_daten.pl' hat fuer
    ".punktuiere($linecount)." Datensaetze insgesamt ".$dauer." Minuten
    gebraucht (".$dsek." Datensaetze pro Sekunde)\n";
    unlink($TEMPDAT);
    exit(0);
    ====================================================

    Where can I make this thing run a little faster?

    Thx in advance!

    Greetings
    Marcus Dau
     
    , Aug 28, 2006
    #1
    1. Advertising

  2. Marc Espie Guest

    In article <>,
    <> wrote:
    >Hy!
    >
    >I've got the following code and would like to ask, if there is any
    >possibilty for more performance. The script inserts about 240
    >recordsets per second into a MySQL database.
    >
    >The file (input.txt) contains about 365.000 lines/recordsets.
    >


    >Where can I make this thing run a little faster?
    >

    Turn autocommit off, and commit just once every n transactions.

    Not sure how good it will be with mysql, but sqlite loves it quite
    a bit.

    Otherwise, use some other database engine. mysql is simple to set up,
    but definitely not the fastest.
     
    Marc Espie, Aug 28, 2006
    #2
    1. Advertising

  3. Guest

    Thx a lot. This does the trick.

    Performance went up from 50 minutes runtime for 30 scripts (tables) to
    10 minutes!!!

    Greetings
    Marcus
     
    , Aug 28, 2006
    #3
  4. goho Guest

    wrote:
    > Thx a lot. This does the trick.
    >
    > Performance went up from 50 minutes runtime for 30 scripts (tables) to
    > 10 minutes!!!
    >
    > Greetings
    > Marcus


    Your problem doesn't have a lot to do with perl; I suggest you look
    towards the mysql group, however, that said:

    I've not used MySQL but from experience with other DBMS you may get
    significant improvement by creating the pvsnr and vm keys (indexes)
    AFTER loading the data.

    In some databases it helps to offline pre-sort in order of the primary
    key - try it. Without knowing the internal structure of InnoDB I cant
    say whether it will improve things or be a disaster but you wont know
    until you try.

    Be aware of ROLLBACK issues if you dont regularly commit; not a subject
    for discussion here, I'll leave that for you to follow up. If you are
    looking for a book, Michael Kofler's 'The Definitive Guide to MySQL'
    looks pretty reasonable

    Good luck.
     
    goho, Aug 28, 2006
    #4
  5. Paul Lalli Guest

    wrote:
    > I've got the following code and would like to ask, if there is any
    > possibilty for more performance. The script inserts about 240
    > recordsets per second into a MySQL database.



    <snip>

    > while (<FP1>)
    > {
    > $line = $_;


    Why make two assignments?

    while (my $line = <FP1>) {

    > chop ($line);


    You want chomp(), not chop(). There is a difference. Look at perldoc
    -f chomp and perldoc -f chop

    > $linecount += 1;


    No need for this variable. Perl maintains the linecount in the $.
    variable for you.

    >
    > $line=~s/\"//g;
    > ($apl,$vsnr1,$vsnr2,$pvsnr,$art,$aktiv,$vmnr,$vbeginn,$vende,$zw,$ikm,$ktoidx,$nettobeitrag,$bruttoratenbeitrag,$produkttext,$rohbau)
    > = split(/;/,$line);
    >
    > $vende=~s/ *$//g;
    > $vende=~s/,/\./g;
    >
    > $ktoidx=~s/ *$//g;
    > $ktoidx=~s/,/\./g;
    >
    > $nettobeitrag=~s/ *$//g;
    > $nettobeitrag=~s/,/\./g;
    >
    > $bruttoratenbeitrag=~s/ *$//g;
    > $bruttoratenbeitrag=~s/,/\./g;
    >
    > $produkttext=~s/ *$//g;
    > $produkttext=~s/,/\./g;
    >
    > $rohbau=~s/ *$//g;
    > $rohbau=~s/,/\./g;


    For readability sake, make this one for loop that iterates over your
    six variables. For performance sake, change the * to a + and change
    the second s///g to a tr///

    for ($vende, $ktoidx, $nettobeitrag, $bruttoratenbeitrag, $producttext,
    $rohbau) {
    s/ +$//g;
    tr/,/./;
    }

    >
    > $statment="INSERT into testtabelle
    > (apl,vsnr,pvsnr,art,aktiv,vmnr,vbeginn,vende,zw,ikm,ktoidx,nettobeitrag,bruttoratenbeitrag,produkttext,rohbau)
    > value
    > ('$apl','$vsnr1$vsnr2','$pvsnr','$art','$aktiv','$vmnr','$vbeginn',".($vende
    > eq "" ? "NULL" : "'".$vende."'").",$zw,'$ikm',".($ktoidx eq "" ? "NULL"
    > :
    > "'".$ktoidx."'").",$nettobeitrag,$bruttoratenbeitrag,'$produkttext',".($rohbau
    > eq "" ? "NULL" : "'".$rohbau."'").")";
    >
    > $rv=$dbh->do($statment);


    Here's a large performance hit. Instead of compiling this SQL each
    time through, prepare this SQL *once*, outside the loop. Then execute
    it many times in the loop.

    #outside the loop:
    my $statement = <<"EO_SQL";
    INSERT INTO testabelle
    (apl,vsnr,pvsnr,art,aktiv,vmnr,vbeginn,vende,zw,ikm,ktoidx,nettobeitrag,bruttoratenbeitrag,produkttext,rohbau)
    VALUE
    (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);
    EO_SQL
    my $sth = $dbh->prepare($statement);
    #...

    #later, inside the loop:
    $sth->execute($apl, $vsnr1 . $vnsr2, $pvsnr, $art, $aktiv, $vmnr,
    $vbeginn, ($vende eq q{} ? 'NULL' : $vende), $zw, $ikm, ($ktoidx eq q{}
    ? 'NULL' : $ktoidx), $nettobeitrag, $bruttoratenbeitrag, $produkttext,
    ($rohbau eq q{} ? 'NULL' : $rohbau));

    You'll note that this also saves you the annoyance of having to figure
    out all those quotes in your original SQL.


    The previous recommendations given to you are good - committing only
    after all the inserts will definately save you time - but this should
    help as well.

    Paul Lalli
     
    Paul Lalli, Aug 28, 2006
    #5
  6. Dr.Ruud Guest

    schreef:

    > $vende=~s/ *$//g;


    I don't think the g-modifier is needed. Also change the "*" to a "+",
    but Paul already showed you that.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Aug 28, 2006
    #6
  7. J. Gleixner Guest

    wrote:
    > Hy!
    >
    > I've got the following code and would like to ask, if there is any
    > possibilty for more performance. The script inserts about 240
    > recordsets per second into a MySQL database.
    >
    > The file (input.txt) contains about 365.000 lines/recordsets.


    Also, see MySQL specific techniques, especially "LOAD DATA INFILE",
    which they say "is usually 20 times faster than using INSERT statements."

    http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
     
    J. Gleixner, Aug 28, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Scott Allen
    Replies:
    1
    Views:
    374
    arkam
    Apr 14, 2004
  2. arkam
    Replies:
    0
    Views:
    404
    arkam
    Apr 13, 2004
  3. Peter Bencsik
    Replies:
    2
    Views:
    856
  4. Andrew Thompson
    Replies:
    8
    Views:
    158
    Premshree Pillai
    Jun 7, 2005
  5. Replies:
    2
    Views:
    61
    Mark H Harris
    May 13, 2014
Loading...

Share This Page