DBI Performance Issues

Discussion in 'Perl Misc' started by Chris H., Aug 25, 2006.

  1. Chris H.

    Chris H. Guest

    im currently working on an issue for work, in which processing of a pipe
    delimited text file consisting of 356,400 lines of data with 19 fields in
    each row.

    the issue that i am having is not the performance of reading and splitting
    this data, but in writing to the database. using dbi/dbd::mysql, using
    'localhost' as the server the script connects to i get the following
    results:

    > time ./ushbh-dev.pl hbh.txt

    Reading input file - 08/25/06 10:19:09
    Updating us_hourly table - 08/25/06 10:19:09
    1000 records processed.

    real 0m1.096s
    user 0m0.424s
    sys 0m0.008s

    ------------------------

    using 'servername.tld' from a remote machine, i get the following
    performance:


    > time ./ushbh-dev.pl hbh.txt

    Reading input file - 08/25/06 10:17:49
    Updating us_hourly table - 08/25/06 10:17:49
    1000 records processed.

    real 1m11.606s
    user 0m0.250s
    sys 0m0.034s

    ------------------------

    the issue seems to be with remote connections either through the dbi
    module, or the dbd::mysql driver. this also appears with the sybase dbi
    driver going from a unix machine to a mssql machine.

    are there any remote connection variables or performance enchancing calls
    that im missing?

    the sql query thats being run is:

    update table foo set var1 = 'foo', var2 = 'foo2' where var3 = 'bar' and
    var4 = 'bar2';

    the table is indexed based on var3 and var4, which did show a huge
    increase in performance once it was indexed properly.

    this is data that needs to be updated every 15 minutes, so dropping the
    data and re-inserting is not an option.

    thanks for any help or insight you can provide on this issue.

    --
    Chris H.
    http://www.nasland.nu
     
    Chris H., Aug 25, 2006
    #1
    1. Advertising

  2. Chris H.

    Guest

    "Chris H." <> wrote:
    > im currently working on an issue for work, in which processing of a pipe
    > delimited text file consisting of 356,400 lines of data with 19 fields in
    > each row.
    >
    > the issue that i am having is not the performance of reading and
    > splitting this data, but in writing to the database. using
    > dbi/dbd::mysql, using 'localhost' as the server the script connects to i
    > get the following results:
    >
    > > time ./ushbh-dev.pl hbh.txt

    > Reading input file - 08/25/06 10:19:09
    > Updating us_hourly table - 08/25/06 10:19:09
    > 1000 records processed.
    >
    > real 0m1.096s
    > user 0m0.424s
    > sys 0m0.008s
    >
    > ------------------------
    >
    > using 'servername.tld' from a remote machine, i get the following
    > performance:
    >
    > > time ./ushbh-dev.pl hbh.txt

    > Reading input file - 08/25/06 10:17:49
    > Updating us_hourly table - 08/25/06 10:17:49
    > 1000 records processed.
    >
    > real 1m11.606s
    > user 0m0.250s
    > sys 0m0.034s
    >
    > ------------------------


    How fast is your network connection? I see barely any change in
    performance at all between local and remote.


    > the issue seems to be with remote connections either through the dbi
    > module, or the dbd::mysql driver.


    I would suggest it is at a lower lever, at the network or at the compiled
    binaries that DBD::mysql links against. Have you implemented the same
    thing in another language (C, Java, Python) and seen better performance?
    If not, I doubt your problem has anything to do with Perl or perl modules
    per se.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 25, 2006
    #2
    1. Advertising

  3. Chris H.

    Chris H. Guest

    On Fri, 25 Aug 2006 17:56:42 +0000, xhoster wrote:
    >
    > How fast is your network connection? I see barely any change in
    > performance at all between local and remote.
    >


    the real machine will be 100mbit to 100mbit via lan. the testing machines
    that i used are from my home cable connection (8mb down/2mb up) to 100mbit
    colo. its easy to blame the cable connection, however the traffic going
    across the line during the updates are a whopping 7kbit/s which a 14.4k
    could handle. thats why im at a loss for ideas. its like its refusing to
    use the cpu or the network hardware during remote tests. as i dont have
    iptraf on the actual machine that will run this, i cant say for certain
    how much bandwidth its using on the lan...but i'd imaging not much if any
    more than my home to colo tests.


    > I would suggest it is at a lower lever, at the network or at the compiled
    > binaries that DBD::mysql links against. Have you implemented the same
    > thing in another language (C, Java, Python) and seen better performance?
    > If not, I doubt your problem has anything to do with Perl or perl modules
    > per se.
    >
    > Xho


    i have not implemented this in another language, however, there are a few
    changes. first the live script is run from a sun solaris machine using
    dbi's sybase odbc driver to connect to a mssql database on a windows 2003
    server machine. the dev script i've modified to suit my environment which
    consists of mysql. the tables i have set up identical, and the data file
    was scp'ed from the live machine and run against both databases to check
    for performance differences.

    the only difference short of this i can find is that on my colo box
    running mysql local, and running the script local with 'localhost' as the
    server to connect to will use 50% cpu for the perl script, and 50% cpu for
    the mysql process.

    running it from my home machine to the mysql box using 'domain.tld' as the
    server to connect to, it uses less than .5% cpu as if its not even working
    or attempting to work.

    this effect is seen in the live version of the script too, with the remote
    server in the connect string, the script on the solaris box will use less
    than 1% cpu and the mssql server machine under task manager reports a cpu
    usage of 8%-14% (it doesnt run this database exclusively).

    i realize there are quite a few variables at play here, but im hoping its
    something really simple im missing thats blocking it from just running at
    its true speed.

    i cant say how the dbi or dbd drivers on the solaris machine are compiled,
    as i didnt set them up. the dbi and dbd drivers on both my home machine
    and the colo box are standard gentoo compiled versions. as far as the
    ebuild's are concerned, they are setting no additional options from the
    defaults.

    thanks.

    --
    Chris H.
    http://www.nasland.nu
     
    Chris H., Aug 28, 2006
    #3
  4. Chris H.

    Guest

    "Chris H." <> wrote:
    > On Fri, 25 Aug 2006 17:56:42 +0000, xhoster wrote:
    > >
    > > How fast is your network connection? I see barely any change in
    > > performance at all between local and remote.
    > >

    >
    > the real machine will be 100mbit to 100mbit via lan. the testing machines
    > that i used are from my home cable connection (8mb down/2mb up) to
    > 100mbit colo. its easy to blame the cable connection, however the traffic
    > going across the line during the updates are a whopping 7kbit/s which a
    > 14.4k could handle. thats why im at a loss for ideas.


    You are not making the important distiction between throughput and latency.
    Your Perl script sends the sql command, then waits to get a response.
    While waiting, it does nothing--it doesn't use CPU and/or bandwidth by
    "sending ahead on prospect" more update statements. So you are limited by
    latency, not by throughput. With an insert, you can circumvent this by
    using the MySQL-specific multiple row insert statement to add many rows per
    network round-trip. Maybe you can use the MySQL-specific "INSERT...ON
    DUPLICATE KEY UPDATE" syntax to accomplish the same thing--just a
    suggestion, I've never looked into it in detail myself. However, if your
    real situation will have the two machines on the same LAN, then latency
    will probably not be a problem. You simply can't do benchmarking on such
    disparate setups and expect the results to be meaningful.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 28, 2006
    #4
  5. Chris H.

    Chris H. Guest

    On Mon, 28 Aug 2006 16:27:31 +0000, xhoster wrote:

    > You are not making the important distiction between throughput and latency.
    > Your Perl script sends the sql command, then waits to get a response.
    > While waiting, it does nothing--it doesn't use CPU and/or bandwidth by
    > "sending ahead on prospect" more update statements. So you are limited by
    > latency, not by throughput. With an insert, you can circumvent this by
    > using the MySQL-specific multiple row insert statement to add many rows per
    > network round-trip. Maybe you can use the MySQL-specific "INSERT...ON
    > DUPLICATE KEY UPDATE" syntax to accomplish the same thing--just a
    > suggestion, I've never looked into it in detail myself. However, if your
    > real situation will have the two machines on the same LAN, then latency
    > will probably not be a problem. You simply can't do benchmarking on such
    > disparate setups and expect the results to be meaningful.
    >
    > Xho


    i understand the differences in the setup, but i dont see latency playing
    that much of a role in this issue. ping response from solaris to windows
    on the lan (live setup) is 2.73ms. ping response from my home machine to
    my colo box is 17.5ms. my issue is that the two setups perform the same in
    regards to how slow they are getting the information to the server. i can
    understand it being slower, that's a given, but what im having an issue
    with is that it takes just as long to update 10 records remotely as 1,000
    locally...then something is going wrong somewhere. i'd suspect it should
    be faster than that.

    thanks again for all the information thus far, it is giving me different
    ideas to try. i'll definately look at the 'insert...on duplicate key
    update' documentation to see if it'll work.

    --
    Chris H.
    http://www.nasland.nu
     
    Chris H., Aug 28, 2006
    #5
  6. Chris H.

    Guest

    "Chris H." <> wrote:
    > On Mon, 28 Aug 2006 16:27:31 +0000, xhoster wrote:
    >
    > > You are not making the important distiction between throughput and
    > > latency. Your Perl script sends the sql command, then waits to get a
    > > response. While waiting, it does nothing--it doesn't use CPU and/or
    > > bandwidth by "sending ahead on prospect" more update statements. So
    > > you are limited by latency, not by throughput. With an insert, you can
    > > circumvent this by using the MySQL-specific multiple row insert
    > > statement to add many rows per network round-trip. Maybe you can use
    > > the MySQL-specific "INSERT...ON DUPLICATE KEY UPDATE" syntax to
    > > accomplish the same thing--just a suggestion, I've never looked into it
    > > in detail myself. However, if your real situation will have the two
    > > machines on the same LAN, then latency will probably not be a problem.
    > > You simply can't do benchmarking on such disparate setups and expect
    > > the results to be meaningful.
    > >
    > > Xho

    >
    > i understand the differences in the setup, but i dont see latency playing
    > that much of a role in this issue. ping response from solaris to windows
    > on the lan (live setup) is 2.73ms. ping response from my home machine to
    > my colo box is 17.5ms.


    Is that under load or under no load? Anyway, you reported 71 seconds for
    1000 records, which is 71ms per record. 71ms is only fourfold more than
    17.5 ms, so I would say that latency is definitely in the ball park to be a
    problem. (I would expect MySQL to have more overhead then ping--4 times
    more would not greatly surprise me.)

    > my issue is that the two setups perform the same
    > in regards to how slow they are getting the information to the server. i
    > can understand it being slower, that's a given, but what im having an
    > issue with is that it takes just as long to update 10 records remotely as
    > 1,000 locally...then something is going wrong somewhere. i'd suspect it
    > should be faster than that.


    I'm afraid you lost me. You have a localhost msyql connection with 1000
    records per second, and an over-cable-modem mysql connection with 14
    records per second. Do you have a *mysql* connection on a LAN? If so,
    what speed did that provide?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 28, 2006
    #6
  7. On 2006-08-28 19:47, Chris H. <> wrote:
    > On Mon, 28 Aug 2006 16:27:31 +0000, xhoster wrote:
    >
    >> You are not making the important distiction between throughput and latency.
    >> Your Perl script sends the sql command, then waits to get a response.
    >> While waiting, it does nothing--it doesn't use CPU and/or bandwidth by
    >> "sending ahead on prospect" more update statements. So you are limited by
    >> latency, not by throughput. With an insert, you can circumvent this by
    >> using the MySQL-specific multiple row insert statement to add many rows per
    >> network round-trip.


    DBI also has an execute_array method, but I don't know if this is
    implemented by DBD::MySQL. If it is, it can improve performance quite
    dramatically - I've measured a factor 50 in an (admittedly very simple)
    benchmark with Oracle.

    > i understand the differences in the setup, but i dont see latency playing
    > that much of a role in this issue. ping response from solaris to windows
    > on the lan (live setup) is 2.73ms.


    This seems very slow for a 100Mbit network. 0.27ms is more what I would
    expect.

    > ping response from my home machine to my colo box is 17.5ms. my issue
    > is that the two setups perform the same in regards to how slow they
    > are getting the information to the server. i can understand it being
    > slower, that's a given, but what im having an issue with is that it
    > takes just as long to update 10 records remotely as 1,000 locally...


    ping localhost reports an rtt of 0.05 ms on my (rather old) laptop. I
    assume your server is at least as fast. So the latency of your broadband
    connection is at least 350 times longer than that of the loopback
    interface. So I don't find it very surprising that a mostly
    latency-bound job takes 71 times as long. Assuming there are four round
    trips per insert (don't know why there should be so many) and an acual
    processing time of 0.8 ms we would arrive almost exactly at your
    measurements:

    broadband: 4 * 17.5ms + 0.8ms = 70.8ms
    loopback: 4 * 0.05ms + 0.8ms = 1.0ms

    If this theory is correct you should expect
    4 * 2.73ms + 0.8ms = 11.72ms
    per insert (or about 85 inserts/second) over the LAN.


    hp

    --
    _ | Peter J. Holzer | > Wieso sollte man etwas erfinden was nicht
    |_|_) | Sysadmin WSR | > ist?
    | | | | Was sonst wäre der Sinn des Erfindens?
    __/ | http://www.hjp.at/ | -- P. Einstein u. V. Gringmuth in desd
     
    Peter J. Holzer, Aug 29, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ulloa
    Replies:
    1
    Views:
    548
    Juha Laiho
    Jul 22, 2004
  2. Daniel Berger

    dbi, oracle parse issues again

    Daniel Berger, Apr 1, 2004, in forum: Ruby
    Replies:
    0
    Views:
    89
    Daniel Berger
    Apr 1, 2004
  3. Jerome Hauss
    Replies:
    0
    Views:
    189
    Jerome Hauss
    Oct 13, 2004
  4. Asby

    Mason, DBI, and DBI::Pg

    Asby, Jul 24, 2003, in forum: Perl Misc
    Replies:
    0
    Views:
    195
  5. Tim Haynes
    Replies:
    3
    Views:
    159
    Ron Reidy
    Sep 13, 2003
Loading...

Share This Page