speeding up cgi perl

Discussion in 'Perl Misc' started by skeldoy, Dec 17, 2008.

  1. skeldoy

    skeldoy Guest

    Hey! I am working on an application that features a huge database
    (mysql) and some cgi (perl) for listing, sorting, searching,
    dupechecking and more. I see that the configuration for the mysql is
    pretty much spot on - most of the data is cached and so the mysqld-
    process isn't really doing much in terms of causing bottlenecks. But I
    believe that the perl-code may be the bottleneck here. I have turned
    off buffering completely and I render pretty much just the things that
    are needed. Still it can take up to a minute to print (in html) a
    query that returns in a second on the mysql-console.

    The output from the cgi is around 15MB for every operation the user
    does, so I see the potential for slowness right there, in the sheer
    amount of data that has to be produced and transmitted over the net,
    but still I don't really understand what I have done to make the cgi
    so slow.

    The cgi is mostly taking a parameter like $query and doing a "select
    from db where value like "%$query%"" and returning that in pretty <td>
    $_</td>-form. That seems to work reasonably fast. But when I do a
    "select * from db" things tend to get really slow when dealing with
    15000-entries++ (even though mysql has it all cached and spits it out
    in a split second). The cgi-process sits there, spitting out html to
    the client, using up 95% of the cpu-time of one of the cores and using
    50MBs of memory or so. I have no idea what it does. I have replaced
    most of the " with '. And I can't really see that I am doing something
    that needn't be done. Is there an issue with creating multiple
    database-connections (DBD::mysql) that I should be aware of?

    If somebody has experience in doing huge db's with perl, can you
    please give me some pointers? Is this a code issue or is it a network-
    issue or is it a browser issue? Does anyone have any tips for doing
    huge databases with perl?
     
    skeldoy, Dec 17, 2008
    #1
    1. Advertising

  2. skeldoy

    smallpond Guest

    On Dec 17, 11:04 am, skeldoy <> wrote:

    "select * from db"

    If you're sending back everything, why not just render the page once
    and
    return static html? You can schedule updates when the db changes.
     
    smallpond, Dec 17, 2008
    #2
    1. Advertising

  3. On 2008-12-17 16:04, skeldoy <> wrote:
    > Hey! I am working on an application that features a huge database
    > (mysql) and some cgi (perl) for listing, sorting, searching,
    > dupechecking and more. I see that the configuration for the mysql is
    > pretty much spot on - most of the data is cached and so the mysqld-
    > process isn't really doing much in terms of causing bottlenecks. But I
    > believe that the perl-code may be the bottleneck here. I have turned
    > off buffering completely and I render pretty much just the things that
    > are needed. Still it can take up to a minute to print (in html) a
    > query that returns in a second on the mysql-console.
    >
    > The output from the cgi is around 15MB for every operation the user
    > does, so I see the potential for slowness right there, in the sheer
    > amount of data that has to be produced and transmitted over the net,
    > but still I don't really understand what I have done to make the cgi
    > so slow.
    >
    > The cgi is mostly taking a parameter like $query and doing a "select
    > from db where value like "%$query%""


    You should use placeholders instead for security, but that won't affect
    speed much.

    > and returning that in pretty <td> $_</td>-form. That seems to work
    > reasonably fast. But when I do a "select * from db" things tend to get
    > really slow when dealing with 15000-entries++ (even though mysql has
    > it all cached and spits it out in a split second). The cgi-process
    > sits there, spitting out html to the client, using up 95% of the
    > cpu-time of one of the cores and using 50MBs of memory or so. I have
    > no idea what it does.


    To find out where it spends the time, use Devel::NYTProf.

    The simplest method to use it is usually to just invoke it from the
    commandline:

    export REQUEST_METHOD=GET
    export QUERY_STRING='whatever'
    perl -d:NYTProf yourscript.cgi

    This creates a file nytprof.out in the current directory which you can
    convert to a nice HTML report with the nytprofhtml command.

    There are also other ways for invoking it, see the docs.


    CGI does have considerable overhead per row. However, for just 15000
    rows I'd still expect that to be finished in a few seconds at most.

    > If somebody has experience in doing huge db's with perl, can you
    > please give me some pointers?


    Is 700 GB and 6 billion rows in the largest table huge enough? I don't
    use mysql for that one, though.

    > Is this a code issue or is it a network-
    > issue or is it a browser issue?


    Since you wrote that your CGI script uses 95% CPU time during the time
    it runs it is unlikely to be a browser problem. But browsers in general
    are rather slow at rendering tables. For 15000 rows, the browser may
    easily take a few dozen seconds to render the table (depending on the
    speed of your workstation, how much memory you have, etc.) Try accessing
    the CGI with wget or a similar tool - is it much faster than with the
    browser?

    hp
     
    Peter J. Holzer, Dec 17, 2008
    #3
  4. skeldoy

    Guest

    skeldoy <> wrote:
    > Hey! I am working on an application that features a huge database
    > (mysql) and some cgi (perl) for listing, sorting, searching,
    > dupechecking and more.


    Isn't dupechecking more a back-office thing best done from the command
    line, not CGI?

    > I see that the configuration for the mysql is
    > pretty much spot on - most of the data is cached and so the mysqld-
    > process isn't really doing much in terms of causing bottlenecks. But I
    > believe that the perl-code may be the bottleneck here. I have turned
    > off buffering completely


    What does that mean? $|=1? mysql_use_result?

    I wouldn't know how to identify *all* sources of buffering present in
    a complex system, let alone know how to disable them completely.

    > and I render pretty much just the things that
    > are needed. Still it can take up to a minute to print (in html) a
    > query that returns in a second on the mysql-console.


    It takes less than a second to stream the entire <big_num> of data
    to your screen on the mysql-console? Or do you somehow tell it to
    run the query but suppress the actual results?

    > The output from the cgi is around 15MB for every operation the user
    > does, so I see the potential for slowness right there, in the sheer
    > amount of data that has to be produced and transmitted over the net,
    > but still I don't really understand what I have done to make the cgi
    > so slow.


    Does the CGI have to be invoked by POST, or will GET work?

    Use wget to fetch the page. See how long it takes when the results just
    have to fetched and stored and not rendered.

    And once you do that and have the results stored as a static file, you
    might as well put that in a location that is served up via http and open it
    with the client browser. See how long does it take to pull the data over
    the net and render it in a browser when it is coming from a static file
    rather than a CGI.

    >
    > The cgi is mostly taking a parameter like $query and doing a "select
    > from db where value like "%$query%"" and returning that in pretty <td>
    > $_</td>-form. That seems to work reasonably fast. But when I do a
    > "select * from db" things tend to get really slow when dealing with
    > 15000-entries++ (even though mysql has it all cached and spits it out
    > in a split second). The cgi-process sits there, spitting out html to
    > the client, using up 95% of the cpu-time of one of the cores


    You can try adding -d:DProf or -d:SmallProf (or you favorite profiler) to
    the CGI shebang line. Or come up with an adapter run the CGI in a
    command-line environment, and profile it from there.

    > and using
    > 50MBs of memory or so.


    On most modern computers, 50MB is a rounding error. Is that the case for
    your computer?


    > I have no idea what it does. I have replaced
    > most of the " with '.


    ???

    > And I can't really see that I am doing something
    > that needn't be done. Is there an issue with creating multiple
    > database-connections (DBD::mysql) that I should be aware of?


    *Anything* can be done incorrectly, in which case there are things to be
    aware of. Are you using multiple MySQL connections in your CGI? If so,
    how are you doing it?


    > If somebody has experience in doing huge db's with perl, can you
    > please give me some pointers?


    Profile, profile, profile. If that doesn't work, comment out chunks
    of code and see if that affects speed (for example, do all the work except
    for the actual printing of the html table.)

    > Is this a code issue or is it a network-
    > issue or is it a browser issue?


    If the CGI script itself is using 95% of a CPU, then it probably isn't
    a browser or network issue. Unless your CPU utilization reporting tool
    accounts for I/O waits as CPU usage (I've seen some that do).

    > Does anyone have any tips for doing
    > huge databases with perl?


    15,000 records is not huge. It is just barely even medium. Unless you
    have giant blob or text fields. If you do have a large database with
    DBD::mysql, it is important to know the benefits and drawbacks of
    mysql_use_result.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Dec 17, 2008
    #4
  5. skeldoy

    skeldoy Guest

    > ... why not just render the page once
    > return static html?  You can schedule updates when the db changes.


    I think that I am somewhat doing that with configuring the mysql to
    cache almost everything it can. The server basically is just one big
    mysql-cache and some memory for the cgi-process to run.. But I see
    your point.. I can do it to test out if the browser may be the issue
     
    skeldoy, Dec 17, 2008
    #5
  6. skeldoy

    skeldoy Guest

    > You should use placeholders instead for security, but that won't affect
    > speed much.


    I know. But it is intended to be used by inhouse people, and if they
    mess it up, I will just restore it. :)

    > To find out where it spends the time, use Devel::NYTProf.
    > The simplest method to use it is usually to just invoke it from the
    > commandline:
    > export REQUEST_METHOD=GET
    > export QUERY_STRING='whatever'
    > perl -d:NYTProf yourscript.cgi
    > This creates a file nytprof.out in the current directory which you can
    > convert to a nice HTML report with the nytprofhtml command.
    > There are also other ways for invoking it, see the docs.


    Good tip! I will try it out and see what it says. Thanks man, this is
    really good stuff right there.

    >
    > CGI does have considerable overhead per row. However, for just 15000
    > rows I'd still expect that to be finished in a few seconds at most.


    I know! when I do a "mysql -u root \n select * from db" it returns
    pretty much in the speed that my gfx-adapter can print it out, and
    within a couple of secs if I pipe it to a file.

    > Is 700 GB and 6 billion rows in the largest table huge enough? I don't
    > use mysql for that one, though.


    Is it a mysql issue that make you use another db?

    > Since you wrote that your CGI script uses 95% CPU time during the time
    > it runs it is unlikely to be a browser problem. But browsers in general
    > are rather slow at rendering tables. For 15000 rows, the browser may
    > easily take a few dozen seconds to render the table (depending on the
    > speed of your workstation, how much memory you have, etc.) Try accessing
    > the CGI with wget or a similar tool - is it much faster than with the
    > browser?


    A really good question. Never really thought about seeing whether the
    data-transmission is the issue in that way.
    Just another thing for me to do..

    Really appreciate your time/help on this. Good pointers there. I got
    pretty one lane looking at the code
    and tuning the mysql there. Never occurred to me that there is basic
    ways of testing the data-transmission (ie. curl/wget).
    And the Devel::NYTProf package is a definate plus in finding the
    culprit here. Thanks again! :)
     
    skeldoy, Dec 17, 2008
    #6
  7. skeldoy

    skeldoy Guest

    > Isn't dupechecking more a back-office thing best done from the command
    > line, not CGI?


    I know. I got some very particular non-technical customers that wants
    to oversee the entire process to spend time. I think the bureaucrats
    have too little work and too much time on their hands. But hey! I
    can't really tell them what THEY want to do, right? :)

    > What does that mean?  $|=1?  mysql_use_result?


    $|=1 on everything, everywhere. The mysql doesn't really buffer output
    anymore, but the whole thing is in memory - so it is, as I said,
    probably not the culprit.

    > I wouldn't know how to identify *all* sources of buffering present in
    > a complex system, let alone know how to disable them completely.


    I know, but I thought there might be something trivial that I didn't
    know about. Good to hear though.

    > It takes less than a second to stream the entire <big_num> of data
    > to your screen on the mysql-console?  Or do you somehow tell it to
    > run the query but suppress the actual results?


    If i render it on screen from the mysql - I get bottlenecked by the
    graphics-adapter/Xorg. When I pipe it to a file it gets done in like a
    couple of seconds, and I figure that is how long it really does to do
    the query.

    > Does the CGI have to be invoked by POST, or will GET work?


    I use GET.

    > Use wget to fetch the page.  See how long it takes when the results just
    > have to fetched and stored and not rendered.


    I will.

    > And once you do that and have the results stored as a static file, you
    > might as well put that in a location that is served up via http and open it
    > with the client browser.  See how long does it take to pull the data over
    > the net and render it in a browser when it is coming from a static file
    > rather than a CGI.


    The next step. I agree. I plainly just have to be more thorough.. See
    your pointers there.

    > You can try adding -d:DProf or -d:SmallProf (or you favorite profiler) to
    > the CGI shebang line.  Or come up with an adapter run the CGI in a
    > command-line environment, and profile it from there.


    Yeah. Mister Peter mentioned a Devel::NYTProf that I intend to run.
    But I will definatly try out DProf and SmallProf like you say.
    Glad some of you actually know about these things (like profiling and
    such other stuff that I am totally blind to).

    > On most modern computers, 50MB is a rounding error.  Is that the case for
    > your computer?


    ? huh ? Is this a new feature of the VonNeumann machine that slipped
    my attention?

    > *Anything* can be done incorrectly, in which case there are things to be
    > aware of.  Are you using multiple MySQL connections in your CGI?  If so,
    > how are you doing it?


    doing multiple "my $db1 = the connection stuff ; my $db2 = the
    connection stuff ; my $db...
    I open them all up in the beginning of the code and keep them over the
    length of the run.
    Strange thing is that when I parse the data into the DB I do it in the
    exactly same manner and the
    multiple connections speed that up. Can't really see how multiple
    connections would suddenly be an issue,
    but strange things happen in the Perl-world, so I have all
    possibilities open.
    >
    > Profile, profile, profile.  If that doesn't work, comment out chunks
    > of code and see if that affects speed (for example, do all the work except
    > for the actual printing of the html table.)


    Again with the profiling. I will man. Thanks for that. Never actually
    been a profiling man. Just code and it normally works the way I
    intended. Will definelty try it out now..

    > If the CGI script itself is using 95% of a CPU, then it probably isn't
    > a browser or network issue.  Unless your CPU utilization reporting tool
    > accounts for I/O waits as CPU usage (I've seen some that do).


    I don't really know. All I have done is run top on the linux-server
    that runs it.
    Not really into the whole userland thing - can it be an apache issue??

    > 15,000 records is not huge.  It is just barely even medium.  Unless you
    > have giant blob or text fields.  If you do have a large database with
    > DBD::mysql, it is important to know the benefits and drawbacks of
    > mysql_use_result.


    I know 15000 is supposed to be nada. But 15000 is the magic number
    where things tend to slow down.
    You must excuse me, I don't normally type English, so I may tend to
    write awkwardly.
    I will read up on mysql_use_result though. Thanks for the tip. I see
    there is much more docmentation to dive into.

    Thanks for all the tips and pointers and not least your time!
     
    skeldoy, Dec 18, 2008
    #7
  8. skeldoy

    Bart Lateur Guest

    skeldoy wrote:

    >> You should use placeholders instead for security, but that won't affect
    >> speed much.

    >
    >I know. But it is intended to be used by inhouse people, and if they
    >mess it up, I will just restore it. :)


    Tsk.

    It's not hard, and here is how:

    >>>select from db where value like "%$query%"


    my $sth = $dbh->prepare("select * from db '%' || ? || '%'");
    $sth->execute($query);

    How hard was that?

    --
    Bart.
     
    Bart Lateur, Dec 18, 2008
    #8
  9. skeldoy

    skeldoy Guest

    Update: Found out (with curl) that the problem is in the browsers. The
    queries completes and downloads in a second or so but it takes forever
    to render in all browsers I have tested.

    Thanks for your help guys!
     
    skeldoy, Dec 18, 2008
    #9
  10. On 2008-12-17 23:53, skeldoy <> wrote:
    >> Is 700 GB and 6 billion rows in the largest table huge enough? I don't
    >> use mysql for that one, though.

    >
    > Is it a mysql issue that make you use another db?


    Actually for us, mysql would be "another db". We are traditionally an
    Oracle shop, so using Oracle was the natural thing to do for us. We are
    using mysql for a few smaller databases, and I'm planning to look at
    postgres again - the inverted indices could be really useful for our
    application.

    hp
     
    Peter J. Holzer, Dec 18, 2008
    #10
  11. skeldoy

    John Guest

    "skeldoy" <> wrote in message
    news:...
    > Update: Found out (with curl) that the problem is in the browsers. The
    > queries completes and downloads in a second or so but it takes forever
    > to render in all browsers I have tested.
    >
    > Thanks for your help guys!


    If it is a browser problem try googling 'website optimisation'.
    There's a lot you can do to speed up display of <tables> and you mentioned
    earlier you use <td>s
    O'Reilly have a book of 'Website Optimization'

    Regards
    John
     
    John, Dec 18, 2008
    #11
  12. skeldoy

    Bart Lateur Guest

    Peter J. Holzer wrote:

    >> Is it a mysql issue that make you use another db?

    >
    >Actually for us, mysql would be "another db". We are traditionally an
    >Oracle shop, so using Oracle was the natural thing to do for us. We are
    >using mysql for a few smaller databases, and I'm planning to look at
    >postgres again - the inverted indices could be really useful for our
    >application.


    Well, allegedly, Postgres is a lot closer to Oracle than Mysql is. So
    you'll probably feel a lot more at home in Postgres than in Mysql.

    One thing I miss is that Postgres doesn't know packages.

    --
    Bart.
     
    Bart Lateur, Dec 20, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jürgen Exner

    Re: CGI Perl "use CGI" statement fail

    Jürgen Exner, Jul 31, 2003, in forum: Perl
    Replies:
    0
    Views:
    1,617
    Jürgen Exner
    Jul 31, 2003
  2. praba kar

    Python-cgi or Perl-cgi script doubt

    praba kar, Jul 30, 2005, in forum: Python
    Replies:
    1
    Views:
    809
    Michael Sparks
    Jul 30, 2005
  3. excord80
    Replies:
    17
    Views:
    866
    J Kenneth King
    Jan 29, 2009
  4. stig erikson
    Replies:
    3
    Views:
    209
    Gregory Toomey
    Oct 29, 2004
  5. kath
    Replies:
    4
    Views:
    811
    J. Gleixner
    Apr 9, 2007
Loading...

Share This Page