Q: How to improve CGI performance for this?

Discussion in 'Perl Misc' started by avtanski@gmail.com, May 11, 2006.

  1. Guest

    Hello,

    I'm not sure that this is the right group for this question - Googling
    for similar questions gave me always "comp.lang.perl.misc", but if this
    is OT here, please, let me know.

    The problem I'm having is with a CGI script that needs to load and
    parse some data from a quite large file, then do some extra processing
    based on user form input and return. The slow part is (I guess)
    loading and parsing the data from the file. Although this is taking a
    fraction of a second, the script is quite often invoked by a number of
    customers simultaneosly, and is causing problems with my shared hosting
    provider (reportedly 25% CPU load at some times).

    I'm looking for high-level ideas for a solution. What I can come up
    with is:

    1) Optimizing the file format for easier parsing - not much could be
    done here, this is pretty straightforward task. The file size is about
    200K, and there is simply no way to avoid loading it. Partially loading
    also doesn't work (splitting it to pieces, things like this).

    2) Switching to PHP - this means rewriting everything, what a mess...

    3) Using something like FastCGI, providing that the hosting provider
    have it. Do you think this can help? I don't know much about FastCGI,
    can I somehow preload the data in memory and just use it from the
    script?

    4) Using mod_perl? Don't know much about it too. Can I parse the data
    from the file once, have it stored in memory and each time my script is
    invoked access it? How much cooperation is required from my provider
    for this?

    5) Do something else?

    Any help/ideas/suggestions are appreciated.

    Thanks,

    - Alex
     
    , May 11, 2006
    #1
    1. Advertising

  2. Guest

    1) I can't comment on this since you did not give any info on the
    current format, or on how you currently parse it.

    2) Why do you think PHP would be faster? Perl was born to parse data. I
    highly doubt switching to PHP would show any improvement and would
    likely be worse.

    3) I don't know much about FastCGI, sorry.

    4) mod_perl would definately work here and would likely be the best
    solution, but it is doubtful that you can get it from a shared host.

    5) A quick search of cpan brought be to the Cache::Cache module. It may
    be worth checking out.

    Please give more information if you want a more in depth answer. I
    mainly need to know what you are parsing and how.
     
    , May 11, 2006
    #2
    1. Advertising

  3. Guest

    Hi and thanks for the reply.

    OK, the script in question is a chat-bot. The file that is parsed in
    the beginning is in the following format:

    key.key.key.../modifier:Some text with $VARS and {OTHER_STUFF} in it.

    The slow part is reading this file (over 7000 records) each time the
    script is invoked. Splitting is pretty straightforward - for each
    record I get the list of keys, the modifier and the text (without
    parsing the text itself). I do this:

    while (<F>) {
    chomp;
    my ($keyemo,$ans)=split /:/,$_,2;
    next unless defined $ans and length($ans)>0;
    my ($key,$emo)=split /\//,$keyemo,2;
    $emo="" unless defined $emo;
    my $keys={};
    foreach my $k (split /\./,$key) {$keys->{$k}=1;}
    push @key,$keys;
    push @emo,$emo;
    push @ans,$ans;
    }
    close F;
    my $turf={};
    $turf->{KEY}=\@key;
    $turf->{EMO}=\@emo;
    $turf->{ANS}=\@ans;

    I see some places here that the preformance can be improved, but I
    don't think anything major could be done here. I would love to have
    somehow the $turf hash reference preloaded and available for each
    invocation, but I don't have any idea how this can be done in my case.

    The Cache::Cache module you mention may not help much (if any), I
    think. As far as I understand from the module doc, it lets me save the
    data and load it again from file - but this is not much different than
    what I already do, and I doubt it will be much faster.

    Thanks,

    - Alex
     
    , May 11, 2006
    #3
  4. Guest

    wrote:
    > I would love to have somehow the $turf hash reference preloaded
    > and available for each invocation


    If your source data is fairly static then you can write your hash out
    to another file (using the Storable module). Then you can do something
    like this:

    $hashref = retrieve('file');

    and load the entire hash from the Storable file - no need to parse the
    data into the hash structure each time. Storable uses an efficient
    binary format as well, so the performance should be DRAMATICALLY
    better.

    Of course, if your source data changes a lot, that's not such a good
    idea. I suppose your script could compare timestamps of the source
    file and the Storable file and re-parse and re-write the Storable file
    if the source file is newer. If your source data constantly changes,
    though, this approach will make your problem worse.

    --
    http://DavidFilmer.com
     
    , May 11, 2006
    #4
  5. Guest

    Good idea. As with "Cache::Cache" I did not expect much of an
    improvement, but now that you say that everything is stored in an
    efficient binary format, I could at least hope, :) Thanks, I didn't
    knew that.

    Now I have to do some benchmarking, to check if my hosting provider
    have the Storable module and to switch to the new format. Thanks for
    the suggestion, I'll let you know how it worked out.

    Meanwhile, if anybody have another idea, I'm all ears, :)

    Thanks,

    - Alex
     
    , May 11, 2006
    #5
  6. Guest

    Hi Jim,

    Thanks for the effort. Wow!

    I modified my code and indeed it now shows better performance. I also
    tried storing the data with Storable, as other people suggested - it
    increased the preformance dramatically too. Now I'm comparing all
    three options and then will go over my code to see what else can be
    simplified.

    Because I'm using shared hosting I cannot afford to load the server too
    much. I'm currently experimenting with two additional approaches:

    1) Checking /proc/avgload to block the script if the server gets too
    busy.

    2) Reducing the answer rate, when there are more users - kind of
    artificially delaying the bot responses in order to limit the rate at
    which the script is invoked. I'm a bit fuzzy how to implement this
    one, but I'll think of something.

    Thanks everybody for the great ideas. If anybody thinks of something
    else, I'll be really grateful, but I don't want to waste your time with
    me anymore.

    Thanks,

    - Alex
     
    , May 12, 2006
    #6
  7. David Squire Guest

    wrote:
    > Hi Jim,
    >
    > Thanks for the effort. Wow!


    <snip>

    Please quote context when you reply. Read the posting guidelines for
    this group.

    DS
     
    David Squire, May 12, 2006
    #7
  8. <> wrote:

    > 2) Reducing the answer rate, when there are more users - kind of
    > artificially delaying the bot responses in order to limit the rate at
    > which the script is invoked. I'm a bit fuzzy how to implement this
    > one, but I'll think of something.



    The search term to use for doing that is "throttling".


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, May 12, 2006
    #8
  9. Guest

    Jim Gibson wrote:
    >
    > Something as simple as putting a sleep(1) in your code will slow
    > down responses. You can sleep for variable amounts of time, if
    > necessary.


    I thought of this, but it seems that it won't help much. Since this is
    a chat bot, the load mostly comes not from a single user chatting
    too fast, but from too many users chatting at the same time - I think
    that although this will have some effect, it will be small. But thanks
    for the idea anyway.

    > You might consider putting your data into an indexed database
    > and doing indexed queries.


    Good idea. I just finished writing the script to use such an index,
    loaded from a file (not actual DB) using Storable (great module this
    one, easy to use too!). Basically I rewrote some of the script logic
    and now I have a script tha tis exactly 2.5 times faster! I was
    hoping for more, but even this is significant improvement.

    The last thing that I'm going to do now is to make the script to
    check /proc/loadavg and to stop if the server is very busy. I hope
    this will be enough to keep my hosting provider happy, :)

    Thank you all for your help and great advice!

    Regards,

    - Alex
     
    , May 14, 2006
    #9
  10. >customers simultaneosly, and is causing problems with my shared hosting
    >provider (reportedly 25% CPU load at some times).


    <Snip possible causes of action>

    The first thing to do is to identify if there is any one (or more) parts
    of your app that can be improved, in terms of speed. To do that you need
    a profiler.

    Once you improved those parts you can then determine if maybe you need
    to go with an "always loaded" solution to avoid the startup/shutdown
    penalty of your script running. I think that would be mod-perl but
    you'll need someone other than I to advise on that.

    I'd advise against just jumping ship to PHP/Python/Ruby in the hope that
    a different interpreted language will be faster (it may be in some
    areas, but be slower in others) - you won't find out until you've put a
    lot of effort in.

    If you are on Windows (you didn't say) you can use Perl Performance
    Validator to get a performance profile of your application. No need to
    modify your app. PPV is targeted at ActiveScipt's Perl implementation.

    http://www.softwareverify.com/perlPerformanceValidator/index.html

    Stephen
    --
    Stephen Kellett
    Object Media Limited http://www.objmedia.demon.co.uk/software.html
    Computer Consultancy, Software Development
    Windows C++, Java, Assembler, Performance Analysis, Troubleshooting
     
    Stephen Kellett, May 14, 2006
    #10
  11. Juha Laiho Guest

    "" <> said:
    >I'm not sure that this is the right group for this question - Googling
    >for similar questions gave me always "comp.lang.perl.misc", but if this
    >is OT here, please, let me know.
    >
    >The problem I'm having is with a CGI script that needs to load and
    >parse some data from a quite large file, then do some extra processing
    >based on user form input and return. The slow part is (I guess)
    >loading and parsing the data from the file.


    Glad to see you've been able to improve the situation already.

    >2) Switching to PHP - this means rewriting everything, what a mess...


    Even though people are sceptical on this, it might be one idea.
    Not for the language, but for the execution mode. More below.

    >3) Using something like FastCGI, providing that the hosting provider
    >have it. Do you think this can help? I don't know much about FastCGI,
    >can I somehow preload the data in memory and just use it from the
    >script?
    >
    >4) Using mod_perl? Don't know much about it too. Can I parse the data
    >from the file once, have it stored in memory and each time my script is
    >invoked access it? How much cooperation is required from my provider
    >for this?


    mod_perl and FastCGI require co-operation from your provider. Both
    will provide the same effect as switching to PHP most probably would.

    The issue is, with perl run as CGI, every time someone accesses your
    script, it will be loaded, parsed and compiled. And run.

    PHP is much more commonly run as a module, and herein lies the difference.
    The page using PHP will be loaded, parsed and compiled once (per server
    worker process), which is a huge difference. Every subsequent request
    through the same worker process will have a ready-to-run version of
    the page. FastCGI will provide the same, as will mod_perl.
    --
    Wolf a.k.a. Juha Laiho Espoo, Finland
    (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
    PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
    "...cancel my subscription to the resurrection!" (Jim Morrison)
     
    Juha Laiho, May 14, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hb
    Replies:
    2
    Views:
    507
  2. Marc Twain
    Replies:
    5
    Views:
    4,834
    Andrew Thompson
    Jan 15, 2004
  3. Nishi Bhonsle
    Replies:
    1
    Views:
    926
    Thomas Weidenfeller
    Jul 20, 2004
  4. Roy Smith
    Replies:
    10
    Views:
    838
    Krzysztof Rzymkowski
    Nov 24, 2003
  5. Replies:
    3
    Views:
    307
Loading...

Share This Page