Q: How to improve CGI performance for this?

A

avtanski

Hello,

I'm not sure that this is the right group for this question - Googling
for similar questions gave me always "comp.lang.perl.misc", but if this
is OT here, please, let me know.

The problem I'm having is with a CGI script that needs to load and
parse some data from a quite large file, then do some extra processing
based on user form input and return. The slow part is (I guess)
loading and parsing the data from the file. Although this is taking a
fraction of a second, the script is quite often invoked by a number of
customers simultaneosly, and is causing problems with my shared hosting
provider (reportedly 25% CPU load at some times).

I'm looking for high-level ideas for a solution. What I can come up
with is:

1) Optimizing the file format for easier parsing - not much could be
done here, this is pretty straightforward task. The file size is about
200K, and there is simply no way to avoid loading it. Partially loading
also doesn't work (splitting it to pieces, things like this).

2) Switching to PHP - this means rewriting everything, what a mess...

3) Using something like FastCGI, providing that the hosting provider
have it. Do you think this can help? I don't know much about FastCGI,
can I somehow preload the data in memory and just use it from the
script?

4) Using mod_perl? Don't know much about it too. Can I parse the data
from the file once, have it stored in memory and each time my script is
invoked access it? How much cooperation is required from my provider
for this?

5) Do something else?

Any help/ideas/suggestions are appreciated.

Thanks,

- Alex
 
X

x3v0-usenet

1) I can't comment on this since you did not give any info on the
current format, or on how you currently parse it.

2) Why do you think PHP would be faster? Perl was born to parse data. I
highly doubt switching to PHP would show any improvement and would
likely be worse.

3) I don't know much about FastCGI, sorry.

4) mod_perl would definately work here and would likely be the best
solution, but it is doubtful that you can get it from a shared host.

5) A quick search of cpan brought be to the Cache::Cache module. It may
be worth checking out.

Please give more information if you want a more in depth answer. I
mainly need to know what you are parsing and how.
 
A

avtanski

Hi and thanks for the reply.

OK, the script in question is a chat-bot. The file that is parsed in
the beginning is in the following format:

key.key.key.../modifier:Some text with $VARS and {OTHER_STUFF} in it.

The slow part is reading this file (over 7000 records) each time the
script is invoked. Splitting is pretty straightforward - for each
record I get the list of keys, the modifier and the text (without
parsing the text itself). I do this:

while (<F>) {
chomp;
my ($keyemo,$ans)=split /:/,$_,2;
next unless defined $ans and length($ans)>0;
my ($key,$emo)=split /\//,$keyemo,2;
$emo="" unless defined $emo;
my $keys={};
foreach my $k (split /\./,$key) {$keys->{$k}=1;}
push @key,$keys;
push @emo,$emo;
push @ans,$ans;
}
close F;
my $turf={};
$turf->{KEY}=\@key;
$turf->{EMO}=\@emo;
$turf->{ANS}=\@ans;

I see some places here that the preformance can be improved, but I
don't think anything major could be done here. I would love to have
somehow the $turf hash reference preloaded and available for each
invocation, but I don't have any idea how this can be done in my case.

The Cache::Cache module you mention may not help much (if any), I
think. As far as I understand from the module doc, it lets me save the
data and load it again from file - but this is not much different than
what I already do, and I doubt it will be much faster.

Thanks,

- Alex
 
U

usenet

I would love to have somehow the $turf hash reference preloaded
and available for each invocation

If your source data is fairly static then you can write your hash out
to another file (using the Storable module). Then you can do something
like this:

$hashref = retrieve('file');

and load the entire hash from the Storable file - no need to parse the
data into the hash structure each time. Storable uses an efficient
binary format as well, so the performance should be DRAMATICALLY
better.

Of course, if your source data changes a lot, that's not such a good
idea. I suppose your script could compare timestamps of the source
file and the Storable file and re-parse and re-write the Storable file
if the source file is newer. If your source data constantly changes,
though, this approach will make your problem worse.
 
A

avtanski

Good idea. As with "Cache::Cache" I did not expect much of an
improvement, but now that you say that everything is stored in an
efficient binary format, I could at least hope, :) Thanks, I didn't
knew that.

Now I have to do some benchmarking, to check if my hosting provider
have the Storable module and to switch to the new format. Thanks for
the suggestion, I'll let you know how it worked out.

Meanwhile, if anybody have another idea, I'm all ears, :)

Thanks,

- Alex
 
A

avtanski

Hi Jim,

Thanks for the effort. Wow!

I modified my code and indeed it now shows better performance. I also
tried storing the data with Storable, as other people suggested - it
increased the preformance dramatically too. Now I'm comparing all
three options and then will go over my code to see what else can be
simplified.

Because I'm using shared hosting I cannot afford to load the server too
much. I'm currently experimenting with two additional approaches:

1) Checking /proc/avgload to block the script if the server gets too
busy.

2) Reducing the answer rate, when there are more users - kind of
artificially delaying the bot responses in order to limit the rate at
which the script is invoked. I'm a bit fuzzy how to implement this
one, but I'll think of something.

Thanks everybody for the great ideas. If anybody thinks of something
else, I'll be really grateful, but I don't want to waste your time with
me anymore.

Thanks,

- Alex
 
T

Tad McClellan

2) Reducing the answer rate, when there are more users - kind of
artificially delaying the bot responses in order to limit the rate at
which the script is invoked. I'm a bit fuzzy how to implement this
one, but I'll think of something.


The search term to use for doing that is "throttling".
 
A

avtanski

Jim said:
Something as simple as putting a sleep(1) in your code will slow
down responses. You can sleep for variable amounts of time, if
necessary.

I thought of this, but it seems that it won't help much. Since this is
a chat bot, the load mostly comes not from a single user chatting
too fast, but from too many users chatting at the same time - I think
that although this will have some effect, it will be small. But thanks
for the idea anyway.
You might consider putting your data into an indexed database
and doing indexed queries.

Good idea. I just finished writing the script to use such an index,
loaded from a file (not actual DB) using Storable (great module this
one, easy to use too!). Basically I rewrote some of the script logic
and now I have a script tha tis exactly 2.5 times faster! I was
hoping for more, but even this is significant improvement.

The last thing that I'm going to do now is to make the script to
check /proc/loadavg and to stop if the server is very busy. I hope
this will be enough to keep my hosting provider happy, :)

Thank you all for your help and great advice!

Regards,

- Alex
 
S

Stephen Kellett

customers simultaneosly, and is causing problems with my shared hosting
provider (reportedly 25% CPU load at some times).

<Snip possible causes of action>

The first thing to do is to identify if there is any one (or more) parts
of your app that can be improved, in terms of speed. To do that you need
a profiler.

Once you improved those parts you can then determine if maybe you need
to go with an "always loaded" solution to avoid the startup/shutdown
penalty of your script running. I think that would be mod-perl but
you'll need someone other than I to advise on that.

I'd advise against just jumping ship to PHP/Python/Ruby in the hope that
a different interpreted language will be faster (it may be in some
areas, but be slower in others) - you won't find out until you've put a
lot of effort in.

If you are on Windows (you didn't say) you can use Perl Performance
Validator to get a performance profile of your application. No need to
modify your app. PPV is targeted at ActiveScipt's Perl implementation.

http://www.softwareverify.com/perlPerformanceValidator/index.html

Stephen
 
J

Juha Laiho

I'm not sure that this is the right group for this question - Googling
for similar questions gave me always "comp.lang.perl.misc", but if this
is OT here, please, let me know.

The problem I'm having is with a CGI script that needs to load and
parse some data from a quite large file, then do some extra processing
based on user form input and return. The slow part is (I guess)
loading and parsing the data from the file.

Glad to see you've been able to improve the situation already.
2) Switching to PHP - this means rewriting everything, what a mess...

Even though people are sceptical on this, it might be one idea.
Not for the language, but for the execution mode. More below.
3) Using something like FastCGI, providing that the hosting provider
have it. Do you think this can help? I don't know much about FastCGI,
can I somehow preload the data in memory and just use it from the
script?

4) Using mod_perl? Don't know much about it too. Can I parse the data
from the file once, have it stored in memory and each time my script is
invoked access it? How much cooperation is required from my provider
for this?

mod_perl and FastCGI require co-operation from your provider. Both
will provide the same effect as switching to PHP most probably would.

The issue is, with perl run as CGI, every time someone accesses your
script, it will be loaded, parsed and compiled. And run.

PHP is much more commonly run as a module, and herein lies the difference.
The page using PHP will be loaded, parsed and compiled once (per server
worker process), which is a huge difference. Every subsequent request
through the same worker process will have a ready-to-run version of
the page. FastCGI will provide the same, as will mod_perl.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top