arrange form data in same order as on form

A

Alan J. Flavell

Some may claim it is. (For some reason that comment wasn't unexpected.
;-) )

Well, it isn't just me, is it? If you already know what answers
you're going to get, why do you raise the question, without addressing
the points that you know are going to be made?

It wasn't me who challenged you to produce the benchmarks, but it
could just as well have been. The first, second, and third rules of
optimisation are "don't optimise yet", you know.
I have already done that, so the program is prepared to be (and is
actually in a few cases) run under mod_perl. However, there are
hundreds or 1,000+ users, and most of them don't have access to
mod_perl...

Then it would seem that general robustness and resilience are of more
importance to you (and your users) than saving those last few cycles
of CPU. Furthermore, when a new version of CGI.pm came out, with some
new browser weakness to workaround, or some new obscure security
loophole discovered, they could get the benefits in short order,
instead of waiting for you to diagnose and fix the implications for
homespun code.

(Of course, you could develop a dual-mode version, that takes
advantage of mod_perl when available, and works as a regular CGI when
it isn't. I gather than CGI.pm makes it rather easy to do that...)
Even if I provided a link in my reply to Randal, I ask you to please
not do that, Alan, at least not yet...

I started to write that program more than three years ago, and at that
time my programming experience basically consisted of having modified
a couple of Matt's Scripts. :)

That's OK, we all have to start somewhere. But if I was still coding
the same Perl4-style scripts that I started Perl with in around 1994
or so, then I'd need my head examined.

And given your commendable honesty about your existing code, I really
am rather surprised that you maintain that your choice of homespun
code must be the right one for your particular situation. In the end,
you *might* sometimes turn out to be right, but I'd want to see that
proved by more than just energetic handwaving, if you'll excuse me.

all the best
 
G

Gunnar Hjalmarsson

Alan said:
It wasn't me who challenged you to produce the benchmarks, but it
could just as well have been. The first, second, and third rules
of optimisation are "don't optimise yet", you know.

Okay, I made a benchmark. My starting-point was Purl Gurl's benchmark:
http://groups.google.com/[email protected]

Since I wanted to include also the compilation phase in the
comparison, I rewrote it. Basically I put my 'limited' code in a
separate file, required (not 'used') both that file and CGI.pm, and
reset %INC before doing so.

This is one typical result:

Rate CGI.pm myCGI
CGI.pm 5.48/s -- -97%
myCGI 178/s 3144% --

If I didn't make some stupid mistake, the comparison shows that the
compilation+execution time for parsing a simple query string with 4
name/value pairs is about 30 times longer when you use CGI.pm compared
to my code. CGI.pm needs about 0.2 seconds!

You find the code I used for the benchmark at the bottom of this message.
Then it would seem that general robustness and resilience are of
more importance to you (and your users) than saving those last few
cycles of CPU.

Sorry, but I fail to see how that conclusion relates to what I said.

You assume that my code is not "robust" without explaining why. Robust
code is good, no doubt, and I believe that the robustness of those few
lines for CGI parsing is sufficient.

You also disregard my view that CPU may be a critical issue for
certain users.
(Of course, you could develop a dual-mode version, that takes
advantage of mod_perl when available, and works as a regular CGI
when it isn't. I gather than CGI.pm makes it rather easy to do
that...)

My program is already dual-mode, but I don't see how CGI.pm would have
made that easier. On the contrary, certain versions of CGI.pm don't
work with certain mod_perl versions:
http://groups.google.com/[email protected]
And given your commendable honesty about your existing code, I
really am rather surprised that you maintain that your choice of
homespun code must be the right one for your particular situation.
In the end, you *might* sometimes turn out to be right, but I'd
want to see that proved by more than just energetic handwaving, if
you'll excuse me.

Surprised? That is my view, and I think I have presented reasons for
it far beyond "energetic handwaving".

This is the benchmark code:

#---------------------- cgispeed.pl ----------------------#
#!/usr/bin/perl
use strict;
use Benchmark 'cmpthese';
my (%tmpINC, $north, $south, $east, $west);
our %in;
BEGIN { %tmpINC = %INC }
$ENV{QUERY_STRING} = 'north=north&south=south&east=east&west=west';
print "Content-type: text/html\n\n<pre>";
cmpthese( -5, {
myCGI => sub {
%INC = %tmpINC;
require 'myCGI';
($north, $south, $east, $west) =
@in{'north', 'south', 'east', 'west'};
},
'CGI.pm' => sub {
%INC = %tmpINC;
require CGI;
my $query = new CGI;
$north = $query->param('north');
$south = $query->param('south');
$east = $query->param('east');
$west = $query->param('west');
}
} );

#------------------------- myCGI -------------------------#
use strict;
my $buffer;
if ($ENV{REQUEST_METHOD} eq 'POST') {
my $len = $ENV{CONTENT_LENGTH};
$len <= 131072 or die "Too much data submitted.\n";
read(STDIN, $buffer, $len) == $len
or die "Reading of posted data failed.\n";
} else {
$buffer = $ENV{QUERY_STRING};
}
$buffer =~ tr/+/ /;
for (split /[&;]/, $buffer) {
my ($name, $value) = split /=/, $_, 2;
$value =~ s/%(..)/pack('c', hex $1)/ge;
$value =~ tr/\r//d; # Windows fix
$main::in{$name} = $value;
}
1;
 
A

Alan J. Flavell

If I didn't make some stupid mistake, the comparison shows that the
compilation+execution time for parsing a simple query string with 4
name/value pairs is about 30 times longer when you use CGI.pm compared
to my code. CGI.pm needs about 0.2 seconds!

And how long does a typical do-nothing browser HTTP transaction and
CGI invocation need in comparison?
You also disregard my view that CPU may be a critical issue for
certain users.

Sorry, I really don't "disregard" it. I'm saying the need is to
review the overall process, including server invocation from the
client and the subsequent CGI process creation, which I'm afraid your
benchmarks don't do.

In fact with some rough benchmarking of the overall process (using
LWP::Simple to run the tests against a local webserver), it seemed to
me as if our (otherwise lightly-loaded) server could run about 14
invocations per second of wallclock with your economy-model script,
compared with some 7 per second with CGI.pm, so - a factor of around 2
(wallclock) overall, compared with your measurement of 30 (cpu) for
some portion of the process. On Windows, I even got a factor
approaching 3 between them for the overall process. (Server in both
cases was a version of Apache 1.3.*, on linux and on Win2000
respectively).

While I must admit the factor is somewhat larger than I had expected,
this does rather put your measurement of a factor of 30 into a rather
more realistic context, I feel.
My program is already dual-mode, but I don't see how CGI.pm would have
made that easier. On the contrary, certain versions of CGI.pm don't
work with certain mod_perl versions:

I'm sorry if you felt it had been improper of me not to mention
versioning issues, but it seems to me that adopting use of mod_perl
would inevitably call for a review of the version compatibility of any
related Perl modules that will be used, and CGI.pm would be no
exception there. mod_perl doesn't lack documentation about such
matters.

But if one is genuinely serious about saving CPU cycles, then such an
approach would seem to me to be indispensible. You can see, by the
comparison between your numbers and mine, just what a large proportion
of the execution of CGI is not accounted for by the CGI script itself.
Surprised? That is my view, and I think I have presented reasons for
it far beyond "energetic handwaving".

This is the benchmark code:

As I say, that focuses on one part of the invocation of a Perl script
in CGI context, namely the Perl code itself. But that's only a part
of the overall process, as I think we have seen.

Nevertheless, I will concede that any rule can have exceptions. What
I usually say about CGI.pm is that those who have genuinely got the
expertise to *not* use CGI.pm will know why they are doing that, and
will need no advice from me. On the other hand anyone who's in a
position to seek advice is going to get my best advice (and you know
what that's going to be, in the overwhelming proportion of cases).

I'm clearly aware that CGI.pm is in no way magical - the code doesn't
do anything that one couldn't just as well code for oneself. And the
author admits that it's grown too big, and might benefit from being
modularised. I've found the odd bug in it myself on occasion. So
this is not the uncritical adulation that some trolls accuse us of.
Nevertheless, it's overall the best thing available for doing CGI in
Perl, because the author is actively working on it and is actively
adapting it to the changing situation, to encapsulate the gathered
knowledge of browser bugs, workarounds etc.

A proportion of extra CPU cycles isn't usually too high a price to pay
for that. And as we've seen - if it _is_ too high a price to pay,
then the most productive place to make real savings is elsewhere.

Can we call a truce on this, then?

all the best
 
G

Gunnar Hjalmarsson

Alan said:
And how long does a typical do-nothing browser HTTP transaction and
CGI invocation need in comparison?

the need is to review the overall process, including server
invocation from the client and the subsequent CGI process creation,
which I'm afraid your benchmarks don't do.

What the need is depends on what you are actually trying to measure.
The conclusion I make out from my benchmark is that the *absolute*
time it takes to parse a query string is significant if you use
CGI.pm, while it's negligible if you use my code. Whether the factor
is 20, 30 or 50 is something I pay little regard to since, as you
point out, I did not measure the whole process.

My program supports a certain kind of web application, and is
typically used on web sites that are hosted on shared servers.
Sometimes it's used in a way that results in thousands of calls per day.

Now, if you have a busy web site on a shared hosting account, there is
always a limit where the hosting provider says: "This is too much, our
other cusomers are affected adversely." That's why I'm anxious to
watch the server load, and to me, 0.2 seconds appears to be
significant if there are thousands of daily calls.

mod_perl is of course suitable in order to further reduce the server
load. It's just that it's very unusual that mod_perl is availabe on
shared web hosting accounts. Of course, you can always say that the
program should have been written in PHP instead. However, that's not
the case.
In fact with some rough benchmarking of the overall process (using
LWP::Simple to run the tests against a local webserver), it seemed
to me as if our (otherwise lightly-loaded) server could run about
14 invocations per second of wallclock with your economy-model
script, compared with some 7 per second with CGI.pm, so - a factor
of around 2 (wallclock) overall, compared with your measurement of
30 (cpu) for some portion of the process. On Windows, I even got a
factor approaching 3 between them for the overall process. (Server
in both cases was a version of Apache 1.3.*, on linux and on
Win2000 respectively).

While I must admit the factor is somewhat larger than I had
expected, this does rather put your measurement of a factor of 30
into a rather more realistic context, I feel.

As regard "realistic", see above.

It surprises me that your server would allow 7 invocations per second
with CGI.pm when you run the whole process, while I found that it
would allow 5 times when only the Perl part is taken into account.
Maybe the server you used is significantly faster. Btw, are you sure
that you captured the compilation time?

Anyway, this is interesting additional info. Thanks, Alan! I suppose
it indicates that, provided that the factor is 2, I would double the
server load by starting to use CGI.pm. The difference appears to be
significant also when you look at it from this angle.

These benchmarks demonstrate that the design of CGI.pm is surprisingly
'expensive'.
I will concede that any rule can have exceptions. What I usually
say about CGI.pm is that those who have genuinely got the expertise
to *not* use CGI.pm will know why they are doing that, and will
need no advice from me. On the other hand anyone who's in a
position to seek advice is going to get my best advice (and you
know what that's going to be, in the overwhelming proportion of
cases).

I'm clearly aware that CGI.pm is in no way magical - the code
doesn't do anything that one couldn't just as well code for
oneself. And the author admits that it's grown too big, and might
benefit from being modularised. I've found the odd bug in it
myself on occasion. So this is not the uncritical adulation that
some trolls accuse us of. Nevertheless, it's overall the best thing
available for doing CGI in Perl, because the author is actively
working on it and is actively adapting it to the changing
situation, to encapsulate the gathered knowledge of browser bugs,
workarounds etc.

A proportion of extra CPU cycles isn't usually too high a price to
pay for that. And as we've seen - if it _is_ too high a price to
pay, then the most productive place to make real savings is
elsewhere.

Can we call a truce on this, then?

I hear what you say. :) And it makes much sense.

Let me try to summarize my view on it out from a different angle:

Good advice is a good thing, and using Perl modules is a convenient
way to reuse code. Personally I use several modules, but when I'm able
to do something with just a couple of lines of Perl code, I sometimes
do so instead of loading hundreds of lines of code by using a module.
I don't feel that I risk getting bashed for doing so, and nobody
demands that I *prove* that my choices are right.

That is, there is one exception: The 'sacred cow' CGI.pm. Even if you
say that "the code doesn't do anything that one couldn't just as well
code for oneself" and "it's grown too big", your reasoning above
presupposes that you are able to explicitly justify your decision if
you choose to not use CGI.pm for parsing CGI data. That makes little
sense to me. The presumption that people don't know what they are
doing if they don't use CGI.pm is patronizing.

If the explanation is the security implications with CGI, I'd like to
see the focus moved to the desirable that you

- *learn* about the implied risks with CGI scripts,

- don't use code copied from random sources if you don't understand
how it works,

- carefully consider the risks with your own applications, and
validate the data accordingly, and

- enable taint mode.

I feel that these things, which I take for granted that we can agree
upon, tend to be forgotten in the 'campaign' for using CGI.pm.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top