arrange form data in same order as on form

A

Alan J. Flavell

Absolutely. Reading decent code is one of the best ways to learn.

Well, yes, but there's a massive difference between the elaborate code
that might be found in a well-tested and peer-reviewed module,
intended to deal well with all possible situations that it's going to
encounter in the Real World(tm), on the one hand; and a
straightforward little script to use that module, checking that all is
well but otherwise simply baling out when it recognises that it's not.

Or in clear text: CGI.pm internally appears to be contorted code, but
there's generally good reasons for what it does and how it does it;
however, it's probably not the kind of code that the average *user* of
CGI.pm should be seeking to emulate.
You do have to be sure your source is reliable, though: there is one
hell of a lot of very bad Perl floating around the web.

That too, for sure. But that's a different axis of evaluation.
I don't think this is guaranteed,

I would ask anyone interested in the following to read all of it,
carefully, or not at all. Half-measures are inadvisable.

Point 1. Read
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1 , item 2.

Read also
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2

in the paragraph beginning 'A "multipart/form-data" message contains a
series of parts'.

Thus, both of the mandatory submission formats specify that the items
are required to be submitted in the same order that they appeared in
the form.

Point 2. Client agents don't necessarily conform to the spec
(although most of them do nowadays).

Point 3. In Perl, f you get your submitted name/value pairs from the
module as a "hash", then of course the ordering has been lost by then.

However, in every other respect, the hash is very much the "natural"
way to represent these things in Perl.

Point 4. The whole point of defining the values by name/value pairs
is surely to make them accessible by name rather than by position?
If the designers of HTML forms had wanted to implement positional
parameters, they could have done so (in fact they already did - check
the <ISINDEX> element, now deprecated, from earlier versions of HTML).

My conclusion: although the HTML4 spec requires the name/value pairs
to be transmitted in same order they appear in the form, it seems to
me that it's utterly pointless to want to rely on all client software
actually doing that. I've often met writers of scripts who seemed
completely obsessed with needing to process the items in the same
order as which they were present in the form, but on closer study I've
never found any justification for doing so, and as soon as the writer
agreed to drop their insistence that they "needed" this, they found
their scripts were easier to write, with no loss of functionality.

While I'm sure that someone could devise a requirement that depended
on the ordering, I can't see any advantage in doing so.

IMHO and YMMVWV.

You may very well want to re-write the form e.g with existing inputs
filled-in and waiting for further input from the user - but the right
way to do that is probably to use the same code to write the original
empty form as re-writes the partially completed form, and that code
will certainly know what is the proper ordering of the items on the
HTML form itself. But when the boss says the items have to come in a
different order on the web page, there will be no need for a major
rewrite of the code to take that into account, if you've written code
that isn't sensitive to the ordering in the first place.
by which I mean that it may happen
to work for you with your browser during this phase of the moon, but
under other circumstances it may well not.

Something like that; but by gaining the benefits of the hash
representation, one also discards any supposed benefits there might
have been in the original ordering, so - as I say - it seems to me to
be the wrong approach anyway.
If you need to keep separate track of the different paramaters, give
them different names. Change whatever generates them to put a number
on the end, or something.

If you want to iterate through the name/value pairs that are present,
then just iterate through the keys of the hash. Write the code so
that the ordering doesn't matter. The resulting code is likely to be
simpler than trying to re-create the problem of positional parameters
all over again - would be my advice.
 
E

Eric J. Roode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

thanks for all the help and opinions
i'm just self learning perl and found some code at
http://www.cgi101.com/class/

I just spent some time perusing this site. It's not a bad site overall,
as far as an introduction to CGI programming goes. The way they
introduce processing of input variables is fine -- but I wish they had
moved immediately on to using CGI.pm, instead of saving it until chapter
17. That code is okay for learning, but is awful for any real work.
actually i dont even know what cgi.pm and cgi lite are but will surely
find out

Yes, you should. CGI.pm is a module that comes with the Perl
distribution. It automates much of the dirty work behind processing CGI
forms, plus it has some security checks to protect you from DOS attacks.
i dont' mean to try and just steal code, but have found that seeing,
using and understanding examples
really accelerates my learing curve

Absolutely. Borrowing and adapting others' code is a great way to learn.
Just be aware of the limitations of the code you're using! :)
what i've since found is that the variable containing the form input
is in fact in the same order as the form

Most (all?) browsers do submit the variables in the same order that they
appear on the form, but this is NOT guaranteed. Besides, why do you need
them to be in any particular order? They all have names.
this code keeps the original order
foreach (split(/[&;]/, $buffer)) {
s/\+/ /g ;
($name, $value)= split('=', $_, 2) ;
$name=~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/ge ;
$value=~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/ge ;
print "$name = $value";
$buffer{$name}.= "\0" if defined($in{$name}) ; # concatenate
multiple vars
$buffer{$name}.= $value ;
}

Yes, this is much better. However, be aware that CGI.pm does all of this
for you. Less typing, and it's already debugged for you.
i have already set up sql query based inserts that expect the data
fields in order and since there are 67
on the form i want to be able to reuse that code

Well, all of your form variables are named, right? So process them in
name order.
again thanks for the help, this is a great forum and would hope to
return the help someday

You're welcome.

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7TImGPeouIeTNHoEQJT4QCfbqG0ESDylR8pTZDPjeaCDAh4Rf0AmgP+
1ZIw0EXmWZEP5GzNZNCgZz06
=fDlp
-----END PGP SIGNATURE-----
 
E

Eric J. Roode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Still don't understand what it is that makes the above code "buggy".

[OP's posted code]:
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/, $buffer);
foreach $pair (@pairs) {
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$FORM{$name} = $value;

1. The read() may fail. No check is made to see if it does.

2. This code does not handle GET requests.

3. CGI parameters may be separated by semicolons instead of ampersands.

4. If a faulty browser fails to encode "=" with a % escape, and that "="
is part of a form variable value, this code will drop that portion of the
value. I've seen browsers do this. split() should use the limit
parameter.

5. No limit is placed on the quantity of data read, opening the script to
possible DOS attack.
I'm not questioning the advantages with code reuse in general. I'm
just (once again) reacting to the aggressive way, sometimes not to the
point, in which some people here argue for using CGI.pm.

Surely you can't be questioning the value of CGI.pm over the above code?
I have more respect for you than that, Gunnar! :)

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7TJ9GPeouIeTNHoEQJRbgCfXwD+RAL7yELVGwmJ53xPd4TSaNEAoPkD
xN+aqh2FBYWsF6sXTLfZD3xw
=G1nw
-----END PGP SIGNATURE-----
 
G

Gunnar Hjalmarsson

Eric said:
Gunnar said:
Still don't understand what it is that makes the above code
"buggy".

[OP's posted code]:
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/, $buffer);
foreach $pair (@pairs) {
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$FORM{$name} = $value;

Note that my initial comment only referred to the two first of those
lines.
1. The read() may fail. No check is made to see if it does.

2. This code does not handle GET requests.

3. CGI parameters may be separated by semicolons instead of
ampersands.

4. If a faulty browser fails to encode "=" with a % escape, and
that "=" is part of a form variable value, this code will drop that
portion of the value. I've seen browsers do this. split() should
use the limit parameter.

5. No limit is placed on the quantity of data read, opening the
script to possible DOS attack.

Thanks for that list over CGI.pm features.

To me, a piece of code that does what it's _intended_ to do is not
"buggy". It may have _limitations_, but limitations and bugs are not
the same thing.

If I want my program to print today's date in ISO 8601 format, I may
use this code:

my $time = time;
sub myDate {
my @t = (gmtime $time)[3..5];
sprintf '%d-%02d-%02d', $t[2] += 1900, ++$t[1], $t[0];
}
print myDate();

I could have used your Time::Format module instead, but if I don't
need a variety of date and time formats in my program, I wouldn't
likely have done so.

Time::Format includes some nice tools for time formating, no doubt.
Nevertheless, that fact wouldn't make you claim that my myDate()
function is "buggy", right?
 
A

A. Sinan Unur

To me, a piece of code that does what it's _intended_ to do is not
"buggy". It may have _limitations_, but limitations and bugs are not
the same thing.

On the other hand, there is usually a difference between what the author
of the code intends it to do and what the user of the code thinks it
does. In the case of OP's code, it had not been written by him (as I had
surmised) and we cannot expect the OP to have had full understanding of
the 'limitations' of the code. Hence my suggestion to either roll his own
paying attention to details (if this is for a learning exercise) or use
CGI.pm if he just wants to parse a form and feel safe.
If I want my program to print today's date in ISO 8601 format, I may
use this code:

my $time = time;
sub myDate {
my @t = (gmtime $time)[3..5];
sprintf '%d-%02d-%02d', $t[2] += 1900, ++$t[1], $t[0];
}
print myDate(); ....
Time::Format includes some nice tools for time formating, no doubt.
Nevertheless, that fact wouldn't make you claim that my myDate()
function is "buggy", right?

Is it possible to bring a web server down using your myDate function?

Sinan.
 
G

Gunnar Hjalmarsson

A. Sinan Unur said:
On the other hand, there is usually a difference between what the
author of the code intends it to do and what the user of the code
thinks it does. In the case of OP's code, it had not been written
by him (as I had surmised) and we cannot expect the OP to have had
full understanding of the 'limitations' of the code.

If you don't know what you are doing, don't do it. I can agree on
that, not least when it comes to CGI.
Hence my suggestion to either roll his own paying attention to
details (if this is for a learning exercise) or use CGI.pm if he
just wants to parse a form and feel safe.

"Safe"??? That's another annoying thing with the arguments used by the
'CGI.pm fan club'. Very often you give the impression that by using
CGI.pm, you don't need to bother about anything, since other very
experienced programmers have already taken care of it for you.

You know very well that there are security implications with CGI
scripts, whether you use CGI.pm or not. So why on earth do you talk
about feeling "safe"?
Is it possible to bring a web server down using your myDate
function?

Probably not. But it can be done with a CGI script, even if CGI.pm is
used to parse form data.
 
A

A. Sinan Unur

"Safe"??? That's another annoying thing with the arguments used by the
'CGI.pm fan club'. Very often you give the impression that by using
CGI.pm, you don't need to bother about anything, since other very
experienced programmers have already taken care of it for you.

Well, maybe I should have fully spelt it out. I meant "feel safe that the
nuts and bolts of parsing the form is properly taken care of". I did not
mean to imply that just by sticking a use CGI; you never have to worry
about the security implications of running a program using untrusted
data. But then, that is not a Perl issue.
Probably not. But it can be done with a CGI script, even if CGI.pm is
used to parse form data.

It can be done in a CGI script regardless of the programming language and
libraries used. But the culprit should not be that you blindly copied
code that has been in circulation at least since 1996 instead of using a
peer-reviewed module.

Sinan.
 
G

Gunnar Hjalmarsson

A. Sinan Unur said:
[Bringing a web server down] can be done in a CGI script regardless
of the programming language and libraries used. But the culprit
should not be that you blindly copied code that has been in
circulation at least since 1996 instead of using a peer-reviewed
module.

Maybe we can finally reach an agreement about this? :)

IMO, the keyword above is "blindly". You should of course never copy
and use *any* code fragment if you don't know how it works. Doing so
cannot be an acceptable alternative to using an established module.

Isn't the real problem that many beginners copy pieces of code that
they don't *understand*, and use them in production code? If so,
wouldn't it be better to say just that, rather than claiming that
every occurrence of code that parses form data is bad or buggy by
definition?
 
A

Alan J. Flavell

To me, a piece of code that does what it's _intended_ to do is not
"buggy". It may have _limitations_, but limitations and bugs are not
the same thing.

I don't think there's any real disagreement over that, unless the
limitation under discussion was in the department of "inability of the
code to protect itself against dangerous input from the client", in
which case I'd rate it as not only a limitation but also a bug.
If I want my program to print today's date in ISO 8601 format, I may
use this code:

However, it's a fact of programming life that the initial design and
implementation often represents only a tiny fraction of the software's
total lifetime support implications. So a program that can only
produce a single date format might very well later be called upon to
produce a different format, or to correctly report the time in someone
else's timezone, or whatever. So an initial design which is capable
of being easily extended to do these things may offer some real
advantages over one that will need additional one-off code development
to achieve the same result, in terms of later maintenance commitments.

Case in point: a few days after the end of European daylight savings
time this year, I had occasion to deal with a USAn videoconference
booking system. It thought that the clock time in the UK was BST (it
was not) and numerically the same as in Geneva(CH) (it was not) and
an hour away from the time in Hamburg(DE) (it got that much right).

When I reported the discrepancy, I was told "the software can be
tweaked". I'm sure it can, but why would it need to? Computer
systems in the various locations _know_ the correct time and timezone
for any supported locale - their sysadmins do not need to "tweak"
them. Evidently the company that implemented the videoconferencing
server had re-invented a square wheel, no?
 
A

A. Sinan Unur

A. Sinan Unur said:
[Bringing a web server down] can be done in a CGI script regardless
of the programming language and libraries used. But the culprit
should not be that you blindly copied code that has been in
circulation at least since 1996 instead of using a peer-reviewed
module.

Maybe we can finally reach an agreement about this? :)

IMO, the keyword above is "blindly". You should of course never copy
and use *any* code fragment if you don't know how it works. Doing so
cannot be an acceptable alternative to using an established module.
Agreed.

Isn't the real problem that many beginners copy pieces of code that
they don't *understand*, and use them in production code? If so,
wouldn't it be better to say just that, rather than claiming that
every occurrence of code that parses form data is bad or buggy by
definition?

Well, I have not claimed every occurence of such code is buggy by
definition. I reacted to the read and query string parsing bugs (later
retracted my objection to the latter). In this specific instance, I was
reacting to code that I have seen posted numerous times with no
indication that the poster was aware of potential pitfalls.

Sinan.
 
G

Gunnar Hjalmarsson

Purl said:
Use of modules is blindly copying code without understanding. This
Perl 5 Cargo Cult practice led to this term, "Copy And Paste
Babies."

Rather ironic, yes?

Don't see the irony. Copying a piece of code out from the context in
which is was intended to work is very different from using a CPAN
module and calling its methods in accordance with the documentation.
Unlike the former piece of code, the intended purpose of the module is
that it can be incorporated in a program even if the user don't
understand all its internals.
 
E

Eric J. Roode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Eric said:
Gunnar said:
Still don't understand what it is that makes the above code
"buggy".

[OP's posted code]:
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/, $buffer);
....

Note that my initial comment only referred to the two first of those
lines.

Well, even just those two lines are the subject of four of my five
arguments against the whole code block. :)
To me, a piece of code that does what it's _intended_ to do is not
"buggy". It may have _limitations_, but limitations and bugs are not
the same thing.
Agreed.

If I want my program to print today's date in ISO 8601 format, I may
use this code:

my $time = time;
sub myDate {
my @t = (gmtime $time)[3..5];
sprintf '%d-%02d-%02d', $t[2] += 1900, ++$t[1], $t[0];
}
print myDate();

I could have used your Time::Format module instead, but if I don't
need a variety of date and time formats in my program, I wouldn't
likely have done so.

Time::Format includes some nice tools for time formating, no doubt.
Nevertheless, that fact wouldn't make you claim that my myDate()
function is "buggy", right?

Your example is a bit simplistic. It is indeed simple to roll one's own
date-formatting code. Your code above has no obvious bugs that jump out
and catch my attention. It is limited in that its format is hard-coded,
but so what? That maybe sufficient for your needs, and as you point out,
a limitation is not a bug.

However, the OP (and hundreds of others like him) were apparently under
the impression that their code would be sufficient to "parse CGI input
parameters". In many cases it would, but in many cases not. And it is
not so simple to write robust CGI input handling code. It's not rocket
science -- but it's a silly wheel to reinvent.

<imho>
It's foolish to write twenty or thirty lines of robust CGI-parsing code
and include it in every CGI you write. It's more foolish to write five
or ten lines of crappy CGI-parsing code and include it in every CGI
program you write. It's much less foolish to write your own robust CGI-
parsing code, wrap it up in a nice module, and use that module from your
own CGI programs.

It's even less foolish to just use the already-written, combat-tested
CGI.pm module. It's a no-brainer.
</imho>

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7UOl2PeouIeTNHoEQLzYACgx8IVkq5OBGar98dChVQ46a8dggQAoLTb
wZwJIm1P6iVuyABxxUFgK3j1
=hXcR
-----END PGP SIGNATURE-----
 
D

Darin McBride

Purl said:
Then you are "blindly" copying code.

This irony is clear to a mind's eye with clear vision.

Interesting definition.
Is not your statement the same as before, a reference to
copying code and not understanding how it works?

"You should of course never copy and use *any* code fragment if
you don't know how it works."

Rather ironic, yes?

No. Copying code and tweaking it is quite different from using code
that was intended to be used, in the way it was intended to be used.
Unless you have read and fully understand all six-thousand plus
lines of Stein's module, his quarter-megabyte module, unless you
completely and fully understand every bit of code in his module,
with usage, then you are copying and pasting code you do not
understand which is one premise of Perl 5 Cargo Cultists' critiques.

Rather ironic, yes?

I assume you don't use perl at all, then, right? Do you understand all
the megabytes of C code with which the standard perl functions are made
up from?
Using modules without understanding is to do precisely what
Perl 5 Cargo Cultists rant about, this use of "cargo cult"
which is precisely what modules are, "cargo cult."

First time I've ever seen the term "Cargo Cultists". Care to define
the term?
 
G

Gunnar Hjalmarsson

Alan said:
I don't think there's any real disagreement over that, unless the
limitation under discussion was in the department of "inability of
the code to protect itself against dangerous input from the
client", in which case I'd rate it as not only a limitation but
also a bug.


However, it's a fact of programming life that the initial design
and implementation often represents only a tiny fraction of the
software's total lifetime support implications. So a program that
can only produce a single date format might very well later be
called upon to produce a different format, or to correctly report
the time in someone else's timezone, or whatever. So an initial
design which is capable of being easily extended to do these things
may offer some real advantages over one that will need additional
one-off code development to achieve the same result, in terms of
later maintenance commitments.

Absolutely. That's things to consider when deciding whether to use a
module, but it has nothing to do with the question if the alternative
contains bugs or not.
 
G

Gunnar Hjalmarsson

Purl said:
A person complained about denial of service attack using so called
"cargo cult" code, yet Stein's module contains this major security
blunder, quite automatically.

CGI.pm does not by default limit the amount of data that can be read
from STDIN, which is something that I believe some people aren't aware
of. Is that what you are referring to?
 
G

Gunnar Hjalmarsson

Purl said:
Yes, precisely.

Brenner's method, which is labeled as "cargo cult" by participants
here, does this automatically.

Over the years, I have read literally thousands of examples of
scripts using the CGI.pm module, very few, perhaps a dozen, make
use of MAX POST.

Although I have not read thousands of examples using CGI.pm in this
group, those I have read, only a small handful use MAX POST syntax.

I have the same impression.
 
E

Eric J. Roode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Looking through this inane thread, I will add usage of Stein's
module automatically sets you up for a denial of service attack.

A person complained about denial of service attack using so called
"cargo cult" code, yet Stein's module contains this major security
blunder, quite automatically.

Stein's module also contains an easy way to avoid the security hole, and
the documentation contains a discussion of the security issues. Not so
for the code that I originally complained about.

Is this "security hole" your only complaint with CGI.pm?

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7VkwWPeouIeTNHoEQJOSQCfYQhx0Z/gGhmw/xdavzkWtrbcuI8An0Ns
Fwt88I6RmSxq4gl7d/io7rLd
=ctF0
-----END PGP SIGNATURE-----
 
J

James Willmore

THIS is top posting. Please don't do this.

what is 'top-post' ???
actually don't understand how the eric roode and first sinan unur
posts were not subordinated to the post immediately above them,
i simply use reply-to-group and it always subordinates to the post
i'm responding to

Please read the posting guidelines for this group
http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

<snip - because there is NO reason to repost EVERYTHING from the
thread>

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
Cinemuck, n.: The combination of popcorn, soda, and melted
chocolate which covers the floors of movie theaters. -- Rich
Hall, "Sniglets"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top