arrange form data in same order as on form

Discussion in 'Perl Misc' started by bbxrider, Nov 13, 2003.

  1. Well, yes, but there's a massive difference between the elaborate code
    that might be found in a well-tested and peer-reviewed module,
    intended to deal well with all possible situations that it's going to
    encounter in the Real World(tm), on the one hand; and a
    straightforward little script to use that module, checking that all is
    well but otherwise simply baling out when it recognises that it's not.

    Or in clear text: CGI.pm internally appears to be contorted code, but
    there's generally good reasons for what it does and how it does it;
    however, it's probably not the kind of code that the average *user* of
    CGI.pm should be seeking to emulate.
    That too, for sure. But that's a different axis of evaluation.
    I would ask anyone interested in the following to read all of it,
    carefully, or not at all. Half-measures are inadvisable.

    Point 1. Read
    http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1 , item 2.

    Read also
    http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2

    in the paragraph beginning 'A "multipart/form-data" message contains a
    series of parts'.

    Thus, both of the mandatory submission formats specify that the items
    are required to be submitted in the same order that they appeared in
    the form.

    Point 2. Client agents don't necessarily conform to the spec
    (although most of them do nowadays).

    Point 3. In Perl, f you get your submitted name/value pairs from the
    module as a "hash", then of course the ordering has been lost by then.

    However, in every other respect, the hash is very much the "natural"
    way to represent these things in Perl.

    Point 4. The whole point of defining the values by name/value pairs
    is surely to make them accessible by name rather than by position?
    If the designers of HTML forms had wanted to implement positional
    parameters, they could have done so (in fact they already did - check
    the <ISINDEX> element, now deprecated, from earlier versions of HTML).

    My conclusion: although the HTML4 spec requires the name/value pairs
    to be transmitted in same order they appear in the form, it seems to
    me that it's utterly pointless to want to rely on all client software
    actually doing that. I've often met writers of scripts who seemed
    completely obsessed with needing to process the items in the same
    order as which they were present in the form, but on closer study I've
    never found any justification for doing so, and as soon as the writer
    agreed to drop their insistence that they "needed" this, they found
    their scripts were easier to write, with no loss of functionality.

    While I'm sure that someone could devise a requirement that depended
    on the ordering, I can't see any advantage in doing so.

    IMHO and YMMVWV.

    You may very well want to re-write the form e.g with existing inputs
    filled-in and waiting for further input from the user - but the right
    way to do that is probably to use the same code to write the original
    empty form as re-writes the partially completed form, and that code
    will certainly know what is the proper ordering of the items on the
    HTML form itself. But when the boss says the items have to come in a
    different order on the web page, there will be no need for a major
    rewrite of the code to take that into account, if you've written code
    that isn't sensitive to the ordering in the first place.
    Something like that; but by gaining the benefits of the hash
    representation, one also discards any supposed benefits there might
    have been in the original ordering, so - as I say - it seems to me to
    be the wrong approach anyway.
    If you want to iterate through the name/value pairs that are present,
    then just iterate through the keys of the hash. Write the code so
    that the ordering doesn't matter. The resulting code is likely to be
    simpler than trying to re-create the problem of positional parameters
    all over again - would be my advice.
     
    Alan J. Flavell, Nov 14, 2003
    #21
    1. Advertisements

  2. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    I just spent some time perusing this site. It's not a bad site overall,
    as far as an introduction to CGI programming goes. The way they
    introduce processing of input variables is fine -- but I wish they had
    moved immediately on to using CGI.pm, instead of saving it until chapter
    17. That code is okay for learning, but is awful for any real work.
    Yes, you should. CGI.pm is a module that comes with the Perl
    distribution. It automates much of the dirty work behind processing CGI
    forms, plus it has some security checks to protect you from DOS attacks.
    Absolutely. Borrowing and adapting others' code is a great way to learn.
    Just be aware of the limitations of the code you're using! :)
    Most (all?) browsers do submit the variables in the same order that they
    appear on the form, but this is NOT guaranteed. Besides, why do you need
    them to be in any particular order? They all have names.
    Yes, this is much better. However, be aware that CGI.pm does all of this
    for you. Less typing, and it's already debugged for you.
    Well, all of your form variables are named, right? So process them in
    name order.
    You're welcome.

    - --
    Eric
    $_ = reverse sort $ /. r , qw p ekca lre uJ reh
    ts p , map $ _. $ " , qw e p h tona e and print

    -----BEGIN PGP SIGNATURE-----
    Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

    iQA/AwUBP7TImGPeouIeTNHoEQJT4QCfbqG0ESDylR8pTZDPjeaCDAh4Rf0AmgP+
    1ZIw0EXmWZEP5GzNZNCgZz06
    =fDlp
    -----END PGP SIGNATURE-----
     
    Eric J. Roode, Nov 14, 2003
    #22
    1. Advertisements

  3. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    [OP's posted code]:
    1. The read() may fail. No check is made to see if it does.

    2. This code does not handle GET requests.

    3. CGI parameters may be separated by semicolons instead of ampersands.

    4. If a faulty browser fails to encode "=" with a % escape, and that "="
    is part of a form variable value, this code will drop that portion of the
    value. I've seen browsers do this. split() should use the limit
    parameter.

    5. No limit is placed on the quantity of data read, opening the script to
    possible DOS attack.
    Surely you can't be questioning the value of CGI.pm over the above code?
    I have more respect for you than that, Gunnar! :)

    - --
    Eric
    $_ = reverse sort $ /. r , qw p ekca lre uJ reh
    ts p , map $ _. $ " , qw e p h tona e and print

    -----BEGIN PGP SIGNATURE-----
    Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

    iQA/AwUBP7TJ9GPeouIeTNHoEQJRbgCfXwD+RAL7yELVGwmJ53xPd4TSaNEAoPkD
    xN+aqh2FBYWsF6sXTLfZD3xw
    =G1nw
    -----END PGP SIGNATURE-----
     
    Eric J. Roode, Nov 14, 2003
    #23
  4. Note that my initial comment only referred to the two first of those
    lines.
    Thanks for that list over CGI.pm features.

    To me, a piece of code that does what it's _intended_ to do is not
    "buggy". It may have _limitations_, but limitations and bugs are not
    the same thing.

    If I want my program to print today's date in ISO 8601 format, I may
    use this code:

    my $time = time;
    sub myDate {
    my @t = (gmtime $time)[3..5];
    sprintf '%d-%02d-%02d', $t[2] += 1900, ++$t[1], $t[0];
    }
    print myDate();

    I could have used your Time::Format module instead, but if I don't
    need a variety of date and time formats in my program, I wouldn't
    likely have done so.

    Time::Format includes some nice tools for time formating, no doubt.
    Nevertheless, that fact wouldn't make you claim that my myDate()
    function is "buggy", right?
     
    Gunnar Hjalmarsson, Nov 14, 2003
    #24
  5. On the other hand, there is usually a difference between what the author
    of the code intends it to do and what the user of the code thinks it
    does. In the case of OP's code, it had not been written by him (as I had
    surmised) and we cannot expect the OP to have had full understanding of
    the 'limitations' of the code. Hence my suggestion to either roll his own
    paying attention to details (if this is for a learning exercise) or use
    CGI.pm if he just wants to parse a form and feel safe.
    Is it possible to bring a web server down using your myDate function?

    Sinan.
     
    A. Sinan Unur, Nov 14, 2003
    #25
  6. If you don't know what you are doing, don't do it. I can agree on
    that, not least when it comes to CGI.
    "Safe"??? That's another annoying thing with the arguments used by the
    'CGI.pm fan club'. Very often you give the impression that by using
    CGI.pm, you don't need to bother about anything, since other very
    experienced programmers have already taken care of it for you.

    You know very well that there are security implications with CGI
    scripts, whether you use CGI.pm or not. So why on earth do you talk
    about feeling "safe"?
    Probably not. But it can be done with a CGI script, even if CGI.pm is
    used to parse form data.
     
    Gunnar Hjalmarsson, Nov 14, 2003
    #26
  7. Well, maybe I should have fully spelt it out. I meant "feel safe that the
    nuts and bolts of parsing the form is properly taken care of". I did not
    mean to imply that just by sticking a use CGI; you never have to worry
    about the security implications of running a program using untrusted
    data. But then, that is not a Perl issue.
    It can be done in a CGI script regardless of the programming language and
    libraries used. But the culprit should not be that you blindly copied
    code that has been in circulation at least since 1996 instead of using a
    peer-reviewed module.

    Sinan.
     
    A. Sinan Unur, Nov 14, 2003
    #27
  8. Maybe we can finally reach an agreement about this? :)

    IMO, the keyword above is "blindly". You should of course never copy
    and use *any* code fragment if you don't know how it works. Doing so
    cannot be an acceptable alternative to using an established module.

    Isn't the real problem that many beginners copy pieces of code that
    they don't *understand*, and use them in production code? If so,
    wouldn't it be better to say just that, rather than claiming that
    every occurrence of code that parses form data is bad or buggy by
    definition?
     
    Gunnar Hjalmarsson, Nov 14, 2003
    #28
  9. I don't think there's any real disagreement over that, unless the
    limitation under discussion was in the department of "inability of the
    code to protect itself against dangerous input from the client", in
    which case I'd rate it as not only a limitation but also a bug.
    However, it's a fact of programming life that the initial design and
    implementation often represents only a tiny fraction of the software's
    total lifetime support implications. So a program that can only
    produce a single date format might very well later be called upon to
    produce a different format, or to correctly report the time in someone
    else's timezone, or whatever. So an initial design which is capable
    of being easily extended to do these things may offer some real
    advantages over one that will need additional one-off code development
    to achieve the same result, in terms of later maintenance commitments.

    Case in point: a few days after the end of European daylight savings
    time this year, I had occasion to deal with a USAn videoconference
    booking system. It thought that the clock time in the UK was BST (it
    was not) and numerically the same as in Geneva(CH) (it was not) and
    an hour away from the time in Hamburg(DE) (it got that much right).

    When I reported the discrepancy, I was told "the software can be
    tweaked". I'm sure it can, but why would it need to? Computer
    systems in the various locations _know_ the correct time and timezone
    for any supported locale - their sysadmins do not need to "tweak"
    them. Evidently the company that implemented the videoconferencing
    server had re-invented a square wheel, no?
     
    Alan J. Flavell, Nov 14, 2003
    #29
  10. A. Sinan Unur, Nov 14, 2003
    #30
  11. Well, I have not claimed every occurence of such code is buggy by
    definition. I reacted to the read and query string parsing bugs (later
    retracted my objection to the latter). In this specific instance, I was
    reacting to code that I have seen posted numerous times with no
    indication that the poster was aware of potential pitfalls.

    Sinan.
     
    A. Sinan Unur, Nov 14, 2003
    #31
  12. Don't see the irony. Copying a piece of code out from the context in
    which is was intended to work is very different from using a CPAN
    module and calling its methods in accordance with the documentation.
    Unlike the former piece of code, the intended purpose of the module is
    that it can be incorporated in a program even if the user don't
    understand all its internals.
     
    Gunnar Hjalmarsson, Nov 14, 2003
    #32
  13. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Well, even just those two lines are the subject of four of my five
    arguments against the whole code block. :)
    Your example is a bit simplistic. It is indeed simple to roll one's own
    date-formatting code. Your code above has no obvious bugs that jump out
    and catch my attention. It is limited in that its format is hard-coded,
    but so what? That maybe sufficient for your needs, and as you point out,
    a limitation is not a bug.

    However, the OP (and hundreds of others like him) were apparently under
    the impression that their code would be sufficient to "parse CGI input
    parameters". In many cases it would, but in many cases not. And it is
    not so simple to write robust CGI input handling code. It's not rocket
    science -- but it's a silly wheel to reinvent.

    <imho>
    It's foolish to write twenty or thirty lines of robust CGI-parsing code
    and include it in every CGI you write. It's more foolish to write five
    or ten lines of crappy CGI-parsing code and include it in every CGI
    program you write. It's much less foolish to write your own robust CGI-
    parsing code, wrap it up in a nice module, and use that module from your
    own CGI programs.

    It's even less foolish to just use the already-written, combat-tested
    CGI.pm module. It's a no-brainer.
    </imho>

    - --
    Eric
    $_ = reverse sort $ /. r , qw p ekca lre uJ reh
    ts p , map $ _. $ " , qw e p h tona e and print

    -----BEGIN PGP SIGNATURE-----
    Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

    iQA/AwUBP7UOl2PeouIeTNHoEQLzYACgx8IVkq5OBGar98dChVQ46a8dggQAoLTb
    wZwJIm1P6iVuyABxxUFgK3j1
    =hXcR
    -----END PGP SIGNATURE-----
     
    Eric J. Roode, Nov 14, 2003
    #33
  14. Interesting definition.
    No. Copying code and tweaking it is quite different from using code
    that was intended to be used, in the way it was intended to be used.
    I assume you don't use perl at all, then, right? Do you understand all
    the megabytes of C code with which the standard perl functions are made
    up from?
    First time I've ever seen the term "Cargo Cultists". Care to define
    the term?
     
    Darin McBride, Nov 14, 2003
    #34
  15. No, you haven't. My apologies for that.
     
    Gunnar Hjalmarsson, Nov 14, 2003
    #35
  16. Absolutely. That's things to consider when deciding whether to use a
    module, but it has nothing to do with the question if the alternative
    contains bugs or not.
     
    Gunnar Hjalmarsson, Nov 14, 2003
    #36
  17. CGI.pm does not by default limit the amount of data that can be read
    from STDIN, which is something that I believe some people aren't aware
    of. Is that what you are referring to?
     
    Gunnar Hjalmarsson, Nov 14, 2003
    #37
  18. I have the same impression.
     
    Gunnar Hjalmarsson, Nov 14, 2003
    #38
  19. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Stein's module also contains an easy way to avoid the security hole, and
    the documentation contains a discussion of the security issues. Not so
    for the code that I originally complained about.

    Is this "security hole" your only complaint with CGI.pm?

    - --
    Eric
    $_ = reverse sort $ /. r , qw p ekca lre uJ reh
    ts p , map $ _. $ " , qw e p h tona e and print

    -----BEGIN PGP SIGNATURE-----
    Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

    iQA/AwUBP7VkwWPeouIeTNHoEQJOSQCfYQhx0Z/gGhmw/xdavzkWtrbcuI8An0Ns
    Fwt88I6RmSxq4gl7d/io7rLd
    =ctF0
    -----END PGP SIGNATURE-----
     
    Eric J. Roode, Nov 14, 2003
    #39
  20. THIS is top posting. Please don't do this.

    Please read the posting guidelines for this group
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

    <snip - because there is NO reason to repost EVERYTHING from the
    thread>

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    Cinemuck, n.: The combination of popcorn, soda, and melted
    chocolate which covers the floors of movie theaters. -- Rich
    Hall, "Sniglets"
     
    James Willmore, Nov 14, 2003
    #40
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.