CGI.pm and special characters in hidden inputs

Discussion in 'Perl Misc' started by tsunami@zedxinc.com, Dec 29, 2004.

  1. Guest

    Hello,

    I use CGI.pm to parse forms, and I am running into issues with certain
    special characters.

    Say I have a form element, with a value of "Mom's House". It is a
    hidden input, passed in from a previous page, so the HTML is something
    like this:

    <INPUT TYPE="hidden" NAME="location" VALUE="Mom&apos;s House">

    I was given to understand that, for ' " > < and &, you need to use the
    encoded value to denote the character when it appears in a tag. I know
    this is the case for normal XML files, and the parsers take care of it.
    However, CGI.pm's param() function does NOT seem to be interpreting
    the special characters. In the CGI script that processes this form, I
    would have:

    $location = param('location');

    and $location would be: "Mom&apos;s House" While I could, in this
    instance, simply NOT encode the apostrophe and it would probably work,
    if it were a double quote, I know it would break it. Any ideas?
    Thanks!

    --
    Dave
     
    , Dec 29, 2004
    #1
    1. Advertising

  2. On Wed, 29 Dec 2004 wrote:

    > I use CGI.pm to parse forms, and I am running into issues


    However, you don't appear to have a Perl problem...

    > with certain special characters.


    I'm afraid you've triggered a raw nerve there. Considering the many
    thousands of Unicode characters which have been defined, what you you
    suppose is so "special" about a us-ascii apostrophe?

    > Say I have a form element, with a value of "Mom's House". It is a
    > hidden input, passed in from a previous page, so the HTML is
    > something like this:
    >
    > <INPUT TYPE="hidden" NAME="location" VALUE="Mom&apos;s House">


    Could be...

    > I was given to understand that, for ' " > < and &, you need to use
    > the encoded value to denote the character when it appears in a tag.


    Not exactly - for details consult a group with comp.infosystems.www...
    in its name. But that's irrelevant, because the client agent has to
    parse that. So it makes no difference which of the ways you choose to
    represent your characters in the HTML source (the coded character
    itself, its numerical character reference, or its character entity).
    At submission time they're all the same.

    > However, CGI.pm's param() function does NOT seem to be interpreting
    > the special characters.


    What do you mean by "interpreting"?

    > In the CGI script that processes this form, I would have:
    >
    > $location = param('location');
    >
    > and $location would be: "Mom&apos;s House"


    It would??? Let's have a URL which demonstrates this behaviour!

    But you're off-topic here. You'd be better on a WWW authoring group
    (namely, comp.infosystems.www.authoring.cgi, but beware its
    automoderation bot).
     
    Alan J. Flavell, Dec 29, 2004
    #2
    1. Advertising

  3. Guest

    wrote:
    > Hello,
    >
    > I use CGI.pm to parse forms, and I am running into issues with

    certain
    > special characters.
    >
    > Say I have a form element, with a value of "Mom's House". It is a
    > hidden input, passed in from a previous page, so the HTML is

    something
    > like this:
    >
    > <INPUT TYPE="hidden" NAME="location" VALUE="Mom&apos;s House">


    print hidden(-name=>'location', -value=>"Mom's House");

    Should work fine if you use CGI.pm like this.

    >
    > I was given to understand that, for ' " > < and &, you need to use

    the
    > encoded value to denote the character when it appears in a tag. I

    know
    > this is the case for normal XML files, and the parsers take care of

    it.
    > However, CGI.pm's param() function does NOT seem to be interpreting
    > the special characters. In the CGI script that processes this form,

    I
    > would have:
    >
    > $location = param('location');
    >
    > and $location would be: "Mom&apos;s House" While I could, in this
    > instance, simply NOT encode the apostrophe and it would probably

    work,
    > if it were a double quote, I know it would break it. Any ideas?
    > Thanks!
    >


    >From CGI.pm home page: http://stein.cshl.org/WWW/software/CGI/


    <quote>
    AUTOESCAPING HTML
    By default, all HTML that are emitted by the form-generating functions
    are passed through a function called escapeHTML():
    $escaped_string = escapeHTML("unescaped string");



    Provided that you have specified a character set of ISO-8859-1 (the
    default), the standard HTML escaping rules will be used. The "<"
    character becomes "&lt;", ">" becomes "&gt;", "&" becomes "&amp;", and
    the quote character becomes "&quot;". In addition, the hexadecimal 0x8b
    and 0x9b characters, which many windows-based browsers interpret as the
    left and right angle-bracket characters, are replaced by their numeric
    HTML entities ("&#139" and "›"). If you manually change the
    charset, either by calling the charset() method explicitly or by
    passing a -charset argument to header(), then all characters will be
    replaced by their numeric entities, since CGI.pm has no lookup table
    for all the possible encodings.

    Autoescaping does not apply to other HTML-generating functions, such as
    h1(). You should call escapeHTML() yourself on any data that is passed
    in from the outside, such as nasty text that people may enter into
    guestbooks.

    To change the character set, use charset(). To turn autoescaping off
    completely, use autoescape():
    $charset = charset([$charset]); # Get or set the current character
    set.

    $flag = autoEscape([$flag]); # Get or set the value of the
    autoescape flag.
    </quote>

    Hope this helps.

    wana
     
    , Dec 29, 2004
    #3
  4. wrote:
    > <INPUT TYPE="hidden" NAME="location" VALUE="Mom&apos;s House">


    <snip>

    > In the CGI script that processes this form, I would have:
    >
    > $location = param('location');
    >
    > and $location would be: "Mom&apos;s House"


    No, it wouldn't. Before submission, that character entity would be
    converted by the browser to "'", so you don't have the problem you think
    you have. Try and see for yourself!

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Dec 30, 2004
    #4
  5. wrote:
    > wrote:
    >> <INPUT TYPE="hidden" NAME="location" VALUE="Mom&apos;s House">


    <snip>

    >> In the CGI script that processes this form, I would have:
    >>
    >> $location = param('location');
    >>
    >> and $location would be: "Mom&apos;s House"

    >
    > From CGI.pm home page: http://stein.cshl.org/WWW/software/CGI/
    >
    > <quote>
    > AUTOESCAPING HTML


    <snip>

    > </quote>


    In what way is that quote related to the OP's concern?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Dec 30, 2004
    #5
  6. Matt Garrish Guest

    "Gunnar Hjalmarsson" <> wrote in message
    news:...
    > wrote:
    >> <INPUT TYPE="hidden" NAME="location" VALUE="Mom&apos;s House">

    >
    > <snip>
    >
    >> In the CGI script that processes this form, I would have:
    >>
    >> $location = param('location');
    >>
    >> and $location would be: "Mom&apos;s House"

    >
    > No, it wouldn't. Before submission, that character entity would be
    > converted by the browser to "'", so you don't have the problem you think
    > you have. Try and see for yourself!
    >


    Huh? Did you test that yourself? I've never heard of a browser converting
    entities in a hidden form field.

    test.htm
    ------------------------------

    <html>
    <head>
    <title></title>
    </head>
    <body>
    <form name="test" action="/cgi-bin/test.cgi" method="post">
    <input type="hidden" name="location" value="what&apos;s wrong with this?" />
    <input type="submit" />
    </form>
    </body>
    </html>



    test.cgi
    ------------------

    use CGI qw/param/;

    my $location = param('location');

    print "Content-type: text/plain\n\n";
    print $location;


    Output:
    --------------------
    what&apos;s wrong with this?

    Matt
     
    Matt Garrish, Dec 30, 2004
    #6
  7. Guest


    >
    > In what way is that quote related to the OP's concern?
    >
    > --
    > Gunnar Hjalmarsson
    > Email: http://www.gunnar.cc/cgi-bin/contact.pl


    For example, I put this in my Perl program using CGI.pm:

    print textfield({name=>'Name', value=>"bob's"});

    When I view source in my browser it looks like this:

    <input type="text" name="Name" value="bob's" />

    CGI.pm handled the HTML escaping automatically as promised in the
    section I quoted. I think that's what he was asking about.

    wana
     
    , Dec 30, 2004
    #7
  8. Matt Garrish wrote:
    > "Gunnar Hjalmarsson" <> wrote in message
    > news:...
    >> wrote:
    >>>
    >>><INPUT TYPE="hidden" NAME="location" VALUE="Mom&apos;s House">

    >>
    >><snip>
    >>
    >>>In the CGI script that processes this form, I would have:
    >>>
    >>>$location = param('location');
    >>>
    >>>and $location would be: "Mom&apos;s House"

    >>
    >>No, it wouldn't. Before submission, that character entity would be
    >>converted by the browser to "'", so you don't have the problem you think
    >>you have. Try and see for yourself!

    >
    > Huh? Did you test that yourself?


    No.

    > I've never heard of a browser converting entities in a hidden form field.


    <example code snipped>

    > Output:
    > --------------------
    > what&apos;s wrong with this?


    When running your code, I get:
    what's wrong with this?

    Hmm.. Guess Alan has to clarify again. :)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Dec 30, 2004
    #8
  9. wrote:
    >>In what way is that quote related to the OP's concern?

    >
    > For example, I put this in my Perl program using CGI.pm:
    >
    > print textfield({name=>'Name', value=>"bob's"});
    >
    > When I view source in my browser it looks like this:
    >
    > <input type="text" name="Name" value="bob's" />
    >
    > CGI.pm handled the HTML escaping automatically as promised in the
    > section I quoted. I think that's what he was asking about.


    CGI.pm converted the ' character to a character entity.

    The OP had already a character entity, and I think he was asking about
    how to get the original character back.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Dec 30, 2004
    #9
  10. Gunnar Hjalmarsson wrote:
    > wrote:
    >>
    >> <INPUT TYPE="hidden" NAME="location" VALUE="Mom&apos;s House">

    >
    > <snip>
    >
    >> In the CGI script that processes this form, I would have:
    >>
    >> $location = param('location');
    >>
    >> and $location would be: "Mom&apos;s House"

    >
    > No, it wouldn't. Before submission, that character entity would be
    > converted by the browser to "'", so you don't have the problem you think
    > you have. Try and see for yourself!


    Matt's objection made me do some testing, and Firefox understands
    "&apos;", while MSIE does not, which explains the confusion. (MSIE does
    understand the other: "&quot;", "&lt;", "&gt;" and "&amp;".)

    So use the entity number "'" instead of "&apos;" to avoid problems.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Dec 30, 2004
    #10
  11. Matt Garrish Guest

    "Gunnar Hjalmarsson" <> wrote in message
    news:...
    > Matt Garrish wrote:
    >> "Gunnar Hjalmarsson" <> wrote in message
    >> news:...
    >>> wrote:
    >>>>
    >>>><INPUT TYPE="hidden" NAME="location" VALUE="Mom&apos;s House">
    >>>
    >>><snip>
    >>>
    >>>>In the CGI script that processes this form, I would have:
    >>>>
    >>>>$location = param('location');
    >>>>
    >>>>and $location would be: "Mom&apos;s House"
    >>>
    >>>No, it wouldn't. Before submission, that character entity would be
    >>>converted by the browser to "'", so you don't have the problem you think
    >>>you have. Try and see for yourself!

    >>
    >> Huh? Did you test that yourself?

    >
    > No.
    >
    >> I've never heard of a browser converting entities in a hidden form field.

    >
    > <example code snipped>
    >
    >> Output:
    >> --------------------
    >> what&apos;s wrong with this?

    >
    > When running your code, I get:
    > what's wrong with this?
    >
    > Hmm.. Guess Alan has to clarify again. :)
    >


    Something I've never considered before, if that is the case (not that I
    spend a lot of time with web forms). I can see some benefit in automatically
    converting the entities, however I don't think I'd ever want a browser
    making that decision for me.

    I'll see if I can find an explanation of this behaviour, even if it is
    getting off topic...

    Matt
     
    Matt Garrish, Dec 30, 2004
    #11
  12. Matt Garrish Guest

    "Matt Garrish" <> wrote in message
    news:_YKAd.32035$...
    >
    > I'll see if I can find an explanation of this behaviour, even if it is
    > getting off topic...
    >


    Alas, Google has let me down (or I can't find the right combination of
    terms, at least). I still can't see much benefit in translating the entities
    back to characters automatically when the form is submitted. The only
    advantage would seem to be that it means transferring slightly less data on
    the form submission. I suspect it has something to do with the attempts to
    render entities as characters within visible form fields, but I would have
    thought the hidden input type's value would be more along the lines of a
    single-quoted string in Perl.

    If someone has an official version of this behaviour, however, I'd be
    interested in hearing what it is.

    Matt
     
    Matt Garrish, Dec 30, 2004
    #12
  13. Peter Wyzl Guest

    "Gunnar Hjalmarsson" <> wrote in message
    news:...
    : Gunnar Hjalmarsson wrote:
    : > wrote:
    : >>
    : >> <INPUT TYPE="hidden" NAME="location" VALUE="Mom&apos;s House">
    : >
    : > <snip>
    : >
    : >> In the CGI script that processes this form, I would have:
    : >>
    : >> $location = param('location');
    : >>
    : >> and $location would be: "Mom&apos;s House"
    : >
    : > No, it wouldn't. Before submission, that character entity would be
    : > converted by the browser to "'", so you don't have the problem you think
    : > you have. Try and see for yourself!
    :
    : Matt's objection made me do some testing, and Firefox understands
    : "&apos;", while MSIE does not, which explains the confusion. (MSIE does
    : understand the other: "&quot;", "&lt;", "&gt;" and "&amp;".)
    :
    : So use the entity number "'" instead of "&apos;" to avoid problems.

    Or, given that it's a hidden field, change its name to something which
    avoids special characters, and only use those where you need to deal with
    displays.

    --
    Wyzelli
     
    Peter Wyzl, Dec 30, 2004
    #13
  14. On Thu, 30 Dec 2004, Gunnar Hjalmarsson wrote:

    > Matt's objection made me do some testing, and Firefox understands
    > "&apos;", while MSIE does not, which explains the confusion. (MSIE
    > does understand the other: "&quot;", "&lt;", "&gt;" and "&amp;".)


    and elsewhere wrote:

    > Hmm.. Guess Alan has to clarify again. :)


    Oops. I must admit that for the moment, I forgot this twilight
    position of the &apos; character entity. I guess I wasn't properly in
    the mood for off-topic details :-}

    Thanks for supplying the missing piece. Although &apos; is fairly
    widely supported by browsers, for some reason it doesn't seem to be
    included in the list of character entities defined in W3C HTML
    specifications. So I guess it shouldn't really be used in a WWW
    context.

    > So use the entity number "'" instead of "&apos;" to avoid
    > problems.


    That's true; but I think it's fair to say that it can always be
    avoided. If an attribute value contains both " and ' characters, then
    it can be enclosed in "...", and the included " characters represented
    as &quot; (which -is- in the HTML/4.01 and HTML/2.0 specifications,
    and is supported by pretty-much any browser, although it seems to have
    been accidentally omitted from HTML/3.2): the ASCII apostrophes can
    then be included literally.

    Did/does CGI.pm really emit &apos; on its own initiative? Or was this
    something that the hon. Usenaut had done deliberately?
     
    Alan J. Flavell, Dec 30, 2004
    #14
  15. Alan J. Flavell wrote:
    > Did/does CGI.pm really emit &apos; on its own initiative?


    No, the escapeHTML() method in CGI.pm replaces ' with ' (but only
    when the charset is ISO-8859-1 or WINDOWS-1252, if I understand it
    correctly).

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Dec 30, 2004
    #15
  16. On Thu, 30 Dec 2004, Gunnar Hjalmarsson wrote:

    > Alan J. Flavell wrote:
    > > Did/does CGI.pm really emit &apos; on its own initiative?

    >
    > No, the escapeHTML() method in CGI.pm replaces ' with '


    So it does. I'm sorry, I realise now that I should have taken the
    time to look before posting...

    > (but only when the charset is ISO-8859-1 or WINDOWS-1252, if I
    > understand it correctly).


    You do - pasting from the version of CGI.pm that I happen to have to
    hand:

    | my $latin = uc $self->{'.charset'} eq 'ISO-8859-1' ||
    | uc $self->{'.charset'} eq 'WINDOWS-1252';
    | if ($latin) { # bug in some browsers
    | $toencode =~ s{'}{'}gso;
    | $toencode =~ s{\x8b}{‹}gso;
    | $toencode =~ s{\x9b}{›}gso;

    But what you omitted to mention was that comment. There is *no
    theoretical need* for that code: it's meant to work-around bugs in
    specific browsers (probably now outdated, but the workarounds are
    harmless to properly-behaved client agents, so there's no particular
    need to remove the workarounds).

    I distinctly remember the (security-relevant!) bug which the \x8b and
    \x9b workarounds are meant to address, and tests confirmed that the
    bug indeed seemed to be confined to documents coded in those specific
    character encodings; but I must confess I'm not exactly familiar with
    the one which prompted L.S to reformulate the apostrophe character.
     
    Alan J. Flavell, Dec 30, 2004
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stefan Mueller
    Replies:
    3
    Views:
    33,171
    Stefan Mueller
    Jul 23, 2006
  2. Replies:
    2
    Views:
    1,115
    Ingo Menger
    May 31, 2007
  3. rvino
    Replies:
    0
    Views:
    4,683
    rvino
    Aug 14, 2007
  4. dawidg

    hidden inputs and viewstate

    dawidg, Jun 11, 2008, in forum: ASP .Net
    Replies:
    2
    Views:
    856
    bruce barker
    Jun 11, 2008
  5. Pif
    Replies:
    1
    Views:
    539
    Jukka K. Korpela
    Dec 22, 2009
Loading...

Share This Page