cgi and escapeHTML but not ampersand

Discussion in 'Perl Misc' started by Marek, Aug 30, 2009.

  1. Marek

    Marek Guest

    Hello all!

    Please if this is not the appropriate group, point my to the right
    one.

    I am trying to find since a good while, how to convince the CGI server
    module, not to replace the ampersand, by &

    I have an array like follows with two entity encoded *Umlauts*:

    my @element_liste =
    (
    ...
    { type => "text", name => "email",
    bez => "Email (zur Auftragsbestätigung):", size =>
    36 },
    { type => "text", name => "fahrgast",
    bez => "Fahrgäste:", size => 36, muss => 1 },
    ...
    );

    and later the cgi I is producing a form with :

    foreach my $f (@{$element_liste_ref})
    {
    print escapeHTML ($f->{bez}), " ",
    textfield (-name => $f->{name},
    -size => $f->{size}),
    br (), "\n";
    }

    How to prevent, that the entity encoded ä is coming back as
    ä ?
    In my @element_liste I tried with every imaginable tricks, like :

    bez => "Fahrgäste:",
    bez => "Fahrg\äste:",
    bez => "Fahrg\\äste:",
    bez => "Fahrgäste:",
    or remove the escapeHTML

    The header of the cgi is:

    print header (),
    start_html (-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
    -title => "Title",
    -lang => 'de',
    -style=>{'src'=>'/style/style.css',
    -type=>'text/css',
    -media=>'screen'},
    -charset=>'utf-8'
    ),

    which is producing the non valid <body charset="utf-8">. On the server
    is running unfortunately an outdated CGI version: CGI.pm Version:
    2.752

    Thank you for your help.


    marek
    Marek, Aug 30, 2009
    #1
    1. Advertising

  2. Marek <> wrote:
    > I am trying to find since a good while, how to convince the CGI server
    > module, not to replace the ampersand, by &amp;


    > I have an array like follows with two entity encoded *Umlauts*:


    > my @element_liste =
    > (
    > ...
    > { type => "text", name => "email",
    > bez => "Email (zur Auftragsbest&auml;tigung):", size =>
    > 36 },
    > { type => "text", name => "fahrgast",
    > bez => "Fahrg&auml;ste:", size => 36, muss => 1 },
    > ...
    > );


    > and later the cgi I is producing a form with :


    > foreach my $f (@{$element_liste_ref})
    > {
    > print escapeHTML ($f->{bez}), " ",
    > textfield (-name => $f->{name},
    > -size => $f->{size}),
    > br (), "\n";
    > }


    > How to prevent, that the entity encoded &auml; is coming back as
    > &amp;auml; ?


    The only way to prevent replacement of characters that have a
    special meaning in HTML is not to call a function that's meant
    to do just that. And I don't see the need to call escapeHTML()
    here since what you output seems to be fully written by you
    and not derived from user input, so you can manually "escape"
    everything that needs escaping.

    > In my @element_liste I tried with every imaginable tricks, like :


    > bez => "Fahrgäste:",


    That, of course, works since there's nothing in that string that
    would need conversion. On the other hand, then the encoding for
    the page must be set correctly (probably either iso-8859-1 or
    utf-8) to get it displayed correctly on the client side.

    > bez => "Fahrg\&auml;ste:",
    > bez => "Fahrg\\&auml;ste:",


    The backslash isn't an "escape character" recognized by that
    function.

    > bez => "Fahrg&amp;auml;ste:",


    That can only make things worse, you will end up with
    "Fahrg&amp;amp;auml;ste:";-)

    > or remove the escapeHTML


    Looks like the way to go if the text to be output is written
    by you and doesn't incorporate elements coming from the out-
    side. If you need to use text coming from the outside then
    run escapeHTML() on it before you use in the text you want
    to output.

    > The header of the cgi is:


    > print header (),
    > start_html (-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
    > -title => "Title",
    > -lang => 'de',
    > -style=>{'src'=>'/style/style.css',
    > -type=>'text/css',
    > -media=>'screen'},
    > -charset=>'utf-8'
    > ),


    > which is producing the non valid <body charset="utf-8">. On the server
    > is running unfortunately an outdated CGI version: CGI.pm Version:
    > 2.752


    Iif you have to you could simply forgo using start_html() and
    output the text for the page header directly. Just take what
    the call of start_html() outputs, correct it as necessary, and
    then output it with a simple print.

    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
    Jens Thoms Toerring, Aug 30, 2009
    #2
    1. Advertising

  3. Marek

    Marek Guest

    On 30 Aug., 12:18, (Jens Thoms Toerring) wrote:

    >
    > The only way to prevent replacement of characters that have a
    > special meaning in HTML is not to call a function that's meant
    > to do just that. And I don't see the need to call escapeHTML()
    > here since what you output seems to be fully written by you
    > and not derived from user input, so you can manually "escape"
    > everything that needs escaping.
    >
    > > In my @element_liste I tried with every imaginable tricks, like :
    > > bez  => "Fahrgäste:",

    >
    > That, of course, works since there's nothing in that string that
    > would need conversion. On the other hand, then the encoding for
    > the page must be set correctly (probably either iso-8859-1 or
    > utf-8) to get it displayed correctly on the client side.
    >
    > > bez  => "Fahrg\&auml;ste:",
    > > bez  => "Fahrg\\&auml;ste:",

    >
    > The backslash isn't an "escape character" recognized by that
    > function.
    >
    > > bez  => "Fahrg&amp;auml;ste:",

    >
    > That can only make things worse, you will end up with
    > "Fahrg&amp;amp;auml;ste:";-)
    >
    > > or remove the escapeHTML

    >
    > Looks like the way to go if the text to be output is written
    > by you and doesn't incorporate elements coming from the out-
    > side. If you need to use text coming from the outside then
    > run escapeHTML() on it before you use in the text you want
    > to output.
    >
    > > The header of the cgi is:
    > > print header (),
    > >         start_html (-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
    > >                         -title => "Title",
    > >                         -lang => 'de',
    > >                         -style=>{'src'=>'/style/style.css',
    > >                         -type=>'text/css',
    > >                         -media=>'screen'},
    > >                         -charset=>'utf-8'
    > >                     ),
    > > which is producing the non valid <body charset="utf-8">. On the server
    > > is running unfortunately an outdated CGI version: CGI.pm Version:
    > > 2.752

    >
    > Iif you have to you could simply forgo using start_html() and
    > output the text for the page header directly. Just take what
    > the call of start_html() outputs, correct it as necessary, and
    > then output it with a simple print.
    >
    >                                 Regards, Jens
    > --
    >   \   Jens Thoms Toerring  ___      
    >    \__________________________      http://toerring.de



    Jens! Vielen Dank!

    I am appreciating your help! You were right! I thought, that I tried
    really everything, but your hints helped me out of an impasse! Here my
    steps:


    1. I put in a blank start_html()
    2. I removed all escapeHTML
    3. I tried with Umlauts "ä" etc (not working)
    4. So I tried with entity-encoding (working!!! Uff!!)
    5. I reinserted the wished Doctype and style-sheet

    to 5.:

    My server is giving back a non valid Doctype:

    <!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
    "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">

    Strange, my html-validator is telling me, that such kind of beast does
    not exist. Probably this is due to the old cgi version: CGI.pm
    Version: 2.752

    A last question: how to set correctly the encoding to utf-8?


    Thank you again Jens
    Marek, Aug 30, 2009
    #3
  4. Marek <> wrote:
    > My server is giving back a non valid Doctype:


    > <!DOCTYPE html
    > PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
    > "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">


    > Strange, my html-validator is telling me, that such kind of beast does
    > not exist. Probably this is due to the old cgi version: CGI.pm
    > Version: 2.752


    > A last question: how to set correctly the encoding to utf-8?


    I guess you will need to output something like the following
    instead of calling start_html() (if you it all manually I do
    not think you should call it at all, even without arguments):

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Title</title>
    <link rel="stylesheet" type="text/css" href="style.css" media="screen" />
    </head>
    <body>

    At least that seems to get accepted by the HTML validator;-)

    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
    Jens Thoms Toerring, Aug 30, 2009
    #4
  5. Marek

    Marek Guest

    On 30 Aug., 14:40, (Jens Thoms Toerring) wrote:


    >
    > I guess you will need to output something like the following
    > instead of calling start_html() (if you it all manually I do
    > not think you should call it at all, even without arguments):
    >
    > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    > <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
    >   <head>
    >     <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    >     <title>Title</title>
    >     <link rel="stylesheet" type="text/css" href="style.css" media="screen" />
    >   </head>
    >   <body>
    >
    > At least that seems to get accepted by the HTML validator;-)
    >
    >                                Regards, Jens
    > --
    >   \   Jens Thoms Toerring  ___      
    >    \__________________________      http://toerring.de



    Thank you Jens!


    Of course; I can still mix "hand written" code with cgi produced html.
    By the way, this produces valid xhtml:

    print start_html ({-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
    -title => 'Title',
    -style=>{'src'=>'style/style.css'}}
    ),

    But I still don't know how to produce a valid charset=utf-8
    declaration with cgi. Probably I will stick to insert the header by
    hand nevertheless, as you recommended it.


    Greetings from Munich



    marek
    Marek, Aug 30, 2009
    #5
  6. On 2009-08-30 13:26, Marek <> wrote:
    > But I still don't know how to produce a valid charset=utf-8
    > declaration with cgi.


    print header(-charset=>'utf-8');

    should work.

    hp
    Peter J. Holzer, Aug 31, 2009
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Prasanna Padmanabhan
    Replies:
    1
    Views:
    627
    Kevin Spencer
    Jul 8, 2004
  2. Alexander Grigoriev
    Replies:
    0
    Views:
    364
    Alexander Grigoriev
    Sep 12, 2003
  3. Replies:
    9
    Views:
    315
    Marcus Kwok
    Aug 25, 2006
  4. Steven Bethard

    subprocess and & (ampersand)

    Steven Bethard, Jan 23, 2008, in forum: Python
    Replies:
    9
    Views:
    1,350
    Albert van der Horst
    Feb 1, 2008
  5. Matthew Salerno

    CGI.pm Escaping query strings - ampersand issue

    Matthew Salerno, Apr 30, 2004, in forum: Perl Misc
    Replies:
    5
    Views:
    487
    pkent
    May 1, 2004
Loading...

Share This Page