cgi and escapeHTML but not ampersand

Marek · Aug 30, 2009

Hello all!

Please if this is not the appropriate group, point my to the right
one.

I am trying to find since a good while, how to convince the CGI server
module, not to replace the ampersand, by &

I have an array like follows with two entity encoded *Umlauts*:

my @element_liste =
(
...
{ type => "text", name => "email",
bez => "Email (zur Auftragsbestätigung):", size =>
36 },
{ type => "text", name => "fahrgast",
bez => "Fahrgäste:", size => 36, muss => 1 },
...
);

and later the cgi I is producing a form with :

foreach my $f (@{$element_liste_ref})
{
print escapeHTML ($f->{bez}), " ",
textfield (-name => $f->{name},
-size => $f->{size}),
br (), "\n";
}

How to prevent, that the entity encoded ä is coming back as
&auml; ?
In my @element_liste I tried with every imaginable tricks, like :

bez => "Fahrgäste:",
bez => "Fahrg\äste:",
bez => "Fahrg\\äste:",
bez => "Fahrg&auml;ste:",
or remove the escapeHTML

The header of the cgi is:

print header (),
start_html (-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
-title => "Title",
-lang => 'de',
-style=>{'src'=>'/style/style.css',
-type=>'text/css',
-media=>'screen'},
-charset=>'utf-8'
),

which is producing the non valid <body charset="utf-8">. On the server
is running unfortunately an outdated CGI version: CGI.pm Version:
2.752

Thank you for your help.

marek

Jens Thoms Toerring · Aug 30, 2009

Marek said:
I am trying to find since a good while, how to convince the CGI server
module, not to replace the ampersand, by &

I have an array like follows with two entity encoded *Umlauts*:

my @element_liste =
(
...
{ type => "text", name => "email",
bez => "Email (zur Auftragsbestätigung):", size =>
36 },
{ type => "text", name => "fahrgast",
bez => "Fahrgäste:", size => 36, muss => 1 },
...
);

and later the cgi I is producing a form with :

foreach my $f (@{$element_liste_ref})
{
print escapeHTML ($f->{bez}), " ",
textfield (-name => $f->{name},
-size => $f->{size}),
br (), "\n";
}

How to prevent, that the entity encoded ä is coming back as
&auml; ?

The only way to prevent replacement of characters that have a
special meaning in HTML is not to call a function that's meant
to do just that. And I don't see the need to call escapeHTML()
here since what you output seems to be fully written by you
and not derived from user input, so you can manually "escape"
everything that needs escaping.

In my @element_liste I tried with every imaginable tricks, like :

bez => "FahrgÃ¤ste:",

That, of course, works since there's nothing in that string that
would need conversion. On the other hand, then the encoding for
the page must be set correctly (probably either iso-8859-1 or
utf-8) to get it displayed correctly on the client side.

bez => "Fahrg\äste:",
bez => "Fahrg\\äste:",

The backslash isn't an "escape character" recognized by that
function.

bez => "Fahrg&auml;ste:",

That can only make things worse, you will end up with
"Fahrg&amp;auml;ste:";-)

or remove the escapeHTML

Looks like the way to go if the text to be output is written
by you and doesn't incorporate elements coming from the out-
side. If you need to use text coming from the outside then
run escapeHTML() on it before you use in the text you want
to output.

The header of the cgi is:

print header (),
start_html (-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
-title => "Title",
-lang => 'de',
-style=>{'src'=>'/style/style.css',
-type=>'text/css',
-media=>'screen'},
-charset=>'utf-8'
),

which is producing the non valid <body charset="utf-8">. On the server
is running unfortunately an outdated CGI version: CGI.pm Version:
2.752

Iif you have to you could simply forgo using start_html() and
output the text for the page header directly. Just take what
the call of start_html() outputs, correct it as necessary, and
then output it with a simple print.

Regards, Jens

Marek · Aug 30, 2009

The only way to prevent replacement of characters that have a
special meaning in HTML is not to call a function that's meant
to do just that. And I don't see the need to call escapeHTML()
here since what you output seems to be fully written by you
and not derived from user input, so you can manually "escape"
everything that needs escaping.

That, of course, works since there's nothing in that string that
would need conversion. On the other hand, then the encoding for
the page must be set correctly (probably either iso-8859-1 or
utf-8) to get it displayed correctly on the client side.

The backslash isn't an "escape character" recognized by that
function.

That can only make things worse, you will end up with
"Fahrg&amp;auml;ste:";-)

Looks like the way to go if the text to be output is written
by you and doesn't incorporate elements coming from the out-
side. If you need to use text coming from the outside then
run escapeHTML() on it before you use in the text you want
to output.

Iif you have to you could simply forgo using start_html() and
output the text for the page header directly. Just take what
the call of start_html() outputs, correct it as necessary, and
then output it with a simple print.

Regards, Jens

Jens! Vielen Dank!

I am appreciating your help! You were right! I thought, that I tried
really everything, but your hints helped me out of an impasse! Here my
steps:

1. I put in a blank start_html()
2. I removed all escapeHTML
3. I tried with Umlauts "ä" etc (not working)
4. So I tried with entity-encoding (working!!! Uff!!)
5. I reinserted the wished Doctype and style-sheet

to 5.:

My server is giving back a non valid Doctype:

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">

Strange, my html-validator is telling me, that such kind of beast does
not exist. Probably this is due to the old cgi version: CGI.pm
Version: 2.752

A last question: how to set correctly the encoding to utf-8?

Thank you again Jens

Jens Thoms Toerring · Aug 30, 2009

Marek said:
My server is giving back a non valid Doctype:

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">

Strange, my html-validator is telling me, that such kind of beast does
not exist. Probably this is due to the old cgi version: CGI.pm
Version: 2.752

A last question: how to set correctly the encoding to utf-8?

I guess you will need to output something like the following
instead of calling start_html() (if you it all manually I do
not think you should call it at all, even without arguments):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Title</title>
<link rel="stylesheet" type="text/css" href="style.css" media="screen" />
</head>
<body>

At least that seems to get accepted by the HTML validator;-)

Regards, Jens

Marek · Aug 30, 2009

I guess you will need to output something like the following
instead of calling start_html() (if you it all manually I do
not think you should call it at all, even without arguments):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Title</title>
<link rel="stylesheet" type="text/css" href="style.css" media="screen" />
</head>
<body>

At least that seems to get accepted by the HTML validator;-)

Regards, Jens

Thank you Jens!

Of course; I can still mix "hand written" code with cgi produced html.
By the way, this produces valid xhtml:

print start_html ({-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
-title => 'Title',
-style=>{'src'=>'style/style.css'}}
),

But I still don't know how to produce a valid charset=utf-8
declaration with cgi. Probably I will stick to insert the header by
hand nevertheless, as you recommended it.

Greetings from Munich

marek

Peter J. Holzer · Aug 31, 2009

But I still don't know how to produce a valid charset=utf-8
declaration with cgi.

print header(-charset=>'utf-8');

should work.

hp

Once again: CGI help	3	Oct 6, 2009
create a form with cgi and a multidimensional array	6	Sep 2, 2009
HCaptcha - How to stop page from refreshing on submit if captcha is not checked/validated	1	Aug 29, 2023
CGI and UTF-8	14	Sep 28, 2009
Ruby, CGI and HTML Forms	1	Feb 21, 2011
Something works in Python but not in cgi.	1	Oct 14, 2011
problems cgi and sendmail	4	Jan 21, 2007
Help with Visual Lightbox: Scripts	2	May 3, 2023

cgi and escapeHTML but not ampersand

Marek

Jens Thoms Toerring

Marek

Jens Thoms Toerring

Marek

Peter J. Holzer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads