character encoding in CGI.pm

David Lee Lambert · Nov 24, 2004

I noticed that, without setting any options, CGI.pm output of a
simple page starts as follows:

Content-Type: text/html; charset=ISO-8859-1

<?xml version="1.0" encoding="utf-8"?>

Now, is the webpage in ISO-8859-1, utf8, or some other encoding? Or
is XML defined such that this is a perfectly valid situation? If I
send a string containing Unicode characters (with \x{}), IE 6 detects
the page as Latin-1 and doesn't show those characters properly; if I
manually tell it that the encoding is UTF-8, it displays the
characters properly.

This is using perl 5.6.1; I'm not sure what verion of CGI.p I have.

Alan J. Flavell · Nov 24, 2004

I noticed that, without setting any options, CGI.pm output of a
simple page starts as follows:

Content-Type: text/html; charset=ISO-8859-1

<?xml version="1.0" encoding="utf-8"?>

Oh dear, does it really? Can we have a CGI.pm version number on that
please?

Now, is the webpage in ISO-8859-1, utf8, or some other encoding?

Well, the only way it can be in both is if it's *really* in
us-ascii. Seriously, that's the truth.

Or is XML defined such that this is a perfectly valid situation?

Absolutely not. Your authoritative reverence (excuse me, I meant
"reference", but the inadvertent typo was too good to take out) is
the XHTML/1.0 specification, Appendix C, since we're dealing here with
the text/html compatibility feature of XHTML/1.0

I personally think leaping into XHTML without an overwhelming cause
was a bit premature. You can tell CGI.pm that you don't want
XHTML-flavoured HTML. But opinions vary, and this is the wrong forum
to dispute that.

This is using perl 5.6.1; I'm not sure what verion of CGI.p I have.

*Upgrade*. 5.6.1 is now old; and the version of CGI.pm that comes
bundled with Perl is generally somewhat back-level compared to the
author's latest version at any given moment. Do I need to refer you
to the FAQ if you need a private version installed due to
foot-dragging by your sysadmin?

Btw. CGI.pm will happily tell you what version it is if you ask it
nicely. It's in the source code too, of course.

Shawn Corey · Nov 25, 2004

David said:
I noticed that, without setting any options, CGI.pm output of a
simple page starts as follows:

Content-Type: text/html; charset=ISO-8859-1

<?xml version="1.0" encoding="utf-8"?>

Now, is the webpage in ISO-8859-1, utf8, or some other encoding? Or
is XML defined such that this is a perfectly valid situation? If I
send a string containing Unicode characters (with \x{}), IE 6 detects
the page as Latin-1 and doesn't show those characters properly; if I
manually tell it that the encoding is UTF-8, it displays the
characters properly.

This is using perl 5.6.1; I'm not sure what verion of CGI.p I have.

The web page is both. The ISO-8859-1 encoding is used for the HTTP
transfer. All bytes, including the web page, while be interpreted as
ISO-8859-1 encoded until handed off to the display engine in the
browser. Then it will be interpreted as UTF-8. This normally does not
mean much since the bytes after the blank line are usually not processed
by the HTTP decoding code; they are simply passed to the next part.

If you are using Perl 5.6, add 'use utf8;' to the code. For any Perl,
you can add:

print handler( -charset => 'UTF-8' );

for the Content-Type handler.

See perldoc CGI for details.

--- Shawn

Ben Morrow · Nov 25, 2004

Quoth "Alan J. Flavell said:
Oh dear, does it really? Can we have a CGI.pm version number on that
please?

Well, the only way it can be in both is if it's *really* in
us-ascii. Seriously, that's the truth.

Absolutely not. Your authoritative reverence (excuse me, I meant
"reference", but the inadvertent typo was too good to take out) is
the XHTML/1.0 specification, Appendix C, since we're dealing here with
the text/html compatibility feature of XHTML/1.0

Correct me if I'm wrong, but surely XHTML cannot be served under a
text/html content type anyway? It isn't valid HTML (take this document,
for example,

<html>
<head>
<link rel="stylesheet" type="text/css" href="css"/>
</head>
<body></body>
</html>

: the '/>' on the <link> is not valid HTML, and validator.w3.org will
reject it under any HTML DTD). This means this header is wrong in three
ways:

1. the content should be labelled application/xhtml+xml

2. the charsets should match

3. the charset shouldn't be specified in the HTTP header anyway, for
precisely this reason (unlike HTML, XML has strict rules for determining
its charset; in this case, the charset given in the HTTP header
overrides that in the document, but this is Not A Good Thing). See
recent discussions on (e-mail address removed) for this; the next version of
RFC3023 (the registration for XML media types) will (probably) state
that XML entities should not be given a charset parameter.

Ben

Alan J. Flavell · Nov 25, 2004

By sheer chance, Google Groups pointed out to me that:

On Wed, 24 Nov 2004, Shawn Corey wrote:

[I'm trimming the comprehensive quote down to what I suppose you must
have interpreted as the significant part. There's no extra charge for
doing this yourself, you know...]

The web page is both.

Impossible, unless it happens to be in us-ascii, in which case it's a
valid instance of all three.

The ISO-8859-1 encoding is used for the HTTP transfer. All bytes,
including the web page, while be interpreted as ISO-8859-1 encoded
until handed off to the display engine in the browser. Then it will
be interpreted as UTF-8. This normally does not mean much since the
bytes after the blank line are usually not processed by the HTTP
decoding code; they are simply passed to the next part.

A truly remarkable castle that you've built in the air there; have you
read XHTML/1.0 Appendix C, by any chance?

See perldoc CGI for details.

Whimper.

Once again, I suppose this brings home the importance of not going
into technical detail on matters that are off-topic for the group.

Alan J. Flavell · Nov 25, 2004

Oh dear, this is desperately off-topic...

Correct me if I'm wrong, but surely XHTML cannot be served under a
text/html content type anyway?

Technically, you're right. Practically, I'd have to refer you to
XHTML/1.0 Appendix C. Well, I already did, but you seem to have
resisted the temptation to mention it.

It isn't valid HTML

Correct. Appendix C is in theory self-contradictory, but in practice
it gets away with it, since almost all "web browsers" implement
tag-soup rather than HTML "per se".

emacs-w3 indeed had to be deliberately broken in order to be
compatible with Appendix C, since it had taken the HTML specification
just a bit more seriously than anyone else (aside from SGML-conforming
browsers such as softquad panorama, but who uses those as www
browsers?).

1. the content should be labelled application/xhtml+xml

"should". Right. XHTML/1.0 Appendix C is a (misguided, IMHO)
exception to that rule.

2. the charsets should match

"must" match, except in a few degenerate cases (since us-ascii can be
validly labelled as iso-8859-anything as well as utf-8, whatever
happens to be convenient).

3. the charset shouldn't be specified in the HTTP header anyway,

Disagree; but this isn't the place to argue the point.

all the best

John W. Kennedy · Nov 26, 2004

Ben said:
Correct me if I'm wrong, but surely XHTML cannot be served under a
text/html content type anyway? It isn't valid HTML (take this document,
for example,

If you want Internet Explorer to display it, you /must/ serve it as
text/html. Internet Explorer refuses outright to render a document that
it knows to be XHTML. Fortunately, most browsers will produce acceptable
results for XHTML 1.0 served as HTML. XHTML 2.0 served as HTML, on the
other hand, will go straight into the toilet.

In short, XHTML is dead, murdered by Bill Gates' arrogance.

Ain't monopolies great?

Alan J. Flavell · Nov 26, 2004

Oh dear. Off topic, but I can't resist at least a reply... with
apologies up-front

If you want Internet Explorer to display it, you /must/ serve it as
text/html.

IE, as normally used, does not support XHTML, and it would be better
not to send it any. Faking XHTML as HTML brings no benefits at all at
the web interface, and adds a few disbenefits. It's sometimes claimed
that XML-based tools at the authoring side are a valuable benefit, and
therefore the result will be XHTML - but that is a half-truth:
XML-based tools can also emit HTML/4.01 as their end-product.

Internet Explorer refuses outright to render a document that it
knows to be XHTML.

Right from the start of the WWW, browsers which can't render a
particular MIME content-type have been configured to fire up a
suitable "helper application" to view that content type.

More recently there's been a tendency to define "plug-ins", which
render certain content types but display them in the window of the
browser.

Either of these mechanisms should be available in IE (after
sacrificing a suitable animal to XP SP2, I suppose). Years back I
configured Windows/IE to use a "helper application" for opening XHTML
MIME-types, and I defined the helper application to be Mozilla. It
worked fine. OK, I'm not promoting it in that form as a practical
solution for end-users, just offering an in-principle refutation that
if the browser-like object doesn't support it then it can't be used.

The original idea of XML was to make a clean break with "tag soup".

Fortunately, most browsers will produce acceptable results for XHTML
1.0 served as HTML.

Unfortunately, that's led to the unwashed masses of web deezyners
simply converting their HTML-flavoured tag soup into XHTML-flavoured
tag soup, and tossing the potential benefits of the clean break out of
the window (no pun intended).

XHTML 2.0 served as HTML, on the other hand, will go
straight into the toilet.

So the bottom line is:

- XHTML/1.0 Appendix C is functionally identical to HTML/4.01, and
almost - but not quite - as compatible with tag-soup slurpers. So
what's the point of deploying XHTML/1.0 to browsers which were never
designed to process it? If the original isn't HTML, XHTML/1.0 can be
converted by rote into HTML/4.01, and the result is slightly more
compatible with the browsers out there.

No other version of XHTML offers that easement. By definition, if you
serve it out as text/html it cannot be XHTML(tm), other than this
pointless, self-contradictory and counter-productive backwater:
XHTML/1.0-Appendix-C. What it would be is XHTML-flavoured tag soup,
which is no kind of improvement from what we already had.

I say choose one of:

* stay with HTML/4.01 - there's no point in XHTML/1.0; or

* make a clean break and move to Real XHTML(tm), with some kind of
Accept-type negotiation for client agents which don't grok it.

In short, XHTML is dead, murdered by Bill Gates' arrogance.

XHTML is alive and well in a subset of client agents, with useful
extras like SVG. Content-type negotiation (Accept: header) has been
working for years; IE contrives (like so much else) to get it only
vaguely right, but with a bit of sleight of hand at the server it can
be made to work with IE's default settings, and the more-aware can
adjust the Accept: header (or have it adjusted for them) to get better
results.

IMHO and YMMV.

Shawn Corey · Nov 26, 2004

Alan said:
By sheer chance, Google Groups pointed out to me that:

On Wed, 24 Nov 2004, Shawn Corey wrote:

[I'm trimming the comprehensive quote down to what I suppose you must
have interpreted as the significant part. There's no extra charge for
doing this yourself, you know...]

[Yes, now the whole world knows what a hero you are.]

A truly remarkable castle that you've built in the air there; have you
read XHTML/1.0 Appendix C, by any chance?

Please explain what XHTML/1.0 Appendix C has to do with HTTP.

--- Shawn

chris-usenet · Nov 26, 2004

Alan J. Flavell said:
Oh dear, does it really? Can we have a CGI.pm version number on that
please?

Perl 5.6.1. CGI 2.752

It's been fixed by 5.8.4 (CGI 3.04)
Chris

Tad McClellan · Nov 26, 2004

Shawn Corey said:
Alan said:

By sheer chance, Google Groups pointed out to me that:

On Wed, 24 Nov 2004, Shawn Corey wrote:

[I'm trimming the comprehensive quote down to what I suppose you must
have interpreted as the significant part. There's no extra charge for
doing this yourself, you know...]

Click to expand...

[Yes, now the whole world knows what a hero you are.]

And now the whole world knows what an inconsiderate type of
poster you are. You shift work from yourself to others.

Shawn Corey · Nov 26, 2004

Tad said:
And now the whole world knows what an inconsiderate type of
poster you are. You shift work from yourself to others.

If you don't like these types of comments you should criticize the first
one.

BTW Tad, I thought I was on your permanent kill file.

--- Shawn

Tad McClellan · Nov 26, 2004

Shawn Corey said:
BTW Tad, I thought I was on your permanent kill file.

I went "slumming".

Michele Dondi · Nov 26, 2004

[I'm trimming the comprehensive quote down to what I suppose you must
have interpreted as the significant part. There's no extra charge for
doing this yourself, you know...]

Click to expand...

[Yes, now the whole world knows what a hero you are.]

I don't think so. OTOH *most* clpmisc users will thank him anyway.
Now, if you could be so gentle and avoid wasting your energies writing
irrelevant cmts with that attitude I, for one, will thank you too, and
I think many others will as well.

Michele

Matt Garrish · Nov 27, 2004

Shawn Corey said:
Please explain what XHTML/1.0 Appendix C has to do with HTTP.

In other words, you haven't read the appendix. See section C.9 if it's so
painful to you to actually read something in its entirety.

Matt

files.py (encoding error)	0	Jun 10, 2013
Character encoding	14	Feb 15, 2008
files.py (weird encoding error)	0	Jun 10, 2013
Ruby 1.8 - character encoding	22	Jul 7, 2009
A few questiosn about encoding	103	Jun 9, 2013
character encoding question	2	Mar 26, 2010
Character encoding	3	Apr 26, 2008
AJAX vs form submission (character encoding)	2	Jan 26, 2012

character encoding in CGI.pm

David Lee Lambert

Alan J. Flavell

Shawn Corey

Ben Morrow

Alan J. Flavell

Alan J. Flavell

John W. Kennedy

Alan J. Flavell

Shawn Corey

chris-usenet

Tad McClellan

Shawn Corey

Tad McClellan

Michele Dondi

Matt Garrish

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads