HTML or XHTML??

S

SteW

Toby said:
SteW wrote:


"should not" ~ "may".


W3C make a distinction between 'may' and 'should not', with the latter
being close to 'must not'.
http://www.rfc-editor.org/rfc/rfc2119.txt

<quote>
SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that
there may exist valid reasons in particular circumstances when the
particular behavior is acceptable or even useful, but the full
implications should be understood and the case carefully weighed
before implementing any behavior described with this label.

MAY This word, or the adjective "OPTIONAL", mean that an item is
truly optional. One vendor may choose to include the item because a
particular marketplace requires it or because the vendor feels that
it enhances the product while another vendor may omit the same item.
</quote>

Ste W
 
T

Toby A Inkster

Bertilo said:
That code is not good. You should take the "q" values into consideration.

Most of the sample code I have seen for parsing the q value on the web has
been crap and usually interprets q=0.9 as q=0.

The percentage of browsers that send "application/xhtml+xml;q=0.9" is
higher[1] than those who send "application/xhtml+xml;q=0", so I prefer
my more permissive version.

If you know of a regular expression that will interpret the q value
correctly[2], please let me know as I'd like to use it.

[1] Opera 7.1+ uses q=0.9. Opera (all versions) makes up about 7% of my
logs.

[2] I don't doubt it is possible to parse correctly. Apache seems to
manage it. I've just not seen any sample PHP code that does.
 
B

Bertilo Wennergren

Toby said:
Bertilo Wennergren wrote:
Most of the sample code I have seen for parsing the q value on the web has
been crap and usually interprets q=0.9 as q=0.

I believe you.
If you know of a regular expression that will interpret the q value
correctly[2], please let me know as I'd like to use it.

I once published a demo of my own PHP routines to this. You can have a
look at it here (I hope it's not crap):

<URL:http://groups.google.com/[email protected]&oe=UTF-8&output=gplain>

The code in the following page might be non-crap (I haven't checked it):

<URL:http://keystonewebsites.com/articles/mime_type.php>
 
B

Bertilo Wennergren

Toby said:
Bertilo Wennergren wrote:
That is indeed better than most I have seen, but still not perfect.
Consider:
Accept: application/xhtml+xml;charset=utf-8;q=1, */*;q=0.1

Oh dear... I had no idea "charset=..." could jump in there and upset the
apple cart. Should be easy to add in that eventuality though. But is
there yet more stuff that can appear there?

I had a really hard time finding out how all this does work.
 
T

Toby A Inkster

Bertilo said:
Oh dear... I had no idea "charset=..." could jump in there and upset the
apple cart. Should be easy to add in that eventuality though. But is
there yet more stuff that can appear there?

Any registered Content-Type parameter -- as far as any MIME type related
to XHTML is concerned this is limited only to charset.

Of course it would not be implausible for browsers to add in random
non-registered parameters (q is one of these!). For example, this
may be considered useful:

Accept: application/xhtml+xml; x-version:2.0; q=1.0,
application/xhtml+xml; x-version:1.1; q=0.9,
application/xhtml+xml; x-version:1.0; q=0.8,
*/*; q=0.1
 
B

Bertilo Wennergren

Toby said:
Bertilo Wennergren wrote:
Any registered Content-Type parameter -- as far as any MIME type related
to XHTML is concerned this is limited only to charset.

I guess this would do the trick then:

<?
if (isset($_SERVER['HTTP_USER_AGENT']))
$browser = $_SERVER['HTTP_USER_AGENT'];
else if (isset($HTTP_SERVER_VARS['HTTP_USER_AGENT']))
$browser = $HTTP_SERVER_VARS['HTTP_USER_AGENT'];
else $browser = '';

if (isset($_SERVER['HTTP_ACCEPT']))
$accepts = $_SERVER['HTTP_ACCEPT'];
else if (isset($HTTP_SERVER_VARS['HTTP_ACCEPT']))
$accepts = $HTTP_SERVER_VARS['HTTP_ACCEPT'];
else $accepts = '';

$accepts = strtolower($accepts);

if (substr_count($accepts,'application/xhtml+xml') == 0) {
$media_type = 'text/html';
} else {
$xhtml_q = 1;
if (preg_match(
"/application\/xhtml\+xml\s*;[^,]*\bq\s*=\s*([\d\.]+)/",
$accepts,$m)) $xhtml_q = $m[1];
$html_q = 1;
if (preg_match(
"/text\/html\s*;[^,]*\bq\s*=\s*([\d\.]+)/",
$accepts,$m)) $html_q = $m[1];

if ($xhtml_q >= $html_q)
$media_type = 'application/xhtml+xml';
else $media_type = 'text/html';
}
if (substr_count($browser,'Netscape6') > 0)
$media_type = 'text/html';

header ('Content-Type: ' . $media_type . '; charset=UTF-8');
?>
 
M

Michael Bauser

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

XHTML 1.1 (as opposed to 1.0) should be served as
application/xhtml+xml rather than text/html
Agreed.

and IE doesn't know what
to do with that content-type.
Agreed.

As most of the WWW uses IE that means
that XHTML 1.1 isn't suitable for the WWW.

Unless you content-negotiate. Give XHTML to the XHTML browsers, and
let the rest eat HTML.

Admittedly, there's no dire need to support XML browsers right now,
but it wouldn't kill people to start doing so, rather than waiting
until the need *is* dire. *Especially* if they're building new sites
and want to create URLs that are easy to negotiate.

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.5.1 for non-commercial use <http://www.pgp.com>

iQA+AwUBP7iT5HKbhCU9m6R7EQK1NACg7+QaEsuvANPqoqYEC+C9RuYob4oAmJBV
M8yAeiCvGvLUaZn4Iof9dUo=
=p2zw
-----END PGP SIGNATURE-----
 
J

Jukka K. Korpela

Unless you content-negotiate. Give XHTML to the XHTML browsers, and
let the rest eat HTML.

OK, so what do you give to a user agent that sends the following?

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-powerpoint, application/vnd.ms-excel,
application/msword, */*

The Accept header is where the client does its part of content negotiation.
There is no expressed preference between the two alternatives you are
interested in, but an expressed willingness to receive anything, so to be
logical, you should send the modern stuff, right?
 
C

Chris Morris

Jukka K. Korpela said:
OK, so what do you give to a user agent that sends the following?

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-powerpoint, application/vnd.ms-excel,
application/msword, */*

The Accept header is where the client does its part of content negotiation.
There is no expressed preference between the two alternatives you are
interested in, but an expressed willingness to receive anything, so to be
logical, you should send the modern stuff, right?

Right. Broken browsers like IE6 shouldn't be a concern. And it's not
as if it's lying about the */* either - it will let the user download
it, open it in some XHTML-aware program (or Notepad) and view it.

Hmm, given that there's no q-values in that header, it could easily be
simplified to Accept: */* anyway, I think.
 
T

Toby A Inkster

Jukka said:
OK, so what do you give to a user agent that sends the following?

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-powerpoint, application/vnd.ms-excel,
application/msword, */*

Pass the HTML version through some sort of html2word or html2jpeg filter?
 
M

Michael Bauser

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

OK, so what do you give to a user agent that sends the following?

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-powerpoint, application/vnd.ms-excel,
application/msword, */*

Given that situation, Apache is apparently designed to default to
whichever eligible file is *smaller*. Fortunately, that's always
the HTML version on my server, so I haven't broken MSIE in any
instances that I know of. (Right now, I just using tidy to convert
HTML to XHTML).
The Accept header is where the client does its part of content
negotiation. There is no expressed preference between the two
alternatives you are interested in, but an expressed willingness
to receive anything, so to be logical, you should send the modern
stuff, right?

My gut reaction is to go the other way: If a server doesn't know
what's best, it should send the the most widely-implemented format
(which is what I do if I'm content-negotiating within an CGI script,
rather than depending on Apache to do it).

The closest thing to a precedent (that I can recall) is MIME's
"multipart/alternative" format (defined in RFC 2046), which presents
the same content in different formats, but in *increasing* order of
preference. (That is, the "richest format" is the last one
presented). The reasoning is different, but the effect is similar --
a multipart/alternative messages that *starts* with the
widely-implemented "text/plain" version, so that people using
non-MIME-compliant mail agents don't have to search through layers of
junk to find the content.

I'm not sure that all made sense, but it's working so far. Where I've
been testing content-negotiation (including the home page of
bauser.com), Mozilla and Opera get XHTML while MSIE gets HTML.

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.5.1 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7mpq3KbhCU9m6R7EQKkVACcDlvsy7mLPCLxypC5Vf1Q89gUQUYAmgJI
4BxLZ4e2JOtbzrxR9ifOlUe7
=+J9H
-----END PGP SIGNATURE-----
 
B

Bertilo Wennergren

Toby A Inkster wrote (in "alt.html" about my PHP code to choose media type
for (X)HTML pages):
That is indeed better than most I have seen, but still not perfect.

Here's my latest try. It's hopefully better.

The function returns "true" if the UA (according to the Accept header)
seems to prefer "application/xhtml+xml" rather than "text/html" (or
gives them equal preferences), otherwise it returns "false". It makes an
exception if the UserAgent calls itself something that includes "Netscape6".

Feel free to cut the code to pieces (or to use it).

What do you say, Toby?

======================================================================

function UAPrefersXHTML_XML() {
$ac = strtolower(getenv('HTTP_ACCEPT'));
$ua = getenv('HTTP_USER_AGENT');

if (substr_count($ua,'Netscape6') > 0 ) {
return false;
} else {
if (substr_count($ac,'application/xhtml+xml') == 0) {
return false;
} else {
$xhtml_q = 1;
if
(preg_match("/application\/xhtml\+xml\s*;[^,]*\bq\s*=\s*([\d\.]+)/",$ac,$m))

{ $xhtml_q = $m[1]; }

$html_q = 1;
if
(preg_match("/text\/html\s*;[^,]*\bq\s*=\s*([\d\.]+)/",$ac,$m))
{ $html_q = $m[1]; }
else if
(preg_match("/text\/\*\s*;[^,]*\bq\s*=\s*([\d\.]+)/",$ac,$m))
{ $html_q = $m[1]; }
else if
(preg_match("/\*\/\*\s*;[^,]*\bq\s*=\s*([\d\.]+)/",$ac,$m))
{ $html_q = $m[1]; }

if ($xhtml_q >= $html_q) {
return true;
} else {
return false;
}
}
}
}
======================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top