R
Robert Maas, see http://tinyurl.com/uh3t
<img width="1" height="1" alt=""/>
appears around character position 9202 in the source from Google
Groups advanced search when there's no such article matching the
search. Everything looks OK up to the / character. What is that
doing there?? Why?? In SGML it'd be a NET (is that correct?, which
would totally screw up the parse here (right?).
Here's the URL that I used to fetch this bad-looking HTML:
<http://groups.google.com/[email protected]>
When I pass it to the W3C validator, it says:
Result: Failed validation, 224 errors
although I suspect most of them are because the DOCTYPE declaration
is totally wrong, claiming the Web page to be XHTML when it's
nowhere near close to it.
I tried editing a copy to change the DOCTYPE to
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3c.org/TR/html4/loose.dtd">
here:
<http://www.rawbw.com/~rem/NewPub/try-search.html>
When I pass that to the W3C validator on that, it says:
Result: Failed validation, 79 errors
which I suppose is a teeny bit better?
I tried a couple other publicized doctypes, but neither of these
helped much either:
<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">
<http://www.rawbw.com/~rem/NewPub/try-search-2.html>
Result: Failed validation, 198 errors
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<http://www.rawbw.com/~rem/NewPub/try-search-3.html>
Result: Failed validation, 97 errors
Is there any DOCTYPE/DTD appropriate for this Google Groups page,
or is it utter trash regardless of the DOCTYPE/DTD?
Meanwhile I'm going to flush the / character from the original
WebPage I downloaded so that the HTML parser I wrote a few days ago
will accept it ... done, and parser likes it now!!
appears around character position 9202 in the source from Google
Groups advanced search when there's no such article matching the
search. Everything looks OK up to the / character. What is that
doing there?? Why?? In SGML it'd be a NET (is that correct?, which
would totally screw up the parse here (right?).
Here's the URL that I used to fetch this bad-looking HTML:
<http://groups.google.com/[email protected]>
When I pass it to the W3C validator, it says:
Result: Failed validation, 224 errors
although I suspect most of them are because the DOCTYPE declaration
is totally wrong, claiming the Web page to be XHTML when it's
nowhere near close to it.
I tried editing a copy to change the DOCTYPE to
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3c.org/TR/html4/loose.dtd">
here:
<http://www.rawbw.com/~rem/NewPub/try-search.html>
When I pass that to the W3C validator on that, it says:
Result: Failed validation, 79 errors
which I suppose is a teeny bit better?
I tried a couple other publicized doctypes, but neither of these
helped much either:
<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">
<http://www.rawbw.com/~rem/NewPub/try-search-2.html>
Result: Failed validation, 198 errors
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<http://www.rawbw.com/~rem/NewPub/try-search-3.html>
Result: Failed validation, 97 errors
Is there any DOCTYPE/DTD appropriate for this Google Groups page,
or is it utter trash regardless of the DOCTYPE/DTD?
Meanwhile I'm going to flush the / character from the original
WebPage I downloaded so that the HTML parser I wrote a few days ago
will accept it ... done, and parser likes it now!!