Shortest possible valid HTML document?

S

Stewart Gordon

I've just been experimenting with creating the shortest possible HTML
document that passes validation. Here's what I've come up with:

----------
<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN"><title//s
----------

Of course, it's only valid given a character encoding in the HTTP
headers. And the W3C validator's direct input interface happens to
treat there as being one....

Further challenges:

1. Find a browser that renders this correctly!

2. Find the shortest valid code in each version of (X)HTML.

Stewart.
 
M

Michael Winter

I've just been experimenting with creating the shortest possible HTML
document that passes validation.

We've done that before. It was Toby's challenge, and I seem to remember
'winning' (though Jukka picked a few holes). :)
Here's what I've come up with:

It can be shorter. See

Subject: Minimal HTML
Author: Toby Inkster
Date: 2004-11-24 21:56:10
Message-ID: (e-mail address removed)

and the discussion that followed. Note that if you read it though Google
Groups, the post order is slightly messed up.

[snip]
1. Find a browser that renders this correctly!

None of the major ones do (I mention that in the thread). You'd have to
find a one implementing a proper SGML parser.
2. Find the shortest valid code in each version of (X)HTML.

They'll all be variations on the same theme, just changing the FPI and
coping with required content. For example, in Strict document types,
your 's' (my '.') could be replaced with <p// (HTML) or <p/> (XHTML).

Mike
 
S

Stewart Gordon

Michael said:
It can be shorter. See

Subject: Minimal HTML
Author: Toby Inkster
Date: 2004-11-24 21:56:10
Message-ID: (e-mail address removed)

It's a shame message IDs look exactly like email addresses! NTS
SeaMonkey wanted to address an email to it when I clicked it.
and the discussion that followed. Note that if you read it though Google
Groups, the post order is slightly messed up.
<snip>

Indeed. But for anyone who wants to get at it quickly anyway:

http://tinyurl.com/rz52q

I'd no idea that there was a version of HTML that's just called HTML.
But that discussion has helped me to get down to 49 characters:

<!DOCTYPE HTML SYSTEM"http://eb.cx/2ef"><title//.

Now try and beat that!

Stewart.
 
M

Michael Winter

On 31/05/2006 14:57, Stewart Gordon wrote:

[snip]
<!DOCTYPE HTML SYSTEM"http://eb.cx/2ef"><title//.
^^
The SGML grammar requires at least one white space character between the
'SYSTEM' literal (or public identifier) and the system identifier, so I
don't think that quite qualifies. :p

[73] external identifier (10.1.6, 379:1) =
( ( "SYSTEM"
| ( "PUBLIC",
+ps [65],
public identifier [74] ) ),
?( +ps [65],
system identifier [75] ) )

-- #73 External Identifier, SGML Productions[1]
Now try and beat that!

As far as I know, the system identifier is a URI reference so a relative
reference (such as a single letter) is permissible, but I could very
well be wrong. In the previous discussion, Jukka seemed to imply that an
absolute URL was necessary, but that could have been due to a limitation
of the W3C Validator (which was the required validator for the
challenge). It is based loosely upon James Clark's SP which only
supports the http scheme (or so the documentation says[2]).

So, if a relative reference is allowed, surely /the/ shortest has to be:

<!DOCTYPE html SYSTEM "d"><title//. (35 bytes)

?

I can't believe I'm doing this again. *shakes head*

Mike


[1] SGML Productions
<ftp://ftp.ifi.uio.no/pub/SGML/productions>
[2] "System identifiers" in SP
<http://www.jclark.com/sp/sysid.htm>
 
D

David Håsäther

Michael Winter said:
On 31/05/2006 14:57, Stewart Gordon wrote:

[snip]
<!DOCTYPE HTML SYSTEM"http://eb.cx/2ef"><title//.
^^
The SGML grammar requires at least one white space character
between the 'SYSTEM' literal (or public identifier) and the system
identifier, so I don't think that quite qualifies. :p

[73] external identifier (10.1.6, 379:1) =
( ( "SYSTEM"
| ( "PUBLIC",
+ps [65],
public identifier [74] ) ),
?( +ps [65],
system identifier [75] ) )

No, it requires a _parameter separator_. However, those are not
required in all circumstances. The SGML Handbook (372:15) says this:

| A required ps that is adjacent to a delimiter or another ps can be
| can be omitted if no ambiguity would be created thereby.

Therefore, the document type declaration above is correct.
As far as I know, the system identifier is a URI reference so a
relative reference (such as a single letter) is permissible, but I
could very well be wrong.

You're right.
In the previous discussion, Jukka seemed
to imply that an absolute URL was necessary, but that could have
been due to a limitation of the W3C Validator (which was the
required validator for the challenge).

Yes, I believe the W3C validator only supports absolute URIs.
So, if a relative reference is allowed, surely /the/ shortest has
to be:

<!DOCTYPE html SYSTEM "d"><title//. (35 bytes)

With a properly set up catalog, you can do it even shorter since e.g.
"<!DOCTYPE HTML>" is a syntactically correct document type declaration.
Something like the following should be able to validate against any
HTML DTD:

<!doctype p><p>

Again, this needs a properly set up catalog.

I'm not going to dig deeper into this though, since I don't really see
the point in this exercise :)
 
M

Michael Winter

[snip]
The SGML grammar requires at least one white space character
between the 'SYSTEM' literal (or public identifier) and the system
identifier, so I don't think that quite qualifies. :p
[snip]

No, it requires a _parameter separator_.

Yes, I went too far there. Even /if/ the separator couldn't be omitted,
it wouldn't necessarily require a white space character; anything
matching that production should do.
However, those are not required in all circumstances.

I stand corrected on both counts. Thank you. :)

[snip]
<!doctype p><p>

Again, this needs a properly set up catalog.

I wondered if there might be trickery along those lines. However, a(n
irrelevant) philosophical question: would that actually count as a valid
HTML document? Yes, it may validate against a HTML DTD, but without the
html element as the root element, is it still HTML, or just a fragment?
I'm not going to dig deeper into this though, since I don't really
see the point in this exercise :)

It's an interesting diversion (and more pleasant than the content of
another composition window I have open :-( ).

Mike
 
D

David Håsäther

Michael Winter said:
<!doctype p><p>
[...]

I wondered if there might be trickery along those lines. However,
a(n irrelevant) philosophical question: would that actually count
as a valid HTML document? Yes, it may validate against a HTML DTD,
but without the html element as the root element, is it still
HTML, or just a fragment?

Actually, it's impossible to tell from the doctype declaration alone,
whether a document is this or that. For a lengthier discussion on this
topic, see
<http://groups.google.com/group/comp.text.sgml/msg/c3e53dee2c152a81>
 
S

Stewart Gordon

David Håsäther wrote:
No, it requires a _parameter separator_. However, those are not
required in all circumstances. The SGML Handbook (372:15) says this:

| A required ps that is adjacent to a delimiter or another ps can be
| can be omitted if no ambiguity would be created thereby.
<snip>

"A required ps" ... "can be omitted". What contradiction.

Stewart.
 
A

Andy Dingley

Stewart said:
David Håsäther wrote:

<snip>

"A required ps" ... "can be omitted". What contradiction.

It's not a contradiction, it's a question of levels in the structure.
At a high level it's required (parameters must be distinguishable), at
a low lexical level it isn't (something else already makes them
distinct).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top