J
John W. Kennedy
I'm in the process of de-frame-ing a website with a couple thousand
pages of static HTML, and I've been building a tool that works pretty
well, based on javax.swing.text.html.parser technology, which I've never
used before. Large parts of the website are HTML 3.2, and everything's
just ducky. But there are a good many pages that are HTML 4.0, and my
program goes completely ca-ca on them, because I'm stuck with only the
built-in html32.bdtd file.
A) Is there any good reason that Sun didn't make up an html401.bdtd file
yonks ago?
B) Has anyone an html401.bdtd file to share?
C) Is there any other solution available? (No XML-based tool is going to
come close to handling this stuff -- it's all hand-written--not by me--
and it was painful enough doing various text-based global fixes to make
it parse properly as 3.2. -- lots of <b><i>blah</b></i> and that sort of
thing.)
pages of static HTML, and I've been building a tool that works pretty
well, based on javax.swing.text.html.parser technology, which I've never
used before. Large parts of the website are HTML 3.2, and everything's
just ducky. But there are a good many pages that are HTML 4.0, and my
program goes completely ca-ca on them, because I'm stuck with only the
built-in html32.bdtd file.
A) Is there any good reason that Sun didn't make up an html401.bdtd file
yonks ago?
B) Has anyone an html401.bdtd file to share?
C) Is there any other solution available? (No XML-based tool is going to
come close to handling this stuff -- it's all hand-written--not by me--
and it was painful enough doing various text-based global fixes to make
it parse properly as 3.2. -- lots of <b><i>blah</b></i> and that sort of
thing.)