Java API for correcting malformed HTML code

M

MCP

Hello,
What are the Java APIs out there that can simply correct malformed
HTML code, like take a input stream of badly formed HTML and produce
an output stream of clean HTML code (parsable by the Swing HTML
parser) ?
 
T

Thomas Weidenfeller

MCP said:
What are the Java APIs out there that can simply correct malformed
HTML code, like take a input stream of badly formed HTML and produce
an output stream of clean HTML code (parsable by the Swing HTML
parser) ?

Maybe this can help http://jtidy.sourceforge.net/ No idea if it fulfills
all your requirements.

/Thomas
 
R

Roedy Green

What are the Java APIs out there that can simply correct malformed
HTML code, like take a input stream of badly formed HTML and produce
an output stream of clean HTML code (parsable by the Swing HTML
parser) ?

I have been bugging the HTMLValidator people to write such a beast. I
figured it could save me a ton of work if it did simple unambiguous
corrections like insert missing </li> or convert stray & to &amp;

His fear is making a change that the user did not want. He did not
want to be morally liable for messing up the source.

I have done a number of one shot programs to clean up various problems
in my website. They do it all with indexof and substring. If you are
just trying to correct a single problem at a time, it can be pretty
simple.
 
R

Roedy Green

(whispers) W3C defininition for the <li>
is that it does not require a closing </li>..

what about </td> and </tr>?

Anyway I like to have the HTML consistent.
 
A

Andrew Thompson

what about </td> and </tr>?

I am pretty sure they need to be
explicitly closed. (shrugs) If in doubt,
leave one out and throw it at the validator
(which is usually quicker than finding the
element on W3C's site)
Anyway I like to have the HTML consistent.

;-) I know what you mean, it has taken
some training to *prevent* myself from
typing </p> and </li>..
 
A

Andrew Thompson

Why bother? All new broswers..

...not all browser are new, not all users
can update, not all sites can afford to
turn away customers just because their
browser is not flavour of the month.

That's why.
 
A

arne thormodsen

;-) I know what you mean, it has taken
some training to *prevent* myself from
typing </p> and </li>..

Why bother? All new broswers interpret XHTML properly, so you might
as well make your HTML well-formed as XML too. Then you can use XML
tools to process it.

--arne
 
C

Christophe Vanfleteren

Andrew said:
..not all browser are new, not all users
can update, not all sites can afford to
turn away customers just because their
browser is not flavour of the month.

That's why.

I'm pretty sure even netscape 4.7 or Lynx interprets </p> and </li>
correctly. Even pure XHTML should pose no problem for those, when you write
the empty elements like <br> as <br /> instead of <br/>. Any browser better
than those (that's all of the currently used browsers :) should have no
problems if you close your tags.

As it says in the spec, the closing tags are not *required*, it doesn't say
that they shouldn't be present. And the advantages of writing XML
compatible HTML are bigger than adjusting to the lowest possible
denominator IMHO.

Have you got any example of a browser which breaks when you add the optional
closing tags?
 
S

Steven J Sobol

Christophe Vanfleteren said:
I'm pretty sure even netscape 4.7 or Lynx interprets </p> and </li>
correctly.

I can confirm that both do. I always use <p></p> and <li></li> in my HTML.
 
A

Andrew Thompson

....
I'm pretty sure even netscape 4.7 or Lynx interprets </p> and </li>
correctly. Even pure XHTML should pose no problem for those, when you write
the empty elements like <br> as <br /> instead of <br/>.

Oh, alright,.. I suppose I tuned out at
the 'new browsers' comment.

I had rejected XHTML earlier for some reason
...no 'target' for 'href's.. no applet tags or
something.. I do not quite remember.

Maybe I should take another look..

[ ..but damn-it, if it does not work on
my NN 4.08, it is *out*! ;-) ]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top