Java API for correcting malformed HTML code

Discussion in 'Java' started by MCP, Jun 9, 2004.

  1. MCP

    MCP Guest

    Hello,
    What are the Java APIs out there that can simply correct malformed
    HTML code, like take a input stream of badly formed HTML and produce
    an output stream of clean HTML code (parsable by the Swing HTML
    parser) ?
    MCP, Jun 9, 2004
    #1
    1. Advertising

  2. MCP wrote:
    > What are the Java APIs out there that can simply correct malformed
    > HTML code, like take a input stream of badly formed HTML and produce
    > an output stream of clean HTML code (parsable by the Swing HTML
    > parser) ?


    Maybe this can help http://jtidy.sourceforge.net/ No idea if it fulfills
    all your requirements.

    /Thomas
    Thomas Weidenfeller, Jun 9, 2004
    #2
    1. Advertising

  3. MCP

    Roedy Green Guest

    On 9 Jun 2004 06:03:20 -0700, (MCP) wrote or
    quoted :

    >What are the Java APIs out there that can simply correct malformed
    >HTML code, like take a input stream of badly formed HTML and produce
    >an output stream of clean HTML code (parsable by the Swing HTML
    >parser) ?


    I have been bugging the HTMLValidator people to write such a beast. I
    figured it could save me a ton of work if it did simple unambiguous
    corrections like insert missing </li> or convert stray & to &amp;

    His fear is making a change that the user did not want. He did not
    want to be morally liable for messing up the source.

    I have done a number of one shot programs to clean up various problems
    in my website. They do it all with indexof and substring. If you are
    just trying to correct a single problem at a time, it can be pretty
    simple.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, Jun 9, 2004
    #3
  4. On Wed, 09 Jun 2004 20:54:17 GMT, Roedy Green wrote:

    > ..it could save me a ton of work if it did simple unambiguous
    > corrections like insert missing </li>


    (whispers) W3C defininition for the <li>
    is that it does not require a closing </li>..

    <http://www.w3.org/TR/1999/REC-html401-19991224/struct/lists.html#didx-list>

    --
    Andrew Thompson
    http://www.PhySci.org/ Open-source software suite
    http://www.PhySci.org/codes/ Web & IT Help
    http://www.1point1C.org/ Science & Technology
    Andrew Thompson, Jun 10, 2004
    #4
  5. MCP

    Roedy Green Guest

    On Thu, 10 Jun 2004 04:03:36 GMT, Andrew Thompson
    <> wrote or quoted :

    >(whispers) W3C defininition for the <li>
    >is that it does not require a closing </li>..


    what about </td> and </tr>?

    Anyway I like to have the HTML consistent.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, Jun 10, 2004
    #5
  6. On Thu, 10 Jun 2004 06:14:58 GMT, Roedy Green wrote:

    > On Thu, 10 Jun 2004 04:03:36 GMT, Andrew Thompson
    > <> wrote or quoted :
    >
    >>(whispers) W3C defininition for the <li>
    >>is that it does not require a closing </li>..

    >
    > what about </td> and </tr>?


    I am pretty sure they need to be
    explicitly closed. (shrugs) If in doubt,
    leave one out and throw it at the validator
    (which is usually quicker than finding the
    element on W3C's site)

    > Anyway I like to have the HTML consistent.


    ;-) I know what you mean, it has taken
    some training to *prevent* myself from
    typing </p> and </li>..

    --
    Andrew Thompson
    http://www.PhySci.org/ Open-source software suite
    http://www.PhySci.org/codes/ Web & IT Help
    http://www.1point1C.org/ Science & Technology
    Andrew Thompson, Jun 10, 2004
    #6
  7. On Thu, 10 Jun 2004 18:37:46 GMT, arne thormodsen wrote:

    >> ;-) I know what you mean, it has taken
    >> some training to *prevent* myself from
    >> typing </p> and </li>..
    >>

    >
    > Why bother? All new broswers..


    ...not all browser are new, not all users
    can update, not all sites can afford to
    turn away customers just because their
    browser is not flavour of the month.

    That's why.

    --
    Andrew Thompson
    http://www.PhySci.org/ Open-source software suite
    http://www.PhySci.org/codes/ Web & IT Help
    http://www.1point1C.org/ Science & Technology
    Andrew Thompson, Jun 10, 2004
    #7
  8. >
    > ;-) I know what you mean, it has taken
    > some training to *prevent* myself from
    > typing </p> and </li>..
    >


    Why bother? All new broswers interpret XHTML properly, so you might
    as well make your HTML well-formed as XML too. Then you can use XML
    tools to process it.

    --arne
    arne thormodsen, Jun 10, 2004
    #8

  9. >
    > Maybe this can help http://jtidy.sourceforge.net/ No idea if it

    fulfills
    > all your requirements.
    >


    I've used it extensively in the past. It works pretty well.

    --arne

    > /Thomas
    arne thormodsen, Jun 10, 2004
    #9
  10. Andrew Thompson wrote:

    > On Thu, 10 Jun 2004 18:37:46 GMT, arne thormodsen wrote:
    >
    >>> ;-) I know what you mean, it has taken
    >>> some training to *prevent* myself from
    >>> typing </p> and </li>..
    >>>

    >>
    >> Why bother? All new broswers..

    >
    > ..not all browser are new, not all users
    > can update, not all sites can afford to
    > turn away customers just because their
    > browser is not flavour of the month.
    >
    > That's why.
    >


    I'm pretty sure even netscape 4.7 or Lynx interprets </p> and </li>
    correctly. Even pure XHTML should pose no problem for those, when you write
    the empty elements like <br> as <br /> instead of <br/>. Any browser better
    than those (that's all of the currently used browsers :) should have no
    problems if you close your tags.

    As it says in the spec, the closing tags are not *required*, it doesn't say
    that they shouldn't be present. And the advantages of writing XML
    compatible HTML are bigger than adjusting to the lowest possible
    denominator IMHO.

    Have you got any example of a browser which breaks when you add the optional
    closing tags?

    --
    Kind regards,
    Christophe Vanfleteren
    Christophe Vanfleteren, Jun 10, 2004
    #10
  11. Christophe Vanfleteren <> wrote:

    > I'm pretty sure even netscape 4.7 or Lynx interprets </p> and </li>
    > correctly.


    I can confirm that both do. I always use <p></p> and <li></li> in my HTML.

    --
    JustThe.net Internet & New Media Services, http://JustThe.net/
    Steven J. Sobol, Geek In Charge / 888.480.4NET (4638) /
    PGP Key available from your friendly local key server (0xE3AE35ED)
    Apple Valley, California Nothing scares me anymore. I have three kids.
    Steven J Sobol, Jun 10, 2004
    #11
  12. On Thu, 10 Jun 2004 20:17:16 GMT, Christophe Vanfleteren wrote:
    > Andrew Thompson wrote:
    >> On Thu, 10 Jun 2004 18:37:46 GMT, arne thormodsen wrote:
    >>
    >>>> ;-) I know what you mean, it has taken
    >>>> some training to *prevent* myself from
    >>>> typing </p> and </li>..

    ...
    >>> Why bother? All new broswers..

    >>
    >> ..not all browser are new,

    ....
    > I'm pretty sure even netscape 4.7 or Lynx interprets </p> and </li>
    > correctly. Even pure XHTML should pose no problem for those, when you write
    > the empty elements like <br> as <br /> instead of <br/>.


    Oh, alright,.. I suppose I tuned out at
    the 'new browsers' comment.

    I had rejected XHTML earlier for some reason
    ...no 'target' for 'href's.. no applet tags or
    something.. I do not quite remember.

    Maybe I should take another look..

    [ ..but damn-it, if it does not work on
    my NN 4.08, it is *out*! ;-) ]

    --
    Andrew Thompson
    http://www.PhySci.org/ Open-source software suite
    http://www.PhySci.org/codes/ Web & IT Help
    http://www.1point1C.org/ Science & Technology
    Andrew Thompson, Jun 11, 2004
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Wilq
    Replies:
    0
    Views:
    1,346
  2. zotkara

    Newbie needs help correcting code

    zotkara, Jan 3, 2006, in forum: C Programming
    Replies:
    25
    Views:
    633
    Randy Howard
    Jan 3, 2006
  3. vv1
    Replies:
    1
    Views:
    428
    T.M. Sommers
    Nov 6, 2006
  4. Andrew Poelstra

    Re: can anyone help me in correcting this code?

    Andrew Poelstra, Nov 8, 2006, in forum: C Programming
    Replies:
    8
    Views:
    320
    goose
    Nov 9, 2006
  5. Markus Fischer
    Replies:
    2
    Views:
    191
    Markus Fischer
    Apr 5, 2011
Loading...

Share This Page