how do I check wellformedness of html files?

drgonzo120 · Oct 16, 2006

hello,

As my first mission n my first job i have to check the wellformedness
of about 1000 html files ...

I assume there must already be some java-classes/packages/libs on the
net that do this ??? It cannot be that I am the first one who has to do
this ...

So, does anybody know any online libs that do this???

Thanks !

hiwa · Oct 16, 2006

drgonzo120 said:
hello,

As my first mission n my first job i have to check the wellformedness
of about 1000 html files ...

I assume there must already be some java-classes/packages/libs on the
net that do this ??? It cannot be that I am the first one who has to do
this ...

So, does anybody know any online libs that do this???

Thanks !

http://validator.w3.org/
http://www.htmlhelp.com/tools/validator/

drgonzo120 · Oct 16, 2006

drgonzo120 schreef:

hello,

As my first mission n my first job i have to check the wellformedness
of about 1000 html files ...

I assume there must already be some java-classes/packages/libs on the
net that do this ??? It cannot be that I am the first one who has to do
this ...

So, does anybody know any online libs that do this???

Thanks !

it will be a console program, so i need classes that accept a html a
file and check it, i guess.

Oliver Wong · Oct 16, 2006

drgonzo120 said:
drgonzo120 schreef:

it will be a console program, so i need classes that accept a html a
file and check it, i guess.

See hiwa's reply, and also consider JTidy.

- Oliver

Martin Gregorie · Oct 16, 2006

Oliver said:
See hiwa's reply, and also consider JTidy.

- Oliver

Take a look at the HTML Tidy project, http://tidy.sourceforge.net

The original HTML Tidy is a C command line utility but there are Java
and Perl versions (Jtidy is one of them), all referenced from the
project. Its worth a visit: there are other useful things too, such HTML
editors which integrate HTML Tidy.

drgonzo120 · Oct 17, 2006

hello, it's quite simple what i need tot do:

for example: this is a sample text from the html files:

<table border=1 width="100%" >
<tr>
<td width=20%><noindex>Betreft :</noindex></td>
<td colspan=3>
<betreft><P><A NAME="b_betreft"></A>Kinderrechten: implementatie van
het VN-verdrag<BR>Jaarlijkse verslaggeving van de Vlaamse regering aan
het Vlaams Parlement en aan de kinderrechtencommissaris omtrent de
implementatie van het VN-verdrag van 20 november 1989 inzake de rechten
van het kind<BR>Tweede verslag d.d. 29 september 2000 <A
NAME="e_betreft"></A></betreft>
</td></tr>

Per html file i need to extract the contents of these special tags ...
<betreft> (and others), (& create xml files out of them), is it
possible to read a html file as a xml file and do some xpath stuff on
it ???

Or just extract tags from a simple text file ...

" JTidy provides a DOM interface to the document that is being
processed, which effectively makes you able to use JTidy as a DOM
parser for real-world HTML."
but no where i can find a good reference to jtidy ...

I still don't know how I'm gonna do it, maybe write it all myself ....

greetings

Andy Dingley · Oct 17, 2006

drgonzo120 said:
As my first mission n my first job i have to check the wellformedness
of about 1000 html files ...

Why use Java? The usual tool for this is HTML Tidy, which you can
drive perfectly adequately from the command line with a couple of lines
of shell script.

Sachin · Oct 17, 2006

Hi,

Have a look at javacc help files and documentations.

This url will help you...
https://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=110

Regards,
Sachin

Martin Gregorie · Oct 17, 2006

drgonzo120 said:
hello, it's quite simple what i need tot do:

for example: this is a sample text from the html files:

<table border=1 width="100%" >
<tr>
<td width=20%><noindex>Betreft :</noindex></td>
<td colspan=3>
<betreft><P><A NAME="b_betreft"></A>Kinderrechten: implementatie van
het VN-verdrag<BR>Jaarlijkse verslaggeving van de Vlaamse regering aan
het Vlaams Parlement en aan de kinderrechtencommissaris omtrent de
implementatie van het VN-verdrag van 20 november 1989 inzake de rechten
van het kind<BR>Tweede verslag d.d. 29 september 2000 <A
NAME="e_betreft"></A></betreft>
</td></tr>

Per html file i need to extract the contents of these special tags ...
<betreft> (and others), (& create xml files out of them), is it
possible to read a html file as a xml file and do some xpath stuff on
it ???

Or just extract tags from a simple text file ...

" JTidy provides a DOM interface to the document that is being
processed, which effectively makes you able to use JTidy as a DOM
parser for real-world HTML."
but no where i can find a good reference to jtidy ...

I still don't know how I'm gonna do it, maybe write it all myself ....

Have you looked at the HTML, HTMLEditorKit and HTMLDocument classes?

The HTMLEditorKit contains a parser I used as the basis for a URL
checker. This extracts <A> tags from HTML pages, Sets up a URL instance
from the href attribute and sees if it is accessible. Access failures
are reported for manual examination and fixes.

Oliver Wong · Oct 17, 2006

drgonzo120 said:
hello, it's quite simple what i need tot do:

for example: this is a sample text from the html files:

<table border=1 width="100%" >
<tr>
<td width=20%><noindex>Betreft :</noindex></td>
<td colspan=3>
<betreft><P><A NAME="b_betreft"></A>Kinderrechten: implementatie van
het VN-verdrag<BR>Jaarlijkse verslaggeving van de Vlaamse regering aan
het Vlaams Parlement en aan de kinderrechtencommissaris omtrent de
implementatie van het VN-verdrag van 20 november 1989 inzake de rechten
van het kind<BR>Tweede verslag d.d. 29 september 2000 <A
NAME="e_betreft"></A></betreft>
</td></tr>

Per html file i need to extract the contents of these special tags ...
<betreft> (and others), (& create xml files out of them), is it
possible to read a html file as a xml file and do some xpath stuff on
it ???

This is possible if and only if the HTML file actually is an XML file
(the HTML file format and the XML file format overlap, but are not identical
to each other). Otherwise, first you'll need something like "XMLTidy" (a
fictional product I just made up) to fix the broken XML -- things like
making sure every open tag is balanced by a closing tag, etc. I noticed in
your example, the <table>, <P> and <BR> tags are never closed, for example.

- Oliver

I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
How do I output a function with a parameter argument?	6	Dec 18, 2022
How to check the validation of js files or html files including js?	6	Jan 12, 2020
I want to Display Excel As HTML In js	2	Feb 24, 2023
Batch Convert HTML to UTF-8 Files	2	Oct 2, 2023
I am trying to make an audio player, how do I get the selected file to be playable?	5	Mar 29, 2022
How can I guarantee that the all callback functions of the first Ajax API call have finished executing before initiating the 2 call in JavaScript?	2	Oct 30, 2023
How do I solidify my Python skills	1	Sep 15, 2023

how do I check wellformedness of html files?

drgonzo120

hiwa

drgonzo120

Oliver Wong

Martin Gregorie

drgonzo120

Andy Dingley

Sachin

Martin Gregorie

Oliver Wong

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads