How to parser the html dtd?

J

Johnny

Hi,
Do you know any kind of html dtd parser? I want to parse the html dtd
file and generate a tree or graph containing all the info. I want to
have some basic operations based on the tree or graph:

1. Query which elements can be included under a specific element.
* eg. specify "UL" element, I can get the answer that only "LI"
element can be included under "UL" element

2. Query which elements is needed to construct a document
* eg. specify "TD" element, I can get the answer that if I need to
build a document, I need the following elements in order:
* HTML
* BODY
* TABLE
* TBODY
* TR/TH

I have tried two dtd parser:
http://matra.sourceforge.net/
and
http://www.wutka.com/dtdparser.html

They are all written in Java. But they all can't handle the html dtd.
As you know, html dtd has its own grammar, is there any existing parser
that can handle the html dtd? I don't think I am the first one who need
to manipulate the html dtd data.

Thanks for your consideration.

Regards,
Johnny
 
J

Johnny

Steve said:
Because they're XML DTD parsers, not SGML DTD parsers, Did you try
giving them a XHTML DTD rather a HTML one?

Steve

Thanks Steve. But I need to parse the HTML DTD rather than the XHTML
dtd.
And also I have tried a SGML DTD parser called SP
(http://www.jclark.com/sp/)
But still, I can't easily get the html dtd parsed, or translated to
xml.

I am wondering is there any parser that works for the html dtd?

Regards,
Johnny
 
B

Benjamin Niemann

Johnny said:
Thanks Steve. But I need to parse the HTML DTD rather than the XHTML
dtd.
And also I have tried a SGML DTD parser called SP
(http://www.jclark.com/sp/)
But still, I can't easily get the html dtd parsed, or translated to
xml.

The DTDs for HTML 4.01 and XHTML 1.0 are almost identical, with a few
exceptions caused by limitations of XML DTDs (e.h. SGML knows ex- and
inclusions which are used by HTML, but these are not available in XML
DTDs). So the official XHTML 1.0 DTDs are already the best 'translations'
of the HTML 4.01 DTDs to XML you can get.
I am wondering is there any parser that works for the html dtd?

SP, its successor OpenSP or any other SGML parser. Though (Open)SP does only
the 'raw' parsing, no visualisation as your want it. If you want to
implement this part yourself, you probably have to access SP through its
API in order to get the required informations of the parsed document. The
command-line version (o)nsgmls only outputs an easily parseable version of
the document instance, not the document type.


HTH
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top