Python & XML & DTD (warning: noob attack!)

I

Igor Fedorow

Hello all,

I have an XML file with an internal DTD which looks roughly like this:

<?xml version="1.0"?>
<!DOCTYPE root [
<!ELEMENT root (node)*>
<!ELEMENT node (description, info, node*)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT info EMPTY>
<!ATTLIST info
text CDATA #REQUIRED
>
]>
<root>
<node>
<description>node 1</description>
<info text="info 1" />
<node>
<description>node 1-1</description>
<info text="info 1-1" />
</node>
</node>
<node>
<description>node 2</description>
<info text="info 2" />
<node>
<description>node 2-1</description>
<info text="info 2-1" />
</node>
<node>
<description>node 2-2</description>
<info text="info 2-2" />
</node>
</node>
</root>

I want to parse this file into my application, modify the data (this includes
maybe creating and/or deleting nodes), and write it back into the file --
including the DTD. (It doesn't necessarily need validation, though.)

I tried xml.dom.ext.PrettyPrint, but it produces only

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE root>
<root>
...
</root>

actually lacking the document type definition.

Any help is appreciated!

Thanks in advance & cheers =)
*igor*
 
P

Peter Hansen

Igor said:
I have an XML file with an internal DTD which looks roughly like this:
[snip]
I want to parse this file into my application, modify the data (this includes
maybe creating and/or deleting nodes), and write it back into the file --
including the DTD. (It doesn't necessarily need validation, though.)

I tried xml.dom.ext.PrettyPrint, but it produces only
[snip]
actually lacking the document type definition.

Any help is appreciated!

Unfortunately I don't know of any way you could generate the DTD again,
and I've never seen a package which supports what you ask for (not that
it isn't possible, mind you).

On the other hand, are you sure you need the DTD? We use XML in
dozens of ways and absolutely have never benefited from attempts
to use DTDs, and don't appear to suffer from the lack thereof.

Also, aren't DTDs sort of considered either obsolete or at least
vastly inferior to the newer approaches such as XML Schema, or both?

So my recommendation is to ditch the DTD and see if any problems
arise as a result.

-Peter
 
I

Igor Fedorow

Igor said:
I have an XML file with an internal DTD which looks roughly like this:
[snip]
I want to parse this file into my application, modify the data (this
includes maybe creating and/or deleting nodes), and write it back into the
file -- including the DTD. (It doesn't necessarily need validation,
though.)

I tried xml.dom.ext.PrettyPrint, but it produces only [snip] actually
lacking the document type definition.

Any help is appreciated!

Unfortunately I don't know of any way you could generate the DTD again, and
I've never seen a package which supports what you ask for (not that it isn't
possible, mind you).

On the other hand, are you sure you need the DTD? We use XML in dozens of
ways and absolutely have never benefited from attempts to use DTDs, and
don't appear to suffer from the lack thereof.

Also, aren't DTDs sort of considered either obsolete or at least vastly
inferior to the newer approaches such as XML Schema, or both?

So my recommendation is to ditch the DTD and see if any problems arise as a
result.

-Peter

Actually, I don't really *need* it, but I would simply like to have it -- which
obviously isn't possible...

Anyway, thank you for your help!

Cheers =)
*igor*
 
K

Kevin Ballard

I tried xml.dom.ext.PrettyPrint, but it produces only

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE root>
<root>
...
</root>

actually lacking the document type definition.

Why not simply use that, then replace the <!DOCTYPE root> with the DTD? I'm
sure you can parse it out from the original file.
 
A

Andrew Clover

Peter Hansen said:
Unfortunately I don't know of any way you could generate the DTD again

It is possible to preserve the internal subset in DOM Level 3. You can
read it from the property DocumentType.internalSubset, and it will be
included in documents serialised by an LSSerializer.

It is not, however, possible to write to the internalSubset, and you can't
create a new DocumentType object with a non-empty internalSubset, for some
reason. So the only standard way to copy an internalSubset would be to make
the new document by parsing something with the same value, eg.:

dtd= oldDocument.doctype.internalSubset
parser= oldDocument.implementation.createLSParser(1, None)
input= oldDocument.implementation.createLSInput()
I've never seen a package which supports what you ask for

Plug time: the only package I know of to support DOM Level 3 is my own:

http://www.doxdesk.com/software/py/pxdom.html

Currently this is based on the November 2003 CR spec; there have been a
number of fixes and changes to L3 functionality since, but I'm waiting for
W3C to publish the next draft (presumably Proposed Recommendation) before
releasing 1.0.
Also, aren't DTDs sort of considered either obsolete or at least
vastly inferior to the newer approaches such as XML Schema, or both?

Certainly they have their drawbacks: they're namespace-ignorant, not
flexible enough for some purposes, and they're a legacy bag on the side of
XML rather than something built on top of it in XML syntax.

Still, they're well-understood and widely supported, and simpler to learn
than Schema at least.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,571
Members
45,045
Latest member
DRCM

Latest Threads

Top