Which line number a Node is from

N

Nicolas Raoul

Hello all,

My Java application uses XML files. These files are parsed using DOM.
The XML files are usually written by developpers, and may contain
errors. In such case, I would like to tell at what line the error is.
The problem is that DOM does not allow this: An org.w3c.dom.Node object
does not contain any kind of reference to the originating file.

XMLSchema is not specialized enough to detect all the possible errors
(conformance rules contained in a database). That's why I must do some
error-checking at run-time, in Java.

How can I parse an XML file in Java and still be able to tell at which
line number a particular Node is ?
Is there any alternative/extension to DOM for this ?

Thanks,
Nicolas Raoul.
http://nrw.free.fr
 
J

jan V

My Java application uses XML files. These files are parsed using DOM.
The XML files are usually written by developpers, and may contain
errors.

Some people would argue that this use of XML is broken. XML should be
written and read by programs, i.e. computers... not people. Why don't you
write a program to generate those XML files, then you can ensure that you
don't produce garbage in the first place (cfr. GIGO).

By trying to solve your line number problem, you're trying to address a
symptom, not the cause... and what do you prefer at the end of the day? The
cause to be tackled, or just tapering over the cracks?
 
A

Andrew Thompson

My Java application uses XML files. These files are parsed using DOM.
The XML files are usually written by developpers, and may contain
errors. In such case, I would like to tell at what line the error is.
The problem is that DOM does not allow this: An org.w3c.dom.Node object
does not contain any kind of reference to the originating file.

So don't validate them that way.

I had some experiences recently using the Ant xmlvalidate
task[1]. Easy peasy when you kick it off from inside an
IDE. The error output will allow you to 'double click'/
'jump to' the line in error.

[1] <http://ant.apache.org/manual/OptionalTasks/xmlvalidate.html>

I'll now clear the floor for the 'I hate XML' crew. ;-)

HTH

--
Andrew Thompson
physci.org 1point1c.org javasaver.com lensescapes.com athompson.info
"I talk of freedom, you talk of the flag. I talk of revolution, you'd much
rather brag.."
Live 'White Discussion'
 
B

bugbear

Nicolas said:
Hello all,

My Java application uses XML files. These files are parsed using DOM.
The XML files are usually written by developpers, and may contain
errors. In such case, I would like to tell at what line the error is.
The problem is that DOM does not allow this: An org.w3c.dom.Node object
does not contain any kind of reference to the originating file.

XMLSchema is not specialized enough to detect all the possible errors
(conformance rules contained in a database). That's why I must do some
error-checking at run-time, in Java.

How can I parse an XML file in Java and still be able to tell at which
line number a particular Node is ?
Is there any alternative/extension to DOM for this ?

No. So I wrote one; I use a SAX parser and a handler
to build up a DOM tree. And I fake/force/fudge
extra line number information (from the SAX events)
into the DOM nodes.

It's a java equivalent of this:
http://search.cpan.org/~enno/libxml-enno-1.02/lib/XML/Handler/BuildDOM.pm

It's foul, I know, but I have happy users.

BugBear
 
J

jan V

I had some experiences recently using the Ant xmlvalidate
task[1]. Easy peasy when you kick it off from inside an
IDE. The error output will allow you to 'double click'/
'jump to' the line in error.

Now you've done it... you've proven Nicolas that it's possible to get at
those line numbers, since your IDE does it. ;-)
I'll now clear the floor for the 'I hate XML' crew. ;-)

Not in this thread, Josephine.
 
F

flazzarino

use xerces, i think it might be in the sdk 5.0 too. validation is not
as hard as it seems.


DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
factory.setAttribute("http://xml.org/sax/features/validation",
new Boolean(true));
factory.setAttribute("http://apache.org/xml/features/validation/schema",
new Boolean(true));
factory.setAttribute("http://xml.org/sax/features/namespaces",
new Boolean(true));

bulder = factory.newDocumentBuilder();
ErrorHandler eh = new ErrorHandler() {
public void warning(SAXParseException e) throws SAXException {
// do stuff with e, like e.getLineNumber()
}
public void error(SAXParseException e) throws SAXException {
// do stuff with e
}

public void fatalError(SAXParseException e) throws SAXException {
// do stuff with e
}
};

bulder.setErrorHandler(eh);
builder.parse(someInputStream)
 
R

Roedy Green

[1] <http://ant.apache.org/manual/OptionalTasks/xmlvalidate.html>

I'll now clear the floor for the 'I hate XML' crew. ;-)

His experience confirms my major XML complaint. XML encourages the
propagation of invalid files.

My key theory in the Abundance database language was to go to extreme
lengths to avoid getting any invalid data in the binary files. It
makes coding ever so much simpler if you can completely trust your
files to contain only valid and complete data.

XML is the antithesis of that approach.

It is the duty of the USER of the XML file to defend himself against
error.

This to me is completely illogical. An XML file has one writer and
potentially many readers. It should be the WRITER's job to produce a
syntactically clean and provably clean file. The only way to do that
is with some sort of binary format that can't easily be tampered with
a last minute change.
 
H

Hemal Pandya

Some people would argue that this use of XML is broken. XML should be
written and read by programs, i.e. computers... not people. Why don't you
write a program to generate those XML files, then you can ensure that you
don't produce garbage in the first place (cfr. GIGO).

By trying to solve your line number problem, you're trying to address a
symptom, not the cause... and what do you prefer at the end of the day? The
cause to be tackled, or just tapering over the cracks?

I am not trying to hijack this thread into discussions about valid uses
of XML, but I have a few of points to make:

- OP didn't say which developpers write the XML. For all you know they
could be at the other end of the and he can't force them to write a
program to generate the XML.

- Even if there program generates the XML, humans still have to read
it. This generated XML will may (will?) have errors in it and having
line number information helps.
 
N

Nicolas Raoul

I believe that XML should be human-readable and human-editable. And if
a program encouters something that should never happen, then it should
not just crash, but point out the problem in an intelligible way,
including the line number information.

It is unfortunate that I dont have the ressources to write a quality
generation tool for developers to easily write this kind of XML. But
even if I had written one, I have no way to entforce its usage, I cant
dictate how developers must edit the XML files.

I cant just hope that every XML file out there will be valid. On the
contrary, I must assume some of them will be invalid, and it is my
responsability to handle that case.

I dont think GIGO (Garbage In => Garbage Out) is a good thing, so here
is yet another new acronym: GIHO (Garbage In => Help Out) ;-)

Thanks for the ideas anyway,
Nicolas Raoul
 
N

Nicolas Raoul

This ant task is useful, it uses a DTD or XMLSchema definition.
However, it is not applicable here, since in my particular case, XML
validity cant be expressed by these languages.

XML validity in my case depends on environment factors. An XML file
that is valid in one system is probably invalid in another system.
That's because it depends on constraints that are stored in a database.
XMLSchema cant access a database, as far as I know :-(
you've proven that it's possible to get at those line numbers

Hopefully, XMLSchema validators tell errors line numbers, probably
because they are not written using DOM.

Thanks,
Nicolas Raoul
 
N

Nicolas Raoul

You describe the case of a single actor producing untouchable files
that are then read by some users.
That is not the case in my situation. There are many writers, many
readers, and I have no control over them. This kind of situation has
become pretty common, and I guess that's why XML has become a standard
for data exchanges.
It makes coding ever so much simpler if you can completely trust your files
This is definitely what I dont want to do. I will always check every
input, to make the application secure and robust in whatever
environment it is asked to run.

Thanks anyway,
Raoul Nicolas.
 
N

Nicolas Raoul

It sounds interesting :)
I am thinking about writing such a tool.
Is your DOM builder open source, or otherwise available somewhere ?

Thanks a lot !
Nicolas Raoul.
 
N

Nicolas Raoul

Well, I could parse everything using SAX, but I really dont like SAX.
DOM is much more adapted to object-oriented programming in my opinion.

For example, an object may parse a Node, recognize some known nodes
inside and pass them to appropriate new objects. Each object knows how
to parse its node, rather than having a big SAX class that does
everything and would be less extensible.

Thanks for the idea,
Nicolas Raoul
 
N

Nicolas Raoul

Your code uses XMLSchema-defined validity rules.
Please have a look at the answer I have just written for Andrew :
XMLSchema is not expressive enough to match my needs.

Thanks anyway :)
Nicolas Raoul.
 
H

Hemal Pandya

Nicolas said:
Hello all,


How can I parse an XML file in Java and still be able to tell at which
line number a particular Node is ?
Is there any alternative/extension to DOM for this ?

Some time spent searching revealed that this can be done by extending
DOMParser. The Xerces samples include DOMAddLines, which prints line
numbers for each node while using a DOM Parser. You can probably use
Proxy to do this with arbitrary DOMParser classes.

I was able to find source by looking for
http://www.google.com/search?hl=en&lr=&q=DOMAddlines+ext:java&btnG=Search
and viewing from google cache.
 
N

Nicolas Raoul

Great !
That is exactly the kind of solution I need :)

The Apache sample Hemal linked to is an extension to DOMParser that
overrides the startDocument() and startElement() methods to store the
XMLLocator.getLineNumber() into the userData attribute of each Node.

Since DOMParser is org.apache.xerces.parsers.DOMParser, it seems that
this is an implementation specific solution. Well, it seems that
getting the line numbers is impossible with the standard J2SE API, so
it has to be a specific implementation. Anyway I am already using
xerces.

I am still using Java 1.4 (which is a shame, I know...) and
unfortunately in 1.4, userData are not available. This will probably
lead me to create for each node a sub-node (or attribute) to store its
line number, a solution similar to what BugBear suggested.

I will try to write something reusable for this, and I will let you
know.

Thanks a lot !
Nicolas Raoul.
 
B

bugbear

Nicolas said:
It sounds interesting :)
I am thinking about writing such a tool.
Is your DOM builder open source, or otherwise available somewhere ?

No. But I suggest a transcription of the perl
I linked to might not be too hard.

BugBear
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top