Parsing XML with Dom

nuthinking · Sep 27, 2007

I can't believe I'm stuck on this, but
DocumentBuilderFactory.setIgnoringElementContentWhitespace doesn't
seem to work at all, I still get the new lines as text elements :S

Any idea?

Here the small code I used:

protected static void parseDom(File file)
{
// TODO Auto-generated method stub
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setIgnoringComments(true);
factory.setIgnoringElementContentWhitespace(true);

DocumentBuilder parser;
try {
parser = factory.newDocumentBuilder();
Document document = parser.parse(file);
NodeList list = document.getChildNodes();
int len = list.getLength();
System.out.println("#parseDom: len:" + len);
for (int i = 0; i < len; i++) {
Node element = list.item(i);
parseNode(element);
}
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}

private static void parseNode(Node node)
{
System.out.println("#parseNode:" + node.getNodeName() + " = " +
node.getNodeValue() + " type:" + node.getNodeType());
NamedNodeMap attributes = node.getAttributes();
if(attributes != null){
int len = attributes.getLength();
for (int i = 0; i < len; i++) {
Node attr = attributes.item(i);
parseAttribute(attr);
}
}
if(!node.hasChildNodes()) return;

NodeList list = node.getChildNodes();
int len = list.getLength();
System.out.println("-- num children: " + len);
for(int i= 0; i<len; i++) {
Node child = list.item(i);
parseNode(child);
}
System.out.println("------");
}

private static void parseAttribute(Node node)
{
// TODO Auto-generated method stub
System.out.println("#parseAttribute:" + node.getNodeName() + " = " +
node.getNodeValue());
}

Thanks,

chr

Andrew Thompson · Sep 28, 2007

DocumentBuilderFactory.setIgnoringElementContentWhitespace doesn't
seem to work at all, I still get the new lines as text elements :S ...
Here the small code I used:

A highly motivated* master of the craft** might be able to spot
the mistake in your 63 line snippet by eye. To get the help
of 'the rest of us', you are better off posting an SSCCE***.

* Highly enough motivated to try and spot mistakes by
simply reading the code, as opposed to seeing the code
work/fail when run.

** ..and they would probably need to know XML processing
inside and out, often mistakes are spotted by people who do
not know an API that well, but were simply interested enough
to run a code sample.

*** <http://www.physci.org/codes/sscce.html>
It would be best to pull a small XML directly from
URL off a web site. If you cannot manage to upload
it to somehwere that is open to being fetched by Java,
try including a small sample in your post.

--
Andrew Thompson
http://www.athompson.info/andrew/

Message posted via JavaKB.com
http://www.javakb.com/Uwe/Forums.aspx/java-general/200709/1

nuthinking · Sep 28, 2007

The problem seemed it is that setIgnoringElementContentWhitespace
works if the xml refers to either to xsd or dtd.

Thanks anyway, chr

=?ISO-8859-1?Q?Arne_Vajh=F8j?= · Sep 30, 2007

The problem seemed it is that setIgnoringElementContentWhitespace
works if the xml refers to either to xsd or dtd.

To some extent that I think that makes sense.

Only with a DTD or XSD is it possible to identify something
as content whitespace.

Arne

=?ISO-8859-1?Q?Arne_Vajh=F8j?= · Sep 30, 2007

Arne said:
To some extent that I think that makes sense.

Only with a DTD or XSD is it possible to identify something
as content whitespace.

Try look at the attached example.

Arne

====================================

package september;

import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.DocumentTraversal;
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.traversal.TreeWalker;
import org.xml.sax.InputSource;

public class XMLandWS {
public static void parse(String xml) throws Exception {
System.out.print(xml);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setIgnoringElementContentWhitespace(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));
TreeWalker walk = ((DocumentTraversal)
doc).createTreeWalker(doc.getDocumentElement(), NodeFilter.SHOW_TEXT,
null, false);
Node n;
while ((n = walk.nextNode()) != null) {
System.out.println("=" + n.getNodeValue().replace("\n",
"\\n").replace(" ", "_"));
}
}
public static void main(String[] args) throws Exception {
parse("<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
parse("<!DOCTYPE all [\n" +
"<!ELEMENT all (one)*>\n" +
"<!ELEMENT one (#PCDATA)>\n" +
"]>\n" +
"<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
parse("<!DOCTYPE all [\n" +
"<!ELEMENT all (#PCDATA|one)*>\n" +
"<!ELEMENT one (#PCDATA)>\n" +
"]>\n" +
"<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
}
}

The distinction between a java applet and an application	1	Jan 4, 2023
Error with server	3	Nov 20, 2022
Picture Comparison Code Not Working Properly	1	Jul 23, 2021
Image overlay and comparison code error.	2	Jul 1, 2021
Unexpected #text-Nodes in a dom structure, parsed with jaxp	2	Jul 6, 2005
xml parsing using dom	1	Nov 17, 2006
Java XML getChildNodes problem	3	May 5, 2011
Connected SQLite to my java program but information are not submitted	2	Aug 2, 2022

Parsing XML with Dom

nuthinking

Andrew Thompson

nuthinking

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads