Parsing XML with Dom


N

nuthinking

I can't believe I'm stuck on this, but
DocumentBuilderFactory.setIgnoringElementContentWhitespace doesn't
seem to work at all, I still get the new lines as text elements :S

Any idea?


Here the small code I used:


protected static void parseDom(File file)
{
// TODO Auto-generated method stub
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setIgnoringComments(true);
factory.setIgnoringElementContentWhitespace(true);

DocumentBuilder parser;
try {
parser = factory.newDocumentBuilder();
Document document = parser.parse(file);
NodeList list = document.getChildNodes();
int len = list.getLength();
System.out.println("#parseDom: len:" + len);
for (int i = 0; i < len; i++) {
Node element = list.item(i);
parseNode(element);
}
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}



private static void parseNode(Node node)
{
System.out.println("#parseNode:" + node.getNodeName() + " = " +
node.getNodeValue() + " type:" + node.getNodeType());
NamedNodeMap attributes = node.getAttributes();
if(attributes != null){
int len = attributes.getLength();
for (int i = 0; i < len; i++) {
Node attr = attributes.item(i);
parseAttribute(attr);
}
}
if(!node.hasChildNodes()) return;

NodeList list = node.getChildNodes();
int len = list.getLength();
System.out.println("-- num children: " + len);
for(int i= 0; i<len; i++) {
Node child = list.item(i);
parseNode(child);
}
System.out.println("------");
}

private static void parseAttribute(Node node)
{
// TODO Auto-generated method stub
System.out.println("#parseAttribute:" + node.getNodeName() + " = " +
node.getNodeValue());
}


Thanks,

chr
 
Ad

Advertisements

A

Andrew Thompson

DocumentBuilderFactory.setIgnoringElementContentWhitespace doesn't
seem to work at all, I still get the new lines as text elements :S ...
Here the small code I used:

A highly motivated* master of the craft** might be able to spot
the mistake in your 63 line snippet by eye. To get the help
of 'the rest of us', you are better off posting an SSCCE***.

* Highly enough motivated to try and spot mistakes by
simply reading the code, as opposed to seeing the code
work/fail when run.

** ..and they would probably need to know XML processing
inside and out, often mistakes are spotted by people who do
not know an API that well, but were simply interested enough
to run a code sample.

*** <http://www.physci.org/codes/sscce.html>
It would be best to pull a small XML directly from
URL off a web site. If you cannot manage to upload
it to somehwere that is open to being fetched by Java,
try including a small sample in your post.

--
Andrew Thompson
http://www.athompson.info/andrew/

Message posted via JavaKB.com
http://www.javakb.com/Uwe/Forums.aspx/java-general/200709/1
 
N

nuthinking

The problem seemed it is that setIgnoringElementContentWhitespace
works if the xml refers to either to xsd or dtd.

Thanks anyway, chr
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

The problem seemed it is that setIgnoringElementContentWhitespace
works if the xml refers to either to xsd or dtd.

To some extent that I think that makes sense.

Only with a DTD or XSD is it possible to identify something
as content whitespace.

Arne
 
Ad

Advertisements

?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Arne said:
To some extent that I think that makes sense.

Only with a DTD or XSD is it possible to identify something
as content whitespace.

Try look at the attached example.

Arne

====================================

package september;

import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.DocumentTraversal;
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.traversal.TreeWalker;
import org.xml.sax.InputSource;

public class XMLandWS {
public static void parse(String xml) throws Exception {
System.out.print(xml);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setIgnoringElementContentWhitespace(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));
TreeWalker walk = ((DocumentTraversal)
doc).createTreeWalker(doc.getDocumentElement(), NodeFilter.SHOW_TEXT,
null, false);
Node n;
while ((n = walk.nextNode()) != null) {
System.out.println("=" + n.getNodeValue().replace("\n",
"\\n").replace(" ", "_"));
}
}
public static void main(String[] args) throws Exception {
parse("<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
parse("<!DOCTYPE all [\n" +
"<!ELEMENT all (one)*>\n" +
"<!ELEMENT one (#PCDATA)>\n" +
"]>\n" +
"<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
parse("<!DOCTYPE all [\n" +
"<!ELEMENT all (#PCDATA|one)*>\n" +
"<!ELEMENT one (#PCDATA)>\n" +
"]>\n" +
"<all>\n" +
" <one>A</one>\n" +
" <one>BB</one>\n" +
" <one>CCC</one>\n" +
"</all>\n");
}
}
 

Top