extracting part of xml

P

puzzlecracker

let's say I have the following xml file
<info>
<item>
............
</item>

<item>
............
</item>

<item>
............
</item>

</info>

I want to extract each< item> in its entirity; thus, in above, I want
to create 3 files
each containing just
<item>
............
</item>

..I tried using xpath didnt help, not sure how to readof the actual
tags.

Thanks.
 
J

Jean-Francois Briere

This is how to retrieve the nodes:

String xpathExpr = "/info/item";
String inputFilename = "yourFile.xml";
XPath xpath = XPathFactory.newInstance().newXPath();
InputSource inputSource = new InputSource(inputFilename);
NodeList nodes = (NodeList)xpath.evaluate(xpathExpr, inputSource,
XPathConstants.NODESET);

Regards
 
J

Jean-Paul

- Download the JDOM library from http://www.jdom.org.

- Import the library in your project / favorite IDE

Given the following an XML file called items.xml with the following
contents:
<?xml version="1.0"?>
<info>
<item>hello</item>
<item>world</item>
<item>!</item>
</info>

We will be producing 3 files, each named item1.xml, item2.xml,
item3.xml with the following piece of code using the JDOM library:

import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
import java.io.*;
import java.util.*;

public class XMLItemManipulator {

private List<Element> items;

public XMLItemManipulator() {
items = null;
}

public void readItems(File xmlFile) throws FileNotFoundException,
IOException {

// make sure the file exists and can be read
if(!xmlFile.exists())
throw new FileNotFoundException("cannot find the xml file");

if(!xmlFile.canRead())
throw new IOException("file exists but does not have *read*
permission");

// now that we have made sure we got the file, just get the objects
// necessary to read it and create and XML doc outta if
SAXBuilder builder = new SAXBuilder();
Document doc = null;

try {
doc = builder.build(xmlFile);
} catch(JDOMException e) {
System.out.println("An error occured while build the XML Doc!");
e.printStackTrace();
}

// get the root element, in you case this would be <info>
Element root = doc.getRootElement();

// get the list of children of the root element
// which have the "item" tag.
// meaning that even if you had other tags that
// were children of the root, we really wouldn't care
// perfect for an heterogenous xml file containing more
// than the "item" elements
items = root.getChildren("item");
}


// now that you got the items you might want to manipulate them
// it depends on what you wanna do with them while they're in
// memory. I recommend you have a look at the JDOM doc for more info.
public void manipulateItems() {
// put some code here
}


// once you have manipulated them or since you got the items,
// you can now decide to write them separately to files.
// To do this, it's very simple.
public void writeItems() throws IOException, Exception {
Element root = null;
Document doc = null;
FileWriter writer = null;
XMLOutputter out = new XMLOutputter();
int size = items.size();

try {
for(int counter = 0; counter < size; counter++) {
root = new Element("item");
root.addContent(items.get(counter).cloneContent());

doc = new Document(root);
writer = new FileWriter(new File("item" + counter + ".xml"));
out.output(doc, writer);
out.output(doc, System.out);
}


} catch(IOException e) {
throw e; // put better handling of exception here
} catch(Exception e) {
throw e; // put better handling of exception here
} finally {
try {
writer.close();
} catch(Exception e) {
e.printStackTrace(); // imagine better handling here
}
}
}


// testing all of this with a main method (normally you'd write)
// a full test case to do this but that's your decision
public static void main(String[] args) {
XMLItemManipulator manip = new XMLItemManipulator();
File file = new File("items.xml");

try {
manip.readItems(file);
manip.manipulateItems(); // this is optional
manip.writeItems();

} catch(Exception e) {
e.printStackTrace();
}
}
}

There you go. Let us know how it goes.

Regards,

Jean-Paul H.
 
A

ab2305

Jean-Paul said:
- Download the JDOM library from http://www.jdom.org.

- Import the library in your project / favorite IDE

Given the following an XML file called items.xml with the following
contents:
<?xml version="1.0"?>
<info>
<item>hello</item>
<item>world</item>
<item>!</item>
</info>

We will be producing 3 files, each named item1.xml, item2.xml,
item3.xml with the following piece of code using the JDOM library:

import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
import java.io.*;
import java.util.*;

public class XMLItemManipulator {

private List<Element> items;

public XMLItemManipulator() {
items = null;
}

public void readItems(File xmlFile) throws FileNotFoundException,
IOException {

// make sure the file exists and can be read
if(!xmlFile.exists())
throw new FileNotFoundException("cannot find the xml file");

if(!xmlFile.canRead())
throw new IOException("file exists but does not have *read*
permission");

// now that we have made sure we got the file, just get the objects
// necessary to read it and create and XML doc outta if
SAXBuilder builder = new SAXBuilder();
Document doc = null;

try {
doc = builder.build(xmlFile);
} catch(JDOMException e) {
System.out.println("An error occured while build the XML Doc!");
e.printStackTrace();
}

// get the root element, in you case this would be <info>
Element root = doc.getRootElement();

// get the list of children of the root element
// which have the "item" tag.
// meaning that even if you had other tags that
// were children of the root, we really wouldn't care
// perfect for an heterogenous xml file containing more
// than the "item" elements
items = root.getChildren("item");
}


// now that you got the items you might want to manipulate them
// it depends on what you wanna do with them while they're in
// memory. I recommend you have a look at the JDOM doc for more info.
public void manipulateItems() {
// put some code here
}


// once you have manipulated them or since you got the items,
// you can now decide to write them separately to files.
// To do this, it's very simple.
public void writeItems() throws IOException, Exception {
Element root = null;
Document doc = null;
FileWriter writer = null;
XMLOutputter out = new XMLOutputter();
int size = items.size();

try {
for(int counter = 0; counter < size; counter++) {
root = new Element("item");
root.addContent(items.get(counter).cloneContent());

doc = new Document(root);
writer = new FileWriter(new File("item" + counter + ".xml"));
out.output(doc, writer);
out.output(doc, System.out);
}


} catch(IOException e) {
throw e; // put better handling of exception here
} catch(Exception e) {
throw e; // put better handling of exception here
} finally {
try {
writer.close();
} catch(Exception e) {
e.printStackTrace(); // imagine better handling here
}
}
}


// testing all of this with a main method (normally you'd write)
// a full test case to do this but that's your decision
public static void main(String[] args) {
XMLItemManipulator manip = new XMLItemManipulator();
File file = new File("items.xml");

try {
manip.readItems(file);
manip.manipulateItems(); // this is optional
manip.writeItems();

} catch(Exception e) {
e.printStackTrace();
}
}
}

There you go. Let us know how it goes.

Regards,

Jean-Paul H.

thanks
it didnt work.

Item is not the root tag but they are scatter of the doc...


<Info>

<Item>
...............
</Item>

etc
</Info>

suggest
 
J

Jean-Paul

Even so, you should be able to modify the code to make it work. What
this code does is that it gives you the basics. From here and with the
documentation of the JDOM library, you should be able to get a solution
on our own. Also try to read on how to properly manipulate XML with
Java.
 
A

ab2305

I am wary about using JDOM in a commercial software. Is it possibly to
acchive the same with standard tools that are part of 1.5?

Thanks.
 
J

James McGill

I am wary about using JDOM in a commercial software. Is it possibly to
acchive the same with standard tools that are part of 1.5?

What are you trying to do (I missed the thread?)

The JDK has reference implementations of DOM and SAX, all in JAXP which
shares ancestry with Xerces. I prefer DOM4J but I can't give you an
intelligent rationale other than, "it's always worked well when I've
used it".

Since 1.5, it seems like it should be unnecessary to use anything
additional for xml processing, unless you need a particular
implementation for performance or compatability reasons. But I must
admit, I didn't see the original question and I might be being naive.
 
A

ab2305

After I do the extraction, i save items to the file, However, when I
read them back, one item at the time (using sax parser- provided by
eclipse), rarely, but for some them I get the following exception.
Can someonw point out the problem? thanks

org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown
Source)
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown
Source)
at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown
Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at
com.touchgraph.amazoncache.io.AmazonParser.parse(AmazonParser.java:33)
at
com.touchgraph.amazoncache.io.AmazonCacheReader.readCache(AmazonCacheReader.java:35)
at
com.touchgraph.amazoncache.io.AmazonCacheStore.getBooksFromCache(AmazonCacheStore.java:185)
at
com.touchgraph.amazoncache.io.AmazonCacheStore.loadSimilarFromCache(AmazonCacheStore.java:131)
at
com.touchgraph.amazoncache.io.AmazonCacheStore.getSimilarBooks(AmazonCacheStore.java:44)
at
com.touchgraph.amazoncache.io.AmazonDataModel.addSimilarBooks(AmazonDataModel.java:70)
at
com.touchgraph.amazoncache.io.AmazonCacheFrame$1.actionPerformed(AmazonCacheFrame.java:85)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown
Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForHierarchy(Unknown
Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
null
 
J

Jean-Paul

It seems that there is a problem on the way you're reading the files
back into your system. Can you show us some code?
 
P

puzzlecracker

Jean-Paul said:
It seems that there is a problem on the way you're reading the files
back into your system. Can you show us some code?
I already solved it. I all I needed to do is to write files with a
different encoding.

thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top