extracting part of xml

puzzlecracker · Feb 16, 2006

let's say I have the following xml file
<info>
<item>
............
</item>

<item>
............
</item>

<item>
............
</item>

</info>

I want to extract each< item> in its entirity; thus, in above, I want
to create 3 files
each containing just
<item>
............
</item>

..I tried using xpath didnt help, not sure how to readof the actual
tags.

Thanks.

Jean-Francois Briere · Feb 16, 2006

This is how to retrieve the nodes:

String xpathExpr = "/info/item";
String inputFilename = "yourFile.xml";
XPath xpath = XPathFactory.newInstance().newXPath();
InputSource inputSource = new InputSource(inputFilename);
NodeList nodes = (NodeList)xpath.evaluate(xpathExpr, inputSource,
XPathConstants.NODESET);

Regards

Denis · Feb 16, 2006

Have you try to use http://jaxen.org/ ?

DM

Jean-Paul · Feb 16, 2006

- Download the JDOM library from http://www.jdom.org.

- Import the library in your project / favorite IDE

Given the following an XML file called items.xml with the following
contents:
<?xml version="1.0"?>
<info>
<item>hello</item>
<item>world</item>
<item>!</item>
</info>

We will be producing 3 files, each named item1.xml, item2.xml,
item3.xml with the following piece of code using the JDOM library:

import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
import java.io.*;
import java.util.*;

public class XMLItemManipulator {

private List<Element> items;

public XMLItemManipulator() {
items = null;
}

public void readItems(File xmlFile) throws FileNotFoundException,
IOException {

// make sure the file exists and can be read
if(!xmlFile.exists())
throw new FileNotFoundException("cannot find the xml file");

if(!xmlFile.canRead())
throw new IOException("file exists but does not have *read*
permission");

// now that we have made sure we got the file, just get the objects
// necessary to read it and create and XML doc outta if
SAXBuilder builder = new SAXBuilder();
Document doc = null;

try {
doc = builder.build(xmlFile);
} catch(JDOMException e) {
System.out.println("An error occured while build the XML Doc!");
e.printStackTrace();
}

// get the root element, in you case this would be <info>
Element root = doc.getRootElement();

// get the list of children of the root element
// which have the "item" tag.
// meaning that even if you had other tags that
// were children of the root, we really wouldn't care
// perfect for an heterogenous xml file containing more
// than the "item" elements
items = root.getChildren("item");
}

// now that you got the items you might want to manipulate them
// it depends on what you wanna do with them while they're in
// memory. I recommend you have a look at the JDOM doc for more info.
public void manipulateItems() {
// put some code here
}

// once you have manipulated them or since you got the items,
// you can now decide to write them separately to files.
// To do this, it's very simple.
public void writeItems() throws IOException, Exception {
Element root = null;
Document doc = null;
FileWriter writer = null;
XMLOutputter out = new XMLOutputter();
int size = items.size();

try {
for(int counter = 0; counter < size; counter++) {
root = new Element("item");
root.addContent(items.get(counter).cloneContent());

doc = new Document(root);
writer = new FileWriter(new File("item" + counter + ".xml"));
out.output(doc, writer);
out.output(doc, System.out);
}

} catch(IOException e) {
throw e; // put better handling of exception here
} catch(Exception e) {
throw e; // put better handling of exception here
} finally {
try {
writer.close();
} catch(Exception e) {
e.printStackTrace(); // imagine better handling here
}
}
}

// testing all of this with a main method (normally you'd write)
// a full test case to do this but that's your decision
public static void main(String[] args) {
XMLItemManipulator manip = new XMLItemManipulator();
File file = new File("items.xml");

try {
manip.readItems(file);
manip.manipulateItems(); // this is optional
manip.writeItems();

} catch(Exception e) {
e.printStackTrace();
}
}
}

There you go. Let us know how it goes.

Regards,

Jean-Paul H.

ab2305 · Feb 17, 2006

Jean-Paul said:
- Download the JDOM library from http://www.jdom.org.

- Import the library in your project / favorite IDE

Given the following an XML file called items.xml with the following
contents:
<?xml version="1.0"?>
<info>
<item>hello</item>
<item>world</item>
<item>!</item>
</info>

We will be producing 3 files, each named item1.xml, item2.xml,
item3.xml with the following piece of code using the JDOM library:

import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
import java.io.*;
import java.util.*;

public class XMLItemManipulator {

private List<Element> items;

public XMLItemManipulator() {
items = null;
}

public void readItems(File xmlFile) throws FileNotFoundException,
IOException {

// make sure the file exists and can be read
if(!xmlFile.exists())
throw new FileNotFoundException("cannot find the xml file");

if(!xmlFile.canRead())
throw new IOException("file exists but does not have *read*
permission");

// now that we have made sure we got the file, just get the objects
// necessary to read it and create and XML doc outta if
SAXBuilder builder = new SAXBuilder();
Document doc = null;

try {
doc = builder.build(xmlFile);
} catch(JDOMException e) {
System.out.println("An error occured while build the XML Doc!");
e.printStackTrace();
}

// get the root element, in you case this would be <info>
Element root = doc.getRootElement();

// get the list of children of the root element
// which have the "item" tag.
// meaning that even if you had other tags that
// were children of the root, we really wouldn't care
// perfect for an heterogenous xml file containing more
// than the "item" elements
items = root.getChildren("item");
}

// now that you got the items you might want to manipulate them
// it depends on what you wanna do with them while they're in
// memory. I recommend you have a look at the JDOM doc for more info.
public void manipulateItems() {
// put some code here
}

// once you have manipulated them or since you got the items,
// you can now decide to write them separately to files.
// To do this, it's very simple.
public void writeItems() throws IOException, Exception {
Element root = null;
Document doc = null;
FileWriter writer = null;
XMLOutputter out = new XMLOutputter();
int size = items.size();

try {
for(int counter = 0; counter < size; counter++) {
root = new Element("item");
root.addContent(items.get(counter).cloneContent());

doc = new Document(root);
writer = new FileWriter(new File("item" + counter + ".xml"));
out.output(doc, writer);
out.output(doc, System.out);
}

} catch(IOException e) {
throw e; // put better handling of exception here
} catch(Exception e) {
throw e; // put better handling of exception here
} finally {
try {
writer.close();
} catch(Exception e) {
e.printStackTrace(); // imagine better handling here
}
}
}

// testing all of this with a main method (normally you'd write)
// a full test case to do this but that's your decision
public static void main(String[] args) {
XMLItemManipulator manip = new XMLItemManipulator();
File file = new File("items.xml");

try {
manip.readItems(file);
manip.manipulateItems(); // this is optional
manip.writeItems();

} catch(Exception e) {
e.printStackTrace();
}
}
}

There you go. Let us know how it goes.

Regards,

Jean-Paul H.

thanks
it didnt work.

Item is not the root tag but they are scatter of the doc...

<Info>

<Item>
...............
</Item>

etc
</Info>

suggest

Jean-Paul · Feb 17, 2006

Even so, you should be able to modify the code to make it work. What
this code does is that it gives you the basics. From here and with the
documentation of the JDOM library, you should be able to get a solution
on our own. Also try to read on how to properly manipulate XML with
Java.

ab2305 · Feb 19, 2006

I am wary about using JDOM in a commercial software. Is it possibly to
acchive the same with standard tools that are part of 1.5?

Thanks.

James McGill · Feb 19, 2006

I am wary about using JDOM in a commercial software. Is it possibly to
acchive the same with standard tools that are part of 1.5?

What are you trying to do (I missed the thread?)

The JDK has reference implementations of DOM and SAX, all in JAXP which
shares ancestry with Xerces. I prefer DOM4J but I can't give you an
intelligent rationale other than, "it's always worked well when I've
used it".

Since 1.5, it seems like it should be unnecessary to use anything
additional for xml processing, unless you need a particular
implementation for performance or compatability reasons. But I must
admit, I didn't see the original question and I might be being naive.

ab2305 · Feb 23, 2006

After I do the extraction, i save items to the file, However, when I
read them back, one item at the time (using sax parser- provided by
eclipse), rarely, but for some them I get the following exception.
Can someonw point out the problem? thanks

org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown
Source)
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown
Source)
at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown
Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at
com.touchgraph.amazoncache.io.AmazonParser.parse(AmazonParser.java:33)
at
com.touchgraph.amazoncache.io.AmazonCacheReader.readCache(AmazonCacheReader.java:35)
at
com.touchgraph.amazoncache.io.AmazonCacheStore.getBooksFromCache(AmazonCacheStore.java:185)
at
com.touchgraph.amazoncache.io.AmazonCacheStore.loadSimilarFromCache(AmazonCacheStore.java:131)
at
com.touchgraph.amazoncache.io.AmazonCacheStore.getSimilarBooks(AmazonCacheStore.java:44)
at
com.touchgraph.amazoncache.io.AmazonDataModel.addSimilarBooks(AmazonDataModel.java:70)
at
com.touchgraph.amazoncache.io.AmazonCacheFrame$1.actionPerformed(AmazonCacheFrame.java:85)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown
Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForHierarchy(Unknown
Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
null

Jean-Paul · Feb 25, 2006

It seems that there is a problem on the way you're reading the files
back into your system. Can you show us some code?

puzzlecracker · Feb 25, 2006

Jean-Paul said:
It seems that there is a problem on the way you're reading the files
back into your system. Can you show us some code?

I already solved it. I all I needed to do is to write files with a
different encoding.

thanks

xml in 1.5...	1	Feb 19, 2006
Select files based on text list of filenames(part of the name:date) with condition	0	May 4, 2022
Gallery - mosiac/grid layout with filtering and information page popup	1	Apr 13, 2020
extracting data of parsed xml	0	Apr 11, 2008
xml string compare	2	Aug 21, 2006
I'm tempted to quit out of frustration	1	Aug 13, 2023
Simple XML question ...	3	Feb 5, 2007
extracting part of a document	7	Oct 17, 2006

extracting part of xml

puzzlecracker

Jean-Francois Briere

Denis

Jean-Paul

ab2305

Jean-Paul

ab2305

James McGill

ab2305

Jean-Paul

puzzlecracker

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads