Help to Process two very big xml files....

F

fuel

Hello,
I have two big xml files (around 50-60 MB) each. and I need to
process the data within each of them. The problem is, I need to
process each node and compare with the other nodes in the other xml
file. After iterating through all the nodes, I need to find those
nodes which have changed or which have been newly introduced.
Assume the following xml structure,

<?xml version="1.0"?>
<root>
<nodeToProcess>

</nodeToProcess>
.....
</root>

I have two such xml files. I keep one xml file as the reference and
compare it with the other. To solve this problem, I thought, I could
use XPath. However, for now, only DOM based XPath processors are
there. Since the file is very huge, I dont think I can afford DOM.
( Memory constraint )

How can I approach this problem ? what would be the right way to start
with.

P.S ( I am trying to access these elements through Java)
 
M

Manuel Collado

fuel escribió:
Hello,
I have two big xml files (around 50-60 MB) each. and I need to
process the data within each of them. The problem is, I need to
process each node and compare with the other nodes in the other xml
file. After iterating through all the nodes, I need to find those
nodes which have changed or which have been newly introduced.
Assume the following xml structure,

<?xml version="1.0"?>
<root>
<nodeToProcess>

</nodeToProcess>
.....
</root>

I have two such xml files. I keep one xml file as the reference and
compare it with the other. To solve this problem, I thought, I could
use XPath. However, for now, only DOM based XPath processors are
there. Since the file is very huge, I dont think I can afford DOM.
( Memory constraint )

How can I approach this problem ? what would be the right way to start
with.

There are ready-to-run tools for differencing XML files. Please google
for xml-diff.
>
> P.S ( I am trying to access these elements through Java)

Some of the tools are written in Java and some of them are open-source.

Don't know the performance of these tools with big files.

Hope this helps.
 
P

Peyo

fuel a écrit :
How can I approach this problem ?
P.S ( I am trying to access these elements through Java)

Use, through its Java API, an XML database, if possible light-weight,
in order to get optimized access to the nodes you need to process ?

Cheers,

p.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top