Help to Process two very big xml files....

Discussion in 'XML' started by fuel, Jun 11, 2008.

  1. fuel

    fuel Guest

    Hello,
    I have two big xml files (around 50-60 MB) each. and I need to
    process the data within each of them. The problem is, I need to
    process each node and compare with the other nodes in the other xml
    file. After iterating through all the nodes, I need to find those
    nodes which have changed or which have been newly introduced.
    Assume the following xml structure,

    <?xml version="1.0"?>
    <root>
    <nodeToProcess>

    </nodeToProcess>
    .....
    </root>

    I have two such xml files. I keep one xml file as the reference and
    compare it with the other. To solve this problem, I thought, I could
    use XPath. However, for now, only DOM based XPath processors are
    there. Since the file is very huge, I dont think I can afford DOM.
    ( Memory constraint )

    How can I approach this problem ? what would be the right way to start
    with.

    P.S ( I am trying to access these elements through Java)
     
    fuel, Jun 11, 2008
    #1
    1. Advertising

  2. fuel escribió:
    > Hello,
    > I have two big xml files (around 50-60 MB) each. and I need to
    > process the data within each of them. The problem is, I need to
    > process each node and compare with the other nodes in the other xml
    > file. After iterating through all the nodes, I need to find those
    > nodes which have changed or which have been newly introduced.
    > Assume the following xml structure,
    >
    > <?xml version="1.0"?>
    > <root>
    > <nodeToProcess>
    >
    > </nodeToProcess>
    > .....
    > </root>
    >
    > I have two such xml files. I keep one xml file as the reference and
    > compare it with the other. To solve this problem, I thought, I could
    > use XPath. However, for now, only DOM based XPath processors are
    > there. Since the file is very huge, I dont think I can afford DOM.
    > ( Memory constraint )
    >
    > How can I approach this problem ? what would be the right way to start
    > with.


    There are ready-to-run tools for differencing XML files. Please google
    for xml-diff.

    >
    > P.S ( I am trying to access these elements through Java)


    Some of the tools are written in Java and some of them are open-source.

    Don't know the performance of these tools with big files.

    Hope this helps.
    --
    Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
     
    Manuel Collado, Jun 11, 2008
    #2
    1. Advertising

  3. fuel

    Peyo Guest

    fuel a écrit :

    > How can I approach this problem ?


    > P.S ( I am trying to access these elements through Java)


    Use, through its Java API, an XML database, if possible light-weight,
    in order to get optimized access to the nodes you need to process ?

    Cheers,

    p.
     
    Peyo, Jun 11, 2008
    #3
  4. Martin Honnen, Jun 11, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. olivier.melcher

    Help running a very very very simple code

    olivier.melcher, May 12, 2008, in forum: Java
    Replies:
    8
    Views:
    2,383
  2. Shaguf
    Replies:
    0
    Views:
    563
    Shaguf
    Dec 24, 2008
  3. Shaguf
    Replies:
    0
    Views:
    511
    Shaguf
    Dec 26, 2008
  4. Shaguf
    Replies:
    0
    Views:
    279
    Shaguf
    Dec 26, 2008
  5. Shaguf
    Replies:
    0
    Views:
    260
    Shaguf
    Dec 24, 2008
Loading...

Share This Page