Re: Method to compare two XML documents

Discussion in 'XML' started by GrindKore, Aug 9, 2004.

  1. GrindKore

    GrindKore Guest

    Record MD5 hash of your file and later on subsequent scans compare hash
    value, if same then document has not changed else do your imaging and update
    xml manifest with new hash value.

    "GrinKore" <> wrote in message
    news:...
    > Hello, I'm working on the intranet document imaging application where

    every
    > 24 hours my program scans all network servers for various documents and
    > creates raster images of them to be placed on company's intranet server.
    >
    > I have created ActiveX DLL that scans FSO and returns XML document as

    a
    > manifest of all compatible document files stored on those servers. See
    > attached sample XML output for more details.
    >
    > What I want to do is to compare two xml documents so that I can
    > determine what files have changed since last scan. Since production system
    > has to be able to handle 100,000 + nodes looping through both XML

    documents
    > takes considerable amount of time. Is there any other ways to do this?
    >
    > Thanks in advance...
    >
    >
    >
    >
    >
     
    GrindKore, Aug 9, 2004
    #1
    1. Advertising

  2. "GrindKore" <> wrote in message
    news:...
    > Record MD5 hash of your file and later on subsequent scans compare hash
    > value, if same then document has not changed else do your imaging and

    update
    > xml manifest with new hash value.


    As a refinement to this idea, you could parse the XML, generate ESIS
    ("element structure information set") and hash the output. Comparing ESIS
    rather than "raw" XML will avoid false positives where input differs in ways
    that would not affect the information that a processing application would
    receive. (Comments, whitespace, DTD changes...)

    In many cases this approach might be undesirable in terms of performance.
    OTOH this capability might be added at reasonable cost to applications that
    already have access to the element structure information.

    For an example of an SGML/XML parser that generates ESIS, see James Clark's
    SP: http://www.jclark.com/sp

    /kmc
     
    Keith M. Corbett, Aug 10, 2004
    #2
    1. Advertising

  3. GrindKore

    Nick Kew Guest

    In article <>,
    "Keith M. Corbett" <> writes:

    > As a refinement to this idea, you could parse the XML, generate ESIS
    > ("element structure information set") and hash the output. Comparing ESIS
    > rather than "raw" XML will avoid false positives where input differs in ways
    > that would not affect the information that a processing application would
    > receive. (Comments, whitespace, DTD changes...)


    This can be further refined to detect or ignore selected types of
    difference. For example, you can detect whether a document's
    structure differs only in attribute values or attributes while
    preserving an element tree. Check the archives of the WAI-ER
    working group (at lists.w3.org) for further discussion, including
    prototype implementation of change detection that successfully
    distinguishes 'significant' changes on, for example, a news site
    where stories change frequently, and adverts (which we ignore)
    change with every hit.

    > For an example of an SGML/XML parser that generates ESIS, see James Clark's
    > SP: http://www.jclark.com/sp


    Or the more up-to-date OpenSP (at openjade.sourceforge.net).
    Don't expose the old SP on the Web (eg via CGI): it's not designed
    for it and has security issues.

    --
    Nick Kew
     
    Nick Kew, Aug 10, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michael Ransburg

    Compare & Merge XML documents

    Michael Ransburg, Feb 16, 2004, in forum: Java
    Replies:
    0
    Views:
    424
    Michael Ransburg
    Feb 16, 2004
  2. GrinKore
    Replies:
    3
    Views:
    2,381
    Dag Sunde
    Nov 4, 2003
  3. Nick Kew
    Replies:
    0
    Views:
    792
    Nick Kew
    Nov 4, 2003
  4. GenxLogic
    Replies:
    3
    Views:
    1,328
    andrewmcdonagh
    Dec 6, 2006
  5. Replies:
    4
    Views:
    575
    delirio
    Jun 26, 2007
Loading...

Share This Page