XML equality

Discussion in 'XML' started by onetitfemme, Dec 21, 2005.

  1. onetitfemme

    onetitfemme Guest

    Hi *,

    I have been looking for a definition or at least some workable concept
    of "XML equality".

    Searching on "XML equality" in comp.text.xml, microsoft.public.xsl and
    microsoft.public.xml resulted in no hits

    I also searched for: XML equality schema (single words) on the same
    newsgroups gave very little and not-to-the-point links

    I have read about from the commercial "XMLBooster" that it now
    addresses these issues by generating code to:
    - Check for equality among XML instances
    - Compute the distance between two XML instances
    - Compute the minimal set of changes required to go from one instance
    to another, similar in spirit to what the diff Unix command does for
    text files.

    But it is hard to tell what is it exactly they mean by "equality among
    XML instances" and "distance between two XML instances". I spent some
    time at their web site and I think they are just using sale pitches. I
    couldn't find any docs exacting or at least clarifying their
    claims/terminology

    I know xml is basically (structured) text and there aren't such
    definitions for texts/natural languages' grammars (their usefulness and
    validity actually is more of a semantic not a syntactic one)

    Do you know of works dealing with the definition of such terms?

    Thanks
    otf
    onetitfemme, Dec 21, 2005
    #1
    1. Advertising

  2. onetitfemme

    mgungora Guest

    Look for "xml diff" instead...
    mgungora, Dec 21, 2005
    #2
    1. Advertising

  3. In article <>,
    onetitfemme <> wrote:

    > I have been looking for a definition or at least some workable concept
    >of "XML equality".


    A natural definition would use the infoset. Norm Walsh has a
    definition:

    http://norman.walsh.name/2004/05/19/infoset-equal

    -- Richard
    Richard Tobin, Dec 21, 2005
    #3
  4. onetitfemme

    onetitfemme Guest

    // - - - - - - - - - - - - - - - - - - - -
    > Look for "xml diff" instead...


    mgungora, this is how I started. search comp.text.xml for "OSS,
    java-based XML Diff?"

    I could not find much either, as a matter of fact no one replied to me

    // - - - - - - - - - - - - - - - - - - - -
    > > I have been looking for a definition or at least some workable concept
    > >of "XML equality".


    > A natural definition would use the infoset. Norm Walsh has a
    > definition:


    > http://norman.walsh.name/2004/05/19/infoset-equal


    Richard, thank you for pointing me to norman walsh's article

    // __
    Infoset Equality
    19 May 2004 (modified 11 Sep 2005)
    Volume 7, Issue 86
    by norman walsh

    http://norman.walsh.name/2004/05/19/infoset-equal
    // __

    in which he approaches the concept from the perspective of infosets
    (http://www.w3.org/TR/xml-infoset/) is definitely a good start, but
    there are a number of issues that I see right away by just looking at
    his defs. for example:

    // __ in def. 2:
    2. Element Information Items

    Two element information items are equal if the following properties
    are equal:

    - [namespace name]
    - [local name]
    - [children]
    - [attributes]

    Children are compared in order, attributes without respect to order.
    // __
    ._ I would also include the path to the element, just the path, NOT
    the content of all elements in the path(unless he understands it as
    part of the "[namespace name]"). To me, it is very natural to include
    the path to an element and I wonder why it escaped his considerations.
    ._ also, to even compare documents (and/or dox sections) they should
    first have structural and type affinity on their schemas, at least on
    the sections that are being compared,
    ._ the order of elements of similar children from the same path should
    not really matter (this can be easily/practically solved by sorting
    them all). These two sections of XML "instances" should be equal

    <node4>
    <children>younger child: Paul<children>
    <children>older child: Mary<children>
    </node4>

    and

    <node4>
    <children>older child: Mary<children>
    <children>younger child: Paul<children>
    </node4>

    ._ if an attribute is not mandatory, should these two sections be the
    same?

    <node4>
    <children>older child: Mary<children>
    <children>younger child: Paul<children>
    </node4>

    and

    <node4>
    <children adopted="true">older child: Mary<children>
    <children>younger child: Paul<children>
    </node4>

    Also I would be obvious that you should exclude comments while
    comparing XML dox, but why ignoring processing instructions, when they
    give important type and reference info that defines the included data?

    Thanks
    otf
    onetitfemme, Dec 21, 2005
    #4
  5. In article <>,
    onetitfemme <> wrote:
    > ._ I would also include the path to the element, just the path, NOT
    >the content of all elements in the path


    I don't understand why you would do that. If the elements don't have
    the same path from the root, you wouldn't be comparing them at all.

    Unless you are considering comparison of fragments of documents, in
    which case you probably don't care about the position in the document.

    > ._ also, to even compare documents (and/or dox sections) they should
    >first have structural and type affinity on their schemas, at least on
    >the sections that are being compared,


    XML documents aren't required to have any kind of schema. This would
    be equality on documents+schemas, not documents.

    > ._ the order of elements of similar children from the same path should
    >not really matter (this can be easily/practically solved by sorting
    >them all).


    This requires knowledge of the interpretation of the document that is not
    inherent in the document itself. Given some kind of schema, it might be
    appropriate to interpret the children as a set rather than a sequence,
    but in that case you are again not comparing documents themselves, but
    the data models resulting from application of a schema to the documents.

    > ._ if an attribute is not mandatory, should these two sections be the
    >same?


    As XML documents, they would be different. According to some
    interpretation, they might be the same. Optional attributes
    are not always interpreted as supplying optional information: their
    absence may be as significant as their presence.

    > Also I would be obvious that you should exclude comments while
    >comparing XML dox, but why ignoring processing instructions, when they
    >give important type and reference info that defines the included data?


    Processing instructions are used for many different purposes. But their
    obvious canonical use is to specify the processing of (part of) the
    document rather than its content.

    -- Richard
    Richard Tobin, Dec 21, 2005
    #5
  6. onetitfemme

    onetitfemme Guest

    > Richard Tobin wrote ...
    Hi *,

    > > ._ I would also include the path to the element, just the path, NOT
    > >the content of all elements in the path


    > I don't understand why you would do that. If the elements don't have
    > the same path from the root, you wouldn't be comparing them at all.


    "If the elements don't have the same path from the root, you
    wouldn't be comparing them at all"
    otf: exactly! Here I might be a little biased and/or some intuition
    artifacts might be kicking in. We theoretical physicists
    "naturally" think this way. You may go LOL, but to us if more
    people board a train, it might still reach its end, but the trajectory
    will definitely not be the same ;-)
    Jokes aside now, to me (in an ontology (well structure hierarchical
    tree-like depedency)) the Path to an element is as important as the
    element itself
    > Unless you are considering comparison of fragments of documents, in
    > which case you probably don't care about the position in the document.

    "fragments of documents"
    otf: am I considering, but I still care about the position in the
    document.
    > > ._ also, to even compare documents (and/or dox sections) they should
    > >first have structural and type affinity on their schemas, at least on
    > >the sections that are being compared,



    > XML documents aren't required to have any kind of schema. This would
    > be equality on documents+schemas, not documents.


    "equality on documents+schemas, not documents."
    otf: exactly! "structural and type affinity on their schemas ..."
    should be very important to even consider any type of comparison

    > > ._ the order of elements of similar children from the same path should
    > >not really matter (this can be easily/practically solved by sorting
    > >them all).


    > This requires knowledge of the interpretation of the document that is not
    > inherent in the document itself. Given some kind of schema, it might be
    > appropriate to interpret the children as a set rather than a sequence,
    > but in that case you are again not comparing documents themselves, but
    > the data models resulting from application of a schema to the documents.


    otf: granted! But how is it that you would not interpret the children
    as a set, if no other indication has been explicitly indicated in the
    schema?
    Actually the data models resulting from the COMPLIANCE of documents to
    a schema, so that they become actionable data for an XML application

    > > ._ if an attribute is not mandatory, should these two sections be the
    > >same?

    > As XML documents, they would be different. According to some
    > interpretation, they might be the same. Optional attributes
    > are not always interpreted as supplying optional information: their
    > absence may be as significant as their presence.


    otf: OK. I think I have started to see that there might not be such
    thing as "XML equality" (as you have e.g. for mathematical
    magnitudes), but degrees thereof

    > > Also I would be obvious that you should exclude comments while
    > >comparing XML dox, but why ignoring processing instructions, when they
    > >give important type and reference info that defines the included data?


    > Processing instructions are used for many different purposes. But their
    > obvious canonical use is to specify the processing of (part of) the
    > document rather than its content.

    // - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    I am thinking of tones of web pages (and/or any other marked up dox)
    as a huge forest of texts where "links" among them are not only
    given though URLs, but though their structure as well.
    I understood something from your comments when you talked about the
    "position in the document" (of an element) I think I am missing
    something. Even the path to the elements might not be enough to an
    accurate description of "equality", but since "degrees thereof"
    might be important as well, even the closed graphs to the point an
    element is should be considered

    Thanks
    otf
    onetitfemme, Dec 22, 2005
    #6
  7. onetitfemme

    onetitfemme Guest

    just found a really good article which answers my XML diffing doubts to
    a large extent

    http://www.mulberrytech.com/Extreme/Proceedings/html/2005/Schaffert01/EML2005Schaffert01.html

    Structure-Preserving Difference Search for XML Documents
    by E. Schubert, S. Schaffert, and F. Bry
    abstract:
    Current XML differencing applications usually try to find a minimal
    sequence of edit operations that transform one XML document to another
    XML document (the so-called "edit script"). In our conviction, this
    approach often produces increments that are unintuitive for human
    readers and do not reflect the actual changes. We therefore propose in
    this article a different approach trying to maximize the retained
    structure instead of minimizing the edit sequence. Structure is thereby
    not limited to the usual tree structure of XML - any kind of structural
    relations can be considered (like parent-child, ancestor-descendant,
    sibling, document order). In our opinion, this approach is very
    flexible and able to adapt to the user's requirements. It produces more
    readable results while still retaining a reasonably small edit
    sequence.
    Keywords: Web; XML; Difference
    onetitfemme, Dec 23, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. javaBeginner
    Replies:
    1
    Views:
    12,893
    snoopyjc
    Apr 30, 2008
  2. Karl Gorden

    Object Equality

    Karl Gorden, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    400
    Michael Borgwardt
    Oct 29, 2003
  3. jbj
    Replies:
    9
    Views:
    395
    Roedy Green
    Jul 16, 2004
  4. Toby Inkster

    Racial equality is a myth!

    Toby Inkster, Mar 10, 2005, in forum: HTML
    Replies:
    47
    Views:
    2,689
    Matt Clara
    May 24, 2005
  5. sb
    Replies:
    4
    Views:
    425
    Nick Hounsome
    Apr 3, 2004
Loading...

Share This Page