XML and Ontologies

A

Alex Fawcett

I am interested in XML mediation and the use of ontlogies to link
similar but different element names in XML schema. Am I correct in my
understanding that an onltology is a language or set of commands that
is agreed upon thus making mediation between XML element names
uneccesary. Also is this the best method of mediation between XML
files.
thanks for any help
Alex
 
A

Andy Dingley

I am interested in XML mediation and the use of ontlogies to link
similar but different element names in XML schema.

XML is a bit of an unhappy fit with ontologies - you start to
appreciate the differences between RDF and XML.

I suggest giving Protégé a whirl
http://protege.stanford.edu

It's an environment for editing both ontologies and instance data, in
a very approachable style. Certainly worth a look.

I spent much of last week here:
http://protege.stanford.edu/workshop_vi/schedule.html
and blogged a brief trip report here:
http://www.livejournal.com/users/quercus/20830.html


It's a frames-based approach, rather than a description logics
approach. This makes big differences, but you need to get a little
hands-on with both (and frames is perhaps simpler to start with). We
don't know where we'll end up finally, and we might need to combine
both approaches.

Take a look at the W3C's OWL (Web Ontology Language) and the older
work (SHOE, OIL, DAML+OIL) too. These are generally DL-based (take a
look at Manchester's OilEd, if you want a contrast to Protégé)

Am I correct in my
understanding that an onltology is a language or set of commands that
is agreed upon thus making mediation between XML element names
uneccesary.

What's an ontology ? I've written the "30 second elevator pitch" on
this about a dozen times over the last few years. It's very hard to
give one simple definition that meets all needs. Everyone who comes to
ontologies (and it's almost a stampede now) approaches from a
different angle.

Natasha Noy 's classic paper "Ontology 101" is a good place to start
http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html

Broadly, I'd say that it was one definition of a set of entities and
their related properties, expressed in a style that was understood by
other systems.

It may also describe their metaphysical "meanings", which is the
difference between an ontology and a schema (or between DAML and
DAML+OIL)

An ontology does not describe mappings or mediation between two XML
schemas. Depending on your meaning of "mediation" this might be easy
(if you know they're ontologically identical, but you just need to
match up the names), but mapping is generally speaking a fiendishly
difficult problem.

You can approach it with ontologies. You use two ontologies,
describing both the source and target. Then you apply some form of
complex reasoning to identify commonality and as much "mapping" as is
possible. From this you then generate (or auto-generate) code to do
the mapping. Easy.

The problem is that any ontology beyond the trivial has no simple
mapping between entities. Does an employee have a "works-for"
relationship with their boss, or a "works-in" with their department
and a "manages" relationship between boss and department ? This stuff
just doesn't overlay cleanly, so an improved description technique
alone isn't going to fix things.
Also is this the best method of mediation between XML
files.

Depends on the scale of your problem. What's an "XML file" ? Are
these the same two schemas you see every day, or is it a dynamic
problem with every new message ? How different are the two models ?

Incidentally, the same problem between one XML document and an RDBMS
is also common.

There's a lot of very rudimentary work being passed off around this
problem (Oracle 9i being a case in point) where people in suits with a
product to sell are pushing very simple (often XSLT-based) solutions
as a panacea. Those who are seriously in the field know it's not so
easy.


There's also the problem of meta-languages. Many people are already
encountering this with database output, and it has a huge effect on
the use of XSLT.

Consider an RDBMS with a generic XML export filter. What should the
output look like ?

<order>
<order-item>
<a>1</a><b>2</b>
</order-item>
<order-item>
<b>3</b><c>4</c>
</order-item>
</order>


<query name="order" >
<row name="order-item" >
<column name="a" >1</column><column name="b" >2</column>
</row>
<row name="order-item" >
<column name="b" >3</column><column name="c" >4</column>
</row>
</query>


The first of these maps column names onto element names. It generates
comapct XML that's probably how most XML coders would do it manually.
The trouble is that it's a new DTD for every query.

The second is a meta-format. The DTD is the same for every query
output and only the name="" metadata changes. It's verbose (but we
don't care, because our computers deal with that for us)

Ontologically these are _identical_ (they ought to be, or our export
filter is broken). In terms of ease of use though, they're quite
different. The first is unstable and somewhat unpredictable
(although you can easily auto-export a DTD or even ontology at the
same time), the second is hard to process (with XSLT).

XSLT is a language for transformtions of XML data at the structural
level. This works fine for our "type 1" data above, or for much XML,
because XML's data model is inferred from the structure (go read
XML-Infoset). A structural transformation _is_ a transformation at the
level of the data-model.

The second one becomes much harder. We've now separated the structural
level (and the data model of our consistent "generic export format")
from the data model of our "real" data. An XSLT transform still
operates at the structural level (it has to - that's what XSLT does)
and so it's now divorced from the level the interesting data is
residing at. Using XSLT to make real "data-level" transformations
like this becomes a real PITA. In some formats it's straightforward,
but long-winded, in others (like RDF) it becomes well-nigh impossible.
Schematron can sometimes help.

RDF is a bit like "type 2" data, with a "generic export format" that's
already defined by the RDF/XML standards. You can't work with
non-trivial RDF in XSLT, because of just this problem. That's why RDF
is manipulated by tools such as Jena, that work at the data model
level.
 
R

Richard Light

Further to Andy's excellent thoughts on this issue, I would add the
suggestion that you could look into using Topic Maps
(http://www.topicmaps.org/) to represent equivalences between concepts
in schemas. As it happens, I was doing exactly this only last week, as
preparation for a data mapping exercise.

I took the two schemas I wanted to compare, and used XSLT to convert
them to Topic Maps. I then wrote a "links" document containing
relationships between individual concepts. As it happens, I wrote this
in the sort of "compact" style Andy described, e.g.:

<link type="exact">
<member schema="nt" id="condition-check"/>
<member schema="spectrum" id="check"/>
</link>

but I could easily use XSLT to convert this to a proper Topic Map
(containing nothing but Associations).

What I actually did was to convert this "links" document into an HTML
table of links between equivalent concepts in the two schemas. This was
sufficient for the task at hand.

In principle I could instead have made my "links" document into a TM in
its own right, and then used it to merge the two schemas into a single
TM with all the correspondences expressed as TM Associations. This sort
of approach lets you work at a higher level of abstraction than the raw
XML (i.e. at a "Topic Map concepts" level). Conversely, TM XML is
pretty simple (if verbose) in its structure, so you may get more mileage
using XSLT than Andy suggests you would with RDF.

Richard Light
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top