I am interested in XML mediation and the use of ontlogies to link
similar but different element names in XML schema.
XML is a bit of an unhappy fit with ontologies - you start to
appreciate the differences between RDF and XML.
I suggest giving Protégé a whirl
http://protege.stanford.edu
It's an environment for editing both ontologies and instance data, in
a very approachable style. Certainly worth a look.
I spent much of last week here:
http://protege.stanford.edu/workshop_vi/schedule.html
and blogged a brief trip report here:
http://www.livejournal.com/users/quercus/20830.html
It's a frames-based approach, rather than a description logics
approach. This makes big differences, but you need to get a little
hands-on with both (and frames is perhaps simpler to start with). We
don't know where we'll end up finally, and we might need to combine
both approaches.
Take a look at the W3C's OWL (Web Ontology Language) and the older
work (SHOE, OIL, DAML+OIL) too. These are generally DL-based (take a
look at Manchester's OilEd, if you want a contrast to Protégé)
Am I correct in my
understanding that an onltology is a language or set of commands that
is agreed upon thus making mediation between XML element names
uneccesary.
What's an ontology ? I've written the "30 second elevator pitch" on
this about a dozen times over the last few years. It's very hard to
give one simple definition that meets all needs. Everyone who comes to
ontologies (and it's almost a stampede now) approaches from a
different angle.
Natasha Noy 's classic paper "Ontology 101" is a good place to start
http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html
Broadly, I'd say that it was one definition of a set of entities and
their related properties, expressed in a style that was understood by
other systems.
It may also describe their metaphysical "meanings", which is the
difference between an ontology and a schema (or between DAML and
DAML+OIL)
An ontology does not describe mappings or mediation between two XML
schemas. Depending on your meaning of "mediation" this might be easy
(if you know they're ontologically identical, but you just need to
match up the names), but mapping is generally speaking a fiendishly
difficult problem.
You can approach it with ontologies. You use two ontologies,
describing both the source and target. Then you apply some form of
complex reasoning to identify commonality and as much "mapping" as is
possible. From this you then generate (or auto-generate) code to do
the mapping. Easy.
The problem is that any ontology beyond the trivial has no simple
mapping between entities. Does an employee have a "works-for"
relationship with their boss, or a "works-in" with their department
and a "manages" relationship between boss and department ? This stuff
just doesn't overlay cleanly, so an improved description technique
alone isn't going to fix things.
Also is this the best method of mediation between XML
files.
Depends on the scale of your problem. What's an "XML file" ? Are
these the same two schemas you see every day, or is it a dynamic
problem with every new message ? How different are the two models ?
Incidentally, the same problem between one XML document and an RDBMS
is also common.
There's a lot of very rudimentary work being passed off around this
problem (Oracle 9i being a case in point) where people in suits with a
product to sell are pushing very simple (often XSLT-based) solutions
as a panacea. Those who are seriously in the field know it's not so
easy.
There's also the problem of meta-languages. Many people are already
encountering this with database output, and it has a huge effect on
the use of XSLT.
Consider an RDBMS with a generic XML export filter. What should the
output look like ?
<order>
<order-item>
<a>1</a><b>2</b>
</order-item>
<order-item>
<b>3</b><c>4</c>
</order-item>
</order>
<query name="order" >
<row name="order-item" >
<column name="a" >1</column><column name="b" >2</column>
</row>
<row name="order-item" >
<column name="b" >3</column><column name="c" >4</column>
</row>
</query>
The first of these maps column names onto element names. It generates
comapct XML that's probably how most XML coders would do it manually.
The trouble is that it's a new DTD for every query.
The second is a meta-format. The DTD is the same for every query
output and only the name="" metadata changes. It's verbose (but we
don't care, because our computers deal with that for us)
Ontologically these are _identical_ (they ought to be, or our export
filter is broken). In terms of ease of use though, they're quite
different. The first is unstable and somewhat unpredictable
(although you can easily auto-export a DTD or even ontology at the
same time), the second is hard to process (with XSLT).
XSLT is a language for transformtions of XML data at the structural
level. This works fine for our "type 1" data above, or for much XML,
because XML's data model is inferred from the structure (go read
XML-Infoset). A structural transformation _is_ a transformation at the
level of the data-model.
The second one becomes much harder. We've now separated the structural
level (and the data model of our consistent "generic export format")
from the data model of our "real" data. An XSLT transform still
operates at the structural level (it has to - that's what XSLT does)
and so it's now divorced from the level the interesting data is
residing at. Using XSLT to make real "data-level" transformations
like this becomes a real PITA. In some formats it's straightforward,
but long-winded, in others (like RDF) it becomes well-nigh impossible.
Schematron can sometimes help.
RDF is a bit like "type 2" data, with a "generic export format" that's
already defined by the RDF/XML standards. You can't work with
non-trivial RDF in XSLT, because of just this problem. That's why RDF
is manipulated by tools such as Jena, that work at the data model
level.