rdf, xmp

I

Imbaud Pierre

I have to add access to some XMP data to an existing python
application.
XMP is built on RDF, RDF is built on XML.
I try to reuse as much of possible of existing code.
btw, dont mistake XMP (http://www.adobe.com/products/xmp/) with
XMPP (http://www.faqs.org/rfcs/rfc3920.html), backed by PyXMPP
(http://pyxmpp.jajcus.net/). XMP is adobe's standard for storing
metadata in files (jpg, pdf).
Are there people with the same concern out there?

It seemed logical to use existing rdf libraries. I found two, RDFLib
(http://rdflib.net) and pyrple (http://infomesh.net/pyrple/).
My first contact with RDFLib is disappointing: real intricate lib
(lot of modules, lot of methods), almost no documentation (an almost
empty Epydoc generated documentation frame), puzzling experiments.

I guess XMP uses a real tiny subset of RDF possibilities, and maybe
RDFLib is fine for ambitious design, but too heavy in this case?
Ill now dig a little on pyrple.
Feedback on these matters, please?
 
A

Andy Dingley

Imbaud said:
I have to add access to some XMP data to an existing python
application.
XMP is built on RDF, RDF is built on XML.

RDF is _NOT_ built on top of XML. Thinking that it is causes a lot of
trouble in the architecture of big RDF projects. RDF is a data model,
not a serialisation. The data model is also a graph (more than XML can
cope with) and can span multiple documents. It's only RDF/XML that's
the serialisation of RDF into XML and good architectures start from
thinking about the RDF data model, not this RDF/XML serialisation.

As to RDF handling, then the usual toolset is Jena (in Java) and
Redland has a Python binding although Redland is fairly aged now.

I'm unfamiliar with XMP and won't have a chance to look at it until
Monday. However if XMP is strongly "XML like" despite claiming to be
RDF, then you might find that handling a pure XMP problem is quite
easily done with XML tools.

Famously RDF/XML is unprocessable with XSLT if it's sophisticated, but
quite easy if it's restricted to only a simple XML-like RDF model. XMP
could well be similar.
 
I

Imbaud Pierre

Andy Dingley a écrit :
RDF is _NOT_ built on top of XML. Thinking that it is causes a lot of
trouble in the architecture of big RDF projects. RDF is a data model,
not a serialisation. The data model is also a graph (more than XML can
cope with) and can span multiple documents. It's only RDF/XML that's
the serialisation of RDF into XML and good architectures start from
thinking about the RDF data model, not this RDF/XML serialisation.
Granted, I oversimplified, my statement was misleading. I tried to
help unknowledgeable reader understand what it was about.
As to RDF handling, then the usual toolset is Jena (in Java) and
Redland has a Python binding although Redland is fairly aged now.

I'm unfamiliar with XMP and won't have a chance to look at it until
Monday. However if XMP is strongly "XML like" despite claiming to be
RDF, then you might find that handling a pure XMP problem is quite
easily done with XML tools.
This was my wild guess: the data model I deal with (XMP data, I mean) is
hardly more than a bunch of key-value pairs - with control for
vocabulary, and some typing.
Famously RDF/XML is unprocessable with XSLT if it's sophisticated, but
quite easy if it's restricted to only a simple XML-like RDF model. XMP
could well be similar. Still unclear.
Thanks for your help!
 
A

Andy Dingley

Imbaud said:
I have to add access to some XMP data to an existing python
application.
XMP is built on RDF,

I'm just looking at the XMP Spec from the Adobe SDK. First impressions
only, as I don't have time to read the whole thing in detail.

This spec doesn't inspire me with confidence as to its accuracy and
consistency. I think I've already seen some obscure conditions where
developers will be unable to unambiguously interpret the spec. Compared
to MPEG-7 however, at least it's not 700 pages long!

The spec does state that property values can be structured, which is
one of the best reasons to start using RDF for storing metadata.
However I think actual use of these would be minimal in "typical" XML
applicaations. At worst it's a simple data typing exercise of a
two-valued tuple for "dimensions", rather than separate height and
width properties. These are no problem to process.

In particular, the XMP data model is a single-rooted tree, i.e. there
is an external model of "a resource" (i.e. one image file) and an XMP
document only addresses a single "resource" at a time.

A major restriction in XMP is that it has no concept of shared
resources between properties (and it can't, as there's no rdf:ID or
rdf:about allowed). This is always hard to process, but it's also very
valuable for doing metadata. Imagine a series of wildlife images that
all refer to a particular safari, national park and species. We might
be able to share a species reference between images easily enough by
referring to a well-known public vocabulary, but it would also be
useful (and concise) to be able to define one "expedition" in a subject
property on one image, then share that same resource to others. As it
is, we'd have to duplicate the full definition. Even in XMP's "separate
document for each image resource" model we still might wish to do
something similar, such as both photographer and director being the
same person. When you start having 20MB+ of metadata per video
resource (been there, done that!) then this sort of duplication is a
huge problem. Not just because of the data volume, but because we need
to identify that referenced resources are identical, not merely havingg
the same in their property values (i.e. I'm the same John Smith, not
just two people with the same name).

There is no visible documentation of vocabularies, inetrnal or
external. Some pre-defined schemas are given that define property sets,
but there's nothing on the values of these, or how to describe that
values are being taken from a particular external vocabulary (you can
do this with RDF, but they don't describe it). This isn't widely seen
as important, except by people who've already been through large media
annotation projects.

It's RDF-like, not just XML. However it's also a subset of RDF - in
particular rdf:about isn't supported, which removes many of the graph
structure constructs that make RDF such a pain to process with the
basic XML tools. Read their explicit not on which RDF features aren't
supported -- they're enough to make XMP easily processable with XSLT.

The notes on embedding of XMP in XML and XML in XMP are both simplistic
and ugly.

I still don't see much _point_ in XMP. I could achieve all this much
with two cups of coffee, RDF and Dublin Core and a whiteboard pen.
Publishing metadata is good, publishing new _ways_ of publishing
metadata is very bad!


Overall, it could be far better, it could be better without being more
complicated, and it's at least 5 years behind industry best practice
for fields like museums and libraries. It's also a field that's still
so alien to media and creative industries that the poor description and
support of XMP will cause them to invent many bad architectures and
data models for a few years to come.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top