how to approach an XSLT task

J

John Harrison

This is my first XSLT project and I need some guidance on the approach
to take.

The problem is to take a document in one XML format and transform it
into another XML format. The source format is fixed but the
destination format will evolve. The two formats are completely
unrelated, so there are source elements we will ignore and destination
elements we will never receive.

However the management has decreed that we mustn't throw away any
data, since we may modify the destination format in such a way that we
can now accommodate source data that we couldn't previously. Therefore
the destination format contains a catch all area that is designed for
all *unprocessed* source data.

Perhaps an example will make this clear

<source>
<team>
<name>Arsenal</name>
<manager><name><first>Arsene</first><last>Wenger</last></name></manager>
<star-player><name><first>Thierry</first><last>Henri</last></name></star-player>
</team>
</source>

If we aren't interested in the star player this might be transformed
to say

<dest>
<team name="Arsenal">
<manager>Arsene Wenger</manager>
</team>
<unprocessed>
<team>
<star-player><name><first>Thierry</first><last>Henri</last></name></star-player>
</team>
</unprocessed>
</dest>

Notice how the processed portion has been transformed but the
unprocessed portion (including the team tag) has been dumped unchanged
into the unprocessed tag.

So any ideas on what approach can I take to code this in an elegant
and robust way? I want to do some sort of 'catch all' processing that
will automatically recognise which portions of the document are
unprocessed by the rest of my XSLT code.

MTIA
John
 
D

DFN-CIS NetNews Service

On 22/01/2004, around 14:23, John Harrison wrote:

JH> Notice how the processed portion has been transformed but the
JH> unprocessed portion (including the team tag) has been dumped unchanged
JH> into the unprocessed tag.
This bit is at odds with everything else you've said; the <team>
element is processed but you want to include it in the <unprocessed>
area.

I think I understand why; you need so qualification of where the
unprocessed data came from. But if you follow that line a little
further what would you expect <unprocessed> to contain if, say you
decided you didn't want first names but did want to include the last
name of <star-player>? I'm guessing you'd want something like this...

<unprocessed>
<team>
<manager>
<last>Wenger</last>
</manager>
<star-player>
<last>Henri</last>
</star-player>
</team>
</unprocessed>

Now even that's ok until your data contains more than one team and you
end up with ...

<unprocessed>
<team>
<manager>
<last>Wenger</last>
</manager>
<star-player>
<last>Henri</last>
</star-player>
</team>
<team>
<manager>
<last>Keeganr</last>
</manager>
<star-player>
<last>Somebloke</last>
</star-player>
</team>
</unprocessed>

Similar, but different, problems occur if, say, a team has 2 'star
players'.

How do you know which bit of un processed data belongs to which
source?

I think that you need to think about tagging your elements with an ID
and using that ID to link the unprocessed data back to the relevant
bits of the destination data.

If any of the above is relevant to your problem, let me know and we
can look at ways around it.
 
A

Andy Dingley

So any ideas on what approach can I take to code this in an elegant
and robust way

Namespaces. Just "don't throw any data away" by preserving the whole
damn lot. For each resource (player, team, whatever) then have a
property on it called "source" and use xsl:copy to make a simple copy
of the source resource.

It's bulky, but that's very rarely a real problem.
 
J

John Harrison

Andy Dingley said:
Namespaces. Just "don't throw any data away" by preserving the whole
damn lot. For each resource (player, team, whatever) then have a
property on it called "source" and use xsl:copy to make a simple copy
of the source resource.

It's bulky, but that's very rarely a real problem.

That's certainly an option, and probably what we'll go for. I thought
I could do something smarter but it seems to be more difficult than I
appreciated.

I don't get the namespace angle however, could you explain that in a
bit more detail (and shouldn't it be xsl:copy-of not xsl:copy?).

John
 
J

John Harrison

DFN-CIS NetNews Service said:
On 22/01/2004, around 14:23, John Harrison wrote:

JH> Notice how the processed portion has been transformed but the
JH> unprocessed portion (including the team tag) has been dumped unchanged
JH> into the unprocessed tag.
This bit is at odds with everything else you've said; the <team>
element is processed but you want to include it in the <unprocessed>
area.

I think I understand why; you need so qualification of where the
unprocessed data came from.
Right.

But if you follow that line a little
further what would you expect <unprocessed> to contain if, say you
decided you didn't want first names but did want to include the last
name of <star-player>? I'm guessing you'd want something like this...

<unprocessed>
<team>
<manager>
<last>Wenger</last>
</manager>
<star-player>
<last>Henri</last>
</star-player>
</team>
</unprocessed>

Now even that's ok until your data contains more than one team and you
end up with ...

<unprocessed>
<team>
<manager>
<last>Wenger</last>
</manager>
<star-player>
<last>Henri</last>
</star-player>
</team>
<team>
<manager>
<last>Keeganr</last>
</manager>
<star-player>
<last>Somebloke</last>
</star-player>
</team>
</unprocessed>

Similar, but different, problems occur if, say, a team has 2 'star
players'.

How do you know which bit of un processed data belongs to which
source?

It's a good point and one I hadn't thought of.
I think that you need to think about tagging your elements with an ID
and using that ID to link the unprocessed data back to the relevant
bits of the destination data.

If any of the above is relevant to your problem, let me know and we
can look at ways around it.

Obviously this is more complex than I realised. I can't think of a way
to make your ID tagging idea work in principle (let alone in
practice). Presumably identical attributes would be added to tags that
are common to both the processed and unprocessed portions of the
destination file but unforuntately the processing we do could
completely remove or rearrange tags so matching the processed to the
unprocessed data seems impossible without writing a lot of ugly
special case code. Something I'd hoped to avoid.

Oh well, thanks for your input. I can do it the ugly way, I was just
hoping for an elegant solution.

John
 
A

Andy Dingley

I don't get the namespace angle however,

You not only need to preserve this source content, you also want to
make it non-obvious to future processing. If you're transforming from
one DTD to another (frequently the target "DTD" will be (X)HTML plus a
load of class selectors) then an easy way to achieve this is to place
the copied source in a different namespace from the "useful"
destination content.
(and shouldn't it be xsl:copy-of not xsl:copy?).

Probably. Depends what you're doing and how flexible it needs to be,
but yes I think you could probably do this example with a single
xsl:copy-of
 
P

Peter Flynn

John Harrison wrote:
[snip]
So any ideas on what approach can I take to code this in an elegant
and robust way?

Provided the unprocessed material is well-formed,

<!ELEMENT unprocessed ANY>

is your friend, and will let the file be valid while allowing any
element to go in there.
I want to do some sort of 'catch all' processing that
will automatically recognise which portions of the document are
unprocessed by the rest of my XSLT code.

A specified default template would do this provided you don't mind
having multiple <unprocessed> elements in the output.

<xsl:template match="*">
<unprocessed>
<xsl:copy-of select="."/>
</unprocessed>
</xsl:template>

But it's 2.40am and I haven't tested this :)

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top