how to approach an XSLT task

Discussion in 'XML' started by John Harrison, Jan 22, 2004.

  1. This is my first XSLT project and I need some guidance on the approach
    to take.

    The problem is to take a document in one XML format and transform it
    into another XML format. The source format is fixed but the
    destination format will evolve. The two formats are completely
    unrelated, so there are source elements we will ignore and destination
    elements we will never receive.

    However the management has decreed that we mustn't throw away any
    data, since we may modify the destination format in such a way that we
    can now accommodate source data that we couldn't previously. Therefore
    the destination format contains a catch all area that is designed for
    all *unprocessed* source data.

    Perhaps an example will make this clear

    <source>
    <team>
    <name>Arsenal</name>
    <manager><name><first>Arsene</first><last>Wenger</last></name></manager>
    <star-player><name><first>Thierry</first><last>Henri</last></name></star-player>
    </team>
    </source>

    If we aren't interested in the star player this might be transformed
    to say

    <dest>
    <team name="Arsenal">
    <manager>Arsene Wenger</manager>
    </team>
    <unprocessed>
    <team>
    <star-player><name><first>Thierry</first><last>Henri</last></name></star-player>
    </team>
    </unprocessed>
    </dest>

    Notice how the processed portion has been transformed but the
    unprocessed portion (including the team tag) has been dumped unchanged
    into the unprocessed tag.

    So any ideas on what approach can I take to code this in an elegant
    and robust way? I want to do some sort of 'catch all' processing that
    will automatically recognise which portions of the document are
    unprocessed by the rest of my XSLT code.

    MTIA
    John
     
    John Harrison, Jan 22, 2004
    #1
    1. Advertising

  2. On 22/01/2004, around 14:23, John Harrison wrote:

    JH> Notice how the processed portion has been transformed but the
    JH> unprocessed portion (including the team tag) has been dumped unchanged
    JH> into the unprocessed tag.
    This bit is at odds with everything else you've said; the <team>
    element is processed but you want to include it in the <unprocessed>
    area.

    I think I understand why; you need so qualification of where the
    unprocessed data came from. But if you follow that line a little
    further what would you expect <unprocessed> to contain if, say you
    decided you didn't want first names but did want to include the last
    name of <star-player>? I'm guessing you'd want something like this...

    <unprocessed>
    <team>
    <manager>
    <last>Wenger</last>
    </manager>
    <star-player>
    <last>Henri</last>
    </star-player>
    </team>
    </unprocessed>

    Now even that's ok until your data contains more than one team and you
    end up with ...

    <unprocessed>
    <team>
    <manager>
    <last>Wenger</last>
    </manager>
    <star-player>
    <last>Henri</last>
    </star-player>
    </team>
    <team>
    <manager>
    <last>Keeganr</last>
    </manager>
    <star-player>
    <last>Somebloke</last>
    </star-player>
    </team>
    </unprocessed>

    Similar, but different, problems occur if, say, a team has 2 'star
    players'.

    How do you know which bit of un processed data belongs to which
    source?

    I think that you need to think about tagging your elements with an ID
    and using that ID to link the unprocessed data back to the relevant
    bits of the destination data.

    If any of the above is relevant to your problem, let me know and we
    can look at ways around it.

    --
    Stuart
    It's all fun and games 'til someone loses an eye! Then it's a SPORT!
     
    DFN-CIS NetNews Service, Jan 22, 2004
    #2
    1. Advertising

  3. John Harrison

    Andy Dingley Guest

    On 22 Jan 2004 05:58:29 -0800, (John Harrison)
    wrote:

    >So any ideas on what approach can I take to code this in an elegant
    >and robust way


    Namespaces. Just "don't throw any data away" by preserving the whole
    damn lot. For each resource (player, team, whatever) then have a
    property on it called "source" and use xsl:copy to make a simple copy
    of the source resource.

    It's bulky, but that's very rarely a real problem.

    --
    Die Gotterspammerung - Junkmail of the Gods
     
    Andy Dingley, Jan 22, 2004
    #3
  4. Andy Dingley <> wrote in message news:<>...
    > On 22 Jan 2004 05:58:29 -0800, (John Harrison)
    > wrote:
    >
    > >So any ideas on what approach can I take to code this in an elegant
    > >and robust way

    >
    > Namespaces. Just "don't throw any data away" by preserving the whole
    > damn lot. For each resource (player, team, whatever) then have a
    > property on it called "source" and use xsl:copy to make a simple copy
    > of the source resource.
    >
    > It's bulky, but that's very rarely a real problem.


    That's certainly an option, and probably what we'll go for. I thought
    I could do something smarter but it seems to be more difficult than I
    appreciated.

    I don't get the namespace angle however, could you explain that in a
    bit more detail (and shouldn't it be xsl:copy-of not xsl:copy?).

    John
     
    John Harrison, Jan 23, 2004
    #4
  5. DFN-CIS NetNews Service <> wrote in message news:<>...
    > On 22/01/2004, around 14:23, John Harrison wrote:
    >
    > JH> Notice how the processed portion has been transformed but the
    > JH> unprocessed portion (including the team tag) has been dumped unchanged
    > JH> into the unprocessed tag.
    > This bit is at odds with everything else you've said; the <team>
    > element is processed but you want to include it in the <unprocessed>
    > area.
    >
    > I think I understand why; you need so qualification of where the
    > unprocessed data came from.


    Right.

    > But if you follow that line a little
    > further what would you expect <unprocessed> to contain if, say you
    > decided you didn't want first names but did want to include the last
    > name of <star-player>? I'm guessing you'd want something like this...
    >
    > <unprocessed>
    > <team>
    > <manager>
    > <last>Wenger</last>
    > </manager>
    > <star-player>
    > <last>Henri</last>
    > </star-player>
    > </team>
    > </unprocessed>
    >
    > Now even that's ok until your data contains more than one team and you
    > end up with ...
    >
    > <unprocessed>
    > <team>
    > <manager>
    > <last>Wenger</last>
    > </manager>
    > <star-player>
    > <last>Henri</last>
    > </star-player>
    > </team>
    > <team>
    > <manager>
    > <last>Keeganr</last>
    > </manager>
    > <star-player>
    > <last>Somebloke</last>
    > </star-player>
    > </team>
    > </unprocessed>
    >
    > Similar, but different, problems occur if, say, a team has 2 'star
    > players'.
    >
    > How do you know which bit of un processed data belongs to which
    > source?


    It's a good point and one I hadn't thought of.

    >
    > I think that you need to think about tagging your elements with an ID
    > and using that ID to link the unprocessed data back to the relevant
    > bits of the destination data.
    >
    > If any of the above is relevant to your problem, let me know and we
    > can look at ways around it.


    Obviously this is more complex than I realised. I can't think of a way
    to make your ID tagging idea work in principle (let alone in
    practice). Presumably identical attributes would be added to tags that
    are common to both the processed and unprocessed portions of the
    destination file but unforuntately the processing we do could
    completely remove or rearrange tags so matching the processed to the
    unprocessed data seems impossible without writing a lot of ugly
    special case code. Something I'd hoped to avoid.

    Oh well, thanks for your input. I can do it the ugly way, I was just
    hoping for an elegant solution.

    John
     
    John Harrison, Jan 23, 2004
    #5
  6. John Harrison

    Andy Dingley Guest

    On 23 Jan 2004 01:28:05 -0800, (John Harrison)
    wrote:

    >I don't get the namespace angle however,


    You not only need to preserve this source content, you also want to
    make it non-obvious to future processing. If you're transforming from
    one DTD to another (frequently the target "DTD" will be (X)HTML plus a
    load of class selectors) then an easy way to achieve this is to place
    the copied source in a different namespace from the "useful"
    destination content.

    >(and shouldn't it be xsl:copy-of not xsl:copy?).


    Probably. Depends what you're doing and how flexible it needs to be,
    but yes I think you could probably do this example with a single
    xsl:copy-of
     
    Andy Dingley, Jan 23, 2004
    #6
  7. John Harrison

    Peter Flynn Guest

    John Harrison wrote:
    [snip]
    > So any ideas on what approach can I take to code this in an elegant
    > and robust way?


    Provided the unprocessed material is well-formed,

    <!ELEMENT unprocessed ANY>

    is your friend, and will let the file be valid while allowing any
    element to go in there.

    > I want to do some sort of 'catch all' processing that
    > will automatically recognise which portions of the document are
    > unprocessed by the rest of my XSLT code.


    A specified default template would do this provided you don't mind
    having multiple <unprocessed> elements in the output.

    <xsl:template match="*">
    <unprocessed>
    <xsl:copy-of select="."/>
    </unprocessed>
    </xsl:template>

    But it's 2.40am and I haven't tested this :)

    ///Peter
     
    Peter Flynn, Jan 24, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Victor
    Replies:
    0
    Views:
    8,847
    Victor
    Sep 1, 2004
  2. krabhi
    Replies:
    1
    Views:
    9,316
    Marco Meschieri
    Aug 9, 2006
  3. teggy
    Replies:
    0
    Views:
    822
    teggy
    May 29, 2007
  4. Mike
    Replies:
    1
    Views:
    1,945
    GArlington
    May 12, 2008
  5. Stéphane Wirtel
    Replies:
    3
    Views:
    375
    Stephane Wirtel
    Jun 15, 2007
Loading...

Share This Page