Techniques for truly reusable structures in XML schemas

Discussion in 'XML' started by Steve Jorgensen, Sep 8, 2005.

  1. I recently produced an XML Schema to support several kinds of transactions
    within a particular business domain. In the process, I learned pretty much
    all of how W3C XML Schema works, learned some Schematron, read up on XML
    design patterns and best practices, and thought I knew what I was doing.

    Since there was more overlap than not between the contents of the different
    transaction types, I designed a single schema with a single root, and used
    some xs:choice elements to handle the different variations. I thought I was
    doing a good job of designing a schema that could be gracefully extended to
    handle different cases.

    Next, the requirements had a medium-big change, so I went to try to extend the
    schema to handle a new transaction type that was a bit different, and the
    whole schema came tumbling down. My exquisitely designed schema built to deal
    with change turned out to be a house of cards that blew over in the next

    I realized a few things from this:

    1. Hierarchical systems are even less flexible than they first appear.
    2. Making XML Schemas flexible is really, really hard and requires a knowledge
    of specific techniques to achieve it.
    3. There's lots of good advice on the Internet regarding how to best use
    schema constructs and namespaces, but not much on how to actually design the
    node hierarchies in a schema for maximum flexibility.
    4. After my schema downfall, invented a couple of patterns that really seem to
    help, but I still don't know where to find more advice along these lines.

    A big difficulty with XML is that it encourages us to choose a hierarchical
    arrangement early that may not work for all cases, because XML is based on the
    use of hierarchies, and the alternative seems to be to add IDREFs or keyrefs
    that make the code and the document more convoluted - it starts to look more
    like a relational database schema than a tree structure. The partial solution
    I've found to this problem is to add layers of abstraction such that instead
    of making one entity a child of another, both elements get a common parent.


    <Customer><Name value="Foo, Inc."/></Customer>
    <Invoice><InvoiceNumber value="1234"/></Invoice>
    <Invoice><InvoiceNumber value="1255"/></Invoice>

    Say we start with a schema that contains billing account details and invoices,
    and there is a 1-to-many relationship between accounts and invoices. The
    obvious construction for 1-to-many is to make invoices children of billing
    accounts (as above), but if we go down that path, then what do we do about
    another document type with billing accounts, but not invoices? We can't just
    reuse the billing account element without allowing invoice elements as well,
    and that makes no sense. We can make a complex type definition and restrict
    or extend the type, but that's another can of worms that gets quickly out of
    hand. We can un-nest the elements and use keys and keyrefs, but now the
    documents are much harder to process.

    <BillingAccount id="123">
    <Customer><Name value="Foo, Inc."/></Customer>

    <Invoice billingAccountID="123">
    <InvoiceNumber value="1234"/>
    <Invoice billingAccountID="123">
    <InvoiceNumber value="1255"/>

    To get out of this, what we need is a shared parent for the 2 element types in
    the original schema. If we add an element for account activity such that each
    account activity element has one billing account child and zero or more
    invoice children, that supports the needs of the current schema, but it still
    leaves the billing account element usable in a document that does not deal
    with invoices.

    <Customer><Name value="Foo, Inc."/></Customer>
    <Invoice><InvoiceNumber value="1234"/></Invoice>
    <Invoice><InvoiceNumber value="1255"/></Invoice>

    This is a partial solution, because it's still terribly not hard to come up
    business rule changes that can break it, but it's much more resillient than
    the original account/invoice hierarchy, and it's much less messy than looking
    up the accounts by key reference. All in all - a good compromise.

    Where, if anywhere, can I go to find more helpful advice along these lines?
    Steve Jorgensen, Sep 8, 2005
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Soeren
    Sep 22, 2004
  2. Replies:
  3. Replies:
    Roedy Green
    Aug 30, 2005
  4. Tony Prichard
    Tony Prichard
    Dec 12, 2003
  5. tweak
    Eric Sosman
    Jun 11, 2004
  6. Alfonso Morra
    Emmanuel Delahaye
    Sep 24, 2005
  7. Andy B
    Andy B
    Aug 7, 2008
  8. Bhasker V Kode
    Bhasker V Kode
    May 26, 2007