container elements for repeating elements ('element farms') needed?

Discussion in 'XML' started by Wolfgang Lipp, Jan 27, 2004.

  1. my question is: do we need container elements for
    repeating elements in data-centric xml documents? or is
    it for some reason very advisable to introduce
    containers in xml documents even where not strictly
    needed? how can a recommendation on this in the light of
    existing tools like w3c xml schema and relaxng as well
    es established practice be answered? i would greatly
    appreciate any words, pointers, and links.

    the exposition of the problem has become a rather long
    one, done partly to make the matter clear to myself, and
    most people will probably not have to read all of it.

    to ease the discussion, let me introduce a very simple
    data schema, one that describes library with books,
    employees, and readers. it looks like this:

    #=====================================================
    library
    address
    *book
    *employee
    *reader

    book
    *author
    title
    isbn

    author extends person

    employee extends person

    reader extends person
    card-id

    person
    name
    last
    first
    #=====================================================

    the star is to be read in the usual way as 'zero or more
    instances of'. i believe the above structure, where
    repeating elements are introduced without explicit
    container elements, to be sufficient and extensible: in
    case i plan to describe individual employees in more
    detail, i can always amend the schema of <employee>
    (which presently only holds first and last name) and
    leave the schema of the <library> element untouched. (i
    also believe that mixed content and order between
    elements should be eschewed in most data-centric xml, so
    i do not make an effort to express mixed content or
    order between sibling elements in the above.)

    now, there are people who do not agree with this kind of
    schema (let's call it the implicit model) and insist on
    container elements for repeatables. this means we have
    to explicitly introduce <books>, <employees>, and
    <readers>, so the library schema will look like this:

    #=====================================================
    library
    books
    employees
    readers

    books
    *book

    employees
    *employee

    readers
    *reader

    book
    authors
    title
    isbn

    authors
    *author

    author extends person

    employee extends person

    reader extends person
    card-id

    person
    name
    last
    first
    #=====================================================

    the argument, if i understand correctly, goes that in
    case i want to change the structure of a cointained
    element, then only in the explicit model i can do so by
    redefining e.g. <employee> (and perhaps <employees>),
    but not the <library> element. it is also claimed that i
    will only then be able to use typing and have employees
    as an entity that i can change later on, and have it
    changed in all the places it appears. third, it is
    claimed that for reasons of object-oriented mapping,
    container elements are desirable.

    i would like to dub explicit container elements 'element
    farms' (think of server farms -- many of the same
    bundled) for short, and call the above set of claims the
    'element farm constraint', which in essence says that
    you should introduce a container element (a farm)
    whenever you allow the repetition of elements in data-
    centric xml.

    now, the second argument is obviously correct in so far
    as i can *only* in the explicit model modify an element
    <employees> and have that change propagate everywhere,
    for the simple reason there is no such element in the
    implicit model. the question is, why should i want to do
    such a thing? i think it is a design decision whether or
    not a given entity or set of entities is modelled
    explicitly or not. i do not have <books>, <readers>, or
    <employees> in the implicit model since i have nothing
    to say about these groups in general, only about each
    individual. this could be different: for example, at
    some point we discover that all readers are subject to a
    same fee, and have a maximum of books to take out of the
    library. then, the set of readers becomes more tangible,
    and i will have to change the implicit model like this:

    #=====================================================
    library
    address
    *book
    *employee
    readers

    readers
    fee
    maximum-number-of-books
    *reader

    reader extends person
    card-id
    #=====================================================

    this is in fact a change in the model that did not so
    automatically percolate through all tiers -- i had to
    modify my definition of <library>. so what? new facts
    are in town, and we make space for them. we did not
    build a complete, all-embracing, all-extensible data
    model with the first shot, but who ever will? sure the
    explicit model would have made it easier, but it is also
    somewhat bulkier. second, what do you do when you find
    you have something new to say about the library itself?
    you will have to change the <library> element, in both
    models. but third and devastatingly, we are faced, in
    both models, with the situation that not all repeated
    elements are covered by container elements -- the
    readers element, above, has two more children. that's
    allright for the implicit model, but in order to satisfy
    the element farm constraint, we must introduce one more
    container <xxx>, like so:

    #=====================================================
    readers
    fee
    maximum-number-of-books
    xxx

    xxx
    *reader
    #=====================================================

    at this juncture, it becomes clear that

    * explicit containers for repeated elements will under
    * the element farm constraint never be true useful
    * entities in the sense of data modelling, since they
    * are never allowed to hold any data pertaining to
    * them per se.

    by the way, i do not see a very strict reason why not to
    add an element <readers> but not necessarily make it the
    container for the <reader> elements -- sounds strange?
    well:

    #=====================================================
    library
    address
    *book
    *employee
    *reader
    readers

    readers
    fee
    maximum-number-of-books

    reader extends person
    card-id

    #=====================================================

    this structure allows you to query for a collective
    'readers' and to scan for individual instances of
    'reader' -- in a way the collective is independent of
    its members, since we can still say that there is a fee
    to pay and a maximum number of books to take home even
    with zero readers.

    lastly, it is possible to model employees and readers
    alike as sets of generic persons. in that case, we must
    have both collective elements:

    #=====================================================
    library
    ...
    employees
    readers

    employees
    *person

    readers
    *person

    #=====================================================

    however, since it is easy to subclass and quite
    foreseeable that employees and readers do differ from
    generic persons in the eyes of a library's data
    administration, this approach is perhaps not very much
    to be recommended.

    sorry again for the longish mail,

    _wolfgang lipp
    w.lipp at bgbm dot org
    Wolfgang Lipp, Jan 27, 2004
    #1
    1. Advertising

  2. In article <>,
    Wolfgang Lipp <> wrote:

    % my question is: do we need container elements for
    % repeating elements in data-centric xml documents?

    You can often get away with it, but you may find it limits you in
    unexpected ways. For instance, if you wanted to move the lists of
    employees and readers from your example to external documents, then you
    must have a containing element for each of them. If you wanted to
    include those documents as external parsed entities, then your library
    schema must allow for the containing element.

    There are certainly cases where people have elected to leave off
    containers and it's made it more difficult to process the data. If your
    book element didn't exist, and you just had a list of titles, authors,
    and isbns, the data could still be unambiguous, but more complicated.
    I'm inclined to think that it's not worth spending the effort to decide
    whether any given container is an example of one that might not be
    useful, and to put it in if it represents some identifiable entity
    (the library's collection and its subscriber base can each be thought
    of as distinct entities).
    --

    Patrick TJ McPhee
    East York Canada
    Patrick TJ McPhee, Jan 30, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Wolfgang Lipp
    Replies:
    0
    Views:
    465
    Wolfgang Lipp
    Jan 28, 2004
  2. Wolfgang Lipp
    Replies:
    0
    Views:
    363
    Wolfgang Lipp
    Feb 9, 2004
  3. Wolfgang Lipp
    Replies:
    0
    Views:
    351
    Wolfgang Lipp
    Feb 9, 2004
  4. Wolfgang Lipp
    Replies:
    8
    Views:
    404
    Wolfgang Lipp
    Feb 9, 2004
  5. Replies:
    4
    Views:
    788
    Daniel T.
    Feb 16, 2006
Loading...

Share This Page