element farms (containers for repeated elements) needed?

Discussion in 'XML' started by Wolfgang Lipp, Feb 9, 2004.

  1. <annotation>
    the first eleven contributions in this thread started
    as an off-list email discussion; i have posted them
    here with the consent of their authors. -- _w.lipp
    </annotation>

    From: Eric van der Vlist [mailto:]
    Sent: Tuesday, 27?January?2004 13:53

    Hi,

    On Tue, 2004-01-27 at 13:25, Lipp, Wolfgang wrote:

    > my question is: do we need container elements for
    > repeating elements in data-centric xml documents?


    No, I don't think so.

    > or is
    > it for some reason very advisable to introduce
    > containers in xml documents even where not strictly
    > needed? how can a recommendation on this in the light of
    > existing tools like w3c xml schema and relaxng


    I tend to think that tools should have a limited impact on document
    design (of course not going to the point where the documents can't be
    processed at all) and that a good design isn't necessarily one which
    imports all the restrictions of all the tools :) ...

    That being said, there is absolutely no restriction in using RELAX NG
    without container elements and even W3C XML Schema won't bit you either
    unless you say that you want to allow the elements to appear in any
    order (using xs:all is the only case I can think of that mandates
    containers with WXS).

    > as well
    > es established practice be answered?


    As you said, some developers (and even good ones for whom I have a lot
    of respect) consider that containers are a good practice but I don't.

    > i would greatly
    > appreciate any words, pointers, and links.


    Hope this helps.

    Eric
    --
    Lisez-moi sur XMLfr.
    http://xmlfr.org/index/person/eric van der vlist/
    Upcoming XML schema languages tutorial:
    - Santa Clara -half day- (15/03/2004) http://masl.to/?J24916E96
    ------------------------------------------------------------------------
    Eric van der Vlist http://xmlfr.org http://dyomedea.com
    (ISO) RELAX NG ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
    (W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
    ------------------------------------------------------------------------
     
    Wolfgang Lipp, Feb 9, 2004
    #1
    1. Advertising

  2. <annotation>
    the first eleven contributions in this thread started
    as an off-list email discussion; i have posted them
    here with the consent of their authors. -- _w.lipp
    </annotation>


    From: Uche Ogbuji [mailto:]
    Wednesday, 28-January-2004 19:18


    Eric has already given an excellent answer to this, especially mark
    his
    words:

    "I tend to think that tools should have a limited impact on document
    design (of course not going to the point where the documents can't be
    processed at all) and that a good design isn't necessarily one which
    imports all the restrictions of all the tools :) ..."

    That's a para that should be engraved somewhere.

    I believe there is no one rule that works in all cases when deciding
    whether or not to use container elements. Here is the informal rule
    of
    thumb I use in my own practice:

    * Use a container element only when it has a natural analogue to some
    meaningful entity in the problem space.

    In other words, don't invent an abstract concept for no other reason
    than to hold elements together.

    So using your example, I would go with

    library
    *book
    *employee
    *reader

    Each element then conforms to an actual concern in the problem space.
    If you use

    library
    books
    *book
    employees
    *employee
    readers
    *reader

    Then in my opinion the added elements are purely contrivances to make
    one feel ore comfortable about not having a container. I believe in
    most cases they don't correspond to any useful entity in the problem
    space.

    Just for clarity, if I wanted to organize my library into a collection
    of books donated at the same time, I might be comfortable with:

    library
    books (@donor='George Soros')
    *book
    books (@donor='Warren Buffett')
    *book

    Although I would probably find a name more suitable to the
    corresponding
    entity:

    library
    endowment (@donor='George Soros')
    *book
    endowment (@donor='Warren Buffett')
    *book

    HTH.


    --
    Uche Ogbuji Fourthought, Inc.
    http://uche.ogbuji.net http://4Suite.org http://fourthought.com
    A survey of XML standards: Part 1 -
    http://www-106.ibm.com/developerworks/xml/library/x-stand1.html
    Building Dictionaries With SAX -
    http://www.xml.com/pub/a/2004/01/14/py-xml.html
    Learning Objects Metadata -
    http://www-106.ibm.com/developerworks/xml/library/x-think21.html
    Python Web services developer: The real world, Part 1 -
    http://www-106.ibm.com/developerworks/webservices/library/ws-pyth14/
    The State of the Python-XML Art, 2003 -
    http://www.xml.com/pub/a/2003/09/10/py.html
    Objects. Encapsulation. XML? -
    http://www.adtmag.com/article.asp?id=8596
     
    Wolfgang Lipp, Feb 9, 2004
    #2
    1. Advertising

  3. <annotation>
    the first eleven contributions in this thread started
    as an off-list email discussion; i have posted them
    here with the consent of their authors. -- _w.lipp
    </annotation>


    From: David Mertz, Ph.D. [mailto:]
    Wednesday, 28-January-2004 20:37

    > "I tend to think that tools should have a limited impact on document
    > design (of course not going to the point where the documents can't be
    > processed at all) and that a good design isn't necessarily one which
    > imports all the restrictions of all the tools :) ..."
    >
    > That's a para that should be engraved somewhere.


    Generally, I quite concur with my colleagues Uche and Eric. There
    certainly is a negative tendency to over abstract in XML document
    design.

    > * Use a container element only when it has a natural analogue to some
    > meaningful entity in the problem space.


    I wonder if Uche dislikes Java for this reason (or most C++ class
    libraries, for that matter). It's not exactly the same thing, but
    abstract classes--or generally, deep class hierarchies--are a definite
    analogue of container elements. And I tend to dislike them for the
    same reason.

    > So using your example, I would go with
    > library
    > *book
    > *employee
    > *reader
    > Each element then conforms to an actual concern in the problem space.
    > If you use
    > library
    > books
    > *book
    > employees
    > *employee
    > readers
    > *reader
    > Then in my opinion the added elements are purely contrivances to make
    > one feel ore comfortable about not having a container.


    I'm not sure if I quite agree here. While there is certainly a point
    to not forcing the data structure into the mold of the programming
    tool, there are a lot of XML bindings that deal nicely with category
    hierarchies. For example, using gnosis.xml.objectify, I might
    enumerate over books in the latter scheme with:

    for book in library.books:
    doSomething(book)

    Under Uche's preferred system, I'd have to do something more like:

    for book in filter(lambda e: tagname(e)=='book', library):
    doSomething(book)

    The first is certainly clearer to intent. Of course, some binding use
    XPath to do the filtering instead (ElementTree, Anobind, REXML,
    etc.)... but while there is something desirable in that uniform syntax,
    it is still basically just a filter. Enumerating over books seems like
    a pretty natural thing to want to do, IMO.

    Think of what you'd do in an OOP framework also--never mind the XML
    issue. If I were generating a library object, I would find it much
    more natural to have it contain a .books attribute that was a
    list/array of books than I would to create a .everything attribute that
    was a heterogeneous list of books, employees and readers.

    In a way, I would suggest that Uche and Wolfgang are avoiding the
    Scylla of letting the data follow the tools, but falling to the
    Charybdis of letting the surface representation of XML dictate the data
    structure.

    Yours, David...
     
    Wolfgang Lipp, Feb 9, 2004
    #3
  4. <annotation>
    the first eleven contributions in this thread started
    as an off-list email discussion; i have posted them
    here with the consent of their authors. -- _w.lipp
    </annotation>


    From: Uche Ogbuji [mailto:]
    Wednesday, 28-January-2004 21:29


    On Wed, 2004-01-28 at 12:37, David Mertz, Ph.D. wrote:

    > > * Use a container element only when it has a natural analogue to some
    > > meaningful entity in the problem space.

    >
    > I wonder if Uche dislikes Java for this reason (or most C++ class
    > libraries, for that matter). It's not exactly the same thing, but
    > abstract classes--or generally, deep class hierarchies--are a definite
    > analogue of container elements. And I tend to dislike them for the
    > same reason.


    Yes. I have the same problem with deep object
    hierarchies. The C++ NIH classes were the classic
    example: you either saw them as the paragon of OO
    design, or thought they were the perfect demonstration
    that OO without generics is bad. I fell quickly into
    the latter camp.


    > > So using your example, I would go with
    > > library
    > > *book
    > > *employee
    > > *reader
    > > Each element then conforms to an actual concern in the problem space.
    > > If you use
    > > library
    > > books
    > > *book
    > > employees
    > > *employee
    > > readers
    > > *reader
    > > Then in my opinion the added elements are purely contrivances to make
    > > one feel ore comfortable about not having a container.

    >
    > I'm not sure if I quite agree here. While there is certainly a point
    > to not forcing the data structure into the mold of the programming
    > tool, there are a lot of XML bindings that deal nicely with category
    > hierarchies. For example, using gnosis.xml.objectify, I might
    > enumerate over books in the latter scheme with:
    >
    > for book in library.books:
    > doSomething(book)


    That's pull processing. For a long time I've preferred
    push processing n XMl precisely because I think pull
    processing often results in contrivances for the benefit
    of the code, rather than the best pure XML design.


    > Under Uche's preferred system, I'd have to do something more like:
    >
    > for book in filter(lambda e: tagname(e)=='book', library):
    > doSomething(book)
    >
    > The first is certainly clearer to intent. Of course, some binding use
    > XPath to do the filtering instead (ElementTree, Anabind, REXML,
    > etc.)... but while there is something desirable in that uniform syntax,
    > it is still basically just a filter. Enumerating over books seems like
    > a pretty natural thing to want to do, IMO.


    I think you hit the nail o the head by talking about
    XPath. XPath-based triggers, I think, are the best way
    to perform such processing in an imperative language
    such as Python. I hope I don't give offense by saying
    that the fact that this is not all that well supported
    in earlier Python data bindings was the primary
    motivation for my developing Anobind rather than
    adopting any similar, existing tool.

    But I don't want to muddy the issue with comparisons
    *between* tools. I can illustrate just as well with
    XSLT

    Pull:

    <xsl:template match="books">
    <xsl:for-each select="book">
    <xsl:value-of select="title"/> <!-- Spurious in this case, but
    battles between apply-templates and value-of are almost inevitable in
    non-trivial pull-type processing -->
    </xsl:for-each>
    </xsl:template>

    Push:

    <xsl:template match="title">
    <xsl:apply-templates/>
    </xsl:template>

    <!-- For this trivial example, a template for book is
    not needed, but it usually is for non-trivial cases -->
    <xsl:template match="book">
    <xsl:apply-templates/>
    </xsl:template>

    I'm a strong advocate of push processing, and I think
    almost all XSLT experts agree that it leads to clearer
    and more maintainable code.


    > Think of what you'd do in an OOP framework also--never mind the XML
    > issue.


    I think this is a different topic. I do not design for
    XML as I do for OO. In fact I argue strenuously against
    such (IMHO) mix-up.


    > If I were generating a library object, I would find it much
    > more natural to have it contain a .books attribute that was a
    > list/array of books than I would to create a .everything attribute that
    > was a heterogeneous list of books, employees and readers.


    The conceptual confusion between the slot and the
    referent frame itself is a problem that OO has inherited
    from its ancestors. I think it argues a problem with OO
    rather than a good direction for XML design.


    > In a way, I would suggest that Uche and Wolfgang are avoiding the


    and Eric?


    > Scylla of letting the data follow the tools, but falling to the
    > Charybdis of letting the surface representation of XML dictate the data
    > structure.


    Structure of the code? In most cases I've worked on,
    the code serves the data, not the other way around, so I
    think it's right to let the data representation of the
    problem space dictate the structure of the code.

    This is a really nice topic, but I may not be able to
    contribute too much more to the thread: I've already
    been neglecting burning fires at work to converse this
    much :)

    Thanks, all.


    --
    Uche Ogbuji Fourthought, Inc.
    http://uche.ogbuji.net http://4Suite.org http://fourthought.com
    A survey of XML standards: Part 1 -
    http://www-106.ibm.com/developerworks/xml/library/x-stand1.html
    Building Dictionaries With SAX -
    http://www.xml.com/pub/a/2004/01/14/py-xml.html
    Learning Objects Metadata -
    http://www-106.ibm.com/developerworks/xml/library/x-think21.html
    Python Web services developer: The real world, Part 1 -
    http://www-106.ibm.com/developerworks/webservices/library/ws-pyth14/
    The State of the Python-XML Art, 2003 -
    http://www.xml.com/pub/a/2003/09/10/py.html
    Objects. Encapsulation. XML? -
    http://www.adtmag.com/article.asp?id=8596
     
    Wolfgang Lipp, Feb 9, 2004
    #4
  5. <annotation>
    the first eleven contributions in this thread started
    as an off-list email discussion; i have posted them
    here with the consent of their authors. -- _w.lipp
    </annotation>


    From: Eric van der Vlist [mailto:]
    Wednesday, 28-January-2004 21:32


    Hi David,

    On Wed, 2004-01-28 at 20:37, David Mertz, Ph.D. wrote:
    > While there is certainly a point
    > to not forcing the data structure into the mold of the programming
    > tool, there are a lot of XML bindings that deal nicely with category
    > hierarchies. For example, using gnosis.xml.objectify, I might
    > enumerate over books in the latter scheme with:
    >
    > for book in library.books:
    > doSomething(book)


    That doesn't necessarily mean that a container needs to
    be found in the XML document. I am working on my own
    library (similar to gnosis.xml.objectify but not ready
    to be published yet), and without any container I can
    write:

    for book in library.book:
    book.doSomething()

    > Under Uche's preferred system, I'd have to do something more like:
    >
    > for book in filter(lambda e: tagname(e)=='book', library):
    > doSomething(book)
    >
    > The first is certainly clearer to intent. Of course, some binding use
    > XPath to do the filtering instead (ElementTree, Anobind, REXML,
    > etc.)... but while there is something desirable in that uniform syntax,
    > it is still basically just a filter. Enumerating over books seems like
    > a pretty natural thing to want to do, IMO.


    Sure, but the abstraction layer can easily be smart
    enough to let you do so without imposing it in the XML
    document.

    > Think of what you'd do in an OOP framework also--never mind the XML
    > issue. If I were generating a library object, I would find it much
    > more natural to have it contain a .books attribute that was a
    > list/array of books than I would to create a .everything attribute that
    > was a heterogeneous list of books, employees and readers.
    >
    > In a way, I would suggest that Uche and Wolfgang are avoiding the
    > Scylla of letting the data follow the tools, but falling to the
    > Charybdis of letting the surface representation of XML dictate the data
    > structure.


    Hmmm... aren't you the one who assumes that the data
    structure is directly derived from the "surface
    representation of XML" when you say that a container is
    needed because a list of homogeneous objects is easier
    to manage with a XML binding tool :) ???

    My feeling is that it's because the data model isn't
    necessarily dictated by the XML that containers aren't
    required.

    Eric

    > Yours, David...

    --
    Don't you think all these XML schema languages should work together?
    http://dsdl.org
    Upcoming XML schema languages tutorial:
    - Santa Clara -half day- (15/03/2004) http://masl.to/?J24916E96
    ------------------------------------------------------------------------
    Eric van der Vlist http://xmlfr.org http://dyomedea.com
    (ISO) RELAX NG ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
    (W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
    ------------------------------------------------------------------------
     
    Wolfgang Lipp, Feb 9, 2004
    #5
  6. <annotation>
    the first eleven contributions in this thread started
    as an off-list email discussion; i have posted them
    here with the consent of their authors. -- _w.lipp
    </annotation>



    From: Lipp, Wolfgang
    Thursday, 29-January-2004 11:03

    i think i have learned that occam's razor
    applies to xml modelling as well: if an element is not
    arguably needed, don't use it -- with the addition that
    when designing an xml format for a specific application,
    then the requirement to enable straightforward iteration
    over some kind of repeated element using a given api may
    mean the set of repeated elements becomes an entity
    since there is something i do with 'it'. the
    availability of techniques like xpath etc. somewhat
    weakens the point. generally, there seems to be a
    feeling that one should do things the xml way in xml,
    and the oop way in oop, and not let too many concerns
    from one domain influence decisions in the other. for
    someone writing a lot of oop things, this may be hard to
    do, since ~.books is such a natural and inevitable
    choice there.

    Uche Ogbuji [] wrote:
    > The conceptual confusion between the slot and the
    > referent frame itself is a problem that OO has
    > inherited from its ancestors. I think it argues a
    > problem with OO rather than a good direction for XML
    > design.


    can you elaborate a bit on this? i *think* it is about
    the thing that made me wonder a lot about xml until i
    found out that the things in the pointy brackets are
    really 'element type names', but i do not fully grasp
    the meaning of your remark.

    _wolfgang
     
    Wolfgang Lipp, Feb 9, 2004
    #6
  7. <annotation>
    the first eleven contributions in this thread started
    as an off-list email discussion; i have posted them
    here with the consent of their authors. -- _w.lipp
    </annotation>


    From: Robert A. Morris [mailto:]
    Donnerstag, 29. Januar 2004 14:12


    It's interesting that the thread seems to, slightly,
    reflect these points of view:

    service centric => rigourously use containers
    data centric => model as convenient

    I happen to think the former point of view, being more
    abstract, is more extensible and robust, and subsumes
    the latter(*). But several writers would naturally put
    me in the camp of the over-general. Despite the well
    reasoned examples from quite respected writers in the
    discussion, my own experience remains that abstraction
    takes you longer to develop with, but at the end has
    products with longer life and cheaper maintenance. On
    the other hand, I live in a world of 3-5 year funding
    cycles, from an agency that /wants/ to see me develop
    with targets that are over the horizon rather than get
    something useful as fast as possible. Indeed, if you
    make a proposal to the U.S. National Science Foundation
    which "merely" proposes the rapid deployment of a
    database---no matter how important---it will rarely, if
    ever, be funded. On the other hand, if you can make the
    case that, when done, lots of other projects and
    consituencies can use your work, you have a good shot at
    funding. For NSF proposals, you are /required/ to make
    an explicit case for broader impact than the specific
    science at hand. Right now I am working on a proposal to
    develop a framework for producing spatially referenced
    scientific observation systems. The major instance
    supporting the proof of concept will be a production
    quality invasive species reporting system with data
    referenced to the earth, and deployed by an organization
    that is presently gathering data with a brittle but
    useful system. But another will be a clinical breast
    cancer management system with data referenced to the
    organ extending the personal database of a clinical
    oncologist who learned some Access on his own. My
    feeling is that, left to the data centric community,
    these two systems would take, say, 2X the effort of
    either one of them because they would basically repeat
    most of the infrastructure. A more abstract approach
    might make the total time 1.1 times the time for either
    one.

    Given the reasonableness of both sides of this argument,
    my guess is that matters will come down to social
    arguments, not technical arguments. This is too bad. One
    of the things we teach software engineers is that the
    client for a system should dictate the behavior not the
    implementation. We go to great lengths in our year-long
    software engineering course to keep development details
    out of view of the people who commission the project.
    After a month of intensive requirements negotiation with
    them, they rarely get more than a few hours a month with
    the development team until something has started to
    emerge that purports to meet the requirements.

    Bob

    (*)It probably will come as no surprise that I have a
    mathematics Ph.D. and before turning to computer science
    spent 10 years as an algebraic geometer and homological
    algebraist. These subjects are so abstract that,
    literally I can no longer understand the very papers I
    published in the 1970s...



    Robert A. Morris
    Professor of Computer Science
    UMASS-Boston
    http://www.cs.umb.edu/~ram
    phone (+1)617 287 6466
     
    Wolfgang Lipp, Feb 9, 2004
    #7
  8. <annotation>
    the first eleven contributions in this thread started
    as an off-list email discussion; i have posted them
    here with the consent of their authors. -- _w.lipp
    </annotation>


    From: Lipp, Wolfgang
    Thursday, 29-January-2004 17:07

    > Thanks. It's interesting that the thread seems to, slightly, reflect
    > these points of view:
    > service centric => rigourously use containers
    > data centric => model as convenient


    i can see that the service centric thing has something
    to it, and it is also a very simple rule to unify data
    structures. as in procedural programming, quality
    probably improves when following clear design patterns.

    btw, i am more on the receiving end of the schema -- i
    don't develop it, i have to use it -- but i'd rather
    live with less containers, shorter element paths and
    slightly more involved oo mapping. you wrote earlier
    that the attempt to get rid of containers is most of the
    time done in the fallacious assumption that people have
    to be able to read the xml. well, in our case i think
    this is exactly what happens, because people who map
    databases according to our xml schema do so in terms of
    associating xpaths to database entities -- which is why
    the schema has elements named in a way so humans can
    read them in the first place. however, i do not want to
    put obstacles in the way of a future development of the
    schema, and you mentioned there may be trouble ahead
    when it comes to questions of schema extensibility:

    > Furthermore, if you use strong enough typing,
    > this means that you can have "group of elments
    > of type X" be reused in many places and have
    > only to change the type definition of X to change
    > them all. I could probably go further down this
    > road invoking inheritance examples that are at
    > least as persuasive, though those might be too
    > technical for the people who make these requests.


    in my example, i had

    > library
    > address
    > *book
    > *employee
    > *reader
    >
    > book
    > *author
    > title
    > isbn


    -- can you point out to me where inheritance bites
    you with this kind of structure?

    _wolfgang
     
    Wolfgang Lipp, Feb 9, 2004
    #8
  9. <annotation>
    the first eleven contributions in this thread started
    as an off-list email discussion; i have posted them
    here with the consent of their authors. -- _w.lipp
    </annotation>


    From: Robert A. Morris [mailto:]
    Friday, 30-January-2004 05:25


    I agree you are exactly the audience that has to read
    XML documents. But to a certain extent, this is a
    similar case made by a point in the discussion that the
    schema shouldn't be held hostage to the code
    implementing applications on it. (However, my
    recollection is that this argument was /against/ my
    position on containers!) In the SDD work, we are taking
    the position that we should rather build some tools to
    address maintenance and consumption needs than change
    the schema. For example, SDD is heavily dependent on
    key/keyref mechanisms and it can be quite difficult for
    a human to understand to which key a particular keyref
    points, because you also have to examine the identity
    constraints on the keys and the XPaths involved. So
    rather than give up the mechanism, every place SDD has a
    keyref attribute, we also have an optional attribute
    named "refdebug" and one of my graduate students wrote a
    small XSLT utility that does the necessary traversals
    and heuristically chooses a label from the element that
    has the correct key and inserts it in the refdebug. In
    an rdb, this would be the same as examining secondary
    keys, tracing all the relations, and replacing the
    secondary key with some reasonably meaningful---if not
    unique---value from the related table.

    Thanks for provoking this discussion. We certainly had
    it many times during the drafting of SDD, and it will
    certainly come up again in the discussion of the draft.
    When you put it all together, send it to me and I'll put
    it on our wiki,
    http://efgblade.cs.umb.edu/twiki/bin/view/SDD/WebHome at
    which we invite discussion!

    Bob



    Robert A. Morris
    Professor of Computer Science
    UMASS-Boston
    http://www.cs.umb.edu/~ram
    phone (+1)617 287 6466
     
    Wolfgang Lipp, Feb 9, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Wolfgang Lipp
    Replies:
    1
    Views:
    404
    Patrick TJ McPhee
    Jan 30, 2004
  2. Wolfgang Lipp
    Replies:
    0
    Views:
    482
    Wolfgang Lipp
    Jan 28, 2004
  3. Wolfgang Lipp
    Replies:
    0
    Views:
    378
    Wolfgang Lipp
    Feb 9, 2004
  4. Wolfgang Lipp
    Replies:
    0
    Views:
    361
    Wolfgang Lipp
    Feb 9, 2004
  5. Replies:
    7
    Views:
    555
    Pete Becker
    Jan 25, 2008
Loading...

Share This Page