Huh?
What in blazes does that mean, and how does that differ from CSV
processing or any other transmission format?
In the light of the remark about CSV, i interpret it as follows. Consider
this format:
<invoice number="Invoice1A">
<customer number="Cust1"/>
<product id="Product_a" qty="1" priceEach="75.00"/>
</invoice>
You write a parser for this. The parser works by getting the invoice
element, then going through its children and dealing with them.
Now it changes to:
<invoice number="Invoice1A">
<customer number="Cust1"/>
<product id="Product_a" qty="1" priceEach="75.00"/>
<payment status="paid"/>
</invoice>
Your parser will now encounter that payment element when it wasn't
expecting one, and may break. If you've been cautious, it won't, but in
many natural ways of writing a parser, it will. For example, if i was
doing StAX, i'd be looping over product elements until i hit an end tag
for invoice, and if i saw something else, i'd probably explode. StAX makes
it *particularly* awkward to skip over unknown elements, because you have
to walk over their innards too. If i was doing DOM and iteration, i'd have
the same problem. If i was doing it with DOM and XPath, it would probably
be okay, but personally, i wouldn't write a parser like that for this
problem. If i was doing SAX, it would depend on what i did in startElement
with unrecognised elements. I'd probably blow up.
Mind you, much the same applies to CSV. In the CSV (actually
pipe-separated values) parsing code in the system i work on now, we check
the number of values in each line, and if it isn't what we expect, we
reject it. Someone else makes the data, and is supposed to tell us if the
format changes, so for us, it's better to scream bloody murder about
changes, so we can point the finger at them, than try to deal with it.
XML *is* plain text, at least as much as CSV is.
Hmm. Things like XHTML are certainly highly readable, and my little
invoice thing above is on a par with CSV, but have you tried reading a
WSDL file lately? The few bytes of useful information are obscured by a
mountain of namespaces, wrapper elements, and god knows what.
And CSV is complex, to the point where there is no one standard way of
doing it.
That's not complexity, it's lack of standardisation. They're orthogonal.
Text files are bone simple, but they aren't standardised - CR vs LF, word
wrapping, meaning of a trailing space on a line, line break at the end,
etc.
tom