Obsessive Compulsive XMLing

J

jan V

- Any format is fragile if it is not thoroughly described and validated.

Of course, but the problem with XML is that, because it's "plain text", the
majority of people seem to think it needn't be rigidly documented. And
that's an "Emperor with new clothes" situation...

To summarize, the bottom line reality of average XML use is that it has NOT
been a revolutionary step up from structured binary files, but it HAS been
yet another almighty technological mushroom for everyone to slam into, learn
(struggle) and exploit (struggle) ... in short, it keeps very many of us in
a job, thank you very much.

I don;t know how old you are Andrew, but I'm not that young anymore. People
like Roedy and I have seen "revolutions" a number of times, people *always*
go berserk claiming the magic bullet has been found, when in reality such
claimed magic bullets often turn out to be the latest layer of problems
mixed with a small dose of progress. The thing that comes with age is that
you become more critical of the younger/naive attitude of people whose
experience is too limited to see the big picture... the picture which - in
part - says "This is madness, but virtually nobody realises it."

Who knows, maybe in 10 years time XML will have become distinctly
"surpassed".. and you and others will be concentrating on newer, fresher
pastures. Then maybe you'll reflect on the wonder years of XML mania, and go
"Geee... people did go a bit mad with it." (cfr the 60s).
 
A

Andrew Thompson

Of course, but the problem with XML is that, because it's "plain text", the
majority of people seem to think it needn't be rigidly documented. And
that's an "Emperor with new clothes" situation...

OK.. yeah. I see your point.

In the specific situation I was quoting, I doubt the
author of this arcane format felt it required any further
documentation than what he originally distributed.

OTOH the results of the validation, in combination with
a slew of factors about uncertainties in the format that
prompted me to create a schema in the first place, support
your basic argument.
Who knows, maybe in 10 years time XML will have become distinctly
"surpassed".. and you and others will be concentrating on newer, fresher
pastures. Then maybe you'll reflect on the wonder years of XML mania, and go
"Geee... people did go a bit mad with it." (cfr the 60s).

(laughs) Nahhh! I'll be too busy, trying to convert
all my XML! ;-)

--
Andrew Thompson
physci.org 1point1c.org javasaver.com lensescapes.com athompson.info
"I don't wanna' be like other people are. Don't wanna' own a key, don't
wanna' wash my car.."
New Order 'Turn My Way'
 
C

Chris Uppal

No doubt that XML has its great and countless benefits. *But*, haven't
programmers developed an out-of-control addiction to such data format?

IMO, yes.

What is causing the dependency of such practice? Is it laziness or rush
for production (one day we or someone else will build a GUI editor for
this), Yet, this could be the most productive practice in software
development (editing and maintaining raw XML documents of various and
complex type definitions) but not properly appreciated by a very very
few of us.

I think it was jan V who first mentioned the band-wagon effect in this thread,
and I agree that that's a big part of it, but I think there are deeper reasons.
It seems (to me) that the programming community -- specifically the /Java/
programming community -- was fertile ground in which such a seed could grow
into an ecological disaster...

Some reasons why I think that Java programmers could be seen as
"pre-conditioned" to (over)use XML. In no very particular order...

1) Java, as a language, belongs in the bondage-and-discipline camp. It design
emphasises (mistakenly, IMO) statically-verifiable safety. It attempts to
provide that by using techniques like a static, declarative, type-system, and
as much early binding as possible (the recent addition of generics are an
example of this, but the idea pervade the whole language design). The problem
is that such languages tend to produce brittle systems. In the real word
things change, and all that up-front investment in safety just gets in the way.
I think that programmers are reacting to that tension between how the language
works, and how their systems need to behave, by moving as much information as
possible /out/ of the source code, and into configuration files of one sort or
another. XML has many faults (can anyone tell me /exactly/ what the syntax for
XML comments is, for example ? The actual XML1.0 standard wimps out and just
gives an approximation !), but a lack of flexibility is not one of them.
Obviously there are other ways of getting flexibility back into applications
that don't necessarily involve XML (including an interpreter in the app,
storing config data -- aka "business rules" in databases, etc, etc), but I
think that XML (considered as a meme) is benefiting from that pressure.

2) Java, as a community, is relatively young. It itself originated in a
bandwagon effect, so there may be quite a strong correlation between being a
Java programmer, and being prone to jump on bandwagons ;-)

3) The Java platform (class hierarchy, JVM, and standard tools) doesn't include
a parser generator, or anything similar. Programmers tend to use the tools
that they are familiar with, and which are available out-of-the-box. Other
tools, such as third-party parser generators, get used far less often and far
less widely. Since the standard toolset doesn't include a good simple way of
creating simple languages for configuration files, scripts, and so forth, Java
programmers are left with the choice of (mis)using something like XML, or of
writing their own special-purpose parsers. Understandable -- any perhaps even
justifiably-- XML is seen as the lesser of these two evils.

4) My impression is that Java programmers -- speaking statistically -- seem to
be younger, or for some other reason more ignorant, than the programming
community in general. Obviously there are many, many, exceptions to this
(cough), but there do seem to be a lot of Java programmers who don't know
anything /except/ Java (and not all of that). For such programmers, there may
not be an obvious alternative to XML, so that's what gets used whether or not a
programmer with a wider perspective would agree that XML was a good fit for the
problem at hand. In a way, this is a generalisation of point (3).

By way of illustration of (or maybe even evidence for) these points:

If point (1) is valid, then you'd expect to see less XML-addiction in users of
less brittle languages such as Smalltalk or Ruby. This is certainly true of
Smalltalk programmers, and my impression is that it's true of Ruby programmers
too. I don't know enough about the Python community to know whether it applies
to them as well, but it would make an interesting test-case.

Proof of (2) is left as an exercise for the reader...

As an illustration of (3) and/or (4), look at the way that the relatively new
regexp stuff has been taken up. It seems sometimes that half the posts here
are about regexps in some way or other (ok, that's an exaggeration...).
Incidentally, I think the parallel here is quite good. As far as I can tell,
regexps are being used in all sorts of situations where they are completely
inappropriate, not just for the (rather rare, IMO) applications to which they
are well suited.

-- chris
 
C

Chris Uppal

Andrew said:
(laughs) Nahhh! I'll be too busy, trying to convert
all my XML! ;-)

Easy. Just use XSLT.

Of course, that will mean you have to write your XSLT script by hand, since
there won''t be any XML editors available...

Phah!

-- chris
 
A

Andrew Thompson

Easy. Just use XSLT.

Of course, that will mean you have to write your XSLT script by hand, since
there won''t be any XML editors available...

LOL. I am, at this very moment, writing my first XSLT,
and.. they have *editors* for these things!?!

Wow! The technology! ;-)
 
M

Mark Thornton

Chris said:
3) The Java platform (class hierarchy, JVM, and standard tools) doesn't include
a parser generator, or anything similar. Programmers tend to use the tools
that they are familiar with, and which are available out-of-the-box. Other
tools, such as third-party parser generators, get used far less often and far
less widely.

Support for character sets beyond basic ascii or 8859 has been slow in
coming to many parser generators. Many programmers also find these tools
rather challenging to use. Designing a good domain specific language is
not a task you can give to a junior programmer.

Mark Thornton
 
J

jan V

Support for character sets beyond basic ascii or 8859 has been slow in
coming to many parser generators. Many programmers also find these tools
rather challenging to use. Designing a good domain specific language is
not a task you can give to a junior programmer.

Or even to very experienced programmers who may not have had
university-style CS education. I've been programming for 20+ years now, but
when you throw lex/yacc-type stuff at me, it's like pushing me out on a
frozen lake with thin ice... I know the concepts quite well, but actually
exploiting these tools, that's a whole different ball game.

[But what one lacks in formal education, one can sometimes make up for with
creativity. I still managed to write a Java source-code parser (1.0
language, mind you) without using the conventional approach]
 
C

Chris Uppal

Mark said:
[me:]
3) The Java platform (class hierarchy, JVM, and standard tools) doesn't
include a parser generator, or anything similar. Programmers tend to
use the tools that they are familiar with, and which are available
out-of-the-box. Other tools, such as third-party parser generators,
get used far less often and far less widely.

Support for character sets beyond basic ascii or 8859 has been slow in
coming to many parser generators. Many programmers also find these tools
rather challenging to use.

All true. I'm not sure if you intend these comments as just observations, or
if you are disagreeing with my theory that the absence of such tools has made
the overuse of XML more likely ?

BTW, I don't think that such tools should necessarily be difficult to use.
Programmers in general seem to be OK with regexps, so I would expect many
people to have problems using regular expressions for defining a scanner
(whether or not there is an explicit scanner in the parser generator). I think
the problems come when the tool (or its documentation, or -- especially -- its
tutorial documentation) starts throwing terms like LALR(N) around. To my mind,
parsing technologies that require an understanding of such terms are unsuitable
for general use -- they artificially restrict the range of grammars they can
recognise all in the name of marginally faster parsing. That
efficiency-centric approach might be fine if you were considering parsing many
millions of lines of input, but for the typical small language, it is complete
overkill -- premature optimisation, in fact -- and as such merely adds
confusion to the system. OTOH, I must admit that I don't know of any little
language machine that /is/ easy to use...

(Just an observation)

Designing a good domain specific language is
not a task you can give to a junior programmer.

Fortunately, for many purposes, you don't /need/ a good domain-specific
language -- as is proved by the fact that people /can/ get away with (mis)using
XML ;-)

-- chris
 
M

Mark Thornton

Chris said:
Mark Thornton wrote:

[me:]
3) The Java platform (class hierarchy, JVM, and standard tools) doesn't
include a parser generator, or anything similar. Programmers tend to
use the tools that they are familiar with, and which are available
out-of-the-box. Other tools, such as third-party parser generators,
get used far less often and far less widely.

Support for character sets beyond basic ascii or 8859 has been slow in
coming to many parser generators. Many programmers also find these tools
rather challenging to use.


All true. I'm not sure if you intend these comments as just observations, or
if you are disagreeing with my theory that the absence of such tools has made
the overuse of XML more likely ?

Observations of course. :)
recognise all in the name of marginally faster parsing. That
efficiency-centric approach might be fine if you were considering parsing many
millions of lines of input, but for the typical small language, it is complete
overkill -- premature optimisation, in fact -- and as such merely adds
confusion to the system. OTOH, I must admit that I don't know of any little
language machine that /is/ easy to use...

It isn't just about efficiency but also about consistency. Languages
designed without a good theoretical grounding tend to have nasty
inconsistencies. Something that is LALR(k) or LR(k) for too high a value
of k can be hard for humans to use too.
Fortunately, for many purposes, you don't /need/ a good domain-specific
language -- as is proved by the fact that people /can/ get away with (mis)using
XML ;-)

In which case misusing XML is perhaps the best achievable solution in
practice. There just aren't enough Gosling's to go around (nor could
most commercial enterprises afford to employ teams full of programmers
with his level of skill).

Mark Thornton
 
J

jan V

There just aren't enough Gosling's to go around (nor could
most commercial enterprises afford to employ teams full of programmers
with his level of skill).

I find this observation really important. I recently worked for a company
where the boss boasted at the interview stage that they only hire top
people.. and sure enough lots of my new colleagues were clever cookies. The
problem with this is that there is such a thing as 'too clever', especially
for the long-term good of a company. Let me explain.

These guys were so clever (partly because they were mostly in their prime -
early 20s) that they could juggle obscene amounts of complexity (read: messy
source code and messy architecture), without bothering to comment much, if
any, of it.

Yet that company is growing fast. And sooner rather than later, the market
place is not going to cough up more of these semi-genius "kids", and the
company will have to do with normal (in the mathematical sense) programmers.
Programmers who need docs, and structure, and rather less complexity. Also,
programmers who are not going to be able to cope with maintaining and
extending the codebase already created by the company's existing team. So
that company is heading for some tough lesson on scalability of one's
business.

My thesis is therefore: any boss who aims to constantly hire top people is
badly mistaken. What any boss should do is to hire average to good people,
and ensure he's got a cast-iron team of managers capable of ***enforcing***
process and quality. That, in my eyes, is the way to run a successful,
scalable software company. (clarification: this is for the scenario where
your company does fairly bread-and-butter projects, obviously if you're
trying to create world-class innovations, then tracking down the geniusses
may still make perfect sense).
 
T

Tim Tyler

jan V said:
No, that's the whole point, programmers - people in general -, shouldn't
have to wade through XML structures, not using ASCII editors, not using XML
editors, full stop.

XML is not meant to be routine reading material for people, it's meant to
communicate structured data in a portable way from machine to machine (or
application to application in general, but NOT from machine to human).

The OP was complaining about a lack of GUI tools to edit and maintain
data stored as XML.

I'm still having a hard time seeing the problem he is referring to.
 
T

Tim Tyler

Speaking of Ant (which is a great build tool to use as long as you don't
have to write the script), here are some few interesting words from
James Duncan Davidson: http://x180.net/Journal/2004/03/31.html

For those who don't know James, he is the original author of both Tomcat
& Ant. See http://ant.apache.org/faq.html#history and
http://en.wikipedia.org/wiki/James_Duncan_Davidson for more info.

The XML effect on Ant is a classic case of XML obsession. This is now a
scripting language, not data. Even its original programmer regrets the
day he decided on using XML as a format; he chose XML because of the
hype plus the facts that he didn't need to write a parser, nor did he
have to write a front end. [...]

A scripting language is a good example of when *not* to use XML.

"Now, I never intended for the file format to become a scripting language"

- http://x180.net/Journal/2004/03/31.html

With a build script - which attempted to replace batch files with
something more portable - that may have been naive.
 
T

Tor Iver Wilhelmsen

Tim Tyler said:
I'm still having a hard time seeing the problem he is referring to.

Indeed. Writing an Ant script from scratch is not really hard. What
could be somewhat harder is writing XML with stricter DTDs like
web.xml's. That said:

1) Users of a language that uses { for begin, } for end, & for and, |
for or etc. should not complain about the occasional <tag> in data.
If you are going to complain about XML you should be writing in
PL/SQL, Pascal, Lisp or something.
2) XML beats the alternatives.
2.1) Binary data is more compact, but more susceptible to corruption.
It's also less manageable when it comes to versioning: XML can
let you mix-in elements from different "versions" of a protocol
using namespaces.
2.2) It's data instead of code. One of the side-effects of dropping
XML for annotations in EJB 3.0 will be that the code needs to be
recompiled when you make a small change

Now, designing Swing GUI apps without a visual designer, that is ebil.
Discuss.
 
D

Dale King

jan said:
I find this observation really important. I recently worked for a company
where the boss boasted at the interview stage that they only hire top
people.. and sure enough lots of my new colleagues were clever cookies. The
problem with this is that there is such a thing as 'too clever', especially
for the long-term good of a company. Let me explain.

These guys were so clever (partly because they were mostly in their prime -
early 20s) that they could juggle obscene amounts of complexity (read: messy
source code and messy architecture), without bothering to comment much, if
any, of it.

Yet that company is growing fast. And sooner rather than later, the market
place is not going to cough up more of these semi-genius "kids", and the
company will have to do with normal (in the mathematical sense) programmers.
Programmers who need docs, and structure, and rather less complexity. Also,
programmers who are not going to be able to cope with maintaining and
extending the codebase already created by the company's existing team. So
that company is heading for some tough lesson on scalability of one's
business.

Then your boss lied in the interview. He was not hiring top people.
My thesis is therefore: any boss who aims to constantly hire top people is
badly mistaken. What any boss should do is to hire average to good people,
and ensure he's got a cast-iron team of managers capable of ***enforcing***
process and quality. That, in my eyes, is the way to run a successful,
scalable software company. (clarification: this is for the scenario where
your company does fairly bread-and-butter projects, obviously if you're
trying to create world-class innovations, then tracking down the geniusses
may still make perfect sense).

I don't think the answer is hiring less skilled programmers but using
real measures for what is a good programmer. Skill and good design,
clear coding and good documentation are not mutually exclusive. The
answer is not hiring average programmers. As Brooks discovered a good
programmer is 10 times more productive than an average one (Brooks
1995). The answer is hiring good programmers that can follow the advice
of Steve Maguire from "Writing Solid Code", to "Write code for the
"average" programmer"

And oh, I disagree with most of the opinions expressed against XML.
 
C

Chris Uppal

Dale said:
I don't think the answer is hiring less skilled programmers but using
real measures for what is a good programmer. Skill and good design,
clear coding and good documentation are not mutually exclusive.

Well said.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top