XML in XMPP

I

Ivan Shmakov

I've found a short discussion of XMPP as an XML application at
[1], which contains some points I cannot agree. But then, I'm
not really that confident in my knowledge of XMPP particulars,
so I'd appreciate if someone could comment on my arguments
below.

[1] http://search.cpan.org/~elmex/AnyEvent-XMPP-0.52/lib/AnyEvent/XMPP/Writer.pm
The whole "XML" concept of XMPP is fundamentally broken anyway. It's
supposed to be an subset of XML. But a subset of XML productions is
not XML.

It's true, but such a subset could satisfy the definition of an
XML application (AIUI), which XMPP is intended to be.
Strictly speaking you need a special XMPP "XML" parser and writer to
be 100% conformant.

OTOH, the requirement of a custom XMPP parser certainly doesn't
fit the notion of an XML application.
On top of that XMPP requires you to parse these partial "XML"
documents. But a partial XML document is not well-formed, heck, it's
not even a XML document! And a parser should bail out with an error.
But XMPP doesn't care, it just relies on implementation dependend
behaviour of chunked parsing modes for SAX parsing. This
functionality isn't even specified by the XML recommendation in any
way. The recommendation even says that it's undefined what happens
if you process not-well-formed XML documents.

And as long as it's undefined (and not denied outright), the
particular interpretation of XML "fragments" used by XMPP seems
more like a natural extension, than a failure to comply with the
standard.
But I try to be as XMPP "XML" conformant as possible (it should be
around 99-100%). But it's hard to say what XML is conformant, as the
specifications of XMPP "XML" and XML are contradicting. For example
XMPP also says you only have to generated and accept UTF-8 encodings
of XML, but the XML recommendation says that each parser has to
accept UTF-8 and UTF-16.

Once again, this is a specialization, and it's my understanding
that an XML application may choose to explicitly define an
acceptable subset of XML.

Though, of course, this allows for XMPP parsers that aren't XML
parsers at the same time.
So, what do you do? Do you use a XML conformant parser or do you
write your own?
I'm using XML::parser::Expat because expat knows how to parse broken
(aka 'partial') "XML" documents, as XMPP requires. Another argument
is that if you capture a XMPP conversation to the end, and even if a
'</stream:stream>' tag was captured, you wont have a valid XML
document. The problem is that you have to resent a <stream> tag
after TLS and SASL authentication each! Awww... I'm repeating
myself.

This one indeed may be a problem, but probably not as much in
practice as in theory.
But well... AnyEvent::XMPP does it's best with expat to cope with
the fundamental brokeness of "XML" in XMPP.
Back to the issue with "XML" generation: I've discoverd that many
XMPP servers (eg. jabberd14 and ejabberd) have problems with XML
namespaces. Thats the reason why I'm assigning the namespace
prefixes manually: The servers just don't accept validly namespaced
XML. The draft 3921bis does even state that a client SHOULD generate
a 'stream' prefix for the <stream> tag.

Indeed, and such a problem seems to be quite common.

To note is that the XHTML 1.1 + MathML 2.0 + SVG 1.1 profile [2]
(as implemented by, e. g., the W3C validator [3]) explicitly
requires that the embedded MathML and SVG documents use the m:
and svg: namespace prefixes, respectively.

My understanding is that it simplifies the task of DTD-based
validation, but DTD doesn't seem such a major part of XML as it
was of SGML, and I doubt of whether it's really necessary to
continue to enforce such restrictions.

[2] http://w3.org/TR/XHTMLplusMathMLplusSVG/
[3] http://validator.w3.org/
 
J

Joe Kesselman

It's true, but such a subset could satisfy the definition of an
XML application (AIUI), which XMPP is intended to be.

Not at all familiar with XMPP, but it sounds like it bears the same sort
of relationship to XML that XML did to SGML -- subset, _possibly_
"backward compatible syntax" in that you could run it through tools
intended for the other syntax if you didn't have something XMPP-specific
available, but Not XML and not really interoperable with XML at anything
beyond that most basic syntactic-subset level.

If an application doesn't use all of XML, that's fine. BUT:
OTOH, the requirement of a custom XMPP parser certainly doesn't
fit the notion of an XML application.

Yep. If it can't _tolerate_ all of XML, it isn't XML..
And as long as it's undefined (and not denied outright), the
particular interpretation of XML "fragments" used by XMPP seems
more like a natural extension, than a failure to comply with the
standard.

XML has a clear definition of well-formed document fragment. If XMPP is
complying with that, it may be fine. If not, no.
Once again, this is a specialization, and it's my understanding
that an XML application may choose to explicitly define an
acceptable subset of XML.

Marginally. There are indeed ASCII-only XML-subset parsers. But they
don't claim to satisfy the XML Recommendation.

If you *can't* use an XML parser, it isn't XML. If you *choose* not to
use an XML parser, that's a different matter.

If the document isn't a well-formed XML document or XML document
fragment, it isn't XML. Period.


What's the advantage of all this breakage supposed to be? Why didn't
they just use XML propertly?
My understanding is that it simplifies the task of DTD-based
validation, but DTD doesn't seem such a major part of XML as it
was of SGML, and I doubt of whether it's really necessary to
continue to enforce such restrictions.

DTDs should be abandoned. They are simply not compatible with XML
Namespaces, and Namespaces should now be considered an essential part of
serious XML processing.

(Believe me, we *tried* to find a model which could reasonably handle
both. There really isn't a reasonable way to retrofit namespaces into
DTDs. DTDs are too bound to raw syntax to work with something that has
semantic behaviors.)


I don't have time to investigate XMPP, but it sounds like its creator
was either lazy and either took unreasonable shortcuts, or diverged
simply to suit their own biases and had no interest in working with the
rest of the XML universe. Unless you like those answers (I don't)
suggest looking for something else which isn't gratuitously incompatible.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
I

Ivan Shmakov

Joe Kesselman said:
[...]
OTOH, the requirement of a custom XMPP parser certainly doesn't fit
the notion of an XML application.
Yep. If it can't _tolerate_ all of XML, it isn't XML..

My guess is that, leaving aside the interpretation of XML
"fragments", the whole recorded XMPP session /should/ comprise a
well-formed XML document.

Yet, once again, an XMPP parser is /not/ required to implement
the whole XML (though it may choose to do so.)
XML has a clear definition of well-formed document fragment.

Huh? Where is it?
If XMPP is complying with that, it may be fine. If not, no.

Unfortunately, I don't know for sure.
Marginally. There are indeed ASCII-only XML-subset parsers. But
they don't claim to satisfy the XML Recommendation.

AIUI, XMPP parsers don't claim to have full XML support. Or, at
least, they're not required to.

[...]
What's the advantage of all this breakage supposed to be? Why didn't
they just use XML propertly?

The purpose of XMPP is to pass around "messages" (either human-
or machine-readable) in real-time.

Apparently, the idea was that the complete recorded XMPP session
/should/ comprise an XML document. But as the XMPP
implementation is required to take action before the session is
over, it has to interpret the bits of XML it receives as soon as
it has a complete bit (or, in XMPP parlance, a "stanza.")
DTDs should be abandoned. They are simply not compatible with XML
Namespaces, and Namespaces should now be considered an essential part
of serious XML processing.
Yes.

(Believe me, we *tried* to find a model which could reasonably handle
both. There really isn't a reasonable way to retrofit namespaces
into DTDs. DTDs are too bound to raw syntax to work with something
that has semantic behaviors.)

Do I understand it correctly that http://validator.w3.org/ is
based on DTD?

BTW, is there a W3C recommendation that explicitly allows for
inclusion of MathML and SVG within an XHTML document (and is
/not/ based on DTD)?
I don't have time to investigate XMPP, but it sounds like its creator
was either lazy and either took unreasonable shortcuts, or diverged
simply to suit their own biases and had no interest in working with
the rest of the XML universe. Unless you like those answers (I
don't) suggest looking for something else which isn't gratuitously
incompatible.

For instance? I'm interested in a "reasonably well supported"
protocol for passing messages in "real-time" (where messages may
contain some XML.) XMPP is so far the only one I've found.
{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."

... And what about XHTML mail?
 
A

Alain Ketterlin

Ivan Shmakov said:
[...]
XML has a clear definition of well-formed document fragment.

Huh? Where is it?

In the XML recommandation: it's called an "external parsed entity".
Apparently, the idea was that the complete recorded XMPP session
/should/ comprise an XML document. But as the XMPP
implementation is required to take action before the session is
over, it has to interpret the bits of XML it receives as soon as
it has a complete bit (or, in XMPP parlance, a "stanza.")

SAX should handle the task. The problem is that at the time an error is
detected, some part of the "document" have already been processed. The
protocol should specify what to do in these cases.
BTW, is there a W3C recommendation that explicitly allows for
inclusion of MathML and SVG within an XHTML document (and is
/not/ based on DTD)?

A working draft (http://www.w3.org/TR/XHTMLplusMathMLplusSVG/)
For instance? I'm interested in a "reasonably well supported"
protocol for passing messages in "real-time" (where messages may
contain some XML.) XMPP is so far the only one I've found.

I guess SOAP is an example.

-- Alain.
 
J

Joe Kesselman

This profile /is/ /based/ on a DTD. The OP explicitly asked about a
recommendation /not/ /based/ on a DTD.

XHTML modularization covers the concept -- which basically consists of
"that's exactly what namespaces are for". See
http://www.w3.org/TR/xhtml-modularization/ and related.




--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
I

Ivan Shmakov

XHTML modularization covers the concept -- which basically consists
of "that's exactly what namespaces are for".

The problem is that the only way http://validator.w3.org/ allows
for XHTML to contain SVG and MathML is via the working draft
cited above, which uses DTD as part of its definition, and thus,
as it was already pointed out, is "poorly compatible" with XML
namespaces.

My guess is that for W3C Validator to be updated to allow for a
fuller understanding of XHTML's "XML nature" there has to be a
W3C recommendation, or a working draft, that explicitly allows
for any XML namespace prefixes in XHTML. AIUI, such a
specification has to be based on something other than DTD.

Thus was my question.

Regarding the XHTML modularization, it was my understanding that
its whole idea was to allow for easier creation of XHTML
profiles. Which seems like an independent issue.

What exactly are the "related" documents?
 
I

Ivan Shmakov

In the XML recommendation: it's called an "external parsed entity".

XMPP stanzas are hardly "external" to XMPP sessions.
SAX should handle the task.

Yes. Indeed, AnyEvent::XMPP::parser uses XML::parser::Expat,
which is event-based.
The problem is that at the time an error is detected, some part of
the "document" have already been processed. The protocol should
specify what to do in these cases.

AIUI, it does.

[...]
I guess SOAP is an example.

ACK, thanks.

Though it looks like I'd have to stick to XMPP, for I search for
a way to extend XMPP clients, anyway.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top