XML in XMPP

Discussion in 'XML' started by Ivan Shmakov, Jul 6, 2012.

  1. Ivan Shmakov

    Ivan Shmakov Guest

    I've found a short discussion of XMPP as an XML application at
    [1], which contains some points I cannot agree. But then, I'm
    not really that confident in my knowledge of XMPP particulars,
    so I'd appreciate if someone could comment on my arguments
    below.

    [1] http://search.cpan.org/~elmex/AnyEvent-XMPP-0.52/lib/AnyEvent/XMPP/Writer.pm

    > The whole "XML" concept of XMPP is fundamentally broken anyway. It's
    > supposed to be an subset of XML. But a subset of XML productions is
    > not XML.


    It's true, but such a subset could satisfy the definition of an
    XML application (AIUI), which XMPP is intended to be.

    > Strictly speaking you need a special XMPP "XML" parser and writer to
    > be 100% conformant.


    OTOH, the requirement of a custom XMPP parser certainly doesn't
    fit the notion of an XML application.

    > On top of that XMPP requires you to parse these partial "XML"
    > documents. But a partial XML document is not well-formed, heck, it's
    > not even a XML document! And a parser should bail out with an error.
    > But XMPP doesn't care, it just relies on implementation dependend
    > behaviour of chunked parsing modes for SAX parsing. This
    > functionality isn't even specified by the XML recommendation in any
    > way. The recommendation even says that it's undefined what happens
    > if you process not-well-formed XML documents.


    And as long as it's undefined (and not denied outright), the
    particular interpretation of XML "fragments" used by XMPP seems
    more like a natural extension, than a failure to comply with the
    standard.

    > But I try to be as XMPP "XML" conformant as possible (it should be
    > around 99-100%). But it's hard to say what XML is conformant, as the
    > specifications of XMPP "XML" and XML are contradicting. For example
    > XMPP also says you only have to generated and accept UTF-8 encodings
    > of XML, but the XML recommendation says that each parser has to
    > accept UTF-8 and UTF-16.


    Once again, this is a specialization, and it's my understanding
    that an XML application may choose to explicitly define an
    acceptable subset of XML.

    Though, of course, this allows for XMPP parsers that aren't XML
    parsers at the same time.

    > So, what do you do? Do you use a XML conformant parser or do you
    > write your own?


    > I'm using XML::parser::Expat because expat knows how to parse broken
    > (aka 'partial') "XML" documents, as XMPP requires. Another argument
    > is that if you capture a XMPP conversation to the end, and even if a
    > '</stream:stream>' tag was captured, you wont have a valid XML
    > document. The problem is that you have to resent a <stream> tag
    > after TLS and SASL authentication each! Awww... I'm repeating
    > myself.


    This one indeed may be a problem, but probably not as much in
    practice as in theory.

    > But well... AnyEvent::XMPP does it's best with expat to cope with
    > the fundamental brokeness of "XML" in XMPP.


    > Back to the issue with "XML" generation: I've discoverd that many
    > XMPP servers (eg. jabberd14 and ejabberd) have problems with XML
    > namespaces. Thats the reason why I'm assigning the namespace
    > prefixes manually: The servers just don't accept validly namespaced
    > XML. The draft 3921bis does even state that a client SHOULD generate
    > a 'stream' prefix for the <stream> tag.


    Indeed, and such a problem seems to be quite common.

    To note is that the XHTML 1.1 + MathML 2.0 + SVG 1.1 profile [2]
    (as implemented by, e. g., the W3C validator [3]) explicitly
    requires that the embedded MathML and SVG documents use the m:
    and svg: namespace prefixes, respectively.

    My understanding is that it simplifies the task of DTD-based
    validation, but DTD doesn't seem such a major part of XML as it
    was of SGML, and I doubt of whether it's really necessary to
    continue to enforce such restrictions.

    [2] http://w3.org/TR/XHTMLplusMathMLplusSVG/
    [3] http://validator.w3.org/

    --
    FSF associate member #7257
    Ivan Shmakov, Jul 6, 2012
    #1
    1. Advertising

  2. On 7/6/2012 5:54 AM, Ivan Shmakov wrote:
    > > The whole "XML" concept of XMPP is fundamentally broken anyway. It's
    > > supposed to be an subset of XML. But a subset of XML productions is
    > > not XML.

    >
    > It's true, but such a subset could satisfy the definition of an
    > XML application (AIUI), which XMPP is intended to be.


    Not at all familiar with XMPP, but it sounds like it bears the same sort
    of relationship to XML that XML did to SGML -- subset, _possibly_
    "backward compatible syntax" in that you could run it through tools
    intended for the other syntax if you didn't have something XMPP-specific
    available, but Not XML and not really interoperable with XML at anything
    beyond that most basic syntactic-subset level.

    If an application doesn't use all of XML, that's fine. BUT:

    > OTOH, the requirement of a custom XMPP parser certainly doesn't
    > fit the notion of an XML application.


    Yep. If it can't _tolerate_ all of XML, it isn't XML..

    > And as long as it's undefined (and not denied outright), the
    > particular interpretation of XML "fragments" used by XMPP seems
    > more like a natural extension, than a failure to comply with the
    > standard.


    XML has a clear definition of well-formed document fragment. If XMPP is
    complying with that, it may be fine. If not, no.

    > Once again, this is a specialization, and it's my understanding
    > that an XML application may choose to explicitly define an
    > acceptable subset of XML.


    Marginally. There are indeed ASCII-only XML-subset parsers. But they
    don't claim to satisfy the XML Recommendation.

    > > So, what do you do? Do you use a XML conformant parser or do you
    > > write your own?


    If you *can't* use an XML parser, it isn't XML. If you *choose* not to
    use an XML parser, that's a different matter.

    If the document isn't a well-formed XML document or XML document
    fragment, it isn't XML. Period.


    What's the advantage of all this breakage supposed to be? Why didn't
    they just use XML propertly?

    > My understanding is that it simplifies the task of DTD-based
    > validation, but DTD doesn't seem such a major part of XML as it
    > was of SGML, and I doubt of whether it's really necessary to
    > continue to enforce such restrictions.


    DTDs should be abandoned. They are simply not compatible with XML
    Namespaces, and Namespaces should now be considered an essential part of
    serious XML processing.

    (Believe me, we *tried* to find a model which could reasonably handle
    both. There really isn't a reasonable way to retrofit namespaces into
    DTDs. DTDs are too bound to raw syntax to work with something that has
    semantic behaviors.)


    I don't have time to investigate XMPP, but it sounds like its creator
    was either lazy and either took unreasonable shortcuts, or diverged
    simply to suit their own biases and had no interest in working with the
    rest of the XML universe. Unless you like those answers (I don't)
    suggest looking for something else which isn't gratuitously incompatible.

    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Kesselman, Jul 8, 2012
    #2
    1. Advertising

  3. Ivan Shmakov

    Ivan Shmakov Guest

    >>>>> Joe Kesselman <> writes:
    >>>>> On 7/6/2012 5:54 AM, Ivan Shmakov wrote:


    [...]

    >> OTOH, the requirement of a custom XMPP parser certainly doesn't fit
    >> the notion of an XML application.


    > Yep. If it can't _tolerate_ all of XML, it isn't XML..


    My guess is that, leaving aside the interpretation of XML
    "fragments", the whole recorded XMPP session /should/ comprise a
    well-formed XML document.

    Yet, once again, an XMPP parser is /not/ required to implement
    the whole XML (though it may choose to do so.)

    >> And as long as it's undefined (and not denied outright), the
    >> particular interpretation of XML "fragments" used by XMPP seems more
    >> like a natural extension, than a failure to comply with the
    >> standard.


    > XML has a clear definition of well-formed document fragment.


    Huh? Where is it?

    > If XMPP is complying with that, it may be fine. If not, no.


    Unfortunately, I don't know for sure.

    >> Once again, this is a specialization, and it's my understanding that
    >> an XML application may choose to explicitly define an acceptable
    >> subset of XML.


    > Marginally. There are indeed ASCII-only XML-subset parsers. But
    > they don't claim to satisfy the XML Recommendation.


    AIUI, XMPP parsers don't claim to have full XML support. Or, at
    least, they're not required to.

    [...]

    > What's the advantage of all this breakage supposed to be? Why didn't
    > they just use XML propertly?


    The purpose of XMPP is to pass around "messages" (either human-
    or machine-readable) in real-time.

    Apparently, the idea was that the complete recorded XMPP session
    /should/ comprise an XML document. But as the XMPP
    implementation is required to take action before the session is
    over, it has to interpret the bits of XML it receives as soon as
    it has a complete bit (or, in XMPP parlance, a "stanza.")

    >> My understanding is that it simplifies the task of DTD-based
    >> validation, but DTD doesn't seem such a major part of XML as it was
    >> of SGML, and I doubt of whether it's really necessary to continue to
    >> enforce such restrictions.


    > DTDs should be abandoned. They are simply not compatible with XML
    > Namespaces, and Namespaces should now be considered an essential part
    > of serious XML processing.


    Yes.

    > (Believe me, we *tried* to find a model which could reasonably handle
    > both. There really isn't a reasonable way to retrofit namespaces
    > into DTDs. DTDs are too bound to raw syntax to work with something
    > that has semantic behaviors.)


    Do I understand it correctly that http://validator.w3.org/ is
    based on DTD?

    BTW, is there a W3C recommendation that explicitly allows for
    inclusion of MathML and SVG within an XHTML document (and is
    /not/ based on DTD)?

    > I don't have time to investigate XMPP, but it sounds like its creator
    > was either lazy and either took unreasonable shortcuts, or diverged
    > simply to suit their own biases and had no interest in working with
    > the rest of the XML universe. Unless you like those answers (I
    > don't) suggest looking for something else which isn't gratuitously
    > incompatible.


    For instance? I'm interested in a "reasonably well supported"
    protocol for passing messages in "real-time" (where messages may
    contain some XML.) XMPP is so far the only one I've found.

    > {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    > /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."


    ... And what about XHTML mail?

    --
    FSF associate member #7257
    Ivan Shmakov, Jul 9, 2012
    #3
  4. Ivan Shmakov <> writes:

    >>>>>> Joe Kesselman <> writes:


    [...]
    > > XML has a clear definition of well-formed document fragment.

    >
    > Huh? Where is it?


    In the XML recommandation: it's called an "external parsed entity".

    > Apparently, the idea was that the complete recorded XMPP session
    > /should/ comprise an XML document. But as the XMPP
    > implementation is required to take action before the session is
    > over, it has to interpret the bits of XML it receives as soon as
    > it has a complete bit (or, in XMPP parlance, a "stanza.")


    SAX should handle the task. The problem is that at the time an error is
    detected, some part of the "document" have already been processed. The
    protocol should specify what to do in these cases.

    > BTW, is there a W3C recommendation that explicitly allows for
    > inclusion of MathML and SVG within an XHTML document (and is
    > /not/ based on DTD)?


    A working draft (http://www.w3.org/TR/XHTMLplusMathMLplusSVG/)

    > For instance? I'm interested in a "reasonably well supported"
    > protocol for passing messages in "real-time" (where messages may
    > contain some XML.) XMPP is so far the only one I've found.


    I guess SOAP is an example.

    -- Alain.
    Alain Ketterlin, Jul 9, 2012
    #4
  5. El 09/07/2012 10:38, Alain Ketterlin escribió:
    > Ivan Shmakov <> writes:
    > ...
    >> BTW, is there a W3C recommendation that explicitly allows for
    >> inclusion of MathML and SVG within an XHTML document (and is
    >> /not/ based on DTD)?

    >
    > A working draft (http://www.w3.org/TR/XHTMLplusMathMLplusSVG/)


    This profile /is/ /based/ on a DTD. The OP explicitly asked about a
    recommendation /not/ /based/ on a DTD.

    --
    Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
    Manuel Collado, Jul 9, 2012
    #5
  6. On 7/9/2012 11:04 AM, Manuel Collado wrote:
    >>> BTW, is there a W3C recommendation that explicitly allows for
    >>> inclusion of MathML and SVG within an XHTML document (and is
    >>> /not/ based on DTD)?


    > This profile /is/ /based/ on a DTD. The OP explicitly asked about a
    > recommendation /not/ /based/ on a DTD.


    XHTML modularization covers the concept -- which basically consists of
    "that's exactly what namespaces are for". See
    http://www.w3.org/TR/xhtml-modularization/ and related.




    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Kesselman, Jul 11, 2012
    #6
  7. Ivan Shmakov

    Ivan Shmakov Guest

    XHTML 1.1, SVG, MathML

    >>>>> Joe Kesselman <> writes:
    >>>>> On 7/9/2012 11:04 AM, Manuel Collado wrote:


    >>>> BTW, is there a W3C recommendation that explicitly allows for
    >>>> inclusion of MathML and SVG within an XHTML document (and is /not/
    >>>> based on DTD)?


    >>> A working draft (http://www.w3.org/TR/XHTMLplusMathMLplusSVG/)


    >> This profile /is/ /based/ on a DTD. The OP explicitly asked about a
    >> recommendation /not/ /based/ on a DTD.


    > XHTML modularization covers the concept -- which basically consists
    > of "that's exactly what namespaces are for".


    The problem is that the only way http://validator.w3.org/ allows
    for XHTML to contain SVG and MathML is via the working draft
    cited above, which uses DTD as part of its definition, and thus,
    as it was already pointed out, is "poorly compatible" with XML
    namespaces.

    My guess is that for W3C Validator to be updated to allow for a
    fuller understanding of XHTML's "XML nature" there has to be a
    W3C recommendation, or a working draft, that explicitly allows
    for any XML namespace prefixes in XHTML. AIUI, such a
    specification has to be based on something other than DTD.

    Thus was my question.

    Regarding the XHTML modularization, it was my understanding that
    its whole idea was to allow for easier creation of XHTML
    profiles. Which seems like an independent issue.

    > See http://www.w3.org/TR/xhtml-modularization/ and related.


    What exactly are the "related" documents?

    --
    FSF associate member #7257 http://sf-day.org/
    Ivan Shmakov, Jul 11, 2012
    #7
  8. Ivan Shmakov

    Ivan Shmakov Guest

    >>>>> Alain Ketterlin <-strasbg.fr> writes:
    >>>>> Ivan Shmakov <> writes:
    >>>>> Joe Kesselman <> writes:


    >>> XML has a clear definition of well-formed document fragment.


    >> Huh? Where is it?


    > In the XML recommendation: it's called an "external parsed entity".


    XMPP stanzas are hardly "external" to XMPP sessions.

    >> Apparently, the idea was that the complete recorded XMPP session
    >> /should/ comprise an XML document. But as the XMPP implementation
    >> is required to take action before the session is over, it has to
    >> interpret the bits of XML it receives as soon as it has a complete
    >> bit (or, in XMPP parlance, a "stanza.")


    > SAX should handle the task.


    Yes. Indeed, AnyEvent::XMPP::parser uses XML::parser::Expat,
    which is event-based.

    > The problem is that at the time an error is detected, some part of
    > the "document" have already been processed. The protocol should
    > specify what to do in these cases.


    AIUI, it does.

    [...]

    >> For instance? I'm interested in a "reasonably well supported"
    >> protocol for passing messages in "real-time" (where messages may
    >> contain some XML.) XMPP is so far the only one I've found.


    > I guess SOAP is an example.


    ACK, thanks.

    Though it looks like I'd have to stick to XMPP, for I search for
    a way to extend XMPP clients, anyway.

    --
    FSF associate member #7257 http://sf-day.org/
    Ivan Shmakov, Jul 11, 2012
    #8
  9. Re: XHTML 1.1, SVG, MathML

    On 7/10/2012 11:36 PM, Ivan Shmakov wrote:
    > What exactly are the "related" documents?


    Searching the W3C website for "modularization" finds the ones I know of.


    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Kesselman, Jul 12, 2012
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. James Mills

    XMPP xmpppy - User Authorization

    James Mills, Nov 5, 2008, in forum: Python
    Replies:
    0
    Views:
    371
    James Mills
    Nov 5, 2008
  2. James Mills

    Re: XMPP xmpppy - User Authorization

    James Mills, Nov 5, 2008, in forum: Python
    Replies:
    3
    Views:
    758
    Henson
    Dec 15, 2008
  3. Gabriel Rossetti

    Blocking XMPP API?

    Gabriel Rossetti, Jul 9, 2009, in forum: Python
    Replies:
    2
    Views:
    392
    Gabriel Rossetti
    Jul 13, 2009
  4. Astan Chee

    webcam in gtalk/xmpp

    Astan Chee, Sep 15, 2010, in forum: Python
    Replies:
    0
    Views:
    305
    Astan Chee
    Sep 15, 2010
  5. Eric Will

    parsing xml (xmpp) with ruby

    Eric Will, Sep 27, 2008, in forum: Ruby
    Replies:
    3
    Views:
    256
    Eric Will
    Sep 27, 2008
Loading...

Share This Page