Re: Space between ending and starting tag not ignorable in a XMLdocument? ...

Discussion in 'XML' started by Mayeul, Jul 12, 2011.

  1. Mayeul

    Mayeul Guest

    On 12/07/2011 00:15, lbrt chx _ gemale kom wrote:
    > OK I understand about the need to process those inline characters if you are processing XHTML (and XHTML only! (and/or other types of XML-ish formats such as RTF))
    > ~
    > The actual text of the XML file (which I am validating using the specified schema) looks like this:
    > ~
    > <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/ http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5" xml:lang="en">
    > <siteinfo>
    > <sitename>Wikipedia</sitename>
    > <base>http://en.wikipedia.org/wiki/Main_Page</base>
    > <generator>MediaWiki 1.17wmf1</generator>
    > <case>first-letter</case>
    > <namespaces>
    > <namespace key="-2" case="first-letter">Media</namespace>
    > <namespace key="109" case="first-letter">Book talk</namespace>
    > </namespaces>
    > </siteinfo>
    > </mediawiki>
    > ~
    > What do you call the the carriage return and the four running spaces after the ending "</sitename>" and before the starting"<base>" if you know this is not an XHTML document?
    > "inline text" anyway?


    Well yes. There is no rule on earth stating that XHTML is the only XML
    format to be ever permitted to use the notion of mixed content. Actually
    a handful of XML formats do that as well, and I designed a few ones myself.

    > ~
    >> Bottomline: whitespace between tags is just the same whitespace as
    >> whitespace within tags, because it is indeed within a tag.

    > ~
    > Then there is something I don't get: What would be the XPath declaration to address the text between and ending and another starting tag?


    In your exemple,

    /m:mediawiki/m:siteinfo/m:sitename/following-sibling::text()
    (immediate next neighbour of <sitename>)

    Would also work:

    /m:mediawiki/m:siteinfo/text()[2]
    (second text run which is a child of <siteinfo>. The first one is before
    <sitename>)

    Would also work:

    /m:mediawiki/m:siteinfo/node()[3]
    (third child node of <siteinfo>. The second one is <sitename>)

    > ~
    > How is it that, knowing this is XML, this sequence of characters could be relevant?
    > ~


    Said XML may just be XHTML, or anything alike.

    > This may be a parser dependent flag but probably you know where the shoe hurts and can just let me know what to do. (I am using Xerces2-J)


    I know of no feature of Xerces-J that would decide which whitespace is
    ignorable and which isn't, in any other way than the previously
    described method: use a DTD.
    Xerces will then detect whether whitespace is ignorable or not, even if
    it is not set validating, so the DTD may be minimal, and just include
    the declaration of one element, for instance.

    > ~
    >> However, some applications -- such as those which
    >> want to retain the formatting of the document -- would NOT want to
    >> ignore them. So the parser passes them along, but flags them as being a
    >> special cases so you can decide what you want to do with them.

    > ~
    > As far as I know XMLReaders (I mostly code Java) can read into the actual document and get the encoding and readjust their own encoding, so I thought that they could also notice if they are processing XML or XHTML and do their thing internally


    XHTML is an XML format too.

    > ~
    > Anyway you could do it explicitly with code (using some sort of declaration or guessing based on the extension of the file), but my guess is that parsers may do this
    > ~
    > Once you set up the Xerces SAX Parser by going:
    > ~
    > java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser Inline00Test
    > ~
    > There should be some way to let xerces know I only want to characters within the starting and corresponding closing tag is they validate. How do you do such a thing?
    > ~


    None that I know of aside the use of a DTD. You are, however, totally
    permitted to make nothing of these characters when you receive them.

    --
    Mayeul
    Mayeul, Jul 12, 2011
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shoval Tomer
    Replies:
    0
    Views:
    443
    Shoval Tomer
    Jul 9, 2003
  2. Bob
    Replies:
    0
    Views:
    406
  3. Mayeul
    Replies:
    0
    Views:
    976
    Mayeul
    Jul 11, 2011
  4. Joe Kesselman
    Replies:
    2
    Views:
    889
    Joe Kesselman
    Jul 12, 2011
  5. Joe Kesselman
    Replies:
    0
    Views:
    759
    Joe Kesselman
    Jul 12, 2011
Loading...

Share This Page