Processing Instructions

D

Dominic Olivastro

Hi all:

I'm new to this newsgroup, and new to XML.

We receive documents in XML, and I am trying to tear them apart to obtain
information. I decided that, for my purposes, it would be fairly easy to
write a simple XML parser, which it was. But now suddenly I find that some
of the information I need is in the form of a Processing Instruction, and
not tagged in the usual way. So I get information like this:

<description>
<?BRFSUM description="Brief Summary" end="lead"?>
The present text related to a photodecter and so on in this vein
<?BRFSUM description="Brief Summary" end="tail"?>

Questions:

1. Why is this not placed in the usual tag format?
2. Can I assume that end="lead" will always open the text, and end="tail"
will always close it? Is this usual for Processing Instructions? From what
I've read, there generally isn't any end tag.

Thanks for any help you can give me.

Dom
mailto: (e-mail address removed)
 
A

Ashmodai

Dominic Olivastro scribbled something along the lines of:
Hi all:

I'm new to this newsgroup, and new to XML.

We receive documents in XML, and I am trying to tear them apart to obtain
information. I decided that, for my purposes, it would be fairly easy to
write a simple XML parser, which it was. But now suddenly I find that some
of the information I need is in the form of a Processing Instruction, and
not tagged in the usual way. So I get information like this:

<description>
<?BRFSUM description="Brief Summary" end="lead"?>
The present text related to a photodecter and so on in this vein
<?BRFSUM description="Brief Summary" end="tail"?>

Questions:

1. Why is this not placed in the usual tag format?

Processing instructions aren't elements. The concept is that they tell
the processor something about the document. PHP, a server side scripting
language, for example uses PI brackets because it is executed
("processed") server side.
The most common PI is the xml PI which tells the version and character
encoding of the document. It doesn't say anything about the actual
content, but it explains how to process it (eg. what version must be
supported and what the character encoding setting should be).
Stylesheets are also linked in PIs because they tell how to render the
document.

The idea, to my understanding, is that PIs are namespace, vocabulary and
subset independant. <?xml ...?> means the same in an XForms document as
it does in a SVG file.
I suppose something along the lines of <xml:info version=""/> would have
worked as well, but then it'd have to be inside the root element and
that's a bit too late for the processor.

2. Can I assume that end="lead" will always open the text, and end="tail"
will always close it? Is this usual for Processing Instructions? From what
I've read, there generally isn't any end tag.

PIs don't have ending tags as they aren't normal elements or even tags.
I've never seen PIs enclosing anything by consisting of a set of two or
more PIs, but then again, I'm only using XML for the web, so maybe I
missed something out.

I don't think it's much of a flaw if you don't support EVERY PI there
is, but you should try covering the basics (xml and xml-stylesheet most
importantly).
 
R

Richard Tobin

Dominic Olivastro said:
<description>
<?BRFSUM description="Brief Summary" end="lead"?>
The present text related to a photodecter and so on in this vein
<?BRFSUM description="Brief Summary" end="tail"?>
1. Why is this not placed in the usual tag format?

You'll have to ask the document designer. A couple of possibilities
are:

- The markup is not necessarily nested. You can't do that with start
and end tags, but you can use processing instructions or "point
elements" (i.e. empty elements used to mark the start and end of
something). This doesn't seem very likely given the example.

- The document has to adhere to a fixed DTD that does provide an
element for "brief summary", and processing instructions are being
used to provide the additional markup.
2. Can I assume that end="lead" will always open the text, and end="tail"
will always close it?

Again, you'll have to ask the document designer.

-- Richard
 
A

arnold m. slotnik

[...]
The most common PI is the xml PI which tells the version and
character encoding of the document. It doesn't say anything
about the actual content, but it explains how to process it (eg.
what version must be supported and what the character encoding
setting should be). Stylesheets are also linked in PIs because
they tell how to render the document.

That's the XML Declaration (or Text Declaration in an external parsed
entity), not an "XML PI".

REC-xml-20001006, section 2.8.
 
R

Richard Tobin

arnold m. slotnik said:
That's the XML Declaration (or Text Declaration in an external parsed
entity), not an "XML PI".

True, but it's not just coincidence that it shares the syntax of
PIs. XML is a subset of SGML, and from the SGML point of view the
XML declaration is a PI.

-- Richard
 
A

arnold m. slotnik

(e-mail address removed) (Richard Tobin) wrote in
True, but it's not just coincidence that it shares the syntax of
PIs. XML is a subset of SGML, and from the SGML point of view
the XML declaration is a PI.

From the XSLT point of view, though, there's a big difference between
an XML Declaration and a PI.

Making sure we get the terminology right now can save questions later
on...
 
A

Ashmodai

arnold m. slotnik scribbled something along the lines of:
(e-mail address removed) (Richard Tobin) wrote in




From the XSLT point of view, though, there's a big difference between
an XML Declaration and a PI.

Making sure we get the terminology right now can save questions later
on...

The XML declaration is a mandatory[1] PI in the eyes of the author (and
probably also in the eyes of SGML). What its function when parsing the
document is, is not of the author's concern, they just have to know it's
mandatory[1] and maybe also that it's used to determine the version and
character encoding.
That's like saying the root is not an element because it's the root,
which is a special element.

[1] Okay, maybe not mandatory, but very recommended.
 
A

arnold m. slotnik

The XML declaration is a mandatory[1] PI in the eyes of the
author (and probably also in the eyes of SGML). What its
function when parsing the document is, is not of the author's
concern, they just have to know it's mandatory[1] and maybe also
that it's used to determine the version and character encoding.
That's like saying the root is not an element because it's the
root, which is a special element.

[1] Okay, maybe not mandatory, but very recommended.


<rant>
I know what the XML Declaration is--and what it isn't. It isn't a
PI--looks like one, but the editors of the spec were very clear
that it isn't a PI. It's a special construct, recommended
("should") in XML 1.0 and mandatory ("must") in XML 1.1.

XSLT has a special function for attaching an XML Declaration to an
output tree, a different function for creating PIs in the output
tree.

How many times have we seen in this and other venues, "How do I
write the XML PI on my output?" Ask the right question, it's easy
to find the right answer.

Tool vendors have confused the XML Declaration, the Text
Declaration, and a garden variety PI in their tools. (Anyone
besides me really annoyed by editing packages that put <?xml
version="1.0"?> on *everything*?)

It doesn't belong on a DTD--a DTD is not an XML document.

It doesn't belong on a external parsed entity--they take a Text
Declaration, which must contain the encoding and *may* contain the
version.

It's very specifically the XML Declaration--with a specific set of
related functions and usage--not "the XML PI".
</rant>
 
A

Ashmodai

arnold m. slotnik scribbled something along the lines of:
The XML declaration is a mandatory[1] PI in the eyes of the
author (and probably also in the eyes of SGML). What its
function when parsing the document is, is not of the author's
concern, they just have to know it's mandatory[1] and maybe also
that it's used to determine the version and character encoding.
That's like saying the root is not an element because it's the
root, which is a special element.

[1] Okay, maybe not mandatory, but very recommended.



<rant>
I know what the XML Declaration is--and what it isn't. It isn't a
PI--looks like one, but the editors of the spec were very clear
that it isn't a PI. It's a special construct, recommended
("should") in XML 1.0 and mandatory ("must") in XML 1.1.

XSLT has a special function for attaching an XML Declaration to an
output tree, a different function for creating PIs in the output
tree.

How many times have we seen in this and other venues, "How do I
write the XML PI on my output?" Ask the right question, it's easy
to find the right answer.

Tool vendors have confused the XML Declaration, the Text
Declaration, and a garden variety PI in their tools. (Anyone
besides me really annoyed by editing packages that put <?xml
version="1.0"?> on *everything*?)

It doesn't belong on a DTD--a DTD is not an XML document.

It doesn't belong on a external parsed entity--they take a Text
Declaration, which must contain the encoding and *may* contain the
version.

It's very specifically the XML Declaration--with a specific set of
related functions and usage--not "the XML PI".
</rant>

I feel so loved.


Actually, putting XML PIs on everything is as dumb as putting the, say,
XHTML 1.1 Doctype declaration on everything -- why would anybody be THAT
stupid?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top