Reading RSS XML with IE

A

Andrew Poulos

When loading an rss feed into Windows IE, doc.childNodes.length always
equals 0. If I manually delete the <!DOCTYPE tag doc.childNodes.length
is correct.

I'm using
doc = new ActiveXObject("Microsoft.XMLDOM");
to load the rss. Is this where the problem lies?

(Using document.implementation.createDocument with FF reads the XML
correctly with or without a DOCTYPE.)

Andrew Poulos
 
M

Martin Honnen

Andrew said:
When loading an rss feed into Windows IE, doc.childNodes.length always
equals 0. If I manually delete the <!DOCTYPE tag doc.childNodes.length
is correct.

I'm using
doc = new ActiveXObject("Microsoft.XMLDOM");
to load the rss. Is this where the problem lies?

Hard to tell, we need to see more code, whether you load synchronously
or asynchronously.
Some settings to play with are
doc.resoveExternals = false
doc.validateOnParse = false
And how exactly does that !DOCTYPE declaration look like? Is the XML
valid in regard to that declared document type?
Have you checked
doc.parseError.errorCode
doc.parseError.reason

While Mozilla uses with Expat a non validating parser that ignores
externals resources IE uses MSXML and MSXML can validate against a DTD.
If you want to validate against the DTD then you need to check whether
there is a parseError.
 
A

Andrew Poulos

Martin said:
Hard to tell, we need to see more code, whether you load synchronously
or asynchronously.
Some settings to play with are
doc.resoveExternals = false
doc.validateOnParse = false
And how exactly does that !DOCTYPE declaration look like? Is the XML
valid in regard to that declared document type?
Have you checked
doc.parseError.errorCode
doc.parseError.reason

Checking the error reason did it. I copied the RSS feed to my hard drive
and I must've inadvertently edited a tag name.

Andrew Poulos
 
A

Andrew Poulos

Andrew said:
Checking the error reason did it. I copied the RSS feed to my hard drive
and I must've inadvertently edited a tag name.

I spoke too soon. I tried parsing the RSS from this link:
<url: http://www.nasa.gov/rss/image_of_the_day.rss >
but IE tells me that "The element 'rss' is used but not declared in the
DTD/Schema."

This seems odd to me. Does this mean that the XML itself is invalid or
that there's some resource that I don't access to that is causing the
problem.

Andrew Poulos
 
V

VK

Duncan said:
Yes, it means the XML is invalid.

The XML contains a DTD embedded inline in the document, but the DTD only
defines some entities & not any elements, so the document will fail to
validate. If you can turn off validation, or tell the parser to ignore the
inline DTD you may be handle it.

Did you try (IE-only):

<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
</head>

<body onload="alert(document.scripts[0])">
<script type="text/xml"
src="http://www.nasa.gov/rss/image_of_the_day.rss"></script>
</body>
</html>

That works just fine (means no parsing errors).
 
A

Andrew Poulos

VK said:
Duncan said:
Yes, it means the XML is invalid.

The XML contains a DTD embedded inline in the document, but the DTD only
defines some entities & not any elements, so the document will fail to
validate. If you can turn off validation, or tell the parser to ignore the
inline DTD you may be handle it.

Did you try (IE-only):

<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
</head>

<body onload="alert(document.scripts[0])">
<script type="text/xml"
src="http://www.nasa.gov/rss/image_of_the_day.rss"></script>
</body>
</html>

That works just fine (means no parsing errors).
I'm not sure what you're doing. How would I access nodes etc?

I'm using the activeX object to load the XML file so that I can later
walk it's DOM. I agree with Duncan Booth that the DTD fails to define
any ELEMENTS and so the XML is invalid.

Andrew Poulos
 
M

Martin Honnen

Andrew Poulos wrote:

I spoke too soon. I tried parsing the RSS from this link:
<url: http://www.nasa.gov/rss/image_of_the_day.rss >
but IE tells me that "The element 'rss' is used but not declared in the
DTD/Schema."

As already suggested you can set
xmlDocument.validateOnParse = false;
before calling the load method and that way you can ensure that the DOM
is built if the XML is well-formed without being valid.
 
A

Andrew Poulos

Martin said:
As already suggested you can set
xmlDocument.validateOnParse = false;
before calling the load method and that way you can ensure that the DOM
is built if the XML is well-formed without being valid.

I used parseError.reason to show me what was wrong with the sample RSS
XML I was testing and I didn't understand what this "new" error meant on
"real" RSS. I think I'll try to read the XML twice. If it fails trying
to validate the XML I'll try it without validating.

thanks
Andrew Poulos
 
V

VK

Andrew said:
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
</head>

<body onload="alert(document.scripts[0])">
<script type="text/xml"
src="http://www.nasa.gov/rss/image_of_the_day.rss"></script>
</body>
</html>
I'm not sure what you're doing. How would I access nodes etc?

It is called "Dynamic Data Island" and it implants dynamic XML data
source into document. You're accessing data later (after script.onload)
using document.scripts.XMLDocument + standard XML DOM methods.

Nevertheless it fails on the data source in question
"image_of_the_day.rss" for the same reason why responseXML is not set
simetimes: because RSS feed is *not* XML though it uses XML format. Its
MIME (if served properly) should be say "application/rss+xml" or other
(depending on what rss format is used: RSS, Atom). In any case nothing
similar to the needed "text/xml" as you can see. You need to have a
registered MIME in your browser (comes with installed RSS readers). Or
you have to read it as plain vanilla text and parse it manually - or
feed it manually to browser's XML parser.
DTD fails to define
any ELEMENTS and so the XML is invalid.

This sentence has no meaning for my humble mind. XML by definition can
consist of any proprietary nodes, as long as they paired properly. I
can use:
<foobar>
<foo>Foo</foo>
<bar>Bar</bar>
</foobar>

w/o any DTD "permissions" to use <foobar>, <foo>, <bar>

It fails because Content-Type doesn't match to the expected: "wants
text/xml, got application/rss+xml"
 
V

VK

Duncan said:
Yes, that's fine: you can use whatever tags you want without a DTD and the
only problem is that the parser won't be able to validate it.

The problem with the XML in question was that a DTD *was* specified, and
when a DTD or schema is supplied a validating parser will reject XML which
does not validate correctly.

Uhm... Are we still talking about the same file?
<http://www.nasa.gov/rss/image_of_the_day.rss>

I don't see and tag DTD, just a set of named entities definitions like
"nbsp" or "pound" (what are they doing any way in a UTF-8 doc?)

IE 6.0 displays it as no problem (no error of any kind) if you trick it
to believe that this is indeed XML - I just did it. btw it's RSS 2.0
format

<rss version="2.0">...</rss> has no value to IE by itself, it's not a
DTD, just another proprietary XML tag. One could use instead <foo
version="0.1 beta">...</foo> with the same results.

I guess the real problem is that both Firefox (?) and Opera (for sure)
are coming with RSS reader build in, so they know in advance what
parser to use for the given Content-Type.
IE doesn't have RSS reader build in, so you have either install one, or
get the text and handle it manually. Just a suggestion.
 
V

VK

Duncan said:
The DTD is specified using an internal subset rather
than an external subset. The internal subset only contains entity
declarations, but its presence is sufficient to give something for the
parser to try to validate the document. If they are going to put some
entity declarations there then they *must* either make it a complete DTD
including element declarations as well or specify an external subset.

From Extensible Markup Language (XML) 1.0, section 2.8 Prolog and Document
Type Declaration:


So there is a DTD, but it is incomplete.

Thanks for inside. It still brings us at the round one: rss file is not
XML, it's a data package using XML format. If client machine has a rss
reader, then it knows what where and how to read. Otherwise maximum you
can do is to spoof XML parser by feeding into it rss data as a real
valid XML. The parser may eat it or spit out, and for the latter it has
all rights too.

Truthfully I consider both RSS and Atom feeds as unfit to eat :) but I
had to learn them while working on JSONet news feeds.

To OP:
Try to add the relevant namespace before reading the feed (no guarantee
though):
....
document.namespaces.add('dc','http://purl.org/dc/elements/1.1/');
....

There are more RSS specs than bugs on the tree, so you may experiment
with the namespace reference.
 
T

Thomas 'PointedEars' Lahn

VK said:
Thanks for inside. It still brings us at the round one: rss file is not
XML,

Of course, since XML is but a metalanguage that defines its applications.
it's a data package using XML format.

First, RSS is an XML application. And if it is one that is to be used, for
adhering to the XML well-formedness standard, its DTD has to be complete or
the undeclared elements and attributes MUST NOT be used. If the DTD is
incomplete and the DTD author is not inclined to change that, authors will
have to augment the DTD through subset declarations in order to produce a
Valid and well-formed XML document, one that is possible to be parsed
without fatal error by an XML parser.


PointedEars
 
D

Dag Sunde

VK said:
Thanks for inside. It still brings us at the round one: rss file is not
XML, it's a data package using XML format. If client machine has a rss
reader, then it knows what where and how to read. Otherwise maximum you
can do is to spoof XML parser by feeding into it rss data as a real
valid XML. The parser may eat it or spit out, and for the latter it has
all rights too.

Oh! Interesting...

Explain to me why rss isn't XML...?

(http://www-128.ibm.com/developerworks/library/w-rss.html?dwzone=web#h0)
 
V

VK

Dag said:
Explain to me why rss isn't XML...?

Because it's content type is not neither "text/xml" nor
"application/xml".

Or, to make it more visual, a .cpp file is not a C++ program, but it
can become a program if you pass it though the right parser. Or it may
remain what it is: a chaotic (from the machine point of view)
agglomeration of characters in a text file.

XML is eXtensible Markup Language, and RSS feed is a data source (with
its own MIME) where data chunks are *marked up* using XML syntaxs.
 
T

Thomas 'PointedEars' Lahn

VK said:
A perfect peace of unfit to read junk, has nothing to do with the
reality (unless it is OK to skip on 80%-95% of your visitors). Any
better links? (at least >50% coverafe)

"The document type specification itself does not have anything to do
with the reality."

YMMD.


PointedEars
 
V

VK

Thomas said:
"The document type specification itself does not have anything to do
with the reality."

Totally right - if you're dealing with the Web. In this case there is
only His Majesty Content-Type.

Like you can name your file reallyRealXML.xml.xml.xml with a bunch of
declarations inside but if it's served w/o "text/xml" Content-Type then
it will be treated equally with some looseText.txt file. It may seem
unfair but it is as it is.
 
D

Dag Sunde

VK said:
Because it's content type is not neither "text/xml" nor
"application/xml".

???

What the hell does Content-type got to do with XML or RSS for that
sake?

Content-type belongs to http. it doesn't have *anything* to do with
rss or xml!

Incidentally, if you happen to send RSS XML over HTTP, and doesn't send
the correct HTTP headers to the AU, you're *still* sending RSS/XML.
You're just running the risk that the UA don't understand what you're
sending.

The majority of xml don't even care about web-browsers, http, or the
internet at all. In that context, content-type and MIME don't make sense.

You are confusing/mixing a lot of terms here.
XML is eXtensible Markup Language, and RSS feed is a data source (with
its own MIME) where data chunks are *marked up* using XML syntaxs.

RSS is no such thing!
RSS is a very specific application of XML.
There is nothing "chunky" about it! To be proper RSS, it must be
completely well-formed, and validating XML.
 
V

VK

Dag said:
What the hell does Content-type got to do with XML or RSS for that
sake?

Syllabically:

1) If it is *not* "text/xml" or "application/xml" (for newer IE)
Content-Type then XML Parser is not turner on and all input goes as
plain vanilla text into responseText reservoir while responseXML
reservoir remains empty.

2) If Content-Type is "application/xml+rss" then what to do and how to
deal with such content depends on MIME association on the current UA.

If there is not any association for the given Content-Type then no
parser is used and all content goes unparsed to the responseText
reservoir..

Exeptions: some Content-Types are recognized but their content access
is limited for security reasons. For instance binary files like .exe or
images are recognized but the responseText will contain only file
header string.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top