InnerHTML not grabbing entire HTML if <p> is present

M

MaryA

Let me preface this with the fact that I am a newbie to HTML, XML and
Javascript. Having said that, let me explain my dilemma:

I am having a difficult time getting innerHTML to consistently return
the entire HTML string when a <p> is part of the text. It somehow
throws everything off. I am trying to store exam questions in an
Oracle Database as XML.
The XML I started with looked like this:

<aicpcu id="XMLTEXT"><stem id="STEM">This is my stem<p>this is the
second line</p></stem><options id="OPTIONS"><distracter
id="DISTRACTER">distracter 1</distracter>"><distracter
id="DISTRACTER">distracter 2</distracter>"><distracter
id="DISTRACTER">distracter 3</distracter>"><distracter
id="DISTRACTER">distracter 4</distracter></options></aicpcu>

I used this code to get at the text I wanted for the stem:
var s = document.getElementById("STEM");
alert(s.innerHTML);

I would only get This is my stem.

However, when I changed the XML to:
<div id="STEM" mix="Y">This is my stem<p><b>This is the second
line.</b></p></div>

I would get both lines.

Now for the distracter part. Since I got the stem working properly, I
tried to modify what I had in a similar manner. I've tried many
iterations of this, but this gives me the closest to what I want;
<div id="DISTRACTERS"><distracter id="DISTRACTER" label="A">First
distracter</distracter><distracter id="DISTRACTER" label="B"><b>Second
distracter</b></distracter><distracter id="DISTRACTER"
label="C"><i>Third distracter</i></distracter><distracter
id="DISTRACTER" label="D">Fourth distracter<p>second
sentence</p></distracter></div>

The code I use to extract the innerHTML is:
var d = document.getElementById("DISTRACTERS");
alert(d.childNodes.length); -- returns 5 (the <p> apparently shows as a
separate child)
var dList = document.getElementsByTagName("DISTRACTER");
alert(dList.length); -- returns 4 (correct)
for (var j=0; dList.length > j; j++)
{
htp.p(' alert(dList[j].innerHTML);');
}

All the innerHTML is displayed properly except the 4th one that has 2
paragraphs to it. I just get the text "Fourth distracter".

Any help would be greatly appreciated.
 
M

Martin Honnen

MaryA wrote:

The XML I started with looked like this:

<aicpcu id="XMLTEXT"><stem id="STEM">This is my stem<p>this is the
second line</p></stem><options id="OPTIONS"><distracter
id="DISTRACTER">distracter 1</distracter>"><distracter
id="DISTRACTER">distracter 2</distracter>"><distracter
id="DISTRACTER">distracter 3</distracter>"><distracter
id="DISTRACTER">distracter 4</distracter></options></aicpcu>

I used this code to get at the text I wanted for the stem:
var s = document.getElementById("STEM");
alert(s.innerHTML);

Why do you expect innerHTML to make any sense on XML? What kind of
document is that document object you call getElementById on?
 
M

MaryA

It is just sitting in an HTML document. It is written at the bottom of
document, for now. I have to find a way to hid it and still be able to
access it. If I put it in as a comment, the getElementById does not
find it. I was under the impression that the DOM standard can be used
to access XML and HTML. It seems to work, in part.
 
L

Lasse Reichstein Nielsen

MaryA said:
I am having a difficult time getting innerHTML to consistently return
the entire HTML string when a <p> is part of the text. It somehow
throws everything off. I am trying to store exam questions in an
Oracle Database as XML.
The XML I started with looked like this:

<aicpcu id="XMLTEXT"><stem id="STEM">This is my stem<p>this is the
second line</p></stem><options id="OPTIONS"><distracter
id="DISTRACTER">distracter 1</distracter>"><distracter
id="DISTRACTER">distracter 2</distracter>"><distracter
id="DISTRACTER">distracter 3</distracter>"><distracter
id="DISTRACTER">distracter 4</distracter></options></aicpcu>

Your first problem is that your are using the HTML DOM, and probably
an HTML parser, on something that isn't HTML. It's just XML.
I used this code to get at the text I wanted for the stem:
var s = document.getElementById("STEM");
alert(s.innerHTML);

I would only get This is my stem.

My guess is that the parser, not knowing the "stem" element, is
reading "This is my stem" as part of an inline element (or should it
be "span"?). That element is ended by the opening tag of the block
level element "p".
However, when I changed the XML to:
<div id="STEM" mix="Y">This is my stem<p><b>This is the second
line.</b></p></div>

I would get both lines.

Because "div" is a block level element that can contain other
block level elements, so the "p" element can be part of it.
The code I use to extract the innerHTML is:
var d = document.getElementById("DISTRACTERS");
alert(d.childNodes.length); -- returns 5 (the <p> apparently shows as a
separate child)

Again, the same explanation can hold. If the "distratcter" element
is considered an inline element, then the <p> ends it and starts
a new child.

/L
 
L

Lasse Reichstein Nielsen

MaryA said:
It is just sitting in an HTML document.

Well it's not an HTML document then, since it doesn't follow the
DTD of any version of HTML. The parser appears to parse it as
"tag soup", with gratuitous amounts of error correction. It's
anybody's guess how each browser will treat non-standard codes.
It is written at the bottom of document, for now. I have to find a
way to hid it and still be able to access it.

IE supports "XML islands" inside (otherwise) HTML documents. Other
browsers doesn't. Why use XML at all? If you want to access it with
Javascript anyway, you could just create a Javascript representation
of the data directly (e.g., using a JSON library on the server).
If I put it in as a comment, the getElementById does not find it.

Obviously, since it's not elements in the DOM any more.
I was under the impression that the DOM standard can be used to
access XML and HTML.

The DOM standard doesn't care what the element names of the document
are. The parser that processes the character sequence and turns it
into DOM elements do, since it is expecting HTML. HTML have specific
rules about which elements can be children of which other
elements. E.g., a div element cannot be the child of a paragraph (p)
element. Parsers attempts to be nice to you by accepting any tag
soup, and then do error correction on the document to try to make
it into a valid HTML document. This includes ending elements where
necessary, so the invalid HTML:
<p>lala<div>didi</div>lala</p>
is read as:
<p>lala</p><div>didi</div>lala
where all tags occour in an allowed order.

You are using non-HTML tags. The parser is, understandably, confuzed.
It appears the browser you use guesses unknown elements to be inline
elements, which cannot contain the block level p element.
It seems to work, in part.

That's also known as "not working" :)
/L
 
R

RobG

MaryA said on 29/03/2006 3:57 AM AEST:
It is just sitting in an HTML document. It is written at the bottom of
document, for now. I have to find a way to hid it and still be able to
access it. If I put it in as a comment, the getElementById does not
find it. I was under the impression that the DOM standard can be used
to access XML and HTML. It seems to work, in part.

Just to be clear on what is happening, the browser parses the 'HTML' and
turns it into a DOM. The innerHTML property is a serialisation of the
DOM back to HTML.

innerHTML is not the same as 'view source'. For example, given the
valid HTML source:

<div>foo<p> bar</div>


In Firefox, the innerHTML property of the div is shown as:

foo<p> bar</p>

and in IE as:

foo
<P>bar</P>


Both browsers close the P tags, IE also capitalises the tag names,
inserts a return before the <P> tag and removes the leading whitespace
before 'bar'. So even in this apparently trivial case there are
significant differences between the source and innerHTML property, and
between browsers.

I guess you could say IE is 'right' because innerHTML is a proprietary
Microsoft invention that has been widely copied. It is not defined in
any standard and behaves differently in various browsers. Depending on
it to be supported consistently is not a good idea, though it is very
handy in some situations.
 
M

MaryA

Ok - given what I've read here - I did some more investigating and came
up with this code:
function loadForm()
{
var theString =''<?xml version=\"1.0\"?><aicpcu
id="XMLTEXT"><stem id="STEM" mix="N">This is my stem<p>with 2 lines in
it!</p></stem><options id="OPTIONS" cnt="3"><distracter id="DISTRACTER"
label="A" mix="N">distracter text 1</distracter><distracter
id="DISTRACTER" label="B" mix="N">distracter text
2</distracter><distracter id="DISTRACTER" label="C" mix="N">distracter
text 3</distracter></options></aicpcu>'';
var parser = new DOMParser();
var dom = parser.parseFromString(theString, "text/xml");
var root = dom.getElementsByTagName("aicpcu")[0];
var stem = root.getElementsByTagName("stem");
for (var i = 0 ; i < stem.length ; i++)
{
var stemEl = stem;
var stemText = stemEl.firstChild.nodeValue;
alert(stemText);
}
alert("done");
}

But I still only get the first part of the stem (This is my stem) and
not the "with 2 lines in it" part. What the heck am I doing wrong now?
 
T

Thomas 'PointedEars' Lahn

MaryA said:
Ok - given what I've read here - I did some more investigating and came
up with this code:
function loadForm()
{
var theString =''<?xml version=\"1.0\"?><aicpcu ^^
id="XMLTEXT"><stem id="STEM" mix="N">This is my stem<p>with 2 lines in
it!</p></stem><options id="OPTIONS" cnt="3"><distracter id="DISTRACTER"
label="A" mix="N">distracter text 1</distracter><distracter
id="DISTRACTER" label="B" mix="N">distracter text
2</distracter><distracter id="DISTRACTER" label="C" mix="N">distracter
text 3</distracter></options></aicpcu>''; ^^
var parser = new DOMParser();
if (parser)
{
var dom = parser.parseFromString(theString, "text/xml");
if (dom)
{
var root = dom.getElementsByTagName("aicpcu")[0];
if (root)
{
var stem = root.getElementsByTagName("stem");
for (var i = 0 ; i < stem.length ; i++)

for (var i = 0, len = stem.length; i < len; i++)
{
var stemEl = stem;
var stemText = stemEl.firstChild.nodeValue;
alert(stemText);
}
alert("done"); }
}
}
}

But I still only get the first part of the stem (This is my stem) and
not the "with 2 lines in it" part. What the heck am I doing wrong now?


It is rather astonishing that you get anything at all.

| Error: syntax error
| Source file: javascript:var theString =''<?xml
| version=\"1.0\"?><aicpcuid="XMLTEXT"><stem id="STEM" mix="N">This is my
| stem<p>with 2 lines init!</p></stem><options id="OPTIONS"
| cnt="3"><distracter id="DISTRACTER"label="A" mix="N">distracter text
| 1</distracter><distracterid="DISTRACTER" label="B" mix="N">distracter
| text2</distracter><distracter id="DISTRACTER" label="C"
| mix="N">distractertext 3</distracter></options></aicpcu>'';
| Line: 1, Column: 18
| Source code:
| var theString =''<?xml version=\"1.0\"?><aicpcuid="XMLTEXT"><stem
--------------------^
| id="STEM" mix="N">This is my stem<p>with 2 lines init!</p></stem><options
| id="OPTIONS" cnt="3"><distracter id="DISTRACTER"label="A"
| mix="N">distracter text 1</distracter><distracterid="DIST

Either you have been using the ' character too often, or you should not
use ' and parseFromString() at all, because the E4X InputElementRegExp
literal expression already creates an object.


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top