innerHTML problem in IE6

K

Kiran Makam

I am setting the content of a div dynamically using innerHTML
property. If the content contains an ampersand, text after the
ampersand is disappearing in IE6. It works properly in Firefox.

This is my code:
----------------
<body>

<div id='div1'></div>
<script>
var div = document.getElementById('div1');
div.innerHTML = "A&B";
</script>

</body>
---------------

IE6 renders the content of div1 as 'A'
Firefox renders the content properly as 'A&B'

If there is a space after ampersand, IE6 renders it properly. So I
think that IE is assuming anything after ampersand as an HTML entity
( like &nbsp; ).

Is this a bug in IE6? Is there any workaround for this?

Thanks
Kiran Makam
 
J

Jonathan N. Little

Kiran said:
I am setting the content of a div dynamically using innerHTML
property. If the content contains an ampersand, text after the
ampersand is disappearing in IE6. It works properly in Firefox.

This is my code:
----------------
<body>

<div id='div1'></div>
<script>
var div = document.getElementById('div1');
div.innerHTML = "A&B";
</script>

Try:

div.innerHTML = "A&amp;B";
 
L

Lars Eighner

In our last episode,
<[email protected]>,
the lovely and talented Kiran Makam
broadcast on alt.html:
I am setting the content of a div dynamically using innerHTML
property. If the content contains an ampersand, text after the
ampersand is disappearing in IE6. It works properly in Firefox.
This is my code:
<div id='div1'></div>
<script>
var div = document.getElementById('div1');
div.innerHTML = "A&B";
</script>

IE6 renders the content of div1 as 'A'
Firefox renders the content properly as 'A&B'
If there is a space after ampersand, IE6 renders it properly. So I
think that IE is assuming anything after ampersand as an HTML entity
( like &nbsp; ).
Is this a bug in IE6?

No, it is a bug in your markup. & should always be &amp; The browser is
entitled to suppose any string starting with & is an attempt at a character
entity. It may be that FF has a better error correction ability, but
you can't blame a browser for how it handles errors.
Is there any workaround for this?

Yes. Enter & as &amp;
 
H

Harlan Messinger

Lars said:
In our last episode,
<[email protected]>,
the lovely and talented Kiran Makam
broadcast on alt.html:






No, it is a bug in your markup. & should always be &amp; The browser is
entitled to suppose any string starting with & is an attempt at a character
entity. It may be that FF has a better error correction ability, but
you can't blame a browser for how it handles errors.


Yes. Enter & as &amp;

To clarify for the original poster: this isn't a workaround, it's the
proper way to escape the ampersand in HTML when it's being used as a
literal instead of in its special role as first character in an entity code.
 
J

Jukka K. Korpela

As so often, a URL would have been needed, even for an apparently trivial
piece of code. Experienced authors know this, and others should just believe
it. :)

The markup is invalid due to lack of required type="..." attribute, but this
is really just a formality. More importantly, we don't know whether this is
supposed to be HTML or XHTML and how it has been served.
No, it is a bug in your markup.

Whether the markup is correct depends on whether this is HTML or XHTML. In
HTML, the content model of <script> is CDATA, which means that entity
references are not recognized, so "&B" means just the character "&" followed
by the character "B". In XHTML, the content model is #PCDATA, in which
case...
& should always be &amp;

.... or something equivalent.
The browser is entitled to suppose any string starting with & is an
attempt at a character entity.

No, not in HTML when inside <script> (or <style>). Otherwise, it is
_required_ to treat "&" as potentially starting an entity reference or a
character reference. Error processing rules are then different for different
situations and flavors of HTML. In HTML 4.01, "&B" must be parsed as an
entity reference, but since no such entity has been defined, we're in the
error processing area, and treating "&" as a data character is conventional
in browsers in such cases. In XHTML, "&B", when not followed by a semicolon
(possibly after some name characters) is a well-formedness violation and XML
processors should simply report an error and refuse to display the document
at all.

Note: There are no grounds for assuming &B to be a "character entity" in any
flavor of HTML. The pseudo-term "character entity" is, at best, shorthand
for "entity reference that happens to evaluate to a one-character string".
The entity reference &B does not evaluate to anything; it is undefined.

Confused? Fine. Just outsource the script, avoiding the mess!
It may be that FF has a better error
correction ability, but
you can't blame a browser for how it handles errors.

Oh we can, both on practical grounds and, in some cases, on formal grounds.
Yes. Enter & as &amp;

The best way to solve the problem is to put the script in an external file
and reference it via <script type="text/javascript" src="foo.js"></script>

Yucca
 
J

Jukka K. Korpela

Ben said:
In markup like:

<script>
div.innerHTML = "A&B";
</script>

"A&B" is certainly inside a script element. But is it also inside a
<div> element?

A tricky question, which I tried to avoid. In terms of HTML specifications,
it is not inside any <div> element, since whatever happens via scripting is
outside the scope of those specs.

As http://msdn.microsoft.com/en-us/library/ms533897.aspx says so eloquently,
"There is no public standard that applies to this [innerHTML] property".
That vendor-specific page says:
"When the innerHTML property is set, the given string completely replaces
the existing content of the object. If the string contains HTML tags, the
string is parsed and formatted as it is placed into the document."

I think it is fair to read this so that they promise to parse the content as
HTML. This in turn means that &B would be detected as undefined entity
reference. If, on the other hand, A&amp;B were used, then it would be first
parsed (as <script> element content, assuming HTML 4.01 rules) as such, and
the second parsing would recognize &amp; as a reference that denotes the &
character. But they don't say exactly how the parsing works.
We can imagine that the browser recursively enters its HTML parser to
evaluate innerHTML,

Why, oh why, do people speak of recursion when they mean iteration?
so its HTML parser will see something like this:

<div>
A&B
</div>

where it would be required to treat & as potentially starting an
entity reference or a character reference as you say.

No, I don't think it sees any <div> tag. It is parsing the string "A&B", and
I agree with the idea that here "&" should be treated as a special
character, here starting an entity reference. But the widely accepted
fallback for undefined entity references is to treat them "literally", i.e.
as if e.g. "&B" were really defined to mean "&B".

Yucca
 
J

Jukka K. Korpela

Ben said:
I don't know why people would do that. I only speak of recursion when
I mean recursion.

Which recursion is involved when a browser, having parsed HTML data, starts
interpreting it, finds some client-side script code, executes it, then
starts parsing the data that results from the execution? (In this case, as
so often, the generation of that data is trivial, since it is a string
constant, but that's irrelevant here.) Answer: There is no recursion
involved. The parsing was finished long before the script execution started,
at the logical level at least, and then new parsing was initiated. It's
really not even iteration, except in a trivial sense.

Parsing HTML could itself be recursive (i.e., a parser routine might call
itself), and that would be natural in a sense since HTML is defined
recursively. But tag soup slurpers don't do that, and generally, recursive
parsing is less efficient than non-recursive parsing.

Yucca
 
S

Sherm Pendley

Ben C said:
How would you parse HTML more efficiently than by using recursive
parsing?

I don't know about other parsers, but Expat uses callback functions
that it calls when it finds an opening tag, closing tag, text node,
comment, etc. It's event driven, not recursive - the parser function
never calls itself.

sherm--
 
N

Neredbojias

I said we can "imagine" that the browser recursively enters its HTML
parser. I'm not talking about particular implementations, although I see
no reason why they wouldn't use recursion here.

From the Neredbojias dictionary:

Recursion - The proximate deployment of more than one swear word, any of
which is not phraseologically related to the others.

Iteration - Improper or excessive use of a pronoun.

Hope that clears this up.
 
J

Jukka K. Korpela

Ben said:
I said we can "imagine" that the browser recursively enters its HTML
parser.

There's no reason to imagine anything more complex than I described.
I'm not sure what you mean by "interpreting" HTML data.

Processing it by some semantic rules, such as the rule that <script> element
content is script code that needs to be passed to a script interpreter. This
is something that can only be performed after the element has been parsed.
The basic
operation here is to build a DOM tree out of HTML.

That's irrelevant. The point is that the HTML markup _has been parsed_, and
then you start doing something else. If you will then start parsing HTML
again, it ain't no recursion. It's just another instance of parsing.
Who cares about tag soup slurpers or knows what the hell they do?

The innerHTML construct is all about tag slurpers, existing browsers, not
ideal browsers as defined in specifications.
How would you parse HTML more efficiently than by using recursive
parsing?

Browsers have done that for years. You just look at tags and turn them to
actions. You see <strong>, you start bolding. You see </strong>, you turn
bolding off. There are browser features that resemble structural processing,
and newer browsers might even be good at it, but in fact structural
processing can be performed by using explicit stacks, instead of the
implicit stacking involved in recursion.

I could write a nonrecursive HTML parser for you, but then I would have
to... charge you for it.

Yucca
 
J

Jukka K. Korpela

Ben said:
OK, but the <script> element has to be interpreted before elements
after it in the source are.

Not at all. Actually, it need not be interpreted at all. Browsers may well
ignore the content of <script> elements, and they often do, but they still
need to _parse_ them (if not for anything else, in order to recognize the
end of the element).
You're presupposing an unnecessarily complicated implementation.

No, I'm just describing what happens conceptually. A parser is a parser even
if integrated into a grotesquely large program.
You're saying the program looks something like this:

No, I'm not saying anything about timing, such as processing some part of an
HTML document while the rest is still being parsed. Running a parser and a
script interpreter in parallel does not imply that if the script interpreter
invokes another instance of the parse, it would be some kind of recursion.

So you _are_ confusing recursion with iteration, or actually mere new
invocation - as many people do.
Yes I realize Microsoft invented innerHTML, but OperaFirefoxSafari
implement it and they are not tag slurpers.

They slurp tags more than you'd think. Check out whether they are still
Yes everyone knows that. But it's normal when describing an algorithm
to say it is "recursive" even if when you come to implement it you
avoid actually writing a function that calls itself.

There's no recursive algorithm involved in the handling of innerHTML.

Yucca
 
S

Sherm Pendley

Ben C said:
Indeed, and neither does the tree builder you implement in the
callbacks-- it has to either maintain an explicit stack or use parent
pointers on the tree nodes it is generating.

But none of that is any more efficient than doing it recursively, it's
just one way of trying to separate things.

It's not faster, but I'd say it's more memory-efficient. Instead of a
deep call stack + your data tree, you have just the tree. And it's
easier for a lot of programmers to understand - for some reason, a lot
of people have trouble with recursion.

sherm--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top