Small question from a rookie

S

Sharon

Hiya
I have a small question, I saw this piece of code somewhere (it's for
creating a customized context menu) and I was wondering: Why is it
that the STYLE and SCRIPT-tags are broken up into parts? I hope
someone can answer my question, thanks! Sharon

html+='<TABLE STYLE="border:1pt solid #808080" BGCOLOR="#CCCCCC"
WIDTH="140" HEIGHT="220" CELLPADDING="0" CELLSPACING="1">';
html+='<ST'+'YLE TYPE="text/css">\n';
html+='a:link {text-decoration:none;font-family:Arial;font-size:8pt;}\n';
html+='a:visited {text-decoration:none;font-family:Arial;font-size:8pt;}\n';
html+='td {font-size:8pt;}\n';
html+='</ST'+'YLE>\n';
html+='<SC'+'RIPT LANGUAGE="JavaScript">\n';
html+='\n<'+'!--\n';
html+='window.onerror=null;\n';
html+='/'+' -'+'->\n';
html+='</'+'SCRIPT>\n';
html+='<TR><TD STYLE="border:1pt solid #CCCCCC"
ONMOUSEOVER="this.style.background=\'#CFD6E8\';this.style.border=\'1pt
solid #737B92\';" ONMOUSEOUT="this.style.background=\'#CCCCCC\';this.style.border=\'1pt
solid #CCCCCC\';" ONCLICK="window.history.go(-1);">&nbsp;Filter op:'+
Value + '</TD></TR>';
html+='</TABLE>';
 
K

Klaus Johannes Rusch

Sharon said:
I have a small question, I saw this piece of code somewhere (it's for
creating a customized context menu) and I was wondering: Why is it
that the STYLE and SCRIPT-tags are broken up into parts?

Some browsers would take any occurance of "<script" as the start tag of
another script and assume that the previous script element had not been
closed.

Writing "<script" as "<sc" + "ript" (or similar) stops browsers from
incorrectly interpreting the element in the Javascript code.
 
L

Lee

Sharon said:
Hiya
I have a small question, I saw this piece of code somewhere (it's for
creating a customized context menu) and I was wondering: Why is it
that the STYLE and SCRIPT-tags are broken up into parts? I hope
someone can answer my question, thanks! Sharon

A web page is parsed by a routine that knows how to
parse HTML code, looking for tags like <SCRIPT> and
<STYLE> and when it finds a <SCRIPT> tag, it passes
the contents off to a separate routine to interpret
the script.

The author was worried that the HTML parser would be
confused by HTML tags inside the <SCRIPT> tag.
 
R

Richard Cornford

Sharon said:
I have a small question, I saw this piece of code somewhere
(it's for creating a customized context menu)

Test that with an Opera browser and it might not seem like such a good
idea.
and I was wondering: Why is it
that the STYLE and SCRIPT-tags are broken up into parts?

The script tags are broken into parts because when a browser encounters
a script element on a page it needs to know how much of what follows to
send to the javascript interpreter; it needs to know where the closing
SCRIPT tag is, so it looks for one. But HTML parsers don't understand
javascript (just HTML) so when they see - "</script>" - in a javascript
string they assume that it represents the end of the SCRIPT element.

They send everything before the "</script>" to the javascript
interpreter, which errors with an unterminated string, and treat
everything that follows as page content. Bad news and best avoided.

The reason STYLE tags (and opening script tags) are also broken in this
script is that whoever wrote it didn't understand what they were doing,
or why, and resorted to programming by mystical incantation to deal with
a problem that they had perceived by misidentified.

In practice it is only necessary to prevent the HTML parser from
recognising the closing script tag but string concatenation is not the
best way of doing so as it is a relatively heavyweight operation and
unnecessary. Inserting a javascript escape character "\" into the
sequence "</script>" would render it unrecognisable as a closing script
tag to the HTML parse.

However, by HTML specification the closing script tag is not the only
one that should cause problems, though browsers seem to be universally
implemented such that it is. Theoretically the character sequence "</"
alone should mark the end of any script element so it is that sequence
that needs to be unrecognisable within javascript strings to truly
achieve safety.

html+='</'+'SCRIPT>\n';
^^
<snip>

That is usually done by inserting the javascript escape character
between the two to produce "<\/". And it should be done to any closing
HTML tag appearing in a javascript string on an HTML page (imported
javascript does not have this problem because it is never looked at by
the HTML parser). The obscured closing script tag would become
"<\/script>".

Richard.
 
L

Lasse Reichstein Nielsen

Klaus Johannes Rusch said:
Some browsers would take any occurance of "<script" as the start tag
of another script and assume that the previous script element had not
been closed.

Do you have an example of any browser that works like that?

AFAIK, the only problem is with the first occurence of "</" after a
"<script>" tag marking the end of the script (in practice browsers
only end it at "</script"). The solution recommended in the HTML
specification is to write "<\/" instead of "</". It results in the
same Javascript string value, but is not parsed the same by the
HTML parser.

/L
 
K

Klaus Johannes Rusch

Lasse said:
Do you have an example of any browser that works like that?

AFAIK, the only problem is with the first occurence of "</" after a
"<script>" tag marking the end of the script (in practice browsers
only end it at "</script"). The solution recommended in the HTML
specification is to write "<\/" instead of "</". It results in the
same Javascript string value, but is not parsed the same by the
HTML parser.

Casual testing on a few browsers did not show any problems with the
start tag so you are probably right that only the end tag needs to be
modified, although it doesn't hurt (except for the few extra bytes) to
split both the start and the end tag, just in case :)

"<\/script>" works just as well of course.
 
T

Thomas 'PointedEars' Lahn

Richard said:
The script tags are broken into parts because when a browser
encounters a script element on a page it needs to know how much of
what follows to send to the javascript interpreter; it needs to know
where the closing SCRIPT tag is, so it looks for one. But HTML
parsers don't understand javascript (just HTML) so when they see -
"</script>" - in a javascript string they assume that it represents
the end of the SCRIPT element.

They send everything before the "</script>" to the javascript
interpreter, which errors with an unterminated string, and treat
everything that follows as page content. Bad news and best avoided.

It is not only "</script>" that causes problems and not only the
SCRIPT element where this peculiarity of HTML/SGML should be watched
for.
The reason STYLE tags (and opening script tags) are also broken in
this script is that whoever wrote it didn't understand what they were
doing, or why, and resorted to programming by mystical incantation to
deal with a problem that they had perceived by misidentified.

For an SGML parser (that should be used for HTML as HTML is an SGML
application), *any* ETAGO (End TAG Open) delimiter ("</") is considered
the end of CDATA (a start tag of an element should not; if it is, it
would be borken parser behavior). The contents of the HTML[1] SCRIPT
element is defined as CDATA (character data; opposed to PCDATA -- parsed
character data), so *every* ETAGO delimiter (including those of end tags
of other elements than the SCRIPT element) in it must be escaped (using
the means the embedded language provides; in ECMAScript/J[ava]Script
that is string splitting or, better in string literals, string escaping)
to prevent the element from ending to early. However, the end tags have
been splitted at exactly the wrong place here.


PointedEars
___________
[1] That is different in XHTML, where it is defined as PCDATA, so you
are wise to use a <![CDATA[ ... ]]> declaration there for parts
that should not be parsed by the XML parser (e.g.: "y=x&&deg;
=°=> y=x&°;"); usually you would (re-)declare the whole script
as CDATA.
 
R

Richard Cornford

Thomas said:
Richard Cornford wrote:

It is not only "</script>" that causes problems and not ^^^^^^
only the SCRIPT element where this peculiarity of HTML/SGML
should be watched for.

When you write "causes" you are implying that you are aware of a browser
which does follow the letter of the HTML specification. If so you should
name it because all previous discussion on this subject has failed to
reveal a single instance of a web browser that is interested in any
closing tag other than said:
For an SGML parser (that should be used for HTML as HTML
is an SGML application), *any* ETAGO (End TAG Open) delimiter
("</") is considered the end of CDATA (a start tag of an element
should not; if it is, it would be borken parser behavior).

Why have you cut the sections of my original post that said:-

| However, by HTML specification the closing script tag is not
| the only one that should cause problems, though browsers seem
| to be universally implemented such that it is. Theoretically
| the character sequence "</" alone should mark the end of any
| script element so it is that sequence that needs to be
| unrecognisable within javascript strings to truly achieve
| safety.

- and -

| That is usually done by inserting the javascript escape character
| between the two to produce "<\/". And it should be done to any
| closing HTML tag appearing in a javascript string on an HTML page
| (imported javascript does not have this problem because it is never
| looked at by the HTML parser). The obscured closing script tag
| would become "<\/script>".

- and then re-produced essentially the same information if a form that
appears to be intended as a correction?
The contents of the HTML[1] SCRIPT element is
defined as CDATA (character data; opposed to PCDATA -- parsed
character data), so *every* ETAGO delimiter (including those of end
tags of other elements than the SCRIPT element) in it must be escaped
<snip> ^^^^

"Must" implies necessity and without a single browser actually caring
about closing tags other than </script> there is no necessity. The most
that can be accurately said is "should" (and then it should be done out
of a desire for valid HTML). Though there doesn't seem much point in
saying anything when it has already been said in the post you are
replying to (and a couple of others in this thread).

Richard.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top