Null character and JavaScript strings

V

VK

a fork of the thread "FAQ Topic - How can I create a Date object from
a String?"

As it has very little to do with Date and FAQ I moved it into the new
thread.

Thomas Lahn is totally correct that in JavaScript the beginning and
the end of each string literal denoted by single or double quotes. \0
escape sequence (NUL char) is neither one so it cannot denote the end
of string. The logic is clear, straight and feasible
.... yet naïve as a claim that while debugging a JavaScript program one
is exempted from electricity breakouts because ECMA 262 3rd says
nothing about electricity breakouts :)

NUL character (a.k.a. NUL terminator) does terminate strings. And
JavaScript core engine handles strings as NUL terminated character
array. And every NUL byte in a character array is automatically the
end of the string. This outcome was and is simply overlooked in specs,
so till now each UA producer either let it go nuts or patches it on
its own discretion.

For the practical outcome these are easy ways to creatively crash IE,
Safari (except iOS) and Opera. For the peaÑeful programming these are
such subtle cases as
http://www.remotesynthesis.com/post.cfm/Handling-Null-Characters-in-a-String

To facilitate the promised PointedEars job I am giving the test and
its behavior for 99.9% of the current desktop UA market.

<!DOCTYPE html>
<html>
<head>
<title>Demo</title>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8">

<script>

var obj = {'f\0oo' : 'b\0ar'};

for (var x in obj) {
window.alert(''.concat(
'property name = ', x,
'\n',
'string length = ', x.length,
'\n\n',
'obj["f\0oo"] = ', obj['f\0oo']
));
}

</script>

</head>
<body>

<p>Demo</p>

</body>
</html>


Outcome:

[ Windows Vista SP2 ]

Firefox 3.6.13
property name: foo
string length: 4
obj["foo"] = bar

IE 8.0.6001
property name: f
(the rest of alert string is "swallowed" by NUL char, separate alerts
needed)

Safari 5.0.3
property name: f
(the rest of alert string is "swallowed" by NUL char, separate alerts
needed)

Google Chrome 10.0.648.127 beta
property name: f oo
string length: 4
obj["f oo"] = b ar
(NUL interpreted as BLANK SPACE ?)

Opera 11.01
property name: f
(the rest of alert string is "swallowed" by NUL char, separate alerts
needed)


Note: iOS Safari (iPhone, iPad)
property name: foo
string length: 4
obj["foo"] = bar
(so just like Firefox)
 
T

Thomas 'PointedEars' Lahn

VK said:
Thomas Lahn is totally correct that in JavaScript the beginning and
the end of each string literal denoted by single or double quotes. \0
escape sequence (NUL char) is neither one so it cannot denote the end
of string. The logic is clear, straight and feasible
... yet naïve as a claim that while debugging a JavaScript program one
is exempted from electricity breakouts because ECMA 262 3rd says
nothing about electricity breakouts :)

NUL character (a.k.a. NUL terminator) does terminate strings.

That does not follow from your observations.
And JavaScript core engine handles strings as NUL terminated character
array.

How can you possibly know?
And every NUL byte in a character array is automatically the
end of the string.

Yes, but for this to be relevant here the premise must be true first.
This outcome was and is simply overlooked in specs, so till now each UA
producer either let it go nuts or patches it on its own discretion.

Your logic is flawed.
For the practical outcome these are easy ways to creatively crash IE,
Safari (except iOS) and Opera. For the peaÑeful programming these are
such subtle cases as
http://www.remotesynthesis.com/post.cfm/Handling-Null-Characters-in-a
String

To facilitate the promised PointedEars job I am giving the test and
its behavior for 99.9% of the current desktop UA market.

Your figures are wrong. Anyhow, assuming they were right, then you would
have proven yourself wrong in at least 33% of cases (2 out of 6) below.
<!DOCTYPE html>

It would be best to start with an HTML 4.01 document. HTML5 is currently
experimental, and as HTML5 concerns both layout and script engine, through
the DOM, it would be best not to include that variable in the equation.
<html>
<head>
<title>Demo</title>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8">

<script>

See above.
var obj = {'f\0oo' : 'b\0ar'};

for (var x in obj) {
window.alert(''.concat(

Why are you including String.prototype.concat() as yet another variable
in the equation? You should have used the concatenation operator, `+',
instead.
'property name = ', x,
'\n',
'string length = ', x.length,
'\n\n',
'obj["f\0oo"] = ', obj['f\0oo']
));
}

But if you absolutely must call a method, you should have called
Array.prototype.join() to avoid having concatenate "\n".
</script>

</head>
<body>

<p>Demo</p>

</body>
</html>


Outcome:

[ Windows Vista SP2 ]

Firefox 3.6.13
property name: foo
string length: 4
obj["foo"] = bar

This disproves what you have stated above.
IE 8.0.6001
property name: f
(the rest of alert string is "swallowed" by NUL char, separate alerts
needed)

One wonders why you have not simply done so to make sure that it is not
U+0000 that merely causes problems with outputting the next characters.
Are you trying to conceal your misconception here?
Safari 5.0.3
property name: f
(the rest of alert string is "swallowed" by NUL char, separate alerts
needed)

See above.
Google Chrome 10.0.648.127 beta
property name: f oo
string length: 4
obj["f oo"] = b ar
(NUL interpreted as BLANK SPACE ?)

You have still not understood that U+0000 is a Unicode character without a
glyph.
Opera 11.01
property name: f
(the rest of alert string is "swallowed" by NUL char, separate alerts
needed)

See above.
Note: iOS Safari (iPhone, iPad)
property name: foo
string length: 4
obj["foo"] = bar
(so just like Firefox)

This also disproves what you have stated, if we assume in your favor that
there was a universal "JavaScript" language implemented everywhere. (There
is not: Firefox supports Mozilla.org JavaScript; Safari Mobile — which is
the correct name for the browser — supports Apple JavaScriptCore.)


PointedEars
 
V

VK

U+0000 that merely causes problems with outputting the next characters

Are you proposing a new PREVENT_DISPLAY Unicode char? It could be
possible (?) for a free code position, but we are dealing with U+0000
and its NUL character a.k.a. NUL terminator. Here is an updated case
with "sweet heart" text/javascript :)

Use IE or Opera to watch this interesting case of "programmed
schizophrenia": the left hemisphere with quotes and the right
hemisphere with NUL both know exactly what a "string" is so making the
whole body moving convulsively :)

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>NUL in Javascript strings</title>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8">
<script type="text/javascript">

var obj = {'f\0oo' : 'b\0ar'};

for (var x in obj) {
window.alert(x);
window.alert(x.length);
window.alert(x.charCodeAt(1));
window.alert(obj[x] + '|');
}

</script>

</head>
<body>
<p>NUL in Javascript strings</p>
</body>
</html>
 
L

Lasse Reichstein Nielsen

VK said:
Thomas Lahn is totally correct that in JavaScript the beginning and
the end of each string literal denoted by single or double quotes. \0
escape sequence (NUL char) is neither one so it cannot denote the end
of string. The logic is clear, straight and feasible
... yet naïve as a claim that while debugging a JavaScript program one
is exempted from electricity breakouts because ECMA 262 3rd says
nothing about electricity breakouts :)

NUL character (a.k.a. NUL terminator) does terminate strings.

Sometimes it does, sometimes it doesn't. In C it does. In C++,
it might not (std::String isn't NUL-terminated). In ECMAScript
it doesn't.
And JavaScript core engine handles strings as NUL terminated character
array.

No. Since you say JavaScript, I'm guessing you're thinking of either
SpiderMonkey, JaegerMonkey or Rhino, which are JavaScript
implementations.
Neither of these use NUL-termination internally.
I doubt you can find any spec-compliant ECMAScript implementation that
does.
And every NUL byte in a character array is automatically the
end of the string.

In a C string, yes.
This outcome was and is simply overlooked in specs,
so till now each UA producer either let it go nuts or patches it on
its own discretion.

The spec is quite clear. NUL charcaters have no special meaning in
any ECMAScript string operation.
For the practical outcome these are easy ways to creatively crash IE,
Safari (except iOS) and Opera. For the peaÑeful programming these are
such subtle cases as
http://www.remotesynthesis.com/post.cfm/Handling-Null-Characters-in-a-String

It seems some browsers' host functions use NUL-termination internally.
That might be a silly choice, but it's got nothing to do with ECMAScript.

....
var obj = {'f\0oo' : 'b\0ar'};

for (var x in obj) {
window.alert(''.concat(
....

If you check it, I think you'll find that the culprit is the alert
function. The concat function will create strings with embedded
NUL characters without any problem.
Outcome:

[ Windows Vista SP2 ]

Firefox 3.6.13
property name: foo
string length: 4
obj["foo"] = bar

Alert doesn't get confused by embedded NULs.
IE 8.0.6001
property name: f
(the rest of alert string is "swallowed" by NUL char, separate alerts
needed)

Alert does get confused by embedded NULs.
Google Chrome 10.0.648.127 beta
property name: f oo
string length: 4
obj["f oo"] = b ar
(NUL interpreted as BLANK SPACE ?)

It's printed as a blank space. For a non-printable character, I guess
that's a valid choice. Sure beats ^@.

/L
 
V

VK

Sorry do disappoint you. To make things clearer I've made new demo -
see my answer to Thomas.

Oh, "does get", affirmative. You mean other methods with string arg
don't?
 
L

Lasse Reichstein Nielsen

VK said:
Are you proposing a new PREVENT_DISPLAY Unicode char?

What makes you think so?
It could be possible (?) for a free code position, but we are
dealing with U+0000 and its NUL character a.k.a. NUL
terminator.

It's not also known as NUL terminator. The code point is just the
NUL character.
Here is an updated case with "sweet heart" text/javascript :)

Use IE or Opera to watch this interesting case of "programmed
schizophrenia": the left hemisphere with quotes and the right
hemisphere with NUL both know exactly what a "string" is so making the
whole body moving convulsively :)

SO you have figured out that the alert function calls into
non-Javascript code, and that code stops printing the string content
at the first NUL character?
And it has nothing to do with ECMAScript strings?
var obj = {'f\0oo' : 'b\0ar'};

for (var x in obj) {
window.alert(x);
window.alert(x.length);
window.alert(x.charCodeAt(1));
window.alert(obj[x] + '|');
}

Yes, the alert truncates the output at the NUL character. The string
does not.

/L
 
L

Lasse Reichstein Nielsen

VK said:
Oh, "does get", affirmative. You mean other methods with string arg
don't?

I'm sure there are other *host* functions that does the same thing
(e.g., confirm), and probably some DOM functions too (a quick check
shows that
var d = document.createElement("div");
var s = "ab\0cd";
d.setAttribute("class", s);
alert([d.className == "ab", d.className == s, s.length]);
// alerts [true, false, 5] in Opera.

The language-specified string operations work on ECMAScript strings,
and they don't care about NUL-termination.

The external DOM functions (including DOM-0's alert) are not specified
in terms of ECMAScript strings, or (in HTML 5) doesn't say what to do
if the string argument contains non-printable characters.
I.e., the browser behaviors aren't in violation of any specification.

/L
 
J

John G Harris

On Fri, 4 Mar 2011 at 14:33:00, in comp.lang.javascript, VK wrote:

NUL character (a.k.a. NUL terminator) does terminate strings. And
JavaScript core engine handles strings as NUL terminated character
array. And every NUL byte in a character array is automatically the
end of the string. This outcome was and is simply overlooked in specs,
so till now each UA producer either let it go nuts or patches it on
its own discretion.

Outcome:

[ Windows Vista SP2 ]

Firefox 3.6.13
property name: foo
string length: 4
obj["foo"] = bar

IE 8.0.6001
property name: f
(the rest of alert string is "swallowed" by NUL char, separate alerts
needed)
<snip>

All you've proved for sure is that the Windows API for MessageBox uses
C-style strings, terminated by Nul, which we already knew. This is not
very interesting except when you are using alert for debugging or
testing.

Unless you inspect the numeric values of the characters in your strings
you cannot deduce anything about the browser's ECMAScript or DOM
behaviours.

The simplest explanation for the difference between Firefox and IE8 is
that Firefox edits text before sending it to a Windows display API and
IE does not. As there is no standard for alert this difference is not
surprising.


By the way, a simple way to test the Nul behaviour is to use the pseudo
URL :

javascript: var v = { 'fo\0o': 'hello' }; alert( v ['fo\0o' ] );

and appropriate variations on this.


John
 
V

VK

All you've proved for sure is that the Windows API for MessageBox uses
C-style strings, terminated by Nul, which we already knew. This is not
very interesting except when you are using alert for debugging or
testing.

In such case not Windows API but IE, Safari and Opera for Windows
(equal behavior). Their window.alert are too different by LaF and
functionality so I would speculate that here are development teams
differences: some are using direct C calls (IE, Safari, Opera), come C+
+ tier (Firefox, Chrome).

<...>
 
V

VK

In such case not Windows API but IE, Safari and Opera for Windows
(equal behavior). Their window.alert are too different by LaF and
functionality so I would speculate that here are development teams
differences: some are using direct C calls (IE, Safari, Opera), come C+
+ tier (Firefox, Chrome).

And at least Opera 11.02 shows lonely "f" on the page here, so it is
not about Windows API for MessageBox:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>NUL in Javascript strings</title>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8">
<script type="text/javascript">

window.onload = function() {
var out = document.getElementById('out');
var obj = {'f\0oo' : 'b\0ar'};
for (var x in obj) {
out.innerHTML = x;
}
}
</script>

</head>
<body>
<p id="out">NUL in Javascript strings</p>
</body>
</html>
 
T

Thomas 'PointedEars' Lahn

VK said:
Are you proposing a new PREVENT_DISPLAY Unicode char?

No. The original paragraph went:

You really want to stop mutilating quotes. (In German-speaking Usenet
we call this inappropriate behavior "quotemardern", from the Steinmarder
[Beech/Stone Marten, Martes foina] that frequently mutilates our cars.)
[…] we are dealing with U+0000 and its NUL character

It is not. The name for the character U+0000 in the Unicode Character
Database is "NULL".
a.k.a. NUL terminator.

There is no such thing. There is the NUL character, in US-ASCII and its
super-character sets.
Use IE or Opera to watch this interesting case of "programmed
schizophrenia": […]

You do not get it, do you? I am/we are not interested in your
fairy^Wtheories. I am/we are interested in your results, since I/some
cannot test on Windows (ATM). Presenting test code without presenting
results, and without saying where exactly the results can be reproduced,
goes a long way to have the statement ignored as yet another rubbish from
you.


PointedEars
 
T

Thomas 'PointedEars' Lahn

John said:
VK said:
[ Windows Vista SP2 ]

Firefox 3.6.13
property name: foo
string length: 4
obj["foo"] = bar

IE 8.0.6001
property name: f
(the rest of alert string is "swallowed" by NUL char, separate alerts
needed)
<snip>

All you've proved for sure is that the Windows API for MessageBox uses
C-style strings, terminated by Nul, which we already knew. This is not
very interesting except when you are using alert for debugging or
testing.

Unless you inspect the numeric values of the characters in your strings
you cannot deduce anything about the browser's ECMAScript or DOM
behaviours.

The simplest explanation for the difference between Firefox and IE8 is
that Firefox edits text before sending it to a Windows display API and
IE does not. As there is no standard for alert this difference is not
surprising.

An even simpler explanation that is likely to be the correct one here is
that IE uses the Windows API for its chrome while Firefox uses Gecko
instead. Several other issues with IE/MSHTML that do not exist in
Firefox/Mozilla, like that in IE/MSHTML `select' elements always have the
highest z-index in their frame (and are CSS z-index ignorant) because they
are so-called "Windowed Controls", are caused by this difference.


PointedEars
 
V

VK

I/some cannot test on Windows (ATM).

Here a new test case w/o window.alert but with innerHTML and innerText/
textContent. Results are below the test. If you think of another test
case to run then I will do it. For tests I have default insts Windows
Vista SP2 and Windows XP SP3

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>NUL in Javascript strings</title>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8">
<script type="text/javascript">

var text = (/*@cc_on true || @*/ false) ?
'innerText' : 'textContent';

window.onload = function() {

var out11 = document.getElementById('out11');
var out12 = document.getElementById('out12');
var out21 = document.getElementById('out21');
var out22 = document.getElementById('out22');

var obj = {'f\0oo' : 'b\0ar'};

for (var x in obj) {
out11.innerHTML = x;
out12.innerHTML = obj[x];

out21[text] = x;
out22[text] = obj[x];
}

}
</script>

</head>
<body>
<p><span id="out11"></span> = <span id="out12"></span></p>
<p><span id="out21"></span> = <span id="out22"></span></p>
</body>
</html>

Results:

[ Windows Vista SP2 ]

Firefox 3.6.13
[] for "no glyph char" on display
f[]oo = b[]ar
f[]oo = b[]ar

IE 8.0.6001
foo= bar
f= b

Safari 5.0.3
display:
foo = bar
foo = bar
copy-paste:
foo = bar
f

Anything prefixed by NUL is displayed, selectable but cannot be copied
(or can be copied but avoids pasting, God knows). Tried in different
way: a new bulletproof "copy-past protection" but for Safari only.

Google Chrome 10.0.648.127 beta
foo = bar
foo = bar

Opera 11.01
f = b
f = b
 
T

Thomas 'PointedEars' Lahn

VK said:
Here a new test case w/o window.alert but with innerHTML and innerText/
textContent. Results are below the test. If you think of another test
case to run then I will do it.

No, although its methodology is unsurprisingly questionable, I think this
one suffices to prove you wrong. Thanks.


PointedEars
 
A

Alter Ego

VK said:
On Mar 5, 9:53 pm, Thomas 'PointedEars' Lahn <[email protected]>
wrote:

<snip BS and nonsense>

People, for Christ's sake, stop this shit. What's the big issue if a fucking
string is NUL terminated or what? Are you nuts, guys? Hold your peace, have
some beer and stop bloating the Usenet with such crappy threads.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,067
Latest member
HunterTere

Latest Threads

Top