"heredoc" in javascript

M

Matt Kruse

In my Greasemonkey code written specifically for Firefox, I use this
"heredoc" syntax a lot:

var myBigBlob = (<><![CDATA[

... insert a bunch of free-form text here ...

]]></>).toString();

Now that Chrome offers built-in Greasemonkey support, I'd like to
support it as well, but it breaks on this syntax.

Does anyone know of a better method that will work in both browsers?

Matt Kruse
 
T

Thomas 'PointedEars' Lahn

Matt said:
In my Greasemonkey code written specifically for Firefox, I use this
"heredoc" syntax a lot:

var myBigBlob = (<><![CDATA[

... insert a bunch of free-form text here ...

]]></>).toString();

Now that Chrome offers built-in Greasemonkey support, I'd like to
support it as well, but it breaks on this syntax.

Does anyone know of a better method that will work in both browsers?

You mean in both *ECMAScript implementations* (Mozilla.org Gecko's
SpiderMonkey/TraceMonkey, and Google Chrome's V8).

The syntax you have used to date apparently makes use of a bug in Gecko's
ECMAScript for XML (E4X, ECMA-357) implementation. V8 breaks on this
syntax because syntactically valid ECMAScript for XML is expected, and `<>'
is not well-formed XML:

,-<http://www.w3.org/TR/REC-xml/#sec-starttags>
|
| [40] STag ::= '<' Name (S Attribute)* S? '>'
| [...]
| [5] Name ::= NameStartChar (NameChar)*

(As you can see, the `Name' is not optional.)

In particular, `<>' cannot be produced by the following productions in E4X,
section 8.3:

| InputElementXMLTag ::
| XMLTagCharacters
| XMLTagPunctuator
| XMLAttributeValue
| XMLWhitespace
| {
| [...]
| XMLTagCharacters ::
| SourceCharacters but no embedded XMLTagPunctuator
| or left-curly { or quote ' or double-quote " or forward-slash /
| or XMLWhitespaceCharacter
| [...]
| XMLTagPunctuator :: one of
| =
| >
| />

Therefore, the following should work with both implementations:

var myBigBlob = (<foo><![CDATA[
...
]]></foo>).toString();

(see also ECMA-357, 10.1.1). This also points out that the text in-between
is _not_ free-form; it must not contain `]]>' if the code following it is
not well-formed.


PointedEars
 
M

Matt Kruse

You mean in both *ECMAScript implementations* (Mozilla.org Gecko's
SpiderMonkey/TraceMonkey, and Google Chrome's V8).

blah blah blah
The syntax you have used to date apparently makes use of a bug in Gecko's
ECMAScript for XML (E4X, ECMA-357) implementation.  V8 breaks on this
syntax because syntactically valid ECMAScript for XML is expected
No.

Therefore, the following should work with both implementations:
  var myBigBlob = (<foo><![CDATA[
    ...
  ]]></foo>).toString();

Did you try it? It doesn't work.

Matt Kruse
 
T

Thomas 'PointedEars' Lahn

Matt said:
blah blah blah

I like you, too.

Then I am at a loss to explain why it breaks in Chrome, unless V8 would not
support E4X.
Therefore, the following should work with both implementations:
var myBigBlob = (<foo><![CDATA[
...
]]></foo>).toString();

Did you try it?

No. I would have worded my reply differently if I had the opportunity to
try it, wouldn't I?
It doesn't work.

Perhaps "it doesn't work" (read the FAQ Notes about proper error reporting)
because Chrome's V8 would not support E4X to begin with, and Greasemonkey
would need to use the underlying scripting language. In that case you
would need to find another way, like escaped newlines in string literals,
should V8 support them already.


PointedEars
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
Then I am at a loss to explain why it breaks in Chrome, unless V8 would not
support E4X.

It doesn't. Nor does SquirrelFish, Trident or Carakan (so far).
E4X is, afaik, only implemented by Mozilla and Adobe.

And XML sucks, so I hope it stays that way! :)

/L 'If you are writing XML by hand, you are doing something wrong!'
 
G

Garrett Smith

Matt said:
On Feb 2, 12:23 pm, Thomas 'PointedEars' Lahn <[email protected]>
wrote:
[snip]
Therefore, the following should work with both implementations:
var myBigBlob = (<foo><![CDATA[
...
]]></foo>).toString();

Did you try it? It doesn't work.

A SyntaxError should be expected, right?

In Chrome I get the error:
Refused to execute a JavaScript script. Source code of script found
within request.

Tracemonkey/Spidermonkey parses the XML and executes the code. It fails
to correctly produce a syntax error, as specified in Ecma-357, and fails
on that account.

http://code.google.com/p/chromium/issues/detail?id=30975
 
T

Thomas 'PointedEars' Lahn

Lasse said:
It doesn't.

I have since found a Chrome 4.0.249.43 package for Debian and can confirm
it for its V8.
Nor does SquirrelFish, Trident or Carakan (so far).

By contrast, Trident is not a script engine. But JScript 5.x does not
support E4X, indeed.
E4X is, afaik, only implemented by Mozilla and Adobe.
ACK

And XML sucks, so I hope it stays that way! :)

/L 'If you are writing XML by hand, you are doing something wrong!'

You are joking, right?


PointedEars
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
Lasse Reichstein Nielsen wrote:
By contrast, Trident is not a script engine. But JScript 5.x does not
support E4X, indeed.

Ah, yes. JScript ofcourse.
You are joking, right?

No, not really.
XML is needlessly verbose and yet not very readable. It's fine for
interoperability between systems, as a serialization format for
structured data. It's the least common denominator of structured
serialization formats. But writing it by hand, or even just editing,
should only be done for debugging purposes. At all other times, use
an editor that actually understands the data being edited, and create
the XML from that.
And that's just for plain XML. If it has to match a schema too, it
just got even easier to make an editing mistake.

/L
 
M

Michael Haufe (\TNO\)

The syntax you have used to date apparently makes use of a bug in Gecko's
ECMAScript for XML (E4X, ECMA-357) implementation.  V8 breaks on this
syntax because syntactically valid ECMAScript for XML is expected, and `<>'
is not well-formed XML:

"<>"..."</>" is an XMLList initializer described under ECMA-357
section 11.1.5, so I don't see how its a bug.
 
T

Thomas 'PointedEars' Lahn

Michael said:
"<>"..."</>" is an XMLList initializer described under ECMA-357
section 11.1.5, so I don't see how its a bug.

Thank you, that is correct. Obviously now, E4X is more complex than
I/we thought. The relevant productions here are:

| PrimaryExpression :
| PropertyIdentifier
| XMLInitialiser
| XMLListInitialiser
|
| [...]
| XMLListInitialiser :
| < > XMLElementContent_opt </ >
|
| [...]
| XMLElementContent :
| XMLMarkup XMLElementContent_opt
| XMLText XMLElementContent_opt
| XMLElement XMLElementContent_opt
| { Expression } XMLElementContent_opt
|
| [...]
| XMLMarkup ::
| XMLComment
| XMLCDATA
| XMLPI
|
| [...]
| XMLCDATA ::
| <![CDATA[ XMLCDATACharacters_opt ]]>
|
| XMLCDATACharacters ::
| SourceCharacters but no embedded sequence ]]>

And

| 10.1.2 ToString Applied to the XMLListType
|
| [...]
| Note that the result of calling ToString on a list of size one is
| identical to the result of calling ToString on the single item
| contained in the XMLList. This treatment intentionally blurs the
| distinction between a single XML object and an XMLList containing
| only one value to simplify the programmer’s task. It allows E4X
| programmers to access the value of an XMLList containing only a
| single primitive value in much the same way they access object
| properties.

makes sure that if E4X were supported, Matt's approach would work.
In fact, it should even be possible to omit the parentheses (WFM
in Gecko/TraceMonkey 1.9.1.6).


PointedEars
 
T

Thomas 'PointedEars' Lahn

Michael said:
<offtopic>

It's not.
Of course, E4X in its present state is considered a mistake by many
in its current form.

By whom, and do their voices count?
I wouldn't be surprised if there are no other
implementations in the future.

I would:
engine/browse_thread/thread/6566b430328bc3ef/9f9005c6525464c6

That's *one* and the same voice (Brendan Eich) in both discussions. Some
with some weight no doubt, but none that provides indication that other
vendors would not be implementing what is already implemented elsewhere (in
JavaScript and ActionScript). In fact, the more it is used in Mozilla, and
cannot be used elsewhere (as in this Greasemonkey example), the greater the
pressure for other vendors to do something about that.


PointedEars
 
T

Thomas 'PointedEars' Lahn

Stefan said:
Matt said:
In my Greasemonkey code written specifically for Firefox, I use this
"heredoc" syntax a lot:

var myBigBlob = (<><![CDATA[

... insert a bunch of free-form text here ...

]]></>).toString();

Now that Chrome offers built-in Greasemonkey support, I'd like to
support it as well, but it breaks on this syntax.

Does anyone know of a better method that will work in both browsers?

Sorry, I don't. I've always wondered why multi-line string literals
aren't possible in JavaScript. AFAICS, there's nothing ambiguous about
this syntax:

var heredoc = "I am a multi-line
string; I end when the tokenizer
sees a double quote";

Nothing ambiguous? To begin with, should that be the equivalent of

var heredoc = "I am a multi-line string; I end when the tokenizer sees a
double quote";

or

var heredoc = "I am a multi-linestring; I end when the tokenizersees a
double quote";

? Besides, you forgot to consider `\"' in your description.
This is standard practice in many other programming languages.

All other programming languages that I know to support this except PHP
require special syntax there. So which "many other programming languages"
are you referring to?
Maybe ES4 had something like this in the queue, but with ES5 geared
towards backwards compatibility, we probably won't see it for a long
time.

You are not making sense, see below.
The two usual workarounds are

var str = "I wish\n"
+ "I was\n"
+ "a multi-line string";

Joining an Array of strings on newlines appears to be easier to maintain:

var str = [
"I wish",
"I was",
"a multi-line string"
].join("\n");
and

var str = "I am the closest thing\
to multi-line strings that we can get\
with JavaScript";

That is not equivalent, of course, as the leading spaces are part of the
string value.
I don't know how well supported the second version is.

I have a fair idea. My tests indicate that this feature first specified
in ECMAScript Edition 5 is supported since JavaScript 1.8.1 (at least),
JScript 5.1.5010, V8 1.3 (at least), JavaScriptCore 525.13 (at least),
Opera ECMAScript 10.10 (not before), and KJS 4.3.2 (at least).

The next release of the ECMAScript Support Matrix will feature that test
case.


PointedEars
 
J

John G Harris

I've always wondered why multi-line string literals
aren't possible in JavaScript. AFAICS, there's nothing ambiguous about
this syntax:

var heredoc = "I am a multi-line
string; I end when the tokenizer
sees a double quote";

This is standard practice in many other programming languages.
<snip>

In C++ you would write

var heredoc = "I am a multi-line "
"string; I end when the tokenizer "
"sees a double quote";

or

var heredoc = "I am a multi-line\n"
"string; I end when the tokenizer\n"
"sees a double quote";

Note the double quotes at both ends of the string parts. This way, you
know exactly which whitespace chars belong to the string parts and which
are just there for layout.

John
 
T

Thomas 'PointedEars' Lahn

Stefan said:
Fair enough, "unescaped double quote", then.

The example was meant to include all contained white space. Its result
would be "(...) multi-line\n string (...)", because it was indented.

ACK. Sorry, trimming the leading spaces when quoting is a bug of my
newsreader that I should have been aware of.
Some examples are Perl, PHP, Ruby, and Bash scripts. There are others.

Perl and Ruby I do not know well enough. Bash (in fact, all Bourne-shell
compatible shells, if I am not mistaken) I know well, but I forgot about
them.

Still it remains to be seen if there are enough programming languages that
do not require special syntax to justify your "many". For example, it does
not apply to the following languages I know rather well: BASIC (and
variants), Pascal (variants, and derivates), C (variants, and derivates),
Tcl (and derivates), Java, and Python. (And maybe I forgot some.)
The two usual workarounds are

var str = "I wish\n"
+ "I was\n"
+ "a multi-line string";

Joining an Array of strings on newlines appears to be easier to
maintain:

var str = [
"I wish",
"I was",
"a multi-line string"
].join("\n");

I guess that's a matter of preference. I think the concatenation is more
readable.

If you align `=' and `+', but you waste characters then and have to take
care not to forget a trailing "\n".
It's also faster,

It is not generally faster.
and concatenation of literals can be optimized away by the parser.
True.


It's also different text, so of course they aren't equivalent. The
spaces are unimportant, I was talking about the line breaks.

But spaces are important in that to have equivalent value (regardless of
the words) you need to write

var str = "I am the closest thing\
to multi-line strings that we can get\
with JavaScript";

which is harder readable.
I've never seen a browser where it didn't work,

It does not appear to work in Opera before version 10.10, so if what you
say is true, you must have never tested with Opera (10.10 is currently the
latest version).

In which browsers (other than supporting the implementations above) do you
remember this to have worked, i.e. that

var s = "foo\
bar";

would be equivalent to

var s = "foobar";

?


PointedEars
 
T

Thomas 'PointedEars' Lahn

John said:
<snip>

In C++ you would write

var heredoc = "I am a multi-line "
"string; I end when the tokenizer "
"sees a double quote";

No, that is not C++ code.
or

var heredoc = "I am a multi-line\n"
"string; I end when the tokenizer\n"
"sees a double quote";

Nor is this.

Perhaps you meant

std::string heredoc = "I am a multi-line"
" string; I end when the tokenizer"
" sees a double quote";
Note the double quotes at both ends of the string parts. This way, you
know exactly which whitespace chars belong to the string parts and which
are just there for layout.

And by contrast it requires *special syntax*, so Stefan's assumption does
_not_ apply here. Thank you for the counter-example, albeit a bogus one.


PointedEars
 
J

John G Harris

No, that is not C++ code.

You time-waster. It's perfectly valid C++ code. var is obviously a
type-name, for a type having a constructor with the signature
var(const char * )

Nor is this.
ditto.


Perhaps you meant

std::string heredoc = "I am a multi-line"
" string; I end when the tokenizer"
" sees a double quote";

That's another example, but not the one I was using.

And by contrast it requires *special syntax*, so Stefan's assumption does
_not_ apply here. Thank you for the counter-example, albeit a bogus one.

But Stefan was suggesting additional special syntax!

Score adjusted.

John
 
T

Thomas 'PointedEars' Lahn

John said:
You time-waster.

Pot, kettle, black.
It's perfectly valid C++ code.

No, it is not.
var is obviously a type-name, for a type having a constructor with the
signature
var(const char * )

You really are pitiable.

Given the existence of preprocessor macros, almost any piece of junk can be
made to compile by a C++ compiler provided there are further definitions
and declarations. Your code *as it is*, however, is clearly not C++ code;
that is, code that compiles *as it is* (in a main() function) without
syntax error messages.

Get a life.
That's another example, but not the one I was using.

Yes, and by contrast this example compiles as it is (in a main() function).
But Stefan was suggesting additional special syntax!

No, he suggested that there are "many other programming languages" that
support simple string literals with newlines in them.
Score adjusted.

YMMD.


PointedEars
 
T

Thomas 'PointedEars' Lahn

kangax said:
Thomas said:
Stefan said:
[...]
var str = "I am the closest thing\
to multi-line strings that we can get\
with JavaScript";

That is not equivalent, of course, as the leading spaces are part of the
string value.
I don't know how well supported the second version is.

I have a fair idea. My tests indicate that this feature first specified
in ECMAScript Edition 5 is supported since JavaScript 1.8.1 (at least),
JScript 5.1.5010, V8 1.3 (at least), JavaScriptCore 525.13 (at least),
Opera ECMAScript 10.10 (not before), and KJS 4.3.2 (at least).

It certainly does work in Opera <10.10.

If by that you mean that it works in some versions before Opera 10.10,
then you are correct. It certainly does not work in all of them.
[...]
However resulting string definitely varies among implementations. In
Opera 7.54, for example:

var x = 'a\
b';

x === 'a\u000Ab'; // true

I rest my case. The correct value would have been `false'.
Note how there's a line feed (U+000A) in between 2 characters.

Tell me something I don't know yet.
In Opera 8.01 (and later), `x` already evaluates to 'ab'.

Interesting. Apparently it has worked for a short time in-between the
versions I have tested this feature so far (5.02, 6.06, 10.10). More fine-
tuned testing is clearly indicated. However, there can be no doubt that
this feature is not safe for general use for the time being.


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,602
Members
45,182
Latest member
BettinaPol

Latest Threads

Top