Thomas 'PointedEars' Lahn said:
Lasse Reichstein Nielsen wrote:
That is the flaw in your argumentation. You distinguish between begin and
end of the process where you should distinguish between input/output pairs.
There are languages, if seen as a whole, that are not either compiled or
interpreted but both (first) compiled and (then) interpreted, as it is e.g.
with JS, and with Perl (ref. `man 1 perlcompile').
As you later point out, that would be a compiler for Perl and an
interpreter for Perl parse trees. I say it's just Perl having
a parser for Perl available that is also the same parser being
used by the Perl interpreter.
But that is beside my point. There are many ways to run a program.
You can either make an interpreter to run it, or you can make a
compiler to another language, and then use an interpreter for that
language (the second language can be machine code with a CPU doing
the interpretation).
Also, there are many ways to make an interpreter for a language (which
is what I call something that takes a program in that language as
input and then runs it). All interpreters of source code (text based)
programs will parse the source and have some internal representation.
That internal representation can be simple data structures, parse
trees (like Perl) or some kind of byte code. You might even call the
parser a "compiler", but that does not change that it is just a part
of an interpreter: something taking a program as input and running it.
For a language to be compiled, I require that the result of the
compilation can be meaningfully stored and later run as a program of
its own. If it can't, then the "compiled program" is simply an
intermediate result of the interpreter (or whatever program it is part
of).
A compiler transforms a program in one language into another form,
which is probably another language. That other form can be native
code, byte code (e.g. Java or OCAML) or some other language source
code (e.g., assembler as from "gcc -S" or the language itself as from
"gcc -E"). If can store this other program, and run it later, several
times, without having to recreate it from the source code each time.
Looking at input/ouput pairs (if I correctly understand what you mean
by that), an interpreter takes a program is input and gives no output
(except what the program itself might do of I/O). A compiler takes a
program as input and gives another program as output, in another
language or format. If we ignore what happens inside, that's the I/O
characteristics of an interpreter and a compiler. More importantly, a
compiler doesn't run the program it processes, an interpreter does.
Actually, it always requires an interpreter to *run* a program.
Through compilation, you can get to the point where that interpreter
is in the CPU, or you can stop earlier and have your own interpreter
(which is run by the CPU).
[...]
I'd say that JavaScript was designed for interpretation rather than
compilation.
Apparently you did not understand what I wrote. JavaScript[tm] was
*designed* as a language specifying human-readable source code to be
compiled into bytecode and then have this bytecode interpreted by a
Virtual Machine in the first place (which makes it cross-platform).
Where do you find this (that JavaScript was designed for being
parsed into bytecode)?
It is designed for ease of writing and being cross-platform, I can
agree on as much, but whether the actual implementation of it shoud
use bytecode or not is not something I can read from the link you
give, nor from the ECMAScript standard.
It was not specified as source code that has to be JIT-compiled,
although that was a reasonable approach if used in an HTML UA
environment for which it was, undeniably, originally designed:
What is clear, is that the primary format of exchange of Javascript
code is source code, and that every web page containing Javascript (as
you say, the orignal design target of JavaScript) will start from
source code every time it is read. While there might be internal
compilation in the interpreter, it is hast the I/O characteristics of
an interpreter.
Strictly speaking, JavaScript is a language to be compiled into JavaScript
bytecode, and JavaScript bytecode is a language to be interpreted by a VM
(according to the platform it is run on).
What is this "JavaScript bytecode" language? Is it standardised
anywhere? Or maybe it is just an artifact of the implementation of
one interpreter.
I know that Rhino uses an internal JavaScript bytecode, and so does
SpiderMonkey. I don't know if it's the same format, though. I'm pretty
sure that other ECMAScript implementations, even if they use bytecode
internally, doesn't use the same bytecode (e.g., JavaScriptCore, KJS,
Opera's ECMAScript, JScript). I'm certain that JScript 2.0 is compiled
into CLI, not JavaScript bytecode (as used by Rhino and/or
SpiderMonkey). Resin even has a compiler from Javascript to Java
bytecode, yet another bytecode format
This, however, does not make JavaScript an interpreted language;
strictly speaking, it makes JavaScript rather a compiled language
because JavaScript bytecode would then be a different language.
I would agree, if the bytecode was used for exchanging Javascript
programs, and not only internally in the interpreter.
If JavaScript was only an interpreted language (e.g. like bash), source
code would be executed line by line and only if a line was reached it
would be checked for syntax errors prior to execution.
That's a very narrow view of what an interpreter is. What you describe
is characteristic of an *interactive* interpreter, or just something
with a very simple grammar. But even then, as I'll show later, you
*can* do exactly that with ECMAScript.
However, JavaScript syntax errors are recognized before the script
is run.
So does Perl, which I still maintain is (primarily) an interpreted
language. So as an argument, it won't change my mind
The reason is that a compiler checks the source code for
correctness, compiles it into bytecode if it is syntactically
correct and then the resulting byte code is interpreted,
i.e. executed, by the JavaScript VM (which may result in runtime
errors). The compound that makes up a at least the JavaScript
JIT-compiler and the JSVM is called the JavaScript engine:
<
http://lxr.mozilla.org/mozilla/source/js/src/README.html>
I assume you mean that this is an argument for JavaScript being
compiled. Let's remember this for a little later
That's the SpiderMonkey implementation of JavaScript. It's an
efficient implementation. If one were to implement a more efficient
interpreter for bash scripts, then it would probably also parse
the script into an internal format before executing it. But that's
an implementation detail of the interpreter. It still doesn't
create an external, retainable compiled version, which means, to
me, that it is still an interpreter.
[...]
So, I'd call JavaScript, and ECMAScript, interpreted languages,
not compiled ones, because they are typically run from source
every time, and are designed to be so.
No, they certainly are not! While this is a common misconception, neither
the Netscape JavaScript Reference nor the ECMAScript Specification state
anything of the kind.
I never claimed that they did, but that I would, based on the design
choices and typical use of JavaScript, classifiy it as designed as an
interpreted language.
Especially, ECMAScript Ed. 3 contains the following paragraphs:
...
| The intent is that the incoming source text has been
| converted to normalised form C before it reaches the compiler.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is one of four occurences of words starting with "compile" in
ECMA262v3. The meaning of "compiler" in this is not defined anywhere
(this is the first occurence of the four).
| 15.10.2.2 Pattern
| [...]
| Informative comments: A Pattern evaluates ("compiles") to an internal
^^^^^^^^^^^^^^^^^^^^^^
Let's not confuze Regular Expressions, which are a language of their
own that are embedded in Javascript, with Javascript itself. Regular
expressions are traditionally converted to a more efficient internal
representation before using, and that transformation is traditionally
called "compilation". That is the behavior of regular expression
libraries, as included in many other languages as well. It's also
somehow reasonable (but I can see a problem for me coming up), because
the compiled version of the regular expression *is* stored and then,
later, interpreted, possibly several times.
My problem is that the compiled version of a regular expression is not
*external* to the Javascript interpreter ... although it is to the
regular expression interpreter. At least, a user (in this case another
program instead of a person) can wait arbitrarily long between
compiling and running the result of the compilation.
While the former may be debatable as a proof because "to compile" can have
different meanings (including, but not limited to: to construct, to build,
to contain, to compose), the relation between compilation and syntax errors
is made quite visible here, so ISTM that the process a source code compiler
(more exact: the source code parser which part of the compiler) performs is
referred to here as well.
Parsers can be part of interpreters too (with my definition of
interpreter
. Actually, they pretty much have to be part of any
program that understands source code.
Furthermore:
| 16 Errors
|
| An implementation shall not report other kinds of runtime errors early
| even if the compiler can prove that a construct cannot execute without
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| error under any circumstances. An implementation may issue an early
| warning in such a case, but it should not report the error until the
| relevant construct is actually executed.
This uses "compiler" in much the same way as the first use (and it is
the fourth and final occurence
However, you left out the *first* paragraph of that section:
---
An implementation should report runtime errors at the time the
relevant language construct is evaluated. An implementation may
report syntax errors in the program at the time the program is read
in, or it may, at its option, defer reporting syntax errors until
the relevant statement is reached. An implementation may report
syntax errors in *eval* code at the time *eval* is called, or it may, at
its option, defer reporting syntax errors until the relevant
statement is reached.
---
Now, remember that SpiderMonkey caught syntax errors when first
parsing the program into its internal bytecode. I said then that it
was an implementation choice of SpiderMonkey, not a requirement of the
language. This says so, much more directly.
This is, if anything, characteristic of an interpreted language.
Together with the existence of "eval" and even just calling it a
"scripting language", it convinces me that JavaScript was designed
as an interpreted language (again, with what I consider the most
reasonable definition of being an interpreted language: that
each execution starts from source code).
So, *if* the distinction between a compiled and an interpreted
language exists and makes any sense, then I will put JavaScript and
ECMAScript into the interpreted group. However, if the distinction was
always clear, we wouldn't have this discussion.
Let's try to imagine three groups of languages:
1) languages that are always compiled
2) languages that are both compiled and interpreted
3) languages that are always interpreted
Then I doubt there is any language that, by necessity, has to be
in either group 1 or 3. I have seen interpreters for C and Java source
code, and compilers for the Bourne shell and DOS .bat-files. Any
language can be interpreted. Any language can be compiled.
<URL:
http://root.cern.ch/root/Cint.html>
<URL:
http://koala.ilog.fr/djava/>
<URL:
http://www.comeaucomputing.com/faqs/ccshfaq.html>
<URL:
http://www.softempire.com/batch-file-compiler-downloads.html>
A distinction must be on what the *typical* or *intended* use of the
language is, not what possbile, or actual but rare, exceptions might
exist. Both C and Java are typical compiled languages. Both sh-
and .bat-language are typical interpreted languages. A language
like OCAML can reasonably claim to be somewhere in the middle.
All source program managing programs needs the ability to parse the
source code, both compilers and interpreters included. For a few
languages, an interpreter can do this incrementally as the program is
executed (typically languages also intended for interactive use, with
simple, command-like statements). Other languages have more
complicated grammars, and an interpreter need to parse entire files at
once, or even entire groups of files. They will then store the parsed
program internally in some format. They might even pick an efficient
representation, which could be some sort of bytecode.
You might call transformation "compilation", but when you always have
this compilation followed by execution, it *is* just a part of an
interpreter, an accident of implementation.
So, back to my way of distinguishing compiled and interpreted
languages:
What characterizes a compiled language is that the compiled program
can be stored arbitrarily long, and then executed several times.
What characterizes an interpreted language is that each execution
starts from the original source code.
Typical usage of Javascript puts it clearly in the latter category.
/L