If we want to write a language translator that translates one language
to another language (Java Source Code or Byte Code) that runs on JVM,
which of the following approaches is better?
1) Non-Java Source Code -> Language Translator -> Java Source Code
2) Non-Java Source Code -> Language Translator -> Java Byte Code
It depends on number of factors.
a) Are you already familar with Java at a fairly deep technical level ? If not
then either give up the project or resign yourself to the idea that you will
/become/ familar by the time you've finished.
b) Are you already familar with the JVM classfile format, either directly or
via some bytecode library such as BCEL, ASM, or the GNU Bytecode library ? If
not then generating straight Java will be simpler.
c) Are you already familar with the JVM bytecode instruction set ? If not then
generating straight Java will be simpler.
d) Do you have any particular reason to avoid creating files ? If so then
you'll have difficulty if you go via straight Java (you can create and use byte
arrays in class file format on the fly).
e) Do you need to generate code that does not correspond to any legal Java.
I'd be quite surprised you did since there aren't many bytecode sequences that
/don't/ correspond to legal Java[*]. The only examples I can think of are
overlapping, but not nested, exeption handlers; and the possibility (but you'll
have to be careful of the verifiers flow-of-control analysis) to create what in
Java would be illegally overlapping control structures. E.g (totally
meaningless example):
switch (x)
{
while (x > 10)
{
case 20:
y = x;
}
case 30:
y++;
break;
}
f) Do you need to avoid the Java compiler's insistance on checking stuff
statically that will be checked at runtime anyway ? E.g. calling a method for
which no definition exists /yet/.
g) other factors that I've forgotten...;-)
Unless one of (d, e, f) is an important consideration for you, then I'd say
that generating straight Java will be easier to do, and considerably easier to
debug.
BTW (just for completeness since I imagine you've already considered and
rejected the idea) another option would be to write an /interpreter/ in Java
rather than attempting a translation. Or even use a hybrid approach --
interpret most if it but generate classfile/Java for the hotspots. JITs within
JITs...
([*] when I say that most bytecode sequences correspond to legal Java, I mean
that most legal bytecode sequences are identical /in effect/ to some sequence
that would be produced by the Java compiler, not that every possible sequence
can be directly mapped onto Java.)
-- chris