C++ to JVM compiler

J

jhc0033

There used to be an alpha quality JVM backend for RedHat's branch of
GCC, but it was abandoned, because FOSS luminaries did not approve of
Java (strange, because both are out to crush Microsoft)

Now that Java is going open-source again, I hope this project will be
resurrected. C++ and Java are the most widely used languages. It's a
shame that C++ can not be compiled to JVM, even though there is a ton
of platforms that it can be compiled to, and there is a ton of
languages that do compile to JVM.

Let me list the advantages:

* definitively determine memory corruption caused by a C++ program
(valgrind and VC++2005's debug mode help, but can not always detect
it; also they are much slower than JVM; it would also be nice to be
able to detect uninitialized memory access, but I don't think JVM can
help with that, since it has everything initialized)

* call C++ programs more easily from Java, presumably

* run C++ on some exotic platforms with JVM and no C++ compiler (rare)

* run some C++ code faster (Java sometimes beats C++ on numerics-heavy
tests - currently rare, but things seem to be changing in favor of
JIT)

* run C++ safely - I know there are arrogant C++ coders who would
claim that their C++ code can not be exploited, but is there an
automated way to prove that?

(This could make a great summer-of-code project, unless it's too hard
- I don't actually know)
 
J

jhc0033

http://nestedvm.ibex.org/

NestedVM provides binary translation for Java Bytecode. This is done
by having GCC compile to a MIPS binary which is then translated to a
Java class file. Hence any application written in C, C++, Fortran, or
any other language supported by GCC can be run in 100% pure Java with
no source changes.


Very interesting. Although, for C++ debugging, I don't think this
approach would always work for memory corruption detection, e.g.

#include <iostream>

struct pr {
double x;
double get_y() const { return y; }
pr() : x(0), y(0) {}
private:
const double y;
};

int main() {
pr p;
(&p.x)[1] = 3; // write to const private y
std::cout << p.get_y() << '\n';
}
 
J

Joshua Cranmer

Now that Java is going open-source again, I hope this project will be
resurrected. C++ and Java are the most widely used languages. It's a
shame that C++ can not be compiled to JVM, even though there is a ton
of platforms that it can be compiled to, and there is a ton of
languages that do compile to JVM.

Any C++-to-JVM compilation would have to be imprecise and rely on
several heuristics. Pointer arithmetic, rather common in C++ code, does
not convert to JVM bytecode well. Quirks of templates in C++ make
conversion to Java generics near impossible unless Java gains
reification. Finally, any crazy stuff I could do with function pointers
would not translate well.
* run some C++ code faster (Java sometimes beats C++ on numerics-heavy
tests - currently rare, but things seem to be changing in favor of
JIT)

This is a large bone of contention, but the two languages are more or
less equally fast these days. Likely the C++ code would be slowed down
as some imprecise translations would use hackier crutches.
* run C++ safely - I know there are arrogant C++ coders who would
claim that their C++ code can not be exploited, but is there an
automated way to prove that?

(This could make a great summer-of-code project, unless it's too hard
- I don't actually know)

I am not an expert in this area, but I think it would be on the harder
side. The GSoC application time frame for 2008 has already passed, though...
 
J

jhc0033

Any C++-to-JVM compilation would have to be imprecise and rely on
several heuristics. Pointer arithmetic, rather common in C++ code, does
not convert to JVM bytecode well.

A pointer into an array, for example, is just a pair of
* array (reference)
* index into it
when the pointer points outside of the array the behavior is undefined
in C++, so I would like the Java version to fail in a case like that.
Quirks of templates in C++ make
conversion to Java generics near impossible unless Java gains
reification.

Templates shouldn't be translated into generics. C++ templates are
semantically closer to C++ macros than to Java generics.
Finally, any crazy stuff I could do with function pointers
would not translate well.

I don't see any difficulties there either.
This is a large bone of contention, but the two languages are more or
less equally fast these days. Likely the C++ code would be slowed down
as some imprecise translations would use hackier crutches.

For debugging, anything less than a 10x slow-down would beat MSVC++
debug mode, and anything less than a 100x slow-down would beat
valgrind.
GCC dehydra? <http://wiki.mozilla.org/Dehydra_GCC>. Of course, you have
to write the tests first.

Not sure what it can do for me, but thanks for the link. I don't write
utterly bad C'ish C++ with strcpy, etc. However, I once spent days if
not weeks looking for an uninitialized value usage bug.
 
L

LR

Stefan said:
Java has a 64 KByte limit for the size of methods.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4262078

So a C++ function can not be translated to a Java method
(except for the case that it will not exceed 64 KBytes).

I'm sorry, but I don't see why this particular limitation of the JVM
limits what sort of C++ can be compiled to JVM code.

Is there some limitation in the JVM that prevents turning a single
method into a number of methods that are functionally equivalent to the
single method?

LR
 
S

Steve Wampler

LR said:
Is there some limitation in the JVM that prevents turning a single
method into a number of methods that are functionally equivalent to the
single method?

I don't think I'd call it a limitation of the JVM, but of automatic
language translations in general. It can be very hard to perform such
a split while preserving the original semantics and keeping the
efficiency close to what was there originally. Remember that you'd
like such a process to work correctly on the pathological cases as
well as the simple ones.
 
S

Stefan Ram

Steve Wampler said:
I don't think I'd call it a limitation of the JVM, but of
automatic language translations in general. It can be very
hard to perform such a split while preserving the original
semantics and keeping the efficiency close to what was there
originally. Remember that you'd like such a process to work
correctly on the pathological cases as well as the simple ones.

Scheme can be compiled to C or C++ using »trampolining«.

This will create one big function from the whole Scheme
program, like

while( running )
{ switch( programm_counter )
{ case 0: /* compiler output for a scheme closure */ break;
case 1: /* compiler output another scheme closure */ break;
case 2: /* and so on */ break; ... }}

Since the whole Scheme program is being compiled to one
large switch, this switch might be larger than 64 K.

Because (possibly small) closures are activated often,
one does not want to have a call overhead for each closure
activation.
 
J

jhc0033

Since the whole Scheme program is being compiled to one
large switch, this switch might be larger than 64 K.

Because (possibly small) closures are activated often,
one does not want to have a call overhead for each closure
activation.

The authors of Bigloo and Kawa would sure like to know that, because
of closures, their Scheme->JVM compilers are even MORE impossible
than C++ -> JVM.
 
S

Stefan Ram

FOSS luminaries did not approve of Java

I don't know if this already was mentioned in this thread,
but here is one source for this (dated 2001):

»RMS thinks having gcc both generate and accept as an
input java bytecode allows folks to do nasty proprietary
things with gcc so he's not interested in the backend for
the jvm which I wrote 18 months ago (and doesn't think
anyone else should be).«

For more details, see

http://gcc.gnu.org/ml/gcc/2001-02/msg00895.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top