Writing custom compiler

D

daniel.w.gelder

Hi,

I'm in the process of writing a custom compiler for my own language
that will target the JVM. I'm just getting started but I've got a
ClassInfo file successfully streamed. I have a question for anyone who
knows.

Apparently I have to define my own constructor, even if it doesn't do
anything. I seem to need to define an <init> method with signature
"()V", otherwise myClass.newInstance() throws InstantiationException.

Anyway, there's nothing in <init> except a return statement. So I get
this exception:

java.lang.VerifyError: (class: Dan, method: <init> signature: ()V)
Constructor must call super() or this()

So apparently I have to call Object.<init>() too. Makes sense, but I
thought you could never call <init> yourself. Is there a trick here?

Thanks.
Dan
 
K

Kent Paul Dolan

[I don't have much knowledge to offer. Instead, let me
play potted plant here and see if it helps.]
Apparently I have to define my own constructor,
even if it doesn't do anything.

That shouldn't be the case, if you don't explicitly
define a parameterless void constructor, the system
creates a default one for you which calls super()
and returns. In fact, creating such a constructor
and making it private so it can't be invoked is one
frequently seen trick to prevent the system from
supplying such a default constructor unbeknownst to
you and having it invoked where you had no such
intention.
I seem to need to define an <init> method with signature
"()V", otherwise myClass.newInstance() throws InstantiationException.

1) Are those angle brackets a literal part of the name "<init>"?
Is that some template parameter naming, or what?

2) What does the "V" in "()V" mean? "Void return?"
Anyway, there's nothing in <init> except a return statement. So I get
this exception:
java.lang.VerifyError: (class: Dan, method: <init> signature: ()V)
Constructor must call super() or this()

Is that a compile time error, or a run time error? It looks
like a runtime invocation of MyClass.init() has encountered
a problem with a constructor for MyClass being missing, but
as noted above, one should be created (and then invoked) by
default if you don't prevent that happening.
So apparently I have to call Object.<init>() too. Makes sense, but I
thought you could never call <init> yourself. Is there a trick here?

I'm still confused by those angle brackets, but it isn't "init"
that you are being told to call, you are being told, at a default
invocation of MyClass.init() at startup time, that you haven't
yet instantiated an object from MyClass, thus there is no object
whose instance (as opposed to static) method init() can be invoked.

A common pattern is to have your main routine inherit from
applet, to instance applet in main(...), then to invoke init()
from that instance.

class MyClass extends applet
{
private static MyClass mc = null;

public MyClass() // constructor
{
super();
}
void main(...) // must exist in some class of your app
{
mc = new MyClass();
mc.init(); // yes, you _can_ call init() yourself
// ... do more stuff
}
void init()
{
// initialize stuff for instance object mc of MyClass.
}
}

Or something vaguely like that. Until I figure out how
to boot Debian Linux using grub on and from an external
drive (to leave the internal drive's MS-Windows garbage
unmolested) on my replacement laptop, I'm temporarily
out of the Java business.

FWIW

xanthian.
 
D

daniel.w.gelder

Yeeeeeaaahhhkaayyyy.....try re-reading my first sentence. :)

Thanks anyway though for replying.
Dan
 
C

Chris Uppal

Apparently I have to define my own constructor, even if it doesn't do
anything.

Yup. There has to be a constructor or there's nothing there for other code to
call.

I seem to need to define an <init> method with signature
"()V", otherwise myClass.newInstance() throws InstantiationException.

That's correct.

So apparently I have to call Object.<init>() too. Makes sense, but I
thought you could never call <init> yourself. Is there a trick here?

"Rules change in the reaches"[*]. I.e. this is bytecode land -- a high-level,
mostly-dynamic, mostly-OO programming language with an interesting hybrid
static/dynamic type system. The rules you learned in Java are only a rough
approximation to the rules which apply here.

In this case you are correct. You have to supersend <init> (or use some other
flavour of <init>). BTW, if you do supersend then it has to be an
invokespecial instruction, invokevirtual isn't allowed here.

The JVM spec is irritatingly incomplete, occasionally ambiguous, and even
within those limits, not especially well-written, but it does cover this stuff.
You should probably read the whole thing at least once (if you haven't
already). Much of it is a non-normative (and largely irrelevant) rehashing of
the JLS. Resign yourself to the idea that you are going to be reading the
/other/ bits over and over again ;-)

-- chris

[*] Or, if you prefer, "You're not in Kansas anymore". Or even, "Welcome to
the /real/ world" ;-)
 
D

daniel.w.gelder

I seem to have gotten it at last. I'm kind of surprised how much actual
bytecode javac always had to make for

public class Test {
}

It uses a lot more space than the original file, that's for sure. Oh
well. Time to optimize my compiler frontend and do a little coding.

Dan
 
P

Patricia Shanahan

Kent said:
[I don't have much knowledge to offer. Instead, let me
play potted plant here and see if it helps.]

Apparently I have to define my own constructor,
even if it doesn't do anything.


That shouldn't be the case, if you don't explicitly
define a parameterless void constructor, the system
creates a default one for you which calls super()
and returns. In fact, creating such a constructor
and making it private so it can't be invoked is one
frequently seen trick to prevent the system from
supplying such a default constructor unbeknownst to
you and having it invoked where you had no such
intention.
....

I think Daniel is using bytecode, rather than Java, as his target
language, so there will be things a Java compiler would do automatically
that he needs to do explicitly.

However, this does suggest a procedure for solving his problem:

1. Write a Java class with no specified superclass and no constructor
declaration.

2. Compile it.

3. Examine the bytecode. See what the compiler generates to represent
the default constructor. It will contain a call to the Object constructor.

4. Make the new compiler generate the same thing.

Patricia
 
M

Mike Schilling

Hi,

I'm in the process of writing a custom compiler for my own language
that will target the JVM. I'm just getting started but I've got a
ClassInfo file successfully streamed. I have a question for anyone who
knows.

I know (obviously) nothing about your language, but I'm wondering:

Might it be easier to use Java as an intermediate language? That is,
generate Java from your language and then use javac to compile that?

As Chris points out, the JVM spec is irritatingly incomplete about the
precise requirements for bytecode, while the JLS and assorted other books
are far better at explaining Java. And should you run into trouble, you'll
have a much easier time debugging your generated Java than debugging
bytecode directly.
 
C

Chris Uppal

Mike said:
Might it be easier to use Java as an intermediate language? That is,
generate Java from your language and then use javac to compile that?

Or maybe an intermediate level technology like Javassist.

Just mentioning options; personally I'd pop a beer and get stuck right into the
bytecode ;-)

-- chris
 
D

dimitar

In addition to the JVM spec, you can also check Bill Venners's "Inside
the JVM". It's out of print, but you might find a copy in your library.

Dimitar
 
D

dimitar

In addition to the JVM spec, you can also check Bill Venners's "Inside
the JVM". It's out of print, but you might find a copy in your library.

Dimitar
 
D

daniel.w.gelder

I've already popped several beers and quite a lot of coffee beans too
on it!

Actually I tried using Java as an intermediate language first, calling
into sun.tools.javac. It worked, but it was really shockingly
inefficient in a lot of ways and I lost interest.

Bytecode, while tricky, is at least a challenge.

Dan
 
D

daniel.w.gelder

Now I'm getting deep. I have a question to anyone who knows: what is
the real difference between local variables and the operand stack in
the JVM? Both exist only within a method frame. Operations push and pop
only from the operand stack, granted, but it seems like the 'dup' and
'swap' commands are entirely sufficient to compile optimized code,
given a non-naive compiler. After all, if you know you'll need the
results of an operation more than once, just 'dup' it the first time.
If it's not in the right place, 'swap' it. Right?
 
M

Mike Schilling

Now I'm getting deep. I have a question to anyone who knows: what is
the real difference between local variables and the operand stack in
the JVM?

There isn't one, really. I vaguely recall reading a paper by a .NET
advocate saying that MSIL is superior to Java bytecode because MSIL does
make that distinction while bytecode doesn't (I don't remember why this was
supposed to be an advantage.)
 
C

Chris Uppal

Operations push and pop
only from the operand stack, granted, but it seems like the 'dup' and
'swap' commands are entirely sufficient to compile optimized code,
given a non-naive compiler

Remember that the JVM bytecode instructions will be translated into real
machine operations. If that machine doesn't have an operation stack (or if the
JITer -- if any -- doesn't use it) then stack twiddling instructions will be
converted into variable-to-variable, or register-to-register, movements. It
might be harder or even impossible for the JITer to optimise such code.

Also, don't forget that the operand stack is cleared whenever an exception is
thrown.

OTOH, don't go overboard in avoiding stack twiddling -- after all those
instructions are there and they are intended to be used. It's probably best to
take the output of javac as a guide to how much use to make of the stack.

-- chris
 
C

Chris Smith

Chris Uppal said:
Remember that the JVM bytecode instructions will be translated into real
machine operations. If that machine doesn't have an operation stack (or if the
JITer -- if any -- doesn't use it) then stack twiddling instructions will be
converted into variable-to-variable, or register-to-register, movements. It
might be harder or even impossible for the JITer to optimise such code.

With a few exceptions, though, optimizers don't optimize native machine
code. They optimize intermediate representations. My guess is that an
aggressive optimizer does the following:

1. Break up the method into basic blocks.

2. Convert all local variables AND the operand stack into SSA form with
temporary variables.

3. Optimize and generate code (including register allocation) from the
SSA representation.

So I believe that it wouldn't make much difference in code quality
whether data is stored in local variable slots or on the operand stack,
because the two will be indistinguishable after the conversion to SSA
form. Both dividing the code into basic blocks and writing the SSA
representation is relatively cheap (linear on the method length), so
this may even be the process for quick optimizations as well.

Where this does make a difference is in the size of the bytecode, and I
suspect that was part of the reason for the design choices. When Java
code was supposed to be transferred via IR beams between set-top cable
television boxes, code size was important. Arguably (though to a lesser
extent), it is so again with J2ME.
It's probably best to
take the output of javac as a guide to how much use to make of the stack.

To get the best optimization possible, it's probably best to take the
output of javac as a guide for as much as you possibly can. There has
undoubtedly been much work done to make the JIT in most major virtual
machines work as well as possible with common types of code that are
written by javac. The same optimizations aren't likely to happen with
someone's one-use code generator. :)

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
M

Mike Schilling

Chris Smith said:
To get the best optimization possible, it's probably best to take the
output of javac as a guide for as much as you possibly can. There has
undoubtedly been much work done to make the JIT in most major virtual
machines work as well as possible with common types of code that are
written by javac. The same optimizations aren't likely to happen with
someone's one-use code generator. :)

And this becomes automatic if your compiler outputs Java. But I repeat
myself :)
 
R

Roedy Green

Now I'm getting deep. I have a question to anyone who knows: what is
the real difference between local variables and the operand stack in
the JVM? Both exist only within a method frame. Operations push and pop
only from the operand stack, granted, but it seems like the 'dup' and
'swap' commands are entirely sufficient to compile optimized code,
given a non-naive compiler. After all, if you know you'll need the
results of an operation more than once, just 'dup' it the first time.
If it's not in the right place, 'swap' it. Right?

the stack has:
1. return value where to carry on when the method ends.
2. local variables.
3. temporaries needed to evaluate expressions
 
R

Roedy Green

but it seems like the 'dup' and
'swap' commands are entirely sufficient to compile optimized code,
given a non-naive compiler. After all, if you know you'll need the
results of an operation more than once, just 'dup' it the first time.
If it's not in the right place, 'swap' it. Right?

FORTH is a stack based machine similar to the JVM.

In FORTH besides SWAP and DUP you have other operators, most notably
PICK to let you get at any element arbitrarily deep in the stack. The
JVM does not have nearly as many stack operators as FORTH, but it does
let you do stack relative addressing which gives you pick. It also
lets you do frame relative addressing to let you access the locals and
parms with fixed offsets.
 
R

Roedy Green

the stack has:
1. return value where to carry on when the method ends.
2. local variables.
3. temporaries needed to evaluate expressions

and 1.5 parameters passed to this method.
 
M

Mike Schilling

Roedy Green said:
and 1.5 parameters passed to this method.

I've seen methods with one parameter and methods with two parameters, but
I've never seem a method with 1.5 parameters.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top