Writing custom compiler

Discussion in 'Java' started by daniel.w.gelder@gmail.com, May 14, 2006.

  1. Guest

    Hi,

    I'm in the process of writing a custom compiler for my own language
    that will target the JVM. I'm just getting started but I've got a
    ClassInfo file successfully streamed. I have a question for anyone who
    knows.

    Apparently I have to define my own constructor, even if it doesn't do
    anything. I seem to need to define an <init> method with signature
    "()V", otherwise myClass.newInstance() throws InstantiationException.

    Anyway, there's nothing in <init> except a return statement. So I get
    this exception:

    java.lang.VerifyError: (class: Dan, method: <init> signature: ()V)
    Constructor must call super() or this()

    So apparently I have to call Object.<init>() too. Makes sense, but I
    thought you could never call <init> yourself. Is there a trick here?

    Thanks.
    Dan
    , May 14, 2006
    #1
    1. Advertising

  2. "daniel.w.gelder" <> wrote:

    [I don't have much knowledge to offer. Instead, let me
    play potted plant here and see if it helps.]

    > Apparently I have to define my own constructor,
    > even if it doesn't do anything.


    That shouldn't be the case, if you don't explicitly
    define a parameterless void constructor, the system
    creates a default one for you which calls super()
    and returns. In fact, creating such a constructor
    and making it private so it can't be invoked is one
    frequently seen trick to prevent the system from
    supplying such a default constructor unbeknownst to
    you and having it invoked where you had no such
    intention.

    > I seem to need to define an <init> method with signature
    > "()V", otherwise myClass.newInstance() throws InstantiationException.


    1) Are those angle brackets a literal part of the name "<init>"?
    Is that some template parameter naming, or what?

    2) What does the "V" in "()V" mean? "Void return?"

    > Anyway, there's nothing in <init> except a return statement. So I get
    > this exception:


    > java.lang.VerifyError: (class: Dan, method: <init> signature: ()V)
    > Constructor must call super() or this()


    Is that a compile time error, or a run time error? It looks
    like a runtime invocation of MyClass.init() has encountered
    a problem with a constructor for MyClass being missing, but
    as noted above, one should be created (and then invoked) by
    default if you don't prevent that happening.

    > So apparently I have to call Object.<init>() too. Makes sense, but I
    > thought you could never call <init> yourself. Is there a trick here?


    I'm still confused by those angle brackets, but it isn't "init"
    that you are being told to call, you are being told, at a default
    invocation of MyClass.init() at startup time, that you haven't
    yet instantiated an object from MyClass, thus there is no object
    whose instance (as opposed to static) method init() can be invoked.

    A common pattern is to have your main routine inherit from
    applet, to instance applet in main(...), then to invoke init()
    from that instance.

    class MyClass extends applet
    {
    private static MyClass mc = null;

    public MyClass() // constructor
    {
    super();
    }
    void main(...) // must exist in some class of your app
    {
    mc = new MyClass();
    mc.init(); // yes, you _can_ call init() yourself
    // ... do more stuff
    }
    void init()
    {
    // initialize stuff for instance object mc of MyClass.
    }
    }

    Or something vaguely like that. Until I figure out how
    to boot Debian Linux using grub on and from an external
    drive (to leave the internal drive's MS-Windows garbage
    unmolested) on my replacement laptop, I'm temporarily
    out of the Java business.

    FWIW

    xanthian.


    --
    Posted via Mailgate.ORG Server - http://www.Mailgate.ORG
    Kent Paul Dolan, May 14, 2006
    #2
    1. Advertising

  3. Guest

    Yeeeeeaaahhhkaayyyy.....try re-reading my first sentence. :)

    Thanks anyway though for replying.
    Dan
    , May 14, 2006
    #3
  4. Chris Uppal Guest

    wrote:

    > Apparently I have to define my own constructor, even if it doesn't do
    > anything.


    Yup. There has to be a constructor or there's nothing there for other code to
    call.


    > I seem to need to define an <init> method with signature
    > "()V", otherwise myClass.newInstance() throws InstantiationException.


    That's correct.


    > So apparently I have to call Object.<init>() too. Makes sense, but I
    > thought you could never call <init> yourself. Is there a trick here?


    "Rules change in the reaches"[*]. I.e. this is bytecode land -- a high-level,
    mostly-dynamic, mostly-OO programming language with an interesting hybrid
    static/dynamic type system. The rules you learned in Java are only a rough
    approximation to the rules which apply here.

    In this case you are correct. You have to supersend <init> (or use some other
    flavour of <init>). BTW, if you do supersend then it has to be an
    invokespecial instruction, invokevirtual isn't allowed here.

    The JVM spec is irritatingly incomplete, occasionally ambiguous, and even
    within those limits, not especially well-written, but it does cover this stuff.
    You should probably read the whole thing at least once (if you haven't
    already). Much of it is a non-normative (and largely irrelevant) rehashing of
    the JLS. Resign yourself to the idea that you are going to be reading the
    /other/ bits over and over again ;-)

    -- chris

    [*] Or, if you prefer, "You're not in Kansas anymore". Or even, "Welcome to
    the /real/ world" ;-)
    Chris Uppal, May 14, 2006
    #4
  5. Guest

    I seem to have gotten it at last. I'm kind of surprised how much actual
    bytecode javac always had to make for

    public class Test {
    }

    It uses a lot more space than the original file, that's for sure. Oh
    well. Time to optimize my compiler frontend and do a little coding.

    Dan
    , May 14, 2006
    #5
  6. Kent Paul Dolan wrote:
    > "daniel.w.gelder" <> wrote:
    >
    > [I don't have much knowledge to offer. Instead, let me
    > play potted plant here and see if it helps.]
    >
    >
    >>Apparently I have to define my own constructor,
    >>even if it doesn't do anything.

    >
    >
    > That shouldn't be the case, if you don't explicitly
    > define a parameterless void constructor, the system
    > creates a default one for you which calls super()
    > and returns. In fact, creating such a constructor
    > and making it private so it can't be invoked is one
    > frequently seen trick to prevent the system from
    > supplying such a default constructor unbeknownst to
    > you and having it invoked where you had no such
    > intention.

    ....

    I think Daniel is using bytecode, rather than Java, as his target
    language, so there will be things a Java compiler would do automatically
    that he needs to do explicitly.

    However, this does suggest a procedure for solving his problem:

    1. Write a Java class with no specified superclass and no constructor
    declaration.

    2. Compile it.

    3. Examine the bytecode. See what the compiler generates to represent
    the default constructor. It will contain a call to the Object constructor.

    4. Make the new compiler generate the same thing.

    Patricia
    Patricia Shanahan, May 14, 2006
    #6
  7. <> wrote in message
    news:...
    > Hi,
    >
    > I'm in the process of writing a custom compiler for my own language
    > that will target the JVM. I'm just getting started but I've got a
    > ClassInfo file successfully streamed. I have a question for anyone who
    > knows.


    I know (obviously) nothing about your language, but I'm wondering:

    Might it be easier to use Java as an intermediate language? That is,
    generate Java from your language and then use javac to compile that?

    As Chris points out, the JVM spec is irritatingly incomplete about the
    precise requirements for bytecode, while the JLS and assorted other books
    are far better at explaining Java. And should you run into trouble, you'll
    have a much easier time debugging your generated Java than debugging
    bytecode directly.
    Mike Schilling, May 16, 2006
    #7
  8. Chris Uppal Guest

    Mike Schilling wrote:

    > Might it be easier to use Java as an intermediate language? That is,
    > generate Java from your language and then use javac to compile that?


    Or maybe an intermediate level technology like Javassist.

    Just mentioning options; personally I'd pop a beer and get stuck right into the
    bytecode ;-)

    -- chris
    Chris Uppal, May 16, 2006
    #8
  9. dimitar Guest

    In addition to the JVM spec, you can also check Bill Venners's "Inside
    the JVM". It's out of print, but you might find a copy in your library.

    Dimitar
    dimitar, May 16, 2006
    #9
  10. dimitar Guest

    In addition to the JVM spec, you can also check Bill Venners's "Inside
    the JVM". It's out of print, but you might find a copy in your library.

    Dimitar
    dimitar, May 16, 2006
    #10
  11. Guest

    I've already popped several beers and quite a lot of coffee beans too
    on it!

    Actually I tried using Java as an intermediate language first, calling
    into sun.tools.javac. It worked, but it was really shockingly
    inefficient in a lot of ways and I lost interest.

    Bytecode, while tricky, is at least a challenge.

    Dan
    , May 17, 2006
    #11
  12. Guest

    Now I'm getting deep. I have a question to anyone who knows: what is
    the real difference between local variables and the operand stack in
    the JVM? Both exist only within a method frame. Operations push and pop
    only from the operand stack, granted, but it seems like the 'dup' and
    'swap' commands are entirely sufficient to compile optimized code,
    given a non-naive compiler. After all, if you know you'll need the
    results of an operation more than once, just 'dup' it the first time.
    If it's not in the right place, 'swap' it. Right?
    , May 20, 2006
    #12
  13. <> wrote in message
    news:...
    > Now I'm getting deep. I have a question to anyone who knows: what is
    > the real difference between local variables and the operand stack in
    > the JVM?


    There isn't one, really. I vaguely recall reading a paper by a .NET
    advocate saying that MSIL is superior to Java bytecode because MSIL does
    make that distinction while bytecode doesn't (I don't remember why this was
    supposed to be an advantage.)
    Mike Schilling, May 21, 2006
    #13
  14. Chris Uppal Guest

    wrote:

    > Operations push and pop
    > only from the operand stack, granted, but it seems like the 'dup' and
    > 'swap' commands are entirely sufficient to compile optimized code,
    > given a non-naive compiler


    Remember that the JVM bytecode instructions will be translated into real
    machine operations. If that machine doesn't have an operation stack (or if the
    JITer -- if any -- doesn't use it) then stack twiddling instructions will be
    converted into variable-to-variable, or register-to-register, movements. It
    might be harder or even impossible for the JITer to optimise such code.

    Also, don't forget that the operand stack is cleared whenever an exception is
    thrown.

    OTOH, don't go overboard in avoiding stack twiddling -- after all those
    instructions are there and they are intended to be used. It's probably best to
    take the output of javac as a guide to how much use to make of the stack.

    -- chris
    Chris Uppal, May 21, 2006
    #14
  15. Chris Smith Guest

    Chris Uppal <-THIS.org> wrote:
    > Remember that the JVM bytecode instructions will be translated into real
    > machine operations. If that machine doesn't have an operation stack (or if the
    > JITer -- if any -- doesn't use it) then stack twiddling instructions will be
    > converted into variable-to-variable, or register-to-register, movements. It
    > might be harder or even impossible for the JITer to optimise such code.


    With a few exceptions, though, optimizers don't optimize native machine
    code. They optimize intermediate representations. My guess is that an
    aggressive optimizer does the following:

    1. Break up the method into basic blocks.

    2. Convert all local variables AND the operand stack into SSA form with
    temporary variables.

    3. Optimize and generate code (including register allocation) from the
    SSA representation.

    So I believe that it wouldn't make much difference in code quality
    whether data is stored in local variable slots or on the operand stack,
    because the two will be indistinguishable after the conversion to SSA
    form. Both dividing the code into basic blocks and writing the SSA
    representation is relatively cheap (linear on the method length), so
    this may even be the process for quick optimizations as well.

    Where this does make a difference is in the size of the bytecode, and I
    suspect that was part of the reason for the design choices. When Java
    code was supposed to be transferred via IR beams between set-top cable
    television boxes, code size was important. Arguably (though to a lesser
    extent), it is so again with J2ME.

    > It's probably best to
    > take the output of javac as a guide to how much use to make of the stack.


    To get the best optimization possible, it's probably best to take the
    output of javac as a guide for as much as you possibly can. There has
    undoubtedly been much work done to make the JIT in most major virtual
    machines work as well as possible with common types of code that are
    written by javac. The same optimizations aren't likely to happen with
    someone's one-use code generator. :)

    --
    www.designacourse.com
    The Easiest Way To Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
    Chris Smith, May 21, 2006
    #15
  16. "Chris Smith" <> wrote in message
    news:...
    > To get the best optimization possible, it's probably best to take the
    > output of javac as a guide for as much as you possibly can. There has
    > undoubtedly been much work done to make the JIT in most major virtual
    > machines work as well as possible with common types of code that are
    > written by javac. The same optimizations aren't likely to happen with
    > someone's one-use code generator. :)


    And this becomes automatic if your compiler outputs Java. But I repeat
    myself :)
    Mike Schilling, May 21, 2006
    #16
  17. Roedy Green Guest

    On 20 May 2006 15:59:45 -0700, wrote, quoted
    or indirectly quoted someone who said :

    >Now I'm getting deep. I have a question to anyone who knows: what is
    >the real difference between local variables and the operand stack in
    >the JVM? Both exist only within a method frame. Operations push and pop
    >only from the operand stack, granted, but it seems like the 'dup' and
    >'swap' commands are entirely sufficient to compile optimized code,
    >given a non-naive compiler. After all, if you know you'll need the
    >results of an operation more than once, just 'dup' it the first time.
    >If it's not in the right place, 'swap' it. Right?


    the stack has:
    1. return value where to carry on when the method ends.
    2. local variables.
    3. temporaries needed to evaluate expressions
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, May 21, 2006
    #17
  18. Roedy Green Guest

    On 20 May 2006 15:59:45 -0700, wrote, quoted
    or indirectly quoted someone who said :

    > but it seems like the 'dup' and
    >'swap' commands are entirely sufficient to compile optimized code,
    >given a non-naive compiler. After all, if you know you'll need the
    >results of an operation more than once, just 'dup' it the first time.
    >If it's not in the right place, 'swap' it. Right?


    FORTH is a stack based machine similar to the JVM.

    In FORTH besides SWAP and DUP you have other operators, most notably
    PICK to let you get at any element arbitrarily deep in the stack. The
    JVM does not have nearly as many stack operators as FORTH, but it does
    let you do stack relative addressing which gives you pick. It also
    lets you do frame relative addressing to let you access the locals and
    parms with fixed offsets.

    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, May 21, 2006
    #18
  19. Roedy Green Guest

    On Sun, 21 May 2006 20:32:22 GMT, Roedy Green
    <> wrote, quoted or
    indirectly quoted someone who said :

    >the stack has:
    >1. return value where to carry on when the method ends.
    >2. local variables.
    >3. temporaries needed to evaluate expressions


    and 1.5 parameters passed to this method.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, May 21, 2006
    #19
  20. "Roedy Green" <> wrote in
    message news:p...
    > On Sun, 21 May 2006 20:32:22 GMT, Roedy Green
    > <> wrote, quoted or
    > indirectly quoted someone who said :
    >
    >>the stack has:
    >>1. return value where to carry on when the method ends.
    >>2. local variables.
    >>3. temporaries needed to evaluate expressions

    >
    > and 1.5 parameters passed to this method.


    I've seen methods with one parameter and methods with two parameters, but
    I've never seem a method with 1.5 parameters.
    Mike Schilling, May 22, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Yan
    Replies:
    0
    Views:
    1,128
  2. Jack Wright
    Replies:
    5
    Views:
    618
    Shiv Kumar
    Jan 19, 2004
  3. Ram
    Replies:
    0
    Views:
    2,826
  4. Andrey Batyuck

    Compiler compiler with C++ as output

    Andrey Batyuck, May 11, 2004, in forum: C++
    Replies:
    3
    Views:
    435
    Frederik Hertzum
    May 17, 2004
  5. RickMuller
    Replies:
    4
    Views:
    700
    Alexey Shamrin
    Mar 26, 2005
Loading...

Share This Page