Questions about JVM internals

Discussion in 'Java' started by Scott Balmos, Jul 18, 2005.

  1. Scott Balmos

    Scott Balmos Guest

    Hi all,

    This is more for morbid curiosity, but I'm reading up on the internals
    of the JVM and how the bytecode is interpreted. I plan on writing a JVM
    emulator, more just for academic sake. But I have a few questions.

    In classloading, what is generally meant by defining a class? I
    understand that you transfer the .class file bytestream into a Class
    object. But is that really it? Are Class objects stored on the stack
    (or heap) just like all other objects? And I'm guessing resolution
    during classloading means translating the textual string references in
    the classfile's pools into actual references to Class objects.

    If I were writing a JVM, with no JIT functionality (e.g. nothing in
    java.lang.compiler), I'm assuming that each opcode is "run" at JVM
    runtime independently of other opcodes. Thus, if I were being really
    stupid and writing this academic JVM in PHP or Perl (yeah, I know, real
    stupid), I could have a global array as my opcode stack, and then
    separate individual functions that implement each opcode.

    Are Class and Object implemented "natively", inside the JVM? I'm
    guessing ClassLoader also. I guess a more general question is how much
    of the Java library, or really java.lang, is implemented "natively" by
    the JVM, and not just as normal .class files that are loaded & executed
    as normal code.

    That might do it for now. I'm sure to come up with other questions
    later. I just need reassurance I'm on the right path, after reading the
    JVM spec, jNode source code, and some of the Sun JVM community code
    (that *really* made my head spin).

    Thanks!

    --Scott
     
    Scott Balmos, Jul 18, 2005
    #1
    1. Advertising

  2. Scott Balmos wrote:
    >
    > In classloading, what is generally meant by defining a class? I
    > understand that you transfer the .class file bytestream into a Class
    > object. But is that really it? Are Class objects stored on the stack
    > (or heap) just like all other objects? And I'm guessing resolution
    > during classloading means translating the textual string references in
    > the classfile's pools into actual references to Class objects.


    The Class object itself is just an object like any other. As I
    understand it, the byte and compiled code sits in a special area of the
    heap that doesn't get garbage collected so often. There is no point
    running the GC regularly on long lived objects, because they'll almost
    certainly not have died since the last itme you looked.

    Some lookups are done. For instance the superclass gets loaded too.
    However, method calls are generally looked up lazily. Sun's JRE
    interpreter switches opcodes for direct custom versions on first use
    (details are in an appendix of the first edition JVM spec if you can
    find a copy). Compiled byte code for a method is thrown away if it needs
    to lookup a method reference.

    > If I were writing a JVM, with no JIT functionality (e.g. nothing in
    > java.lang.compiler), I'm assuming that each opcode is "run" at JVM
    > runtime independently of other opcodes. Thus, if I were being really
    > stupid and writing this academic JVM in PHP or Perl (yeah, I know, real
    > stupid), I could have a global array as my opcode stack, and then
    > separate individual functions that implement each opcode.


    You could, but it wouldn't be very fast. You would have to deal with
    shared state and multiple threads.

    Actually java.lang.Compiler does very little.

    > Are Class and Object implemented "natively", inside the JVM? I'm
    > guessing ClassLoader also. I guess a more general question is how much
    > of the Java library, or really java.lang, is implemented "natively" by
    > the JVM, and not just as normal .class files that are loaded & executed
    > as normal code.


    They are normal Java classes. However, many classes have native code.
    Mostly through JNI, although some methods are handled specially for
    performance.

    It's easy to see which method are native. I you run, say,

    javap java.lang.Object

    You'll see that most of its methods are native. Some of the library
    source code is included in src.zip within JDKs. You can download the
    entire source for the JDK is you like. For instance, current mustang
    (Java SE 6.0) builds are downloadable from https://mustang.dev.java.net/.

    Tom Hawtin
    --
    Unemployed English Java programmer
     
    Thomas Hawtin, Jul 18, 2005
    #2
    1. Advertising

  3. Scott Balmos

    Scott Balmos Guest

    On 2005-07-18 15:27:16 -0400, Thomas Hawtin <> said:

    > Scott Balmos wrote:
    >
    > You could, but it wouldn't be very fast. You would have to deal with
    > shared state and multiple threads.


    That's kind of the point. I don't care about performance, and I could
    find some way to emulate threading.

    >> Are Class and Object implemented "natively", inside the JVM? I'm
    >> guessing ClassLoader also. I guess a more general question is how much
    >> of the Java library, or really java.lang, is implemented "natively" by
    >> the JVM, and not just as normal .class files that are loaded & executed
    >> as normal code.

    >
    > They are normal Java classes. However, many classes have native code.
    > Mostly through JNI, although some methods are handled specially for
    > performance.
    >
    > It's easy to see which method are native. I you run, say,
    >
    > javap java.lang.Object
    >
    > You'll see that most of its methods are native. Some of the library
    > source code is included in src.zip within JDKs. You can download the
    > entire source for the JDK is you like. For instance, current mustang
    > (Java SE 6.0) builds are downloadable from
    > https://mustang.dev.java.net/.


    Reading the JDK source makes me go cross-eyed. Insane amounts of code
    going every which way, cross-referencing things, etc.

    The basic idea is I want to write a barebones, no-frills JVM, just to
    see if I can parse the classfiles correctly, do all the necessary
    initialization, and correctly run the opcodes. So, as I understand, the
    necessary JVM would have these parts:

    1) Opcode implementations
    2) Rudimentary GC
    3) Classfile parser / verifier
    4) Bootstrap classloader

    #4 is the one that I have the biggest question mark with. If Class,
    Object, and ClassLoader (especially ClassLoader, since it's abstract)
    are all normal objects, it just seems like the logic is circular. The
    bootstrap classloader would get the bytestream from the classfile
    loader/verifier, and create Class objects, right? But before that, it
    parses Class & Object, which have references to Class arrays, String
    arrays, and etc, so you can see where my head goes in circles.

    I guess I need more info on what a rudimentary bootstrap classloader
    looks like, other than the JVM Spec that says it's implementation
    dependent and implements all the initialization, verification, etc.

    >
    > Tom Hawtin


    --Scott
     
    Scott Balmos, Jul 18, 2005
    #3
  4. Scott Balmos wrote:

    >
    > The basic idea is I want to write a barebones, no-frills JVM, just to
    > see if I can parse the classfiles correctly, do all the necessary
    > initialization, and correctly run the opcodes. So, as I understand, the
    > necessary JVM would have these parts:
    >
    > 1) Opcode implementations
    > 2) Rudimentary GC


    Not strictly necessary. You can jsut allocate, there is no requirement
    for deallocation. Look in the JLS for "garbage collector".

    > 3) Classfile parser / verifier
    > 4) Bootstrap classloader


    ZIP file reader (hopefully without as many security bugs as zlib), lots
    of libraries, signal handling, thread handling, etc.

    > #4 is the one that I have the biggest question mark with. If Class,
    > Object, and ClassLoader (especially ClassLoader, since it's abstract)
    > are all normal objects, it just seems like the logic is circular. The
    > bootstrap classloader would get the bytestream from the classfile
    > loader/verifier, and create Class objects, right? But before that, it
    > parses Class & Object, which have references to Class arrays, String
    > arrays, and etc, so you can see where my head goes in circles.
    >
    > I guess I need more info on what a rudimentary bootstrap classloader
    > looks like, other than the JVM Spec that says it's implementation
    > dependent and implements all the initialization, verification, etc.


    Well yes, the bootstrap class loader needs some magic about it.
    Obviously you need machine code to run to start with. Sun's
    implementation uses C(++). I believe one of the IBM JVMs has a Java
    implementation, but an image of the compiled code can be dumped to disk
    for the next time the JVM starts.

    Class.getClassLoader may return null to indicate the bootstrap class
    loader. There are many places within the bootstrap classes that depend
    upon this.

    Which class is loaded first? You can set -Xbootclasspath: and see. 1.3
    to 1.6 report:

    java/lang/NoClassDefFoundError: java/lang/Object

    So how did it find that error class? IIRC, older version reported not
    finding Thread.

    Tom Hawtin
    --
    Unemployed English Java programmer
     
    Thomas Hawtin, Jul 19, 2005
    #4
  5. Scott Balmos

    Roedy Green Guest

    On Mon, 18 Jul 2005 18:18:35 -0400, Scott Balmos
    <> wrote or quoted :

    >The basic idea is I want to write a barebones, no-frills JVM, just to
    >see if I can parse the classfiles correctly, do all the necessary
    >initialization, and correctly run the opcodes. So, as I understand, the
    >necessary JVM would have these parts:


    you might start with Kaffe, a third party implementation.
    or at least study it to see how such beasts work.

    --
    Bush crime family lost/embezzled $3 trillion from Pentagon.
    Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
    http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

    Canadian Mind Products, Roedy Green.
    See http://mindprod.com/iraq.html photos of Bush's war crimes
     
    Roedy Green, Jul 19, 2005
    #5
  6. Scott Balmos

    Scott Balmos Guest

    Thanks Roedy. Was reading most of the stuff last night. With deference
    to Thomas, I already know the JVM has to be written in a host language.
    Reading the Mustang, or even IBM's Jikes RVM, source code was painful
    because of all the abstraction, multi-platform, and bytecode-compiling
    code they have.

    I read some of Kaffe, and started getting a better idea of things. Once
    you start seeing the same code over and over in different JVM
    implementations, it gets hammered in. :D <sco> I'm a little suspect
    that Kaffe says they're completely clean-room and never seen any of the
    Sun JVM code, when a lot of their startup and initial environment code
    looks downright *exact*. </sco>

    Anyway, like I thought, creating a Class object from a classfile really
    is just a matter of copying around bits from the file. Nothing really
    magical. And the bootstrap classloader is internal to the JVM.

    So lemme make sure about this again...

    1. Startup of JVM with Foo.class as the main class
    2. JVM attempts to load Foo.class, creating a Class object for Foo
    3. Foo, during linking, loads String, loads Object, creating Class
    objects for those also
    4. Build args, run Foo::main

    Doesn't Class have to be built before you can build Foo in step 2? I
    guess that's the main question remaining - how does a Class object for
    Class get built (much less the fact that Class is a subclass of
    Object)? That's where my circular logic comes in.

    --S
     
    Scott Balmos, Jul 19, 2005
    #6
  7. Scott Balmos wrote:
    ...
    > Doesn't Class have to be built before you can build Foo in step 2? I
    > guess that's the main question remaining - how does a Class object for
    > Class get built (much less the fact that Class is a subclass of
    > Object)? That's where my circular logic comes in.
    >
    > --S
    >


    Warning - I've never looked specifically at Java start-up, so I may have
    some things completely wrong for that case, but I do have a lot of
    experience with machine boot sequences and program loading, and I
    believe you are looking for the idea that makes those things work.

    During this sort of process one has environment A, with one set of
    capabilities. For booting, environment A is the raw machine plus some
    ROM or PROM code. For your task, environment A will be a program in a
    host language such as C.

    You need environment B, the environment you want. For booting, it might
    be the base modules of a loaded operating system with enough capability
    to load additional modules and start normal operation. For your task, it
    is whatever it takes for your JVM to do normal class loading.

    The objective is to initialize enough of B, using A, for B to be able to
    finish the job off by itself. For example, you know you need a
    ClassLoader to load more classes, so you also need Class, Object, etc.
    List those things. Call them B's "essential infrastructure". You can't
    run your B, and use it to load more stuff, until you have those things
    built, so A is going to have to build them.

    For a JVM, I suspect the essential infrastructure will include the
    structures the JVM uses to find already loaded classes (essentially, a
    system ClassLoader) and a sufficient subset of the API classes for
    normal ClassLoader operation.

    To get from A to B, you need a weird sort of double vision. Using A, you
    build bit patterns in memory, or C structs, or whatever, but made up to
    LOOK as though B had just finished loading its own essential
    infrastructure. Of course, B can't really have done that, because B
    can't load anything until that essential infrastructure is there, but
    you can work out what B's data structures would look like if it had just
    happened.

    You then turn around and hand control over the subset of B that you just
    finished loading. B's only way of knowing what has happened is to look
    at its data structures, and they all look as though its essential
    infrastructure is loaded.

    Now that bit string in memory is not merely a bit string you wrote in
    memory using the host language, it is your JVM's representation of the
    Class object for Class. It is already properly linked to the Class
    object for its immediate superclass, Object, which from the JVM's point
    of view has also just magically appeared in memory.

    I think the thing you are missing is the double vision that lets you see
    the same data structures as bit patterns you create using the host
    language, outside the JVM's normal operation, and as the JVM's essential
    infrastructure, including key classes. Once you get that double vision,
    the circularity goes away because the host language doesn't need any JVM
    infrastructure to create bit patterns that represent JVM infrastructure.

    Of course, your own reading of JVM implementations will be a much better
    guide than my guesses as to how exactly to do this for Java. Once you
    get the double vision, I don't think you will have any trouble applying
    your reading.

    Patricia
     
    Patricia Shanahan, Jul 19, 2005
    #7
  8. Scott Balmos

    Scott Balmos Guest

    ah k... Not that this is specific to Patricia's response, but I'm
    pretty sure I've gotten it figured out now. Spent most of the afternoon
    wading through Kaffe's source, and I've gotten it pretty much figured
    out (save for four lines of code where it looks like they're writing
    circular method dispatch table references).

    My fatal mistake was assuming that java.lang.Object & Class were the
    canonical representation of the "JVM internal class object" referenced
    in the JVM spec. In fact, it is not. There are internal structures for
    Class & Object, which include all the gory stuff like the constant
    pool, method dispatch tables, superclass, classloader, etc etc etc.
    java.lang.Class & Object just happen to be be core classes which
    represent most of that *to the Java world*. It's almost-completely
    separate from (and unrelated to?) the JVM representation of a class /
    object.

    Further, java.lang.Object is parsed first, just like any other normal
    class file, since it's the superclass. Contrary to what Patricia was
    suggesting, there is no internal representation of Object. Everything's
    read in like a normal class file. Certain class files, like Object,
    Class, System, String, etc are read in during JVM startup, before it
    gets to reading the user's main class, as a matter of establishing an
    expected system environment.

    Like I said, my fatal mistake was not realizing that the JVM class &
    object data structures (structs in Kaffe, written in C) really had
    almost nothing to do with classes Class & Object in Java.

    If anyone else wants to chime in and correct any of my thought process
    above, feel free. But the mental lightbulbs are glowing brightly now, I
    think.
     
    Scott Balmos, Jul 19, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stuart
    Replies:
    0
    Views:
    438
    Stuart
    May 11, 2005
  2. Jobs Gooogle
    Replies:
    2
    Views:
    473
    Patricia Shanahan
    May 11, 2007
  3. Jobs Gooogle
    Replies:
    1
    Views:
    317
    Victor Bazarov
    May 10, 2007
  4. Jobs Gooogle

    .Net VC++ Java C++ Windows Internals Unix Internals

    Jobs Gooogle, May 10, 2007, in forum: C Programming
    Replies:
    0
    Views:
    354
    Jobs Gooogle
    May 10, 2007
  5. Jobs Gooogle
    Replies:
    0
    Views:
    123
    Jobs Gooogle
    May 10, 2007
Loading...

Share This Page