Questions about JVM internals

S

Scott Balmos

Hi all,

This is more for morbid curiosity, but I'm reading up on the internals
of the JVM and how the bytecode is interpreted. I plan on writing a JVM
emulator, more just for academic sake. But I have a few questions.

In classloading, what is generally meant by defining a class? I
understand that you transfer the .class file bytestream into a Class
object. But is that really it? Are Class objects stored on the stack
(or heap) just like all other objects? And I'm guessing resolution
during classloading means translating the textual string references in
the classfile's pools into actual references to Class objects.

If I were writing a JVM, with no JIT functionality (e.g. nothing in
java.lang.compiler), I'm assuming that each opcode is "run" at JVM
runtime independently of other opcodes. Thus, if I were being really
stupid and writing this academic JVM in PHP or Perl (yeah, I know, real
stupid), I could have a global array as my opcode stack, and then
separate individual functions that implement each opcode.

Are Class and Object implemented "natively", inside the JVM? I'm
guessing ClassLoader also. I guess a more general question is how much
of the Java library, or really java.lang, is implemented "natively" by
the JVM, and not just as normal .class files that are loaded & executed
as normal code.

That might do it for now. I'm sure to come up with other questions
later. I just need reassurance I'm on the right path, after reading the
JVM spec, jNode source code, and some of the Sun JVM community code
(that *really* made my head spin).

Thanks!

--Scott
 
T

Thomas Hawtin

Scott said:
In classloading, what is generally meant by defining a class? I
understand that you transfer the .class file bytestream into a Class
object. But is that really it? Are Class objects stored on the stack
(or heap) just like all other objects? And I'm guessing resolution
during classloading means translating the textual string references in
the classfile's pools into actual references to Class objects.

The Class object itself is just an object like any other. As I
understand it, the byte and compiled code sits in a special area of the
heap that doesn't get garbage collected so often. There is no point
running the GC regularly on long lived objects, because they'll almost
certainly not have died since the last itme you looked.

Some lookups are done. For instance the superclass gets loaded too.
However, method calls are generally looked up lazily. Sun's JRE
interpreter switches opcodes for direct custom versions on first use
(details are in an appendix of the first edition JVM spec if you can
find a copy). Compiled byte code for a method is thrown away if it needs
to lookup a method reference.
If I were writing a JVM, with no JIT functionality (e.g. nothing in
java.lang.compiler), I'm assuming that each opcode is "run" at JVM
runtime independently of other opcodes. Thus, if I were being really
stupid and writing this academic JVM in PHP or Perl (yeah, I know, real
stupid), I could have a global array as my opcode stack, and then
separate individual functions that implement each opcode.

You could, but it wouldn't be very fast. You would have to deal with
shared state and multiple threads.

Actually java.lang.Compiler does very little.
Are Class and Object implemented "natively", inside the JVM? I'm
guessing ClassLoader also. I guess a more general question is how much
of the Java library, or really java.lang, is implemented "natively" by
the JVM, and not just as normal .class files that are loaded & executed
as normal code.

They are normal Java classes. However, many classes have native code.
Mostly through JNI, although some methods are handled specially for
performance.

It's easy to see which method are native. I you run, say,

javap java.lang.Object

You'll see that most of its methods are native. Some of the library
source code is included in src.zip within JDKs. You can download the
entire source for the JDK is you like. For instance, current mustang
(Java SE 6.0) builds are downloadable from https://mustang.dev.java.net/.

Tom Hawtin
 
S

Scott Balmos

Scott Balmos wrote:

You could, but it wouldn't be very fast. You would have to deal with
shared state and multiple threads.

That's kind of the point. I don't care about performance, and I could
find some way to emulate threading.
They are normal Java classes. However, many classes have native code.
Mostly through JNI, although some methods are handled specially for
performance.

It's easy to see which method are native. I you run, say,

javap java.lang.Object

You'll see that most of its methods are native. Some of the library
source code is included in src.zip within JDKs. You can download the
entire source for the JDK is you like. For instance, current mustang
(Java SE 6.0) builds are downloadable from
https://mustang.dev.java.net/.

Reading the JDK source makes me go cross-eyed. Insane amounts of code
going every which way, cross-referencing things, etc.

The basic idea is I want to write a barebones, no-frills JVM, just to
see if I can parse the classfiles correctly, do all the necessary
initialization, and correctly run the opcodes. So, as I understand, the
necessary JVM would have these parts:

1) Opcode implementations
2) Rudimentary GC
3) Classfile parser / verifier
4) Bootstrap classloader

#4 is the one that I have the biggest question mark with. If Class,
Object, and ClassLoader (especially ClassLoader, since it's abstract)
are all normal objects, it just seems like the logic is circular. The
bootstrap classloader would get the bytestream from the classfile
loader/verifier, and create Class objects, right? But before that, it
parses Class & Object, which have references to Class arrays, String
arrays, and etc, so you can see where my head goes in circles.

I guess I need more info on what a rudimentary bootstrap classloader
looks like, other than the JVM Spec that says it's implementation
dependent and implements all the initialization, verification, etc.
Tom Hawtin

--Scott
 
T

Thomas Hawtin

Scott said:
The basic idea is I want to write a barebones, no-frills JVM, just to
see if I can parse the classfiles correctly, do all the necessary
initialization, and correctly run the opcodes. So, as I understand, the
necessary JVM would have these parts:

1) Opcode implementations
2) Rudimentary GC

Not strictly necessary. You can jsut allocate, there is no requirement
for deallocation. Look in the JLS for "garbage collector".
3) Classfile parser / verifier
4) Bootstrap classloader

ZIP file reader (hopefully without as many security bugs as zlib), lots
of libraries, signal handling, thread handling, etc.
#4 is the one that I have the biggest question mark with. If Class,
Object, and ClassLoader (especially ClassLoader, since it's abstract)
are all normal objects, it just seems like the logic is circular. The
bootstrap classloader would get the bytestream from the classfile
loader/verifier, and create Class objects, right? But before that, it
parses Class & Object, which have references to Class arrays, String
arrays, and etc, so you can see where my head goes in circles.

I guess I need more info on what a rudimentary bootstrap classloader
looks like, other than the JVM Spec that says it's implementation
dependent and implements all the initialization, verification, etc.

Well yes, the bootstrap class loader needs some magic about it.
Obviously you need machine code to run to start with. Sun's
implementation uses C(++). I believe one of the IBM JVMs has a Java
implementation, but an image of the compiled code can be dumped to disk
for the next time the JVM starts.

Class.getClassLoader may return null to indicate the bootstrap class
loader. There are many places within the bootstrap classes that depend
upon this.

Which class is loaded first? You can set -Xbootclasspath: and see. 1.3
to 1.6 report:

java/lang/NoClassDefFoundError: java/lang/Object

So how did it find that error class? IIRC, older version reported not
finding Thread.

Tom Hawtin
 
R

Roedy Green

The basic idea is I want to write a barebones, no-frills JVM, just to
see if I can parse the classfiles correctly, do all the necessary
initialization, and correctly run the opcodes. So, as I understand, the
necessary JVM would have these parts:

you might start with Kaffe, a third party implementation.
or at least study it to see how such beasts work.

--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 
S

Scott Balmos

Thanks Roedy. Was reading most of the stuff last night. With deference
to Thomas, I already know the JVM has to be written in a host language.
Reading the Mustang, or even IBM's Jikes RVM, source code was painful
because of all the abstraction, multi-platform, and bytecode-compiling
code they have.

I read some of Kaffe, and started getting a better idea of things. Once
you start seeing the same code over and over in different JVM
implementations, it gets hammered in. :D <sco> I'm a little suspect
that Kaffe says they're completely clean-room and never seen any of the
Sun JVM code, when a lot of their startup and initial environment code
looks downright *exact*. </sco>

Anyway, like I thought, creating a Class object from a classfile really
is just a matter of copying around bits from the file. Nothing really
magical. And the bootstrap classloader is internal to the JVM.

So lemme make sure about this again...

1. Startup of JVM with Foo.class as the main class
2. JVM attempts to load Foo.class, creating a Class object for Foo
3. Foo, during linking, loads String, loads Object, creating Class
objects for those also
4. Build args, run Foo::main

Doesn't Class have to be built before you can build Foo in step 2? I
guess that's the main question remaining - how does a Class object for
Class get built (much less the fact that Class is a subclass of
Object)? That's where my circular logic comes in.

--S
 
P

Patricia Shanahan

Scott Balmos wrote:
...
Doesn't Class have to be built before you can build Foo in step 2? I
guess that's the main question remaining - how does a Class object for
Class get built (much less the fact that Class is a subclass of
Object)? That's where my circular logic comes in.

--S

Warning - I've never looked specifically at Java start-up, so I may have
some things completely wrong for that case, but I do have a lot of
experience with machine boot sequences and program loading, and I
believe you are looking for the idea that makes those things work.

During this sort of process one has environment A, with one set of
capabilities. For booting, environment A is the raw machine plus some
ROM or PROM code. For your task, environment A will be a program in a
host language such as C.

You need environment B, the environment you want. For booting, it might
be the base modules of a loaded operating system with enough capability
to load additional modules and start normal operation. For your task, it
is whatever it takes for your JVM to do normal class loading.

The objective is to initialize enough of B, using A, for B to be able to
finish the job off by itself. For example, you know you need a
ClassLoader to load more classes, so you also need Class, Object, etc.
List those things. Call them B's "essential infrastructure". You can't
run your B, and use it to load more stuff, until you have those things
built, so A is going to have to build them.

For a JVM, I suspect the essential infrastructure will include the
structures the JVM uses to find already loaded classes (essentially, a
system ClassLoader) and a sufficient subset of the API classes for
normal ClassLoader operation.

To get from A to B, you need a weird sort of double vision. Using A, you
build bit patterns in memory, or C structs, or whatever, but made up to
LOOK as though B had just finished loading its own essential
infrastructure. Of course, B can't really have done that, because B
can't load anything until that essential infrastructure is there, but
you can work out what B's data structures would look like if it had just
happened.

You then turn around and hand control over the subset of B that you just
finished loading. B's only way of knowing what has happened is to look
at its data structures, and they all look as though its essential
infrastructure is loaded.

Now that bit string in memory is not merely a bit string you wrote in
memory using the host language, it is your JVM's representation of the
Class object for Class. It is already properly linked to the Class
object for its immediate superclass, Object, which from the JVM's point
of view has also just magically appeared in memory.

I think the thing you are missing is the double vision that lets you see
the same data structures as bit patterns you create using the host
language, outside the JVM's normal operation, and as the JVM's essential
infrastructure, including key classes. Once you get that double vision,
the circularity goes away because the host language doesn't need any JVM
infrastructure to create bit patterns that represent JVM infrastructure.

Of course, your own reading of JVM implementations will be a much better
guide than my guesses as to how exactly to do this for Java. Once you
get the double vision, I don't think you will have any trouble applying
your reading.

Patricia
 
S

Scott Balmos

ah k... Not that this is specific to Patricia's response, but I'm
pretty sure I've gotten it figured out now. Spent most of the afternoon
wading through Kaffe's source, and I've gotten it pretty much figured
out (save for four lines of code where it looks like they're writing
circular method dispatch table references).

My fatal mistake was assuming that java.lang.Object & Class were the
canonical representation of the "JVM internal class object" referenced
in the JVM spec. In fact, it is not. There are internal structures for
Class & Object, which include all the gory stuff like the constant
pool, method dispatch tables, superclass, classloader, etc etc etc.
java.lang.Class & Object just happen to be be core classes which
represent most of that *to the Java world*. It's almost-completely
separate from (and unrelated to?) the JVM representation of a class /
object.

Further, java.lang.Object is parsed first, just like any other normal
class file, since it's the superclass. Contrary to what Patricia was
suggesting, there is no internal representation of Object. Everything's
read in like a normal class file. Certain class files, like Object,
Class, System, String, etc are read in during JVM startup, before it
gets to reading the user's main class, as a matter of establishing an
expected system environment.

Like I said, my fatal mistake was not realizing that the JVM class &
object data structures (structs in Kaffe, written in C) really had
almost nothing to do with classes Class & Object in Java.

If anyone else wants to chime in and correct any of my thought process
above, feel free. But the mental lightbulbs are glowing brightly now, I
think.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top