Java VM Address Space

R

Roedy Green

Since the codes for VM are like ordinary machine codes, do they lay in the
same Address Space for all loaded classes?
Is that address space a chunk of segmented memory with size of 4Gb? Or that
code lies in address space of Virtual Machine in form of instantinated
objects of VM itself?

The virtual machine is quite unlike an Intel machine. Addressing
within a method is relative to start of the method. Addressing within
an object is my field name. Addressing on the stack is by stack slot
relative to the current invocation, not letting you peek at the
caller's stack. References are black boxes designed to fetch an
object. You can't do arithmetic on them.

There is great flexibility in how the JVM actually works. It can even
use 64 bit references if it wants.
 
A

a

Hello Everybody.
I heard that Security is not covered completely in Java resources available
from Sun Microsystems'.
I'm trying to understand then, several things related to internals of Java
VM.
Since the codes for VM are like ordinary machine codes, do they lay in the
same Address Space for all loaded classes?
Is that address space a chunk of segmented memory with size of 4Gb? Or that
code lies in address space of Virtual Machine in form of instantinated
objects of VM itself?
Actually how many kinds of bindings(linking) exist?
Late binding, compile-time binding, and dynamic are they all presented in
Java?

Jack.
 
R

Roedy Green

Finally the
class is a not compiled code at all.
The class seems to be, and it is, the same source file, but without human
readable crap.

It is not Pentium native machine code. It is machine code for the Java
Virtual machine. It is the native machine code for PicoJava chips. See
http://mindprod.com/jgloss/picojava.html

It is not source with the comments removed, even though much of it is
intelligible to hex viewer. See http://mindprod.com/jasm.html

Try using Javah -c to disassemble some class files. You will see the
byte code resembles FORTH, a postfix, stack-based language.

see http://mindprod.com/jgloss/disassembler.html
 
R

Roedy Green

While looking into hex view of class file i found the limitations of 16bit
(pools entries) and 32bit of length

There are a number of limits in the class file format which are not in
the running JVM itself. For example inside the JVM you can have 64
bit references. The class file is the same. The size of the compiled
native code is not limited to 64K for a method. Inside the JVM, you
can have all the strings that will fit in the address space.
 
C

Chris Smith

Jack said:
Finally the class is a not compiled code at all.

By everyone else's definition of the word "compile" Java bytecode is
definitely compiled. Perhaps you meant to say that it is not native, in
which case you'd be right. A second stage of the compiler (the JIT)
generally runs as the program is running to translate that bytecode into
machine code for the processor on which the code will run.
The class seems to be, and it is, the same source file, but without human
readable crap.

The class file contains a set of VM-level methods and fields and
constants, which mostly resemble the set written in the Java language.
However, the *contents* of the methods (the bytecode) bears only some
resemblances to Java. It does not radically depart from the memory
model, but it throws out decisions about choice of control flow
constructs, block scope, order of operation rules, variable names, etc.
JavaVM under Windows obviously uses the address space of itself and keeps
classes in form of instantinated objects (which are not related directly to
Java objects).

I'm not at all sure what you mean by "instantiated objects", which
you've said a couple of times. I get the feeling from context that you
don't mean the same thing as most other people would when they say the
same thing.

All major Java virtual machines for conventional CPUs with MMUs reside
in a single address space, and load all of their code into that address
space. I am not aware of any JVM that makes use of shared memory or
other IPC between separate processes (except insofar as pre-NPTL
versions of Linux tend to treat threads as multiple processes that share
everything). There will exist in that address space one or more copies
of the bytecode: one as the bytecode itself, and a natively compiled
(JITed) versions; and potentially other JITed versions if the JIT
compiler determines that it's worth recompiling for special cases to
improve efficiency.
All three kinds of binding are used in Java technology.

I don't know what distinction you're making between the three kinds of
binding. In fact, you seem to have made a habit of assuming that
everyone shares your understanding of terminology. In reality,
terminology differs from context to context; and textbook authors,
technical presenters, etc. often make it up on the fly to express the
distinctions that happen to be useful for them, right now.

This might help. There are the following different bytecodes used in
the JVM to invoke methods:

invokevirtual
invokestatic
invokespecial
invokeinterface

(and the special cases: invokevirtual_quick, invokenonvirtual_quick,
invokesuper_quick, invokestatic_quick, invokeinterface_quick,
invokevirtualobject_quick, invokevirtual_quick_w)

Ignoring the special cases, the four invoke opcodes could be said to
correspond loosely to kinds of method binding, but I don't think that
they correspond directly to the terms you've used. The most dynamic is
invokeinterface. Two of the four -- virtual and interface -- are
polymorphic, whereas the other two -- special and static -- can be
completely resolved earlier in the class loading and JIT process. The
only difference between special and static is whether to pass an
implicit this pointer, which is irrelevant to a discussion of method
binding.

Note that the term "compile-time" is problematic when discussing the
JVM, since there are two compilers involved -- the source compiler, and
the JIT compiler. No methods are ever completely bound by the source
compiler; it always embed the names of methods into the class file and
leaves the runtime class loader to resolve them. The JIT compiler will
typically completely resolve an invokespecial or invokestatic all the
way to a direct memory address for a jump or jsr instruction. It will
resolve invokevirtual to a simple indirect jump (as in, jump to
[obj_base + offset]). Unless it has some specialized global knowledge
of the application, the JIT will generate table lookup code for an
invokeinterface instruction.

All of this, of course, is typical implementation. The JVM is free to
choose any alternate implementation, so long as the observed behavior is
the same.

Now, I'm confused as to why this would have any impact on security...
but there it is, anyway.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
J

Jack

The virtual machine is quite unlike an Intel machine. Addressing
within a method is relative to start of the method. Addressing within
an object is my field name. Addressing on the stack is by stack slot
relative to the current invocation, not letting you peek at the
caller's stack. References are black boxes designed to fetch an
object. You can't do arithmetic on them.

There is great flexibility in how the JVM actually works. It can even
use 64 bit references if it wants.

While looking into hex view of class file i found the limitations of 16bit
(pools entries) and 32bit of length.

Therefore classes should be distinguished by the version number. Finally the
class is a not compiled code at all.
The class seems to be, and it is, the same source file, but without human
readable crap.

From your answer a can form the answers for my questions. Please correct me
if im wrong.

JavaVM under Windows obviously uses the address space of itself and keeps
classes in form of instantinated objects (which are not related directly to
Java objects).
All three kinds of binding are used in Java technology.
 
R

Roedy Green

Now i have to learn what is FORTH and Postfix.

In forth every word is a verb. You evaluate strictly left to right, in
a stack based machine.

So for example

2 3 + .

is a forth program that prints 5.

It works like this

2 is a verb that pushes 2 to the stack

3 is a verb that pushes 3 to the stack Your stack now looks like :
2 3 ( with 2 deeper in the stack )

+ is a verb that adds the top two stack elements, discards them and
pushes the sum to the stack. your stack now looks like:
5

.. is a verb that displays the top of stack as an int and discard it.
Your stack in now empty.

Postfix is the natural order of calculation. You have to calculate
operands before you can do an operation on them. It is used in
PostScript, HP calculators, and the Java JVM.

Another example

a = b + ( c - d ) / e

becomes in postfix

b c d - e / + a !

where ! is the store operator

if ( a < b )
{
c = d;
f = g;
}
else
{
c = e;
}

becomes in FORTH

a b < IF d g f ! ElSE e THEN c !

It looks very strange at first, but if you work with it for a while it
become easier than Java notation because there are no precedence
rules. Everything proceeds strictly left to right.
 
J

Jack

So Thank You for help, now it clarifies many things for me.

Now i have to learn what is FORTH and Postfix.

Of course i understand that Java bytecode are not the physical system's
processor's "byte codes".
So if language constructions are translated to Java ByteCodes at compile
time, the addresses (of fields and methods) can not be determined until
class is loaded into VM.
Thus they have to be found by either string names or indexes.
Neither addresses can be found after class loading.
The VM will have to look for real address of field of method (even it is not
executed by real CPU).
This i suppose to name the call-time binding.
Such thing can be done only by database of addresses that consists of names
and addresses. The database should be indexed for faster search. Thus the
bigger the program the slower it will run. But it is not allways true.
But after checking the access rights, the substitution for real addresses
should be totally safe.
But there is still no freedom of arbitrary numbers which potentially leads
to securyti risks.

Thank You a lot. I'll go learn bit more.
 
R

Roedy Green

So what protection is used to prevent the stack overflow?

seems to be like: a b < IF d c ! g f ! ElSE e THEN c !

FORTH protection comes from writing very small routines and debugging
each one exhaustively before you move on to the next. This technique
is much more powerful than you would imagine. In Forth there are no
safety nets. You can do an explicit stack check with ?STACK. I
designed BBL with a bit of slop in the stacks so small amounts of
overflow/underflow would do no damage during debugging.

The most common bug in Forth is to leave something on the stack you
did not intend or consume something you did not intend. When you get
your stack balanced to the method spec, nearly always your code is
correct.

Even though FORTH and Java are very similar underneath the hood, FORTH
has a quite different philosophy. You are permitted to tinker with
the inner workings of everything. They are so simple, you can
understand every last instruction in the entire system even more so if
you write your own FORTH engine from scratch. You could do a simple
one in about a month. My 32-bit one was more complex. I had to
simulate a 32-bit virtual machine on a 16-bit 8086.
 
J

Jack

Wow! Cool!
So what protection is used to prevent the stack overflow?

seems to be like: a b < IF d c ! g f ! ElSE e THEN c !
 
J

Jack

FORTH protection comes from writing very small routines and debugging
each one exhaustively before you move on to the next. This technique
is much more powerful than you would imagine. In Forth there are no
safety nets. You can do an explicit stack check with ?STACK. I
designed BBL with a bit of slop in the stacks so small amounts of
overflow/underflow would do no damage during debugging.

The most common bug in Forth is to leave something on the stack you
did not intend or consume something you did not intend. When you get
your stack balanced to the method spec, nearly always your code is
correct.

Even though FORTH and Java are very similar underneath the hood, FORTH
has a quite different philosophy. You are permitted to tinker with
the inner workings of everything. They are so simple, you can
understand every last instruction in the entire system even more so if
you write your own FORTH engine from scratch. You could do a simple
one in about a month. My 32-bit one was more complex. I had to
simulate a 32-bit virtual machine on a 16-bit 8086.

I Know how to resolve the problem of stack over/underflow FOREVER.

FOREVER.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top