How to create virtual machine?

B

Borneq

I am planning to create virtual machine. How is better, use stack like
Java or .Net vm or static single assignment form as LLVM has ?
How garbage collectors gets roots from local variables? In article
http://www.artima.com/insidejvm/ed2/gcP.html is described only garbage
collection while the root nodes are already found.
First I must stop all but GC threads. Next I should see through all
stack frames and see local variables - how do this step?
 
G

Gene

  I am planning to create virtual machine. How is better, use stack like
Java or .Net vm or static single assignment form as LLVM has ?
How garbage collectors gets roots from local variables? In articlehttp://www.artima.com/insidejvm/ed2/gcP.htmlis described only garbage
collection while the root nodes are already found.
First I must stop all but GC threads. Next I should see through all
stack frames and see local variables - how do this step?

I assume you are implementing a VM in C because you are posting here.
The VM stacks and registers must be stored _only_ in C data structures
that the GC can inspect for roots. If the GC is running in a thread
different from the program, appropriate mutexes are needed to make
pointer copies look atomic to the GC. C code that "GC safe" often
looks wierd and verbose and is tricky to get right. A favorite
example is the rather innocent-looking

*context.register = allocate(&context, somebytes);

where context is being exposed to the GC as a source of roots
and .register (obviously) holds a pointer.

Since the C compiler is free to emit code that computes the LHS L-val
first (and many do), and the allocation may cause the pointer target
to be relocated by the GC (therefore .register holds a new value
afterward), this is a hard-to-find bug. You must instead say

context.tmp = allocate(somebytes);
*context.register = context.tmp;
 
G

Gene

I am planning to create virtual machine. How is better, use stack like
Java or .Net vm or static single assignment form as LLVM has ?
How garbage collectors gets roots from local variables? In articlehttp://www.artima.com/insidejvm/ed2/gcP.htmlis described only garbage
collection while the root nodes are already found.
First I must stop all but GC threads. Next I should see through all
stack frames and see local variables - how do this step?

I assume you are implementing a VM in C because you are posting
here.

You are confused about LLVM. SSL is an intermediate form
representation for compilers, not generally for interpreters like
VMs. VMs normally use stacks wherever possible because allocation/
eallocation is extremely fast. But they're LIFO, which doesn't work
for everything. For this there is the heap.

The functional language community has looked at various schemes to
eliminate the stack completely in favor of keeping all storage on a
GC'ed heap. They have never been seriously competitive with stack.
Even though the allocation/deallocation instruction count can be made
comparable for both approaches, the heap approach doesn't play well
with memory hierarchies.

Re GC root-finding, the answer is pretty obvious: VM stacks and
registers must be stored only in C data structures that the GC can
inspect for roots. If the GC is running in a thread different from
the program, appropriate mutexes are needed to make pointer copies
look atomic to the GC. C code that's "GC safe" often looks wierd and
verbose and is tricky to get right. A favorite example is the rather
innocent-looking

*context.register = allocate(&context, 42);
/* allocate 42 bytes to the location pointed to by the register */

where context is being exposed to the GC as a source of roots
and .register (obviously) holds a pointer.

Since the C compiler is free to emit code that computes the LHS L-val
first (and many do), and the allocation (by a copying collector) may
cause the pointer target to be relocated by the GC
(therefore .register holds a new value afterward), this is a hard-to-
find bug. You must instead say

context.tmp = allocate(&context, 42);
*context.register = context.tmp;

Many people have tried "conservative" GC that inspects arbitrary C
stack contents for possible roots (see for example the Boehm
collector, which is about as well as this can be done). You can't do
this portably. Here, "portable" extends even to different command
line options for the same compiler on the same machine. It's a risky
approach.

The lccwin compiler supports GC as an extension, avoiding the above
risk because the compiler-writer is in the loop.
 
K

Keith Thompson

Borneq said:
I am planning to create virtual machine. How is better, use stack like
Java or .Net vm or static single assignment form as LLVM has ?
How garbage collectors gets roots from local variables? In article
http://www.artima.com/insidejvm/ed2/gcP.html is described only garbage
collection while the root nodes are already found.
First I must stop all but GC threads. Next I should see through all
stack frames and see local variables - how do this step?

I see nothing about C in your question. You'll probably get better
answers in, say, comp.programming. Feel free to come back here if
you have any C-specific questions.
 
K

Kenny McCormack

I see nothing about C in your question. You'll probably get better
answers in, say, comp.programming. Feel free to come back here if
you have any C-specific questions.

But only if.

Otherwise, Gatekeeper Kiki will have you forcibly ejected. He does have
that power, you know...

--
"The anti-regulation business ethos is based on the charmingly naive notion
that people will not do unspeakable things for money." - Dana Carpender

Quoted by Paul Ciszek (pciszek at panix dot com). But what I want to know
is why is this diet/low-carb food author doing making pithy political/economic
statements?

Nevertheless, the above quote is dead-on, because, the thing is - business
in one breath tells us they don't need to be regulated (which is to say:
that they can morally self-regulate), then in the next breath tells us that
corporations are amoral entities which have no obligations to anyone except
their officers and shareholders, then in the next breath they tell us they
don't need to be regulated (that they can morally self-regulate) ...
 
B

BGB / cr88192

Gene said:
I assume you are implementing a VM in C because you are posting
here.

You are confused about LLVM. SSL is an intermediate form
representation for compilers, not generally for interpreters like
VMs. VMs normally use stacks wherever possible because allocation/
eallocation is extremely fast. But they're LIFO, which doesn't work
for everything. For this there is the heap.

actually, a VM can also be a compiler, just typically compilation is done
just before running the code (as often called, JIT).

now, the tradeoffs are, as I see it:
for interpreters, stack machines are easier to implement and work with,
whereas register machines tend to be somewhat more complicated.

however, stack machines are also a problem when it comes to flexibility,
such as the freedom to reorganize code or adapt ordering to be better suited
to a particular target architecture (more important for compilers), which
gives an advantage to more flexible forms (such as SSA, or the use of ASTs).

The functional language community has looked at various schemes to
eliminate the stack completely in favor of keeping all storage on a
GC'ed heap. They have never been seriously competitive with stack.
Even though the allocation/deallocation instruction count can be made
comparable for both approaches, the heap approach doesn't play well
with memory hierarchies.

yeah, heap allocation tends to be slower...

"sliding pointer" allocators can also be very fast, but tend to require
either memory copying (like in a copy collector or compacting collector), or
periodically destroying an entire region of the heap (the app being
responsible for not having any important data there when this happens),
which limits applicability as a general-purpose allocation strategy...

Re GC root-finding, the answer is pretty obvious: VM stacks and
registers must be stored only in C data structures that the GC can
inspect for roots. If the GC is running in a thread different from
the program, appropriate mutexes are needed to make pointer copies
look atomic to the GC. C code that's "GC safe" often looks wierd and
verbose and is tricky to get right. A favorite example is the rather
innocent-looking

*context.register = allocate(&context, 42);
/* allocate 42 bytes to the location pointed to by the register */

where context is being exposed to the GC as a source of roots
and .register (obviously) holds a pointer.

Since the C compiler is free to emit code that computes the LHS L-val
first (and many do), and the allocation (by a copying collector) may
cause the pointer target to be relocated by the GC
(therefore .register holds a new value afterward), this is a hard-to-
find bug. You must instead say

context.tmp = allocate(&context, 42);
*context.register = context.tmp;

yeah.

we call this precise GC, and IME in most cases (in C), it is too much of a
hassle to bother with...
multi-threading with a precise GC is even worse, as then one has to start
being very pedantic about everything, ...


I much prefer the use of either conservative GC, or the "slide memory and
destroy the heap" strategy (this being well suited for cases where one knows
that something will only run a short time, and none of its local data will
be needed later, so to the code in question it is like having a GC...).

also, the above strategy can also give better performance in many usage
patterns than does having an actual GC.

Many people have tried "conservative" GC that inspects arbitrary C
stack contents for possible roots (see for example the Boehm
collector, which is about as well as this can be done). You can't do
this portably. Here, "portable" extends even to different command
line options for the same compiler on the same machine. It's a risky
approach.

The lccwin compiler supports GC as an extension, avoiding the above
risk because the compiler-writer is in the loop.

I do conservative GC, and in general it works fairly well...
(although I tend to use a custom GC, rather than Boehm...).


a partial downside though is performance (during a GC), namely that a
conservative GC with a large heap generally doesn't perform nearly as well
as the other options (largely since a conservative).

another problem is one of "write barriers":
there is no good and portable way to do implicit write barriers with a GC,
which is needed to be able to safely do concurrent GC (where the GC runs at
the same time as the program is running);
this means either having to forsake a little safety (running without write
barriers, and having some added risk of crashes), or having to use pedantic
little functions or macros whenever assigning regions of memory on the heap,
which is annoying.

like, if:
gcSet(foo->ptr, value);
has to be used rather than:
foo->ptr=value;

and so on...


there are also incremental strategies (such as reference counting), which
have their own tradeoffs.

sadly, all this means that one will often end up using a number of different
memory-management strategies in an app depending on what one is doing,
leading in some cases to "hidden rules" regarding the treatment and handling
of memory objects.


or such...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top