How to create virtual machine?

Discussion in 'C Programming' started by Borneq, Aug 21, 2010.

  1. Borneq

    Borneq Guest

    I am planning to create virtual machine. How is better, use stack like
    Java or .Net vm or static single assignment form as LLVM has ?
    How garbage collectors gets roots from local variables? In article
    http://www.artima.com/insidejvm/ed2/gcP.html is described only garbage
    collection while the root nodes are already found.
    First I must stop all but GC threads. Next I should see through all
    stack frames and see local variables - how do this step?
     
    Borneq, Aug 21, 2010
    #1
    1. Advertising

  2. Borneq

    Gene Guest

    On Aug 21, 9:56 am, Borneq <> wrote:
    >   I am planning to create virtual machine. How is better, use stack like
    > Java or .Net vm or static single assignment form as LLVM has ?
    > How garbage collectors gets roots from local variables? In articlehttp://www.artima.com/insidejvm/ed2/gcP.htmlis described only garbage
    > collection while the root nodes are already found.
    > First I must stop all but GC threads. Next I should see through all
    > stack frames and see local variables - how do this step?


    I assume you are implementing a VM in C because you are posting here.
    The VM stacks and registers must be stored _only_ in C data structures
    that the GC can inspect for roots. If the GC is running in a thread
    different from the program, appropriate mutexes are needed to make
    pointer copies look atomic to the GC. C code that "GC safe" often
    looks wierd and verbose and is tricky to get right. A favorite
    example is the rather innocent-looking

    *context.register = allocate(&context, somebytes);

    where context is being exposed to the GC as a source of roots
    and .register (obviously) holds a pointer.

    Since the C compiler is free to emit code that computes the LHS L-val
    first (and many do), and the allocation may cause the pointer target
    to be relocated by the GC (therefore .register holds a new value
    afterward), this is a hard-to-find bug. You must instead say

    context.tmp = allocate(somebytes);
    *context.register = context.tmp;
     
    Gene, Aug 21, 2010
    #2
    1. Advertising

  3. Borneq

    Gene Guest

    On Aug 21, 9:56 am, Borneq <> wrote:
    > I am planning to create virtual machine. How is better, use stack like
    > Java or .Net vm or static single assignment form as LLVM has ?
    > How garbage collectors gets roots from local variables? In articlehttp://www.artima.com/insidejvm/ed2/gcP.htmlis described only garbage
    > collection while the root nodes are already found.
    > First I must stop all but GC threads. Next I should see through all
    > stack frames and see local variables - how do this step?


    I assume you are implementing a VM in C because you are posting
    here.

    You are confused about LLVM. SSL is an intermediate form
    representation for compilers, not generally for interpreters like
    VMs. VMs normally use stacks wherever possible because allocation/
    eallocation is extremely fast. But they're LIFO, which doesn't work
    for everything. For this there is the heap.

    The functional language community has looked at various schemes to
    eliminate the stack completely in favor of keeping all storage on a
    GC'ed heap. They have never been seriously competitive with stack.
    Even though the allocation/deallocation instruction count can be made
    comparable for both approaches, the heap approach doesn't play well
    with memory hierarchies.

    Re GC root-finding, the answer is pretty obvious: VM stacks and
    registers must be stored only in C data structures that the GC can
    inspect for roots. If the GC is running in a thread different from
    the program, appropriate mutexes are needed to make pointer copies
    look atomic to the GC. C code that's "GC safe" often looks wierd and
    verbose and is tricky to get right. A favorite example is the rather
    innocent-looking

    *context.register = allocate(&context, 42);
    /* allocate 42 bytes to the location pointed to by the register */

    where context is being exposed to the GC as a source of roots
    and .register (obviously) holds a pointer.

    Since the C compiler is free to emit code that computes the LHS L-val
    first (and many do), and the allocation (by a copying collector) may
    cause the pointer target to be relocated by the GC
    (therefore .register holds a new value afterward), this is a hard-to-
    find bug. You must instead say

    context.tmp = allocate(&context, 42);
    *context.register = context.tmp;

    Many people have tried "conservative" GC that inspects arbitrary C
    stack contents for possible roots (see for example the Boehm
    collector, which is about as well as this can be done). You can't do
    this portably. Here, "portable" extends even to different command
    line options for the same compiler on the same machine. It's a risky
    approach.

    The lccwin compiler supports GC as an extension, avoiding the above
    risk because the compiler-writer is in the loop.
     
    Gene, Aug 21, 2010
    #3
  4. Borneq <> writes:
    > I am planning to create virtual machine. How is better, use stack like
    > Java or .Net vm or static single assignment form as LLVM has ?
    > How garbage collectors gets roots from local variables? In article
    > http://www.artima.com/insidejvm/ed2/gcP.html is described only garbage
    > collection while the root nodes are already found.
    > First I must stop all but GC threads. Next I should see through all
    > stack frames and see local variables - how do this step?


    I see nothing about C in your question. You'll probably get better
    answers in, say, comp.programming. Feel free to come back here if
    you have any C-specific questions.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 21, 2010
    #4
  5. In article <>,
    Keith Thompson <> wrote:
    >Borneq <> writes:
    >> I am planning to create virtual machine. How is better, use stack like
    >> Java or .Net vm or static single assignment form as LLVM has ?
    >> How garbage collectors gets roots from local variables? In article
    >> http://www.artima.com/insidejvm/ed2/gcP.html is described only garbage
    >> collection while the root nodes are already found.
    >> First I must stop all but GC threads. Next I should see through all
    >> stack frames and see local variables - how do this step?

    >
    >I see nothing about C in your question. You'll probably get better
    >answers in, say, comp.programming. Feel free to come back here if
    >you have any C-specific questions.


    But only if.

    Otherwise, Gatekeeper Kiki will have you forcibly ejected. He does have
    that power, you know...

    --
    "The anti-regulation business ethos is based on the charmingly naive notion
    that people will not do unspeakable things for money." - Dana Carpender

    Quoted by Paul Ciszek (pciszek at panix dot com). But what I want to know
    is why is this diet/low-carb food author doing making pithy political/economic
    statements?

    Nevertheless, the above quote is dead-on, because, the thing is - business
    in one breath tells us they don't need to be regulated (which is to say:
    that they can morally self-regulate), then in the next breath tells us that
    corporations are amoral entities which have no obligations to anyone except
    their officers and shareholders, then in the next breath they tell us they
    don't need to be regulated (that they can morally self-regulate) ...
     
    Kenny McCormack, Aug 21, 2010
    #5
  6. "Gene" <> wrote in message
    news:...
    > On Aug 21, 9:56 am, Borneq <> wrote:
    >> I am planning to create virtual machine. How is better, use stack like
    >> Java or .Net vm or static single assignment form as LLVM has ?
    >> How garbage collectors gets roots from local variables? In
    >> articlehttp://www.artima.com/insidejvm/ed2/gcP.htmlis described only
    >> garbage
    >> collection while the root nodes are already found.
    >> First I must stop all but GC threads. Next I should see through all
    >> stack frames and see local variables - how do this step?

    >
    > I assume you are implementing a VM in C because you are posting
    > here.
    >
    > You are confused about LLVM. SSL is an intermediate form
    > representation for compilers, not generally for interpreters like
    > VMs. VMs normally use stacks wherever possible because allocation/
    > eallocation is extremely fast. But they're LIFO, which doesn't work
    > for everything. For this there is the heap.
    >


    actually, a VM can also be a compiler, just typically compilation is done
    just before running the code (as often called, JIT).

    now, the tradeoffs are, as I see it:
    for interpreters, stack machines are easier to implement and work with,
    whereas register machines tend to be somewhat more complicated.

    however, stack machines are also a problem when it comes to flexibility,
    such as the freedom to reorganize code or adapt ordering to be better suited
    to a particular target architecture (more important for compilers), which
    gives an advantage to more flexible forms (such as SSA, or the use of ASTs).


    > The functional language community has looked at various schemes to
    > eliminate the stack completely in favor of keeping all storage on a
    > GC'ed heap. They have never been seriously competitive with stack.
    > Even though the allocation/deallocation instruction count can be made
    > comparable for both approaches, the heap approach doesn't play well
    > with memory hierarchies.
    >


    yeah, heap allocation tends to be slower...

    "sliding pointer" allocators can also be very fast, but tend to require
    either memory copying (like in a copy collector or compacting collector), or
    periodically destroying an entire region of the heap (the app being
    responsible for not having any important data there when this happens),
    which limits applicability as a general-purpose allocation strategy...


    > Re GC root-finding, the answer is pretty obvious: VM stacks and
    > registers must be stored only in C data structures that the GC can
    > inspect for roots. If the GC is running in a thread different from
    > the program, appropriate mutexes are needed to make pointer copies
    > look atomic to the GC. C code that's "GC safe" often looks wierd and
    > verbose and is tricky to get right. A favorite example is the rather
    > innocent-looking
    >
    > *context.register = allocate(&context, 42);
    > /* allocate 42 bytes to the location pointed to by the register */
    >
    > where context is being exposed to the GC as a source of roots
    > and .register (obviously) holds a pointer.
    >
    > Since the C compiler is free to emit code that computes the LHS L-val
    > first (and many do), and the allocation (by a copying collector) may
    > cause the pointer target to be relocated by the GC
    > (therefore .register holds a new value afterward), this is a hard-to-
    > find bug. You must instead say
    >
    > context.tmp = allocate(&context, 42);
    > *context.register = context.tmp;
    >


    yeah.

    we call this precise GC, and IME in most cases (in C), it is too much of a
    hassle to bother with...
    multi-threading with a precise GC is even worse, as then one has to start
    being very pedantic about everything, ...


    I much prefer the use of either conservative GC, or the "slide memory and
    destroy the heap" strategy (this being well suited for cases where one knows
    that something will only run a short time, and none of its local data will
    be needed later, so to the code in question it is like having a GC...).

    also, the above strategy can also give better performance in many usage
    patterns than does having an actual GC.


    > Many people have tried "conservative" GC that inspects arbitrary C
    > stack contents for possible roots (see for example the Boehm
    > collector, which is about as well as this can be done). You can't do
    > this portably. Here, "portable" extends even to different command
    > line options for the same compiler on the same machine. It's a risky
    > approach.
    >
    > The lccwin compiler supports GC as an extension, avoiding the above
    > risk because the compiler-writer is in the loop.
    >


    I do conservative GC, and in general it works fairly well...
    (although I tend to use a custom GC, rather than Boehm...).


    a partial downside though is performance (during a GC), namely that a
    conservative GC with a large heap generally doesn't perform nearly as well
    as the other options (largely since a conservative).

    another problem is one of "write barriers":
    there is no good and portable way to do implicit write barriers with a GC,
    which is needed to be able to safely do concurrent GC (where the GC runs at
    the same time as the program is running);
    this means either having to forsake a little safety (running without write
    barriers, and having some added risk of crashes), or having to use pedantic
    little functions or macros whenever assigning regions of memory on the heap,
    which is annoying.

    like, if:
    gcSet(foo->ptr, value);
    has to be used rather than:
    foo->ptr=value;

    and so on...


    there are also incremental strategies (such as reference counting), which
    have their own tradeoffs.

    sadly, all this means that one will often end up using a number of different
    memory-management strategies in an app depending on what one is doing,
    leading in some cases to "hidden rules" regarding the treatment and handling
    of memory objects.


    or such...
     
    BGB / cr88192, Aug 21, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?UmVzaG1hIFByYWJodQ==?=

    Auto-attach to process '[1084] aspnet_wp.exe' on machine <Machine

    =?Utf-8?B?UmVzaG1hIFByYWJodQ==?=, Jan 21, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    1,063
    =?Utf-8?B?UmVzaG1hIFByYWJodQ==?=
    Jan 21, 2005
  2. UJ
    Replies:
    2
    Views:
    427
  3. Replies:
    4
    Views:
    5,736
    Michael Borgwardt
    Dec 10, 2004
  4. Nico Grubert
    Replies:
    1
    Views:
    2,613
    Markus Rosenstihl
    Sep 21, 2005
  5. Steve Holden
    Replies:
    0
    Views:
    413
    Steve Holden
    Sep 21, 2005
Loading...

Share This Page