Tiny VM in C

Discussion in 'C Programming' started by tekk.nolagi, Feb 11, 2014.

  1. tekk.nolagi

    tekk.nolagi Guest

    Hi all,

    I decided to go ahead and try and write... something like a VM in C. I'm not quite sure what I'm doing, but it seems OK right now.

    If anyone is interested in looking at <200 LoC, I'd appreciate it.

    github.com/tekknolagi/myvm

    Thanks,
    Max
     
    tekk.nolagi, Feb 11, 2014
    #1
    1. Advertisements

  2. Interesting.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Feb 11, 2014
    #2
    1. Advertisements

  3. tekk.nolagi

    Eric Sosman Guest

    Based only on a quick look, one problem jumps right out:
    The "globals.h" and "instructions.h" files contain actual
    definitions (not merely declarations) of the `regs' array and
    of the instr_xxx() functions. If your project ever grows beyond
    the everything-in-one-file stage, this will be a Bad Thing:
    If two or more .c files include either or both of these headers
    and then are linked together into the same program, there will
    be multiple definitions of these items -- and even though the
    definitions agree in every respect, they clash. Technically,
    "the behavior is undefined." References:

    - Section 6.9 paragraph 5 of the C language Standard

    - Questions 1.7 and 10.6 on the comp.lang.c Frequently
    Asked Questions (FAQ) page at <http://www.c-faq.com/>

    As for the VM itself, a machine without branching or looping
    capability is not particularly powerful (hence, not particularly
    interesting). I realize you're just getting started and you may
    want to add such capabilities later -- but that's when the problems
    with multiple .c files and clashing definitions will start to
    become more pressing.
     
    Eric Sosman, Feb 11, 2014
    #3
  4. Branching or looping could be implemented on top of this base. A series
    of commands for the fixed portion could be sent to this "runner" portion
    which, in turn, sets the results in registers, or later flags might be
    added, and upon return from the fixed runner portion, the flags can be
    inspected in another module which branches appropriately.

    On the x86, it branches on average every six instructions. By sending
    those "six instructions" here, it can do the sequential work while
    leaving the register and flag states for the caller module which
    handles the branching, and looping.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Feb 11, 2014
    #4
  5. That's good.

    My main comment would be that you have not yet thought about how to
    handle jumps. Doing that is going to mean that the functions that do the
    work will need to be able to adjust the program counter. The PC is just
    another bit of state, like the registers, which needs to be maintained,
    and there may be more state as you consider things like condition flags
    and so on. Anyway, the upshot is the it will be time to make a struct
    to store the machine state, rather than passing more and more arguments
    to the various functions, or having to have the machine state stored in
    globals.

    What sort of data layout is your VM going to have? A stack?
     
    Ben Bacarisse, Feb 11, 2014
    #5
  6. tekk.nolagi

    Kaz Kylheku Guest

    ^^^^^^^^^^^^^^^^

    That's ... branching.
     
    Kaz Kylheku, Feb 11, 2014
    #6
  7. tekk.nolagi

    BartC Guest

    As others have said, you need to add a conditional jump instruction to start
    making it more interesting:

    (1) you can run some actual programs
    (2) you can see how fast it is compared with real machine code
     
    BartC, Feb 11, 2014
    #7
  8. Here's an example of what I meant. In this case, there is a master
    structure that contains pakcets of fixed run-length packets that
    contain no branching instructions, interspersed with branching
    instructions. The fixed length portion is executed in one function,
    and the branching is handled in another.

    // For illustration purposes only. :)

    struct Sx {
    // Instruction stuff here
    };

    // A repeated series of instructions
    struct SMaster {
    bool is_branch;
    int count;
    Sx* first_instruction;
    };

    void mainLoop(SMaster* packet)
    {
    while (packet)
    {
    if (packet->is_branch) {
    packet = executeBranch(packets->first_instruction);
    // Right now, packet is pointing to the next packet

    } else {
    executeStream(packet->first_instruction, packets->count);
    // Move to next packet
    ++packet;
    }
    }
    }

    void executeStream(SX* instr, int count)
    {
    int i;

    // Run the fixed portion (which has no branching instruction)
    for (i = 0; i < count; i++, instr++)
    decode(instr);
    }

    SMaster* executeBranch(Sx* instr)
    {
    SMaster* newPacket;

    // Determine the branch instruction and get the new packet

    return(newPacket);
    }

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Feb 11, 2014
    #8
  9. tekk.nolagi

    tekk.nolagi Guest

    Hi all,

    Eric: I took your advice and separated the definitions and declarations. Thanks for the tip!

    Eric & Ben: I'm not sure how I would go about implementing branching or looping. I do like the idea about a struct though. I'm looking into that now. Would, then, each instruction function take the stack as a parameter?

    Rick: I'm rather new to this, so I'm not quite sure what you mean by branching in this regard. Is it different than the idea of control flow?

    Ben: I would like to have a stack, yes.

    Bart: I'm not quite sure about a benchmark, but I would love for this to be a learning experience, or to write something that compiles a language into this.
     
    tekk.nolagi, Feb 12, 2014
    #9
  10. tekk.nolagi

    tekk.nolagi Guest

    Ben: I went ahead and added a stack. Trying to figure out how a JMP would work.
     
    tekk.nolagi, Feb 12, 2014
    #10
  11. tekk.nolagi

    tekk.nolagi Guest

    tekk.nolagi, Feb 12, 2014
    #11
  12. If you are looking to really know, search for "Intel IA-32 Instruction Set Manual" and you'll find thick books outlining the x86.

    You can also take a look at Bochs source code. It is a full x86 emulation program.

    Download Visual Studio express, and compile your program in it. Step into, then turn on disassembly, then view registers. You'll get a "learn by example" view.
     
    Rick C. Hodgin, Feb 12, 2014
    #12
  13. tekk.nolagi

    BartC Guest

    Branching is just control flow, usually conditional. To implement this, then
    some instructions will have to modify the program counter (pc). You already
    have this kind of loop:

    for (mstate.pc = 0; mstate.running; mstate.pc++)

    This changes, so start you start off with mstate.pc=0, but then each
    instruction can be made to return the next pc value, usually just pc+1.
    Instructions such as jump, jump on condition, call, return etc. can return a
    different new pc - the destination label, which will be one of the operands,
    and will return that pc instead of pc+1.

    So an instruction such as jump_not_zero R0, L5 will return either pc+1 when
    R0 is zero, or L5 (L5 being the index of some other instruction).

    (You might be able to assume pc=pc+1 for the majority of instructions if you
    can somehow tell which instructions can manipulate the pc, and which don't.)

    To write virtual code which has labels, you will need to number each
    instruction 0, 1, 2 etc (this can be in the comment) to make it easy to know
    what label numbers to use. (A label is just the number of the target
    instruction.) Writing longer code though will be difficult (to modify and
    maintain) unless you create an assembler or somehow generate the code with
    software.

    (BTW I wouldn't have bothered encoding the instructions into a compact form,
    as it can take time to unpack them. Maybe you want to have a emulation close
    to an actual machine; that's fine, but I would have designed a machine with
    byte-wide instruction fields!)
     
    BartC, Feb 12, 2014
    #13
  14. tekk.nolagi

    David Brown Guest


    Avoid looking at anything remotely "x86" - it is a hideous architecture.
    You don't want anything nearly that complicated.

    If you want to look at an existing processor's architecture and
    instruction set (and I think that is a very good idea), then you want to
    pick something small and simple. My recommendation here would be the
    "MSP430". It is a 16-bit architecture with quite a small, clean,
    orthogonal instruction set. This has an extra advantage that if you
    copy the instruction set directly, then you already have tools available
    for generating code for your VM - binutils and gcc support it.

    Ignore the details of the peripherals and other non-cpu parts of the
    devices, of course. And ignore the newer 20-bit extensions to the cpu -
    they just add complications.

    <http://en.wikipedia.org/wiki/TI_MSP430#MSP430_CPU>
    <http://www.ti.com/lsds/ti/microcontroller/16-bit_msp430/overview.page>
     
    David Brown, Feb 12, 2014
    #14
  15. If you want to look at an existing processor's architecture and

    So I am looking at the instruction set (on Wikipedia) and seeing JZ and JNZ. What advice should I take about implementing those?
     
    Maxwell Bernstein, Feb 12, 2014
    #15
  16. Like in labeling instructions or sets of instructions...
     
    Maxwell Bernstein, Feb 12, 2014
    #16
  17. tekk.nolagi

    Eric Sosman Guest

    Since they're written in CAPITAL LETTERS, they're probably
    (not certainly, but probably) macro names. Look for the #define
    directives that --

    Oh, wait: You *are* asking about C, right?
     
    Eric Sosman, Feb 12, 2014
    #17
  18. I am not asking *directly* about C; I am asking advice for an implementation IN C.
     
    Maxwell Bernstein, Feb 12, 2014
    #18
  19. You need to test the result of operations and set a flag. The JZ means "jump
    if zero" and the JNZ means "jump if not zero". Other forms can say BZ for
    "branch if zero" and BNZ for "branch if not zero".

    In your add operation, for example, if the two integer values add up to be
    zero (-1 added to +1 equals 0) then based on the result you set a flag.
    The flag nomenclature is typically ?Z or Z? and indicates the flag state.
    When it is ?Z the flag is raised (upper case letters) which means that
    state is signaling. In this case -1 added to +1 equals 0 would cause ?Z
    because the result is zero, and the flag now indicates that condition.

    In the case of +5 added to +2 equals +7, the state would be ?z (lowered),
    because the result was not zero.

    Then for branching instructions, you test the condition of that flag, and
    based on whatever the branching conditions you're setting are, you either
    branch or don't branch.

    If you have an x86 box with Windows, and can download Visual Studio express,
    you'll find it very helpful. You can email me offline if you'd like some
    assistance on stepping through x86 assembly. But, seeing the practical
    results in a real, graphical debugger, which shows the register and flag
    states change as you step through assembly instructions ... for the task
    you're on about it would be most beneficial in my estimation.

    Other common conditions:

    ?Z or ?z -- Zero
    ?O or ?o -- Overflow (adding 200 and 200 in an 8-bit quantity overflows)
    ?U or ?u -- Underflow (same thing with subtracting)
    ?P or ?p -- Parity (if the number of bits in the result was odd)

    And other flags. Check out the EFLAGS register in the Intel IA-32 if you'd
    like to see the ones used by the x86 CPU.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Feb 12, 2014
    #19
  20. tekk.nolagi

    BartC Guest

    These will work on condition flags usually, set by a previous instruction
    (such as CMP).

    If the condition is true, jump to the label (set the PC to value specified
    as the operand). Otherwise carry on with the next instruction (set PC to
    PC+1 for example).

    It's not that simple! And even if gcc might generate code for it, the
    challenge then is disentangling the object, binary or executable file
    formats. There might also be a hefty runtime library to emulate and debug.

    For emulating a real processor as an exercise, probably an 8-bit cpu from
    the late 70s would be simplest (eg. 8080 or Z80, 6800, 6502). There is
    little peripheral stuff to worry about, and the data sheets might be a dozen
    pages. (And any emulation would likely run considerably faster than the real
    thing!)
     
    BartC, Feb 12, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.