Tiny VM in C

tekk.nolagi · Feb 11, 2014

Hi all,

I decided to go ahead and try and write... something like a VM in C. I'm not quite sure what I'm doing, but it seems OK right now.

If anyone is interested in looking at <200 LoC, I'd appreciate it.

github.com/tekknolagi/myvm

Thanks,
Max

Rick C. Hodgin · Feb 11, 2014

Hi all,
I decided to go ahead and try and write... something like a VM in C.
I'm not quite sure what I'm doing, but it seems OK right now.

If anyone is interested in looking at <200 LoC, I'd appreciate it.
github.com/tekknolagi/myvm
Thanks,
Max

Interesting.

Best regards,
Rick C. Hodgin

Eric Sosman · Feb 11, 2014

Hi all,

I decided to go ahead and try and write... something like a VM in C. I'm not quite sure what I'm doing, but it seems OK right now.

If anyone is interested in looking at <200 LoC, I'd appreciate it.

github.com/tekknolagi/myvm

Based only on a quick look, one problem jumps right out:
The "globals.h" and "instructions.h" files contain actual
definitions (not merely declarations) of the `regs' array and
of the instr_xxx() functions. If your project ever grows beyond
the everything-in-one-file stage, this will be a Bad Thing:
If two or more .c files include either or both of these headers
and then are linked together into the same program, there will
be multiple definitions of these items -- and even though the
definitions agree in every respect, they clash. Technically,
"the behavior is undefined." References:

- Section 6.9 paragraph 5 of the C language Standard

- Questions 1.7 and 10.6 on the comp.lang.c Frequently
Asked Questions (FAQ) page at <http://www.c-faq.com/>

As for the VM itself, a machine without branching or looping
capability is not particularly powerful (hence, not particularly
interesting). I realize you're just getting started and you may
want to add such capabilities later -- but that's when the problems
with multiple .c files and clashing definitions will start to
become more pressing.

Rick C. Hodgin · Feb 11, 2014

As for the VM itself, a machine without branching or looping
capability is not particularly powerful (hence, not particularly
interesting).

Branching or looping could be implemented on top of this base. A series
of commands for the fixed portion could be sent to this "runner" portion
which, in turn, sets the results in registers, or later flags might be
added, and upon return from the fixed runner portion, the flags can be
inspected in another module which branches appropriately.

On the x86, it branches on average every six instructions. By sending
those "six instructions" here, it can do the sequential work while
leaving the register and flag states for the caller module which
handles the branching, and looping.

Best regards,
Rick C. Hodgin

Ben Bacarisse · Feb 11, 2014

I decided to go ahead and try and write... something like a VM in
C. I'm not quite sure what I'm doing, but it seems OK right now.

If anyone is interested in looking at <200 LoC, I'd appreciate it.

github.com/tekknolagi/myvm

That's good.

My main comment would be that you have not yet thought about how to
handle jumps. Doing that is going to mean that the functions that do the
work will need to be able to adjust the program counter. The PC is just
another bit of state, like the registers, which needs to be maintained,
and there may be more state as you consider things like condition flags
and so on. Anyway, the upshot is the it will be time to make a struct
to store the machine state, rather than passing more and more arguments
to the various functions, or having to have the machine state stored in
globals.

What sort of data layout is your VM going to have? A stack?

Kaz Kylheku · Feb 11, 2014

Branching or looping could be implemented on top of this base. A series
of commands for the fixed portion could be sent to this "runner" portion
which, in turn, sets the results in registers, or later flags might be
added, and upon return from the fixed runner portion, the flags can be

^^^^^^^^^^^^^^^^

That's ... branching.

BartC · Feb 11, 2014

Hi all,

I decided to go ahead and try and write... something like a VM in C. I'm
not quite sure what I'm doing, but it seems OK right now.

If anyone is interested in looking at <200 LoC, I'd appreciate it.

github.com/tekknolagi/myvm

As others have said, you need to add a conditional jump instruction to start
making it more interesting:

(1) you can run some actual programs
(2) you can see how fast it is compared with real machine code

Rick C. Hodgin · Feb 11, 2014

Branching or looping could be implemented on top of this base. A series
of commands for the fixed portion could be sent to this "runner" portion
which, in turn, sets the results in registers, or later flags might be
added, and upon return from the fixed runner portion, the flags can be
inspected in another module which branches appropriately.

Here's an example of what I meant. In this case, there is a master
structure that contains pakcets of fixed run-length packets that
contain no branching instructions, interspersed with branching
instructions. The fixed length portion is executed in one function,
and the branching is handled in another.

// For illustration purposes only.

struct Sx {
// Instruction stuff here
};

// A repeated series of instructions
struct SMaster {
bool is_branch;
int count;
Sx* first_instruction;
};

void mainLoop(SMaster* packet)
{
while (packet)
{
if (packet->is_branch) {
packet = executeBranch(packets->first_instruction);
// Right now, packet is pointing to the next packet

} else {
executeStream(packet->first_instruction, packets->count);
// Move to next packet
++packet;
}
}
}

void executeStream(SX* instr, int count)
{
int i;

// Run the fixed portion (which has no branching instruction)
for (i = 0; i < count; i++, instr++)
decode(instr);
}

SMaster* executeBranch(Sx* instr)
{
SMaster* newPacket;

// Determine the branch instruction and get the new packet

return(newPacket);
}

Best regards,
Rick C. Hodgin

tekk.nolagi · Feb 12, 2014

Hi all,

Eric: I took your advice and separated the definitions and declarations. Thanks for the tip!

Eric & Ben: I'm not sure how I would go about implementing branching or looping. I do like the idea about a struct though. I'm looking into that now. Would, then, each instruction function take the stack as a parameter?

Rick: I'm rather new to this, so I'm not quite sure what you mean by branching in this regard. Is it different than the idea of control flow?

Ben: I would like to have a stack, yes.

Bart: I'm not quite sure about a benchmark, but I would love for this to be a learning experience, or to write something that compiles a language into this.

tekk.nolagi · Feb 12, 2014

Ben: I went ahead and added a stack. Trying to figure out how a JMP would work.

tekk.nolagi · Feb 12, 2014

So it turns out I have much to learn about instruction pointers and such. What is the best way to learn? So far I have found:

http://en.wikibooks.org/wiki/X86_Assembly/Control_Flow#Jump_on_Zero
http://en.wikipedia.org/wiki/X86_assembly_language#Using_the_instruction_pointer_register

Rick C. Hodgin · Feb 12, 2014

If you are looking to really know, search for "Intel IA-32 Instruction Set Manual" and you'll find thick books outlining the x86.

You can also take a look at Bochs source code. It is a full x86 emulation program.

Download Visual Studio express, and compile your program in it. Step into, then turn on disassembly, then view registers. You'll get a "learn by example" view.

BartC · Feb 12, 2014

Eric & Ben: I'm not sure how I would go about implementing branching or
looping. I do like the idea about a struct though. I'm looking into that
now. Would, then, each instruction function take the stack as a parameter?

Rick: I'm rather new to this, so I'm not quite sure what you mean by
branching in this regard. Is it different than the idea of control flow?

Bart: I'm not quite sure about a benchmark, but I would love for this to
be a learning experience, or to write something that compiles a language
into this.

Branching is just control flow, usually conditional. To implement this, then
some instructions will have to modify the program counter (pc). You already
have this kind of loop:

for (mstate.pc = 0; mstate.running; mstate.pc++)

This changes, so start you start off with mstate.pc=0, but then each
instruction can be made to return the next pc value, usually just pc+1.
Instructions such as jump, jump on condition, call, return etc. can return a
different new pc - the destination label, which will be one of the operands,
and will return that pc instead of pc+1.

So an instruction such as jump_not_zero R0, L5 will return either pc+1 when
R0 is zero, or L5 (L5 being the index of some other instruction).

(You might be able to assume pc=pc+1 for the majority of instructions if you
can somehow tell which instructions can manipulate the pc, and which don't.)

To write virtual code which has labels, you will need to number each
instruction 0, 1, 2 etc (this can be in the comment) to make it easy to know
what label numbers to use. (A label is just the number of the target
instruction.) Writing longer code though will be difficult (to modify and
maintain) unless you create an assembler or somehow generate the code with
software.

(BTW I wouldn't have bothered encoding the instructions into a compact form,
as it can take time to unpack them. Maybe you want to have a emulation close
to an actual machine; that's fine, but I would have designed a machine with
byte-wide instruction fields!)

David Brown · Feb 12, 2014

So it turns out I have much to learn about instruction pointers and
such. What is the best way to learn? So far I have found:

http://en.wikibooks.org/wiki/X86_Assembly/Control_Flow#Jump_on_Zero
http://en.wikipedia.org/wiki/X86_assembly_language#Using_the_instruction_pointer_register

Avoid looking at anything remotely "x86" - it is a hideous architecture.
You don't want anything nearly that complicated.

If you want to look at an existing processor's architecture and
instruction set (and I think that is a very good idea), then you want to
pick something small and simple. My recommendation here would be the
"MSP430". It is a 16-bit architecture with quite a small, clean,
orthogonal instruction set. This has an extra advantage that if you
copy the instruction set directly, then you already have tools available
for generating code for your VM - binutils and gcc support it.

Ignore the details of the peripherals and other non-cpu parts of the
devices, of course. And ignore the newer 20-bit extensions to the cpu -
they just add complications.

<http://en.wikipedia.org/wiki/TI_MSP430#MSP430_CPU>
<http://www.ti.com/lsds/ti/microcontroller/16-bit_msp430/overview.page>

Maxwell Bernstein · Feb 12, 2014

If you want to look at an existing processor's architecture and

instruction set (and I think that is a very good idea), then you want to

pick something small and simple. My recommendation here would be the

"MSP430". It is a 16-bit architecture with quite a small, clean,

orthogonal instruction set. This has an extra advantage that if you

copy the instruction set directly, then you already have tools available

for generating code for your VM - binutils and gcc support it.

So I am looking at the instruction set (on Wikipedia) and seeing JZ and JNZ. What advice should I take about implementing those?

Maxwell Bernstein · Feb 12, 2014

Like in labeling instructions or sets of instructions...

Eric Sosman · Feb 12, 2014

So I am looking at the instruction set (on Wikipedia) and seeing JZ and JNZ. What advice should I take about implementing those?

Since they're written in CAPITAL LETTERS, they're probably
(not certainly, but probably) macro names. Look for the #define
directives that --

Oh, wait: You *are* asking about C, right?

Maxwell Bernstein · Feb 12, 2014

I am not asking *directly* about C; I am asking advice for an implementation IN C.

Rick C. Hodgin · Feb 12, 2014

So I am looking at the instruction set (on Wikipedia) and seeing JZ and
JNZ. What advice should I take about implementing those?

You need to test the result of operations and set a flag. The JZ means "jump
if zero" and the JNZ means "jump if not zero". Other forms can say BZ for
"branch if zero" and BNZ for "branch if not zero".

In your add operation, for example, if the two integer values add up to be
zero (-1 added to +1 equals 0) then based on the result you set a flag.
The flag nomenclature is typically ?Z or Z? and indicates the flag state.
When it is ?Z the flag is raised (upper case letters) which means that
state is signaling. In this case -1 added to +1 equals 0 would cause ?Z
because the result is zero, and the flag now indicates that condition.

In the case of +5 added to +2 equals +7, the state would be ?z (lowered),
because the result was not zero.

Then for branching instructions, you test the condition of that flag, and
based on whatever the branching conditions you're setting are, you either
branch or don't branch.

If you have an x86 box with Windows, and can download Visual Studio express,
you'll find it very helpful. You can email me offline if you'd like some
assistance on stepping through x86 assembly. But, seeing the practical
results in a real, graphical debugger, which shows the register and flag
states change as you step through assembly instructions ... for the task
you're on about it would be most beneficial in my estimation.

Other common conditions:

?Z or ?z -- Zero
?O or ?o -- Overflow (adding 200 and 200 in an 8-bit quantity overflows)
?U or ?u -- Underflow (same thing with subtracting)
?P or ?p -- Parity (if the number of bits in the result was odd)

And other flags. Check out the EFLAGS register in the Intel IA-32 if you'd
like to see the ones used by the x86 CPU.

Best regards,
Rick C. Hodgin

BartC · Feb 12, 2014

So I am looking at the instruction set (on Wikipedia) and seeing JZ and
JNZ. What advice should I take about implementing those?

These will work on condition flags usually, set by a previous instruction
(such as CMP).

If the condition is true, jump to the label (set the PC to value specified
as the operand). Otherwise carry on with the next instruction (set PC to
PC+1 for example).

It's not that simple! And even if gcc might generate code for it, the
challenge then is disentangling the object, binary or executable file
formats. There might also be a hefty runtime library to emulate and debug.

For emulating a real processor as an exercise, probably an 8-bit cpu from
the late 70s would be simplest (eg. 8080 or Z80, 6800, 6502). There is
little peripheral stuff to worry about, and the data sheets might be a dozen
pages. (And any emulation would likely run considerably faster than the real
thing!)

Weird Behavior with Rays in C and OpenGL	4	Feb 12, 2024
Mini Web Server in C++ (Part One)	4	Oct 2, 2025
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Help with finding difference between two bodies of text in order	0	Sep 10, 2024
Exception propagation in C	63	Dec 29, 2011
Hello everyone	0	Jul 28, 2023

Tiny VM in C

tekk.nolagi

Rick C. Hodgin

Eric Sosman

Rick C. Hodgin

Ben Bacarisse

Kaz Kylheku

BartC

Rick C. Hodgin

tekk.nolagi

tekk.nolagi

tekk.nolagi

Rick C. Hodgin

BartC

David Brown

Maxwell Bernstein

Maxwell Bernstein

Eric Sosman

Maxwell Bernstein

Rick C. Hodgin

BartC

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads