So your first
step should be to identify logical groupings of code that you can
move off into their own functions. Don't worry for now about
refactoring for commonality - just refactor for function size!
<snip>
<--
Thanks Richard, yes I have considered this. However, what is the
overhead of adding a function? I obviously want my bytecode to
execute as efficiently as possible. If every single bytecode
instruction launches a function to perform its role, it's not going to
be so quick is it? That's why one big function monolith. The other
problem with adding functions is it then requires all kinds of
different variables required to keep the engine running to be passed
to the function. I could encapsulate these in one big structure and
pass a single pointer, but then I'm making more data available to each
function than it really needs, this goes against my nature (usually I
only want to pass the data I know the function will need).
So, functions aside, are there any other suggestions from anyone
else? Speed is of the essence, I don't want to do anything that will
slow it down (execution speed before style, Richard).
-->
with this code, you will not lose much.
noting as how damn near every case is a full block (with variables, if
statements, loops, and such...), the relative cost of adding a function is
trivial.
it can be noted that, on average, if/for/while/... are fairly expensive vs
other operations (whereas your code uses them endlessly).
putting state in a big structure:
I, and probably most others I know of, also do this...
other ideas:
don't do high-level opcode decoding for each opcode, or if you really must
have high-level opcode decoding, maybe put it in its own code. this is what
I do for x86 machine code, where 1 piece of code is responsible for decoding
opcodes, and another piece of code for executing them.
so, in this case, I decode opcodes, and then run them through a switch based
on argument configuration (Reg, RM, Imm, RegRM, RMReg, Imm, RegImm, RMImm,
....), and then through another big switch (per-opcode number).
typically, each of these has its own functions, so the argument
configuration switch calls the function for handling the particular
configuration, which does a big switch for handling the various opcodes.
another route is to make each opcode have fixed-form arguments (sort of like
in the JVM's JBC, MSIL, ...). where, if one has different arguments, they
also have different opcode names and numbers.
another thing:
if possible, try to make operations fairly close to atomic.
if it is an atomic operation, then maybe it goes in the switch;
if it is not (as in, involves a big ugly chunk of logic code), then maybe it
is better served by a function.
FWIW, I handled pretty much the entire integer subset of x86 in around 2
kloc (for the per-instruction code), with about 700 lines going just into
dealing with REP/REPE/REPZ opcode forms (I give them a lot of special
treatment...).
most of the rest is simple operations, although a lot of this function calls
(given x86 has fairly complicated register-handling logic, and the great
terror known as 'eflags' which is awkward to simulate effectively...), so a
lot of code for dealing with registers and eflags is done elsewhere (as well
as the code for managing the virtual address space, which is essentially
independent of the interpreter, mostly so that they don't step on each
other, ...).
it is then about 1.5 kloc for the code to simulate the x87 FPU (including
conversion code, 80-bit float operations being done primarily via integer
math, ...).
granted, there is a 1.2 kloc chunk of code dedicated primarily to
getting/setting registers (various types, sizes, sign vs zero extension,
....).
....
and, in all this, I realize that x86 is very difficult to interpret
efficiently, vs nearly any other bytecode.
most of my bytecodes could be interpreted almost entirely with a big dumb
switch and maybe 1 or 2 statements for each case...
or such...