CPU simulator written in C

F

/* frank */

I have to do a homework: make a CPU simulator using C language.

I have a set of asm instructions so I have to write a program
that should:
- load .asm file
- view .asm file
- do a step by step simulation
- display registers contents

I tried to search con google with no success.

Any help (links, tricks/tips...)
would be really appreciated.

Thanks in advance.
 
E

Emmanuel Delahaye

In 'comp.lang.c' said:
I have to do a homework: make a CPU simulator using C language.

I have a set of asm instructions so I have to write a program
that should:
- load .asm file
- view .asm file
- do a step by step simulation
- display registers contents

It's a design question. There is no particular problem with that. It's just
another interpreter.
I tried to search con google with no success.

Better to dig your brain. The answer is there.

BTW, what is your question about the C-language?
 
R

Richard Bos

/* frank */ said:
I have to do a homework: make a CPU simulator using C language.

I have a set of asm instructions so I have to write a program
that should:
- load .asm file
- view .asm file
- do a step by step simulation
- display registers contents

I tried to search con google with no success.

Your assignment said: "_make_ a CPU simulator". Not "copy a CPU
simulator". There is a very good reason for this. How do you expect to
learn anything if you just hand in someone else's work?

Richard
 
D

Dan Pop

In said:
I have to do a homework: make a CPU simulator using C language.

Well, C is particularly suitable for such an exercise.
I have a set of asm instructions so I have to write a program
that should:
- load .asm file
- view .asm file
- do a step by step simulation
- display registers contents

I tried to search con google with no success.

Any help (links, tricks/tips...)
would be really appreciated.

Apart from being lazy, you're also stupid if you can expect any help
without specifying the CPU being emulated and the format of the .asm
files. I hope they're containing machine code and not assembly code,
because assemblying the code is, by far, the most difficult part of the
exercise.

Anyway, you can't expect any help until you describe your design and
ask very specific questions about its implementation.

Dan
 
T

Thomas Matthews

/* frank */ said:
I have to do a homework: make a CPU simulator using C language.

I have a set of asm instructions so I have to write a program
that should:
- load .asm file
- view .asm file
- do a step by step simulation
- display registers contents

I tried to search con google with no success.

Any help (links, tricks/tips...)
would be really appreciated.

Thanks in advance.

Ha! I wrote something like this at the University.

I also wrote another one at my current worksite.

Here are some techniques:
1. Allocate variables for all registers.

2. You may need variables for the databus(es) too.

3. Write a simple program that executes a simple
instruction, such as adding two registers.

3.1 This may involve writing a function that simulates
a hardware adder. In my case, I had to put a value
onto a databus, have the adder retrieve from the
databus and repeat for the other argument. Another
function would trigger the actual addition, then
another function puts the sum onto the databus.

4. Once you get the above working, add a new feature
and test it. The paradigm of "Test Driven Development"
says to write the test first and expect the first
invocation to fail, since you haven't written the
code yet.

If you have a windowing operating system, you could
display the register contents and the databus contents
in separate windows.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
R

RoSsIaCrIiLoIA

I have to do a homework: make a CPU simulator using C language.

I have a set of asm instructions so I have to write a program
that should:
- load .asm file
- view .asm file
- do a step by step simulation
- display registers contents

I tried to search con google with no success.

Any help (links, tricks/tips...)
would be really appreciated.

Thanks in advance.
I googled eax ebx ecx edx ax bx in com.lang.c
-->
_______________________
I have repeatedly come across declarations of
the type 'struct REGS' (i think that was what was called,
have seen a few variations) and was wondering if
someone could tell me it's contents, and what it is used for.
I only have a very basic shareware compiler (for my PC anyway,
and I have never seen it under Unix) and it has no mention of this
structure.

Could someone enlighten me ?

This is not a Unix item but strictly a PC item as follows:

union REGS {
struct WORDREGS x;
struct BYTEREGS h;
};

struct WORDREGS {
unsigned int ax, bx, cx, dx;
unsigned int si, di, cflag, flags;
};

struct BYTEREGS {
unsigned char al, ah, bl, bh;
unsigned char cl, ch, dl, dh;
};

These structures allow C programs on PCs to access the CPU registers.

AX,BX,CX,DX,SI,DI,CF(carry flag),and FLAGS are all Intel 80x86
registers
AL/AH,BL/BH,CL/CH,DL/DH are pseudo 8 bit registers which are derived
from the
AL = Low half of AX, AH = High half of AX, etc.

The basic registers are 16 bits while the extended registers are 32
bits
EAX,EBX,ECX,EDX, etc.

In addition you have the segment registers CS,DS,ES,SS and the
pointers
SP,BP plus the index registers SI,DI and on the 80386 and above you
also
have debug registers.

In protected mode the segment registers become selector registers.

The segments are as follows:
CS code segment
DS data segment
ES extra segment
SS stack segment

The pointers are:
SP stack pointer
BP base pointer


--
Scott Hopson ([email protected])
____________________________________________

Hi,

I am not really familiar with interrupt calls, especially when done in
a 32 Bit OS enviroment.

I using the WatCom C Compiler Ver. 11b building 32 Bit Dos
applications.

I am trying to call interrupt 2FH, function 0DH for detecting the
driveletters of all connected cdroms.

I programmed this under 16bit dos and it works well, but I am not sure
how to adapt it for 32 Bit.

I have enclosed the code for 16 Bit. I would be pleased if anyone can
give advice.

Thanks a lot in advance.

Philipp

{
union REGS regs;
struct SREGS sregs;
char driveLetters [27];

regs.x.ax = 0x150D;
sregs.es = FP_SEG (driveLetters);
regs.x.bx = FP_OFF (driveLetters);
int86x (0x2F, &regs, &regs, &sregs);
}

**********************
* Auszug aus <i86.h> *
**********************


struct DWORDREGS {
unsigned int eax;
unsigned int ebx;
unsigned int ecx;
unsigned int edx;
unsigned int esi;
unsigned int edi;
unsigned int cflag;
};


struct WORDREGS {
unsigned short ax; __FILLER(_1)
unsigned short bx; __FILLER(_2)
unsigned short cx; __FILLER(_3)
unsigned short dx; __FILLER(_4)
unsigned short si; __FILLER(_5)
unsigned short di; __FILLER(_6)
#if defined(__WINDOWS_386__)
unsigned short cflag;
#else
unsigned int cflag;
#endif
};


struct BYTEREGS {
unsigned char al, ah; __FILLER(_1)
unsigned char bl, bh; __FILLER(_2)
unsigned char cl, ch; __FILLER(_3)
unsigned char dl, dh; __FILLER(_4)
};


union REGS {
#if defined(__386__) && !defined(__WINDOWS_386__)
struct DWORDREGS x;
#else
struct WORDREGS x;
#endif
struct WORDREGS w;
struct BYTEREGS h;
};


struct SREGS {
unsigned short es, cs, ss, ds;
#if defined(__386__)
unsigned short fs, gs;
#endif
};
 
N

Nick Keighley

/* frank */ said:
I have to do a homework: make a CPU simulator using C language.

I have a set of asm instructions so I have to write a program
that should:
- load .asm file
- view .asm file
- do a step by step simulation
- display registers contents

I tried to search con google with no success.

Any help (links, tricks/tips...)
would be really appreciated.

Thanks in advance.

didn't you get *any* hints at all on how to takle this?
Out of the goodness of my heart:-

typedef unsigned char Byte;
typedef unsigned long Register;

Byte memory [1024};

Register rA; /* accumulator */
Register pc; /* program counter */
Register rI; /* instruction register */


Register fetch32 (void)
{
Register f;

f = *pc++;
f = (f << 8) & *pc++;
f = (f << 8) & *pc++;
f = (f << 8) & *pc++;
}


void execute (void)
{
rI = fetch32 (); /* 32-bit fetch */

while (rI != HALT)
{
/* identify the instruction */
/* process the instruction */
}
}

int main (void)
{
/* load memory somehow */
pc = memory;
execute ();
}


this is untested. It hasn't even been compiled.
 
D

Dan Pop

In said:
The key idea is to map all parts of a CPU on C structures
and routines. For example: the program counter (PC) can
be simply mapped to an int variable.

An usigned integer type would be a much better choice for the program
counter. Ditto for the other integer registers.

Dan
 
E

Eric Sosman

Dan said:
An usigned integer type would be a much better choice for the program
counter. Ditto for the other integer registers.

<OT degree="severe">

I once worked on an actual machine whose program counter
registers (there were two) were signed -- the explanation
given was that the same circuitry implemented all the machine's
registers, so the signedness of the PCs was just a by-product
of parsimonious hardware design. However, only the low-order
bits of the active PC participated in address generation; the
sign bits and high-order bits were simply ignored.

So far, just an amusing peculiarity. But the really odd
thing was that the operation of "increment the program counter"
was implemented as the arithmetic operation "add one to the
program counter" -- so if the PC contained a negative value,
the effect was that the program ran backwards! I got some
diversion out of dreaming up a code sequence that executed
in the usual fashion for a while and then negated the PC and
"backed out" by re-executing the preceding instructions in
reverse order ...

</OT>
 
D

Dan Pop

In said:
You are searching with the wrong target word. The proper word is emulate,
not simulate.

To simulate is to try to predict how fast some CPU design (probably unbuilt)
will be. To emulate is to produce the effect on computer B that a program
was run on computer A.

You're splitting hairs. Try googling for "8051 simulator".

Dan
 
C

Case -

Nick said:
/* frank */ said:
I have to do a homework: make a CPU simulator using C language.
Out of the goodness of my heart:-

typedef unsigned char Byte;
typedef unsigned long Register;

Byte memory [1024};

Typo here }
Register rA; /* accumulator */
Register pc; /* program counter */
Register rI; /* instruction register */


Register fetch32 (void)
{
Register f;

f = *pc++;

To do this, pc should at least be a pointer.
f = (f << 8) & *pc++;
f = (f << 8) & *pc++;
f = (f << 8) & *pc++;

Of course you mean: f = (f << 8) | *pc++;
}


void execute (void)
{
rI = fetch32 (); /* 32-bit fetch */

while (rI != HALT)
{
/* identify the instruction */
/* process the instruction */
}
}

int main (void)
{
/* load memory somehow */
pc = memory;

Because memory is unsigned char[] (or IAW unsigned char *)
and pc is unsigned long, this will not work.
execute ();
}


this is untested. It hasn't even been compiled.

Will not compile, and doesn't compute either ;-)

<OT>
Why not simply use the pc as an index in memory? This
would allow you to write memory[pc]. And doesn't this
perfectly match what a PC does, being an index into a
block of data?
<\OT>

Case
 
S

Scott Moore

/* frank */ said:
Case ha scritto:



Thanks a lot.

I was searching only for a "starting input" .

I write a lot of these. Here are some hints.

1. I would skip the .asm part unless you have some sort of class
assignment going on that requires it. Its an unecessary complication
to have to assemble the code yourself. Better is to figgure out how
to get a "straight binary image" from a standard assembler and use
that. A binary image would contain only the code, without symbols
and other window dressing.

2. Contructing C structures or variables that contain the registers
and flags for the target CPU, these will "overflow" in C without
error. However, you will still need to make a routine that checks
what flags were set during that operation. Typically this is done
by examining the result and the operands, and checking if the result
makes sense. For example, adding two positive numbers and getting
a negative result means an overflow occurred.

3. Although common instruction sets today can be quite large, say
32 bits or more for an instruction, there is typically a much
shorter section of the instruction that determines what instruction
type is being processed. Your typical "execute" case statement
will extract that field and form a case based on that. Be prepared
for a large case statement on most current CPUs. This is good,
because it means your emulator will be fast.

Luck !

--
Samiam is Scott A. Moore

Personal web site: http:/www.moorecad.com/scott
My electronics engineering consulting site: http://www.moorecad.com
ISO 7185 Standard Pascal web site: http://www.moorecad.com/standardpascal
Classic Basic Games web site: http://www.moorecad.com/classicbasic
The IP Pascal web site, a high performance, highly portable ISO 7185 Pascal
compiler system: http://www.moorecad.com/ippas

Being right is more powerfull than large corporations or governments.
The right argument may not be pervasive, but the facts eventually are.
 
C

Case -

Dan said:
An usigned integer type would be a much better choice for the program
counter. Ditto for the other integer registers.

I didn't say what kind of int, so technically what you propose
is covered by my statement ;-)

Yes, you're right, the PC is best typed as unsigned. I'm not sure
about the registers. Values in registers are seen as 2-s complement
by instruction in at least some CPU's (e.g., MIPS has pairs of
similar instructions for singed and unsigned register operand).

Case
 
E

Eric Sosman

Case said:
I didn't say what kind of int, so technically what you propose
is covered by my statement ;-)

Yes, you're right, the PC is best typed as unsigned. I'm not sure
about the registers. Values in registers are seen as 2-s complement
by instruction in at least some CPU's (e.g., MIPS has pairs of
similar instructions for singed and unsigned register operand).

Note that you must issue the occasional no-op when
using instructions of the first type, to give the singed
registers time to cool.
 
R

RoSsIaCrIiLoIA

for me it is difficult write a portable and *fast* x86 cpu in C
(it has to execute an OS)
I'm a beginner but I would 'solve' the problem in this way:

___________________________________
#include <stdio.h>
#include <stdint.h> /* or stddef don't remember for uintxx_t */

struct r16{
uint8_t rl; /* uintXX_t would be in the standard c c89 */
uint8_t rh; /* so it is portable: it is ok in every cpu */
}; /* but in the x86 cpu a register is good for
signed
and unsigned calculation */


struct r32{
struct r16 ac;
uint16_t sn;
};


/* all global */
struct r32 eax_={0}, ebx_={0}, ecx_={0}, edx_={0};

/* they are static so until I don't write ={0} they are ={0} ie all 0
at the start of prog */

struct r32 esi_, edi_, ebp_, esp_, eip_;
uint16_t cs_, ds_, es_, ss_, fs_, gs_, flags_;

struct r32 *eax= &eax_, *ebx= &ebx_, *ecx= &ecx_, *edx= &edx_;
struct r32 *esi= &esi_, *edi= &edi_, *ebp= &ebp_, *esp= &esp_,
*eip = &eip_;
uint16_t *cs=&cs_, *ds=&ds_, *es=&es_, *ss=&ss_, *fs=&fs_,
*gs=&gs_, *falgs=&flags_;


#define ax eax->ac
#define al eax->ac.rl
#define ah eax->ac.rh

#define bx ebx->ac
#define bl ebx->ac.rl
#define bh ebx->ac.rh

#define cx ecx->ac
#define cl ecx->ac.rl
#define ch ecx->ac.rh

#define dx edx->ac
#define dl edx->ac.rl
#define dh edx->ac.rh

#define sp esp->ac
#define bp ebp->ac
#define si esi->ac
#define di edi->ac
#define ip eip->ac
#define U unsigned
#define P printf

void assign(struct r32* a, uint32_t b)
{uint8_t l, h;
/*--------------*/
(*a).sn = (b>>16) & 0xFFFF;
(*a).ac.rl = b & 0xFF;
(*a).ac.rh = b>>8;
}

void Pr(struct r32* a)
{uint32_t b;
/*----------------*/
b=(*a).sn;
//P("a.sn=%u", (U) a.sn);
b <<= 16;
b |= ((uint32_t) (*a).ac.rh << 8 ) | (*a).ac.rl;
printf("%x", (int) b);
fflush(stdout);
}

void somma(struct r32* a,struct r32* b)
{uint32_t bb, aa;
/*----------------*/
aa =(*a).sn; aa <<= 16;
aa |= ((uint32_t) (*a).ac.rh << 8 ) | (*a).ac.rl;
bb = b->sn; bb <<= 16;
bb |= ((uint32_t) b->ac.rh << 8 ) | b->ac.rl;
bb += aa;
assign(a, bb);
}

/* it is difficult but main() has to
1) read in a file in binary form a instructon
2) perform that istruction in the program
*/

int main(void)
{
assign( eax , 0xFEFEFEFE); assign( ebx , 0xFAFAFAFA);
P("eax="); Pr(eax); P(" ebx="); Pr(ebx); P("\n");
assign(ecx, 50000); assign(edx, 512341);
somma(ecx, edx);
P("ecx="); Pr(ecx); P(" edx="); Pr(edx); P("\n");
printf("somma=%x", (int)(50000 + 512341) );
return 0;
}

/*
eax=fefefefe ebx=fafafafa
ecx=894a5 edx=7d155
somma=894a5
*/
 
G

Gordon Burditt

This is not a Unix item but strictly a PC item as follows:
union REGS {
struct WORDREGS x;
struct BYTEREGS h;
};

struct WORDREGS {
unsigned int ax, bx, cx, dx;
unsigned int si, di, cflag, flags;
};

struct BYTEREGS {
unsigned char al, ah, bl, bh;
unsigned char cl, ch, dl, dh;
};

These structures allow C programs on PCs to access the CPU registers.

AX,BX,CX,DX,SI,DI,CF(carry flag),and FLAGS are all Intel 80x86
registers
AL/AH,BL/BH,CL/CH,DL/DH are pseudo 8 bit registers which are derived
from the
AL = Low half of AX, AH = High half of AX, etc.

I have to really wonder about this design for a CPU emulator. Some
of the registers listed above share storage (for example, ax consists
of ah and al concatenated, and eax contains ax). Every time you
change one register (e.g. eax, ax, al, or ah), you have to change
all of them, or keep track of which one is more up to date. This
tends to make things slow. and bug-prone, if you're not careful.
Assembly code WILL occasionally make use of this fact, for example,
loading ah with 0 to do an unsigned-extension of al into ax may be
the fastest way to accomplish this.

There may also be reasons for representing some of the registers as an
array. For example, the field in a machine instruction that specifies
a register may be consistent enough that isolating that field and using
it as an array index may be useful.
The basic registers are 16 bits while the extended registers are 32
bits
EAX,EBX,ECX,EDX, etc.

In addition you have the segment registers CS,DS,ES,SS and the
pointers
SP,BP plus the index registers SI,DI and on the 80386 and above you
also
have debug registers.

In protected mode the segment registers become selector registers.

There are a number of registers not mentioned above in the *86
architecture (and this isn't complete either). I realize the above
wasn't intended to be a complete list.

- EFLAGS, EIP, EDI, ESI, ESP, EBP
- Segment registers FS and GS
- Floating point registers
- Control registers
- Machine specific registers
- Portions of the "hidden" part of selector registers can become visible
under the right circumstances.
- GDTR, IDTR, Task register, and LDTR
- MMX and XMM registers (which may overlap with others)


Gordon L. Burditt
 
G

Gordon Burditt

Why not simply use the pc as an index in memory? This
would allow you to write memory[pc]. And doesn't this

This is a great idea for simpler CPUs without memory management
hardware, and especially for CPUs where the maximum addressable
memory of the emulated CPU (e.g. Z80 or 8086) is small (e.g. 64k
bytes or even 1MB) compared to the available RAM of the host CPU
running the emulation.
perfectly match what a PC does, being an index into a
block of data?

Yes, given the absence of memory management. Just Intel segment-register
mappings don't make it THAT hard, but protected-mode memory management
can make it much harder. With memory management, things get
complicated all of a sudden, especially if it's allowed for a
multi-byte integer fetch to straddle memory-management pages, and
where multiple very-different addresses can refer to the same block
of memory.

Gordon L. Burditt
 
C

Case -

Gordon said:
<OT>
Why not simply use the pc as an index in memory? This
would allow you to write memory[pc]. And doesn't this

This is a great idea for simpler CPUs without memory management
hardware, and especially for CPUs where the maximum addressable
memory of the emulated CPU (e.g. Z80 or 8086) is small (e.g. 64k
bytes or even 1MB) compared to the available RAM of the host CPU
running the emulation.
perfectly match what a PC does, being an index into a
block of data?

Yes, given the absence of memory management. Just Intel segment-register
mappings don't make it THAT hard, but protected-mode memory management
can make it much harder. With memory management, things get
complicated all of a sudden, especially if it's allowed for a
multi-byte integer fetch to straddle memory-management pages, and
where multiple very-different addresses can refer to the same block
of memory.

I didn't think of this in the context of OP's homework
assignment. Thanks for the extra info! CPU (and virtual
machine) design is and remains a very interesting subject.

Case
 
J

John Cochran

Eric Sosman said:
I once worked on an actual machine whose program counter
registers (there were two) were signed -- the explanation
given was that the same circuitry implemented all the machine's
registers, so the signedness of the PCs was just a by-product
of parsimonious hardware design. However, only the low-order
bits of the active PC participated in address generation; the
sign bits and high-order bits were simply ignored.

So far, just an amusing peculiarity. But the really odd
thing was that the operation of "increment the program counter"
was implemented as the arithmetic operation "add one to the
program counter" -- so if the PC contained a negative value,
the effect was that the program ran backwards! I got some
diversion out of dreaming up a code sequence that executed
in the usual fashion for a while and then negated the PC and
"backed out" by re-executing the preceding instructions in
reverse order ...

Huh?

0x8000 = -32768
0x8000 + 1 = 0x8001 = -32767
0x8001 + 1 = 0x8002 = -32766
....

Show me where adding 1 to a negative binary number will cause the low
order bits to "run backwards" compared to adding 1 to an unsigned binary
number.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top