Doh! Sorry, Arne, i completely failed to understand there. You're quite
right, of course. And i would imagine that in most applications, reads
of data far outweigh reads of code (once you account for the caches). I
would be very interested to see numbers for that across different kinds
of program, though.
It depends what you mean by 'read'.
If you look at the instruction flow into the CPU, i.e. out of any caches
and into the CPU proper, the instruction flow is considerably larger than
the data flow in almost any architecture.
For the optimised code:
int a, b, c;
if (a > 0) then
c = a + b;
else
c = -1;
Assembler examples are:
ICL 1900 (24 bit f/l instructions. 24 bit words) c = a + b
LDA 7 A # Read A
LDN 6 0 # load literal zero
BLE 7 L1 # Jump if A <= 0
ADD 7 B # Read and add B
BRN L2
L1 LDN 7 -1 # Set result to -1
L2 STO 7 C # Save the result
Instructions: 7 read
Data: 2 words read, 1 word written
Ratio I
7:2
68020 (32 bit MPU and addresses, v/l instructions, 16 bit words)
MOVE.W A,D1 # Read A 5 bytes [1]
BMI.S L1 # Jump if negative 2 bytes
BEQ.S L1 # Jump if zero 2 bytes
ADD.W B,D1 # Read and add B 5 bytes
BRA.S L2 # 2 bytes
L1 MOVE.W #0,D1 # Set result to -1 5 bytes
L2 MOVE.W D1,C # Save the result 5 bytes
Instructions: 26 bytes read
Data: 4 bytes read, 2 bytes written
Ratio I
26:6
[1] I won't swear to these MOVE and ADD instruction lengths (my handbook
doesn't give them and my 68020 isn't running at present, but even if I'm
wrong and they're only 3 bytes, the ratio is still 18:6.
You don't have to throw in much in the way of overflow checking, address
arithmetic, etc to increase the Instruction
ata ratio quite considerably.
Both my examples are of processors with a decently sized register set but
I don't think entirely stack-oriented machines would do much better.
The ICL 2900 had the most sophisticated architecture I've seen (entirely
stack-based, descriptors for all but primitive data types, software-
controlled register length) and averaged 3 instructions per COBOL
sentence v.s the 6+ per sentence of the 1900, but its instruction flow
through the OCP (Order Code Processor) was higher than its data flow and
the hardware was optimised to reflect that fact.
If anybody knows of hardware where the data flow is larger then the
instruction flow and can provide an equivalent example I'd be fascinated
to see it.