Not a good idea. The compiler should be consistent or the programmer
will be surprised later on if the run-time code suddenly overflows on
65536. If the compiler implements 16bit ints, then that's fine. If it
is 32bit ints, then that's fine also. But if it is sometimes 32 and
sometimes 16 that's not a good thing.
Ummm, it looked to me like that OP wasn't trying to say anything like
that.
If I understand correctly, the question is about integral literals
appearing in source code -- though it should probably instead be about
integral literals appearing after pre-processing and constant folding.
The question is about the relative rates at which 16 bit literals
appear, compared to 32 bit literals. The point, if I understand
correctly, is not about computing at those widths, but rather about
the instructions involved in coding the literals into the machine
code.
There are architectures which offer several different ways of loading
the same literal.
For example, there are architectures which have (say) "Add Quick", an
instruction which operates on registers of whatever size is specified
in other parts of the instruction (e.g., ADDQ.W, ADDQ.L) -- but no
matter what the register size, the value being added is restricted to a
particular range, such as -128 to +127 . For example, ADDQ.L A3,#16
might be used to increment address register A3 by 16. Such instructions
could hypothetically crop up quite a bit in loops over arrays.
The same architecture might have an ADDI.W and ADDI.L (Add a 16 bit
constant to a 16 bit register; add a 16 bit constant to a 32 bit
register) and might have a regular ADD instruction in which
one of the source operand modes was "immediate 32 bit constant".
When these instructions are available, they can result in smaller
machine language, which is a benefit for fitting more into
cache; they may also end up operating faster because of the
reduced number of cycles required to read the constant from
memory. There may also turn out to be interesting tradeoffs
in such architectures, such as it potentially being faster to
load a byte value in one instruction and sign-extending it to 32
bits in a second instruction, rather than using a single
instruction to load a 32 bit constant -- the dual instruction
version might, for example, hypothetically pipeline better,
or the "sign extend" instruction might be only a two byte-long
instruction, with the "load quick" being 4 bytes long, for
a total of 6 bytes where the architecture would need 8 bytes
(4 for the instruction, 4 for the constant) to load a 32 bit
constant.
Suppose one is working on a compiler for an architecture
that offers different instructions for the same basic task,
with some of the variations being shorter or faster for particular
ranges of literal constants, If one were doing that, then one
might wonder whether it is worth the effort to put in a bunch
of compiler optimization work to get the absolute best possible
memory fit or absolute best possible instruction speed. Such work
could get particularily messy if the less-memory variations
turn out to take longer, as one would then have to analyze to see
whether the surrounding code is such that it is better to
optimize for space or time. In a tight loop that was likely to
stay in cache for a while, the answer could depend on the exact
size of the loop...
If this is what the OP is trying to talk about, then I would
make some suggestions:
1) That possibly this kind of study was undertaken back when RISC
theory was being developed;
2) That not just the explicitly coded (or constant folded) literals
are important, since there can be many implicit literals for
pointer/address manipulation
3) That it matters not only how often various sizes of literals appear,
but also what kind of circumstances they appear in. There might
be (hypothetically) several 32 bit constants per "for" loop
(initialization, termination test, increment), but those would tend
to get loaded into registers, and the literal that might be
most crucial to the speed of the loop might be (say) the 4 in
"add #4, %eax" where %eax is being used to index arrays. Or
depending on the architecture and generated code, it might be the 2
in the sequence INC R3; MOV R3, R2; SHL R2, #2; LOAD R4, $1BE00C(R2)
i.e., increment, shift-level to get a byte offset, load from that
byte offset relative to the contant location of a static array...