Natural size: int

  • Thread starter Frederick Gotham
  • Start date
F

Frederick Gotham

On modern 32-Bit PC's, the following setup is common:

char: 8-Bit
short: 16-Bit
int: 32-Bit
long: 32-Bit

"char" is commonly used to store text characters.
"short" is commonly used to store large arrays of numbers, or perhaps wide
text characters (via wchar_t).
"int" is commonly used to store an integer.
"long" is commonly used to store an integer greater than 65535.

Now that 64-Bit machines are coming in, how should the integer types be
distributed? It makes sense that "int" should be 64-Bit... but what should
be done with "char" and "short"? Would the following be a plausible setup?

char: 8-Bit
short: 16-Bit
int: 64-Bit
long: 64-Bit

Or perhaps should "short" be 32-Bit? Or should "char" become 16-Bit (i.e.
16 == CHAR_BIT).

Another semi-related question:

If we have a variable which shall store the quantity of elements in an
array, then should we use "size_t"? On a system where "size_t" maps to
"long unsigned" rather than "int unsigned", it would seem to be inefficient
most of the time. "int unsigned" guarantees us at least 65535 array
elements -- what percentage of the time do we have an array any bigger than
that? 2% maybe? Therefore would it not make sense to use unsigned rather
than size_t to store array lengths (or the positive result of subtracting
pointers)?
 
J

jacob navia

Frederick said:
On modern 32-Bit PC's, the following setup is common:

char: 8-Bit
short: 16-Bit
int: 32-Bit
long: 32-Bit

"char" is commonly used to store text characters.
"short" is commonly used to store large arrays of numbers, or perhaps wide
text characters (via wchar_t).
"int" is commonly used to store an integer.
"long" is commonly used to store an integer greater than 65535.

Now that 64-Bit machines are coming in, how should the integer types be
distributed? It makes sense that "int" should be 64-Bit... but what should
be done with "char" and "short"? Would the following be a plausible setup?

char: 8-Bit
short: 16-Bit
int: 64-Bit
long: 64-Bit

Or perhaps should "short" be 32-Bit? Or should "char" become 16-Bit (i.e.
16 == CHAR_BIT).

For windows systems, Microsoft decided that with 64 bit machines it
will be
char 8, short 16, int 32, long 32, __int64 64

For unix systems, gcc decided that
char 8, short 16, int 32, long 64, long long 64
Another semi-related question:

If we have a variable which shall store the quantity of elements in an
array, then should we use "size_t"? On a system where "size_t" maps to
"long unsigned" rather than "int unsigned", it would seem to be inefficient
most of the time. "int unsigned" guarantees us at least 65535 array
elements -- what percentage of the time do we have an array any bigger than
that? 2% maybe? Therefore would it not make sense to use unsigned rather
than size_t to store array lengths (or the positive result of subtracting
pointers)?

There is no difference in 32 bit machines since a register will be 32
bits. If you fill 16 bits only, the other are wasted.

If you store the index data in memory in global variables or in disk,
where space is more important you *could* have some space gains by
using a short, or even a char. But beware of alignment issues. The
compiler will align data to 32 bits in most machines so the gains
could be very well zero.
 
M

Malcolm

Frederick Gotham said:
On modern 32-Bit PC's, the following setup is common:

char: 8-Bit
short: 16-Bit
int: 32-Bit
long: 32-Bit

"char" is commonly used to store text characters.
"short" is commonly used to store large arrays of numbers, or perhaps wide
text characters (via wchar_t).
"int" is commonly used to store an integer.
"long" is commonly used to store an integer greater than 65535.

Now that 64-Bit machines are coming in, how should the integer types be
distributed? It makes sense that "int" should be 64-Bit... but what should
be done with "char" and "short"? Would the following be a plausible setup?

char: 8-Bit
short: 16-Bit
int: 64-Bit
long: 64-Bit
If you use int you want an integer.
If the manufacturer has kindly provided 64 bit registers, obviously he wants
you to use 64-bit integers.
So it seems pretty obvious what to do.
Or perhaps should "short" be 32-Bit? Or should "char" become 16-Bit (i.e.
16 == CHAR_BIT).

Another semi-related question:

If we have a variable which shall store the quantity of elements in an
array, then should we use "size_t"? On a system where "size_t" maps to
"long unsigned" rather than "int unsigned", it would seem to be
inefficient
most of the time. "int unsigned" guarantees us at least 65535 array
elements -- what percentage of the time do we have an array any bigger
than
that? 2% maybe? Therefore would it not make sense to use unsigned rather
than size_t to store array lengths (or the positive result of subtracting
pointers)?
size_t was a nice idea - a type to hold a size of an object in memory.
Sadly the implications weren't thought through - if you can't use an int to
index an array, then the machine manufacturer has done something weird and
wonderful with his address bus.

characters for character data
integers for integral data
double precision for floating point numbers.

That's all the world really needs, except byte for chunks of 8-bit data in
the rare cases where memory size matters.
 
S

Stephen Sprunk

Frederick Gotham said:
On modern 32-Bit PC's, the following setup is common:

char: 8-Bit
short: 16-Bit
int: 32-Bit
long: 32-Bit

Or long might be 64-bit. Or int might be 16-bit. You can find just
about every allowable combination out there in the wild.

....
Now that 64-Bit machines are coming in, how should the integer types
be distributed? It makes sense that "int" should be 64-Bit... but
what
should be done with "char" and "short"? Would the following be a
plausible setup?

char: 8-Bit
short: 16-Bit
int: 64-Bit
long: 64-Bit

That's referred to as ILP64, and there are indeed systems out there
like that. However, I32LP64 and IL32LLP64 are arguably more common.
Or perhaps should "short" be 32-Bit? Or should "char" become 16-Bit
(i.e. 16 == CHAR_BIT).

A char should be the smallest addressable unit of memory; if your
system only supports 16-bit (or greater) loads, it may be reasonable
to have CHAR_BIT==16, but expect to have to hack up virtually every
program you try to port. Even Cray and the DEC Alpha had to
synthesize 8-bit loads for the char type, because not doing so was
suicide.
Another semi-related question:

If we have a variable which shall store the quantity of elements
in an
array, then should we use "size_t"? On a system where "size_t" maps
to "long unsigned" rather than "int unsigned", it would seem to be
inefficient most of the time.

You assume that shorter ints are somehow more efficient than longer
ints; many modern processors have slow shorts and int is no faster
than long or long long.

Premature optimization is the root of all evil. Avoid the temptation
unless profiling shows it matters and the change actually helps in a
significant way. Then document the heck out of it.
"int unsigned" guarantees us at least 65535 array elements -- what
percentage of the time do we have an array any bigger than that?
2% maybe? Therefore would it not make sense to use unsigned
rather than size_t to store array lengths

If you use int (or long), you always have to worry about what happens
if/when you're wrong; use size_t and and you can promptly forget about
it.

I've seen all kinds of systems which crashed when one more record was
added, and it was always due to coders who assumed "we'll never have
more than 32767 employees/customers" or some such.
(or the positive result of subtracting pointers)?

The result of subtracting pointers is already ptrdiff_t, so why use
something else? ssize_t is about the only reasonable replacement, but
it's not portable. size_t is fine if you test to make sure the
difference is positive first. Do you really care so much about the
extra two or three letters it takes to use a type that is _guaranteed_
to work that you're willing to accept your program randomly breaking?

S
 
S

Skarmander

Malcolm said:
[size and nature of integral types]
size_t was a nice idea - a type to hold a size of an object in memory.
Sadly the implications weren't thought through - if you can't use an int to
index an array, then the machine manufacturer has done something weird and
wonderful with his address bus.
Consider a system with 64-bit pointers and 32-bit ints -- not that
far-fetched, right? On such a system size_t might be a 64-bit type as well.
You can still *use* ints to index an array, but not the huge arrays the
system might allow. (Whether that matters is another thing.)
characters for character data

You mean like Unicode?
integers for integral data

25! = 15511210043330985984000000
double precision for floating point numbers.
Which is double of what?
That's all the world really needs, except byte for chunks of 8-bit data in
the rare cases where memory size matters.

That is almost never the reason why bytes are used.

S.
 
F

Frederick Gotham

jacob navia posted:
For windows systems, Microsoft decided that with 64 bit machines it
will be
char 8, short 16, int 32, long 32, __int64 64

For unix systems, gcc decided that
char 8, short 16, int 32, long 64, long long 64



What's the point in having a 64-Bit system if it's not taken advantage of? It
would be less efficient to use 32-Bit integers on a 64-Bit machine. It would
probably be more efficient to use 32-Bit integers on a 32-Bit machine rather
than on a 64-Bit machine, no?

When people use an "int", they expect it to be the most efficient integer
type.
 
B

Ben Pfaff

Frederick Gotham said:
What's the point in having a 64-Bit system if it's not taken advantage of? It
would be less efficient to use 32-Bit integers on a 64-Bit machine. It would
probably be more efficient to use 32-Bit integers on a 32-Bit machine rather
than on a 64-Bit machine, no?

This is not necessarily the case. It may be just as efficient to
work with 32- or 64-bit integers on a system with 64-bit
general-purpose registers. Using 32-bit integers can save a good
deal of memory (especially in arrays or structures), so it may
make sense to use a 32-bit `int' on such systems.
 
K

Keith Thompson

Frederick Gotham said:
On modern 32-Bit PC's, the following setup is common:

char: 8-Bit
short: 16-Bit
int: 32-Bit
long: 32-Bit

"char" is commonly used to store text characters.
"short" is commonly used to store large arrays of numbers, or perhaps wide
text characters (via wchar_t).
"int" is commonly used to store an integer.
"long" is commonly used to store an integer greater than 65535.

Now that 64-Bit machines are coming in, how should the integer types be
distributed? It makes sense that "int" should be 64-Bit... but what should
be done with "char" and "short"? Would the following be a plausible setup?

char: 8-Bit
short: 16-Bit
int: 64-Bit
long: 64-Bit

Or perhaps should "short" be 32-Bit? Or should "char" become 16-Bit (i.e.
16 == CHAR_BIT).

Making int 64 bits leaves a gap in the type system; unless the
implementation has C99 extended integer types, either there's no
16-bit integer type or there's no 32-bit integer type.

A common setup on 64-bit systems is:

char: 8 bits
short: 16 bits
int: 32 bits
long: 64 bits
long long: 64 bits

Of course there are other possibilities.

Most 64-bit systems, I think, can perform 32-bit operations reasonably
efficiently, so there's not much disadvantage in defining int as 32
bits.

Also, unless you're an implementer, you don't have much influence.
Compiler writers get to decide who big the fundamental types are going
to be; as users, most of us just have to deal with whatever they
provide.
Another semi-related question:

If we have a variable which shall store the quantity of elements in an
array, then should we use "size_t"? On a system where "size_t" maps to
"long unsigned" rather than "int unsigned", it would seem to be inefficient
most of the time. "int unsigned" guarantees us at least 65535 array
elements -- what percentage of the time do we have an array any bigger than
that? 2% maybe? Therefore would it not make sense to use unsigned rather
than size_t to store array lengths (or the positive result of subtracting
pointers)?

If you're wondering about percentages like that, then you're
approaching the problem from the wrong perspective.

If you're sure an array can never have more than 65535 elements, go
ahead and use unsigned int to index it. If you're not sure how big it
can be, size_t is a reasonable choice.
 
J

jmcgill

Frederick said:
jacob navia posted:




What's the point in having a 64-Bit system if it's not taken advantage of?

Why do you think it's not taken advantage of?
It
would be less efficient to use 32-Bit integers on a 64-Bit machine.

Of course a 64-bit ALU or instruction path might be able to handle two
32-bit integers in a similar number of clock cycles as the 32-bit system
can handle one.

A 64-bit architecture might be able to do things in the instruction
decoding, pipeline, or other datapath considerations that are not
possible in a 32-bit architecture.

Or maybe, a 64-bit machine deals with 32-bit integers by sign-extending
the leftmost 32-bits. Is that less efficient? Do you have metrics for
your platform?
It would
probably be more efficient to use 32-Bit integers on a 32-Bit machine rather
than on a 64-Bit machine, no?

This is not necessarily true; certainly untrue for every situation.
When people use an "int", they expect it to be the most efficient integer
type.

When people use an "int" in C, they expect it to be a numeric data type
no larger than a "long".
 
F

Frederick Gotham

Stephen Sprunk posted:

A char should be the smallest addressable unit of memory; if your
system only supports 16-bit (or greater) loads, it may be reasonable
to have CHAR_BIT==16, but expect to have to hack up virtually every
program you try to port. Even Cray and the DEC Alpha had to
synthesize 8-bit loads for the char type, because not doing so was
suicide.


I don't see why (unless you're reading/writing data to/from disk perhaps?).

You assume that shorter ints are somehow more efficient than longer
ints; many modern processors have slow shorts and int is no faster
than long or long long.


No, I presume that int is faster than long (or they're both as fast as each
other).

I would expect the following:

speed_of(int) >= speed_of(long)

speed_of(int) >= speed_of(short) >= speed_of(char)
 
S

Stephen Sprunk

Frederick Gotham said:
jacob navia posted:

What's the point in having a 64-Bit system if it's not taken
advantage
of? It would be less efficient to use 32-Bit integers on a 64-Bit
machine.

If the programmer needed a 64-bit type, he would have used long long.
If he used int, he doesn't need more than 16 bits (or, given how bad
many coders are, 32 bits).
It would probably be more efficient to use 32-Bit integers on
a 32-Bit machine rather than on a 64-Bit machine, no?

This is not like the opposite case of 64-bit ints being slow on a
32-bit system.

It is not hard to imagine a 64-bit system that executed half-width
operations faster than full-width operations, and 32-bit ints or longs
definitely take up less memory, reducing cache misses and improving
overall memory performance and footprint. The speed hit you get
merely from using 64-bit pointers often makes systems slower in 64-bit
mode than they are in 32-bit mode; force them to 64-bit ints and it'd
get worse.
When people use an "int", they expect it to be the most efficient
integer
type.

The _largest_ integer type the system supports may not be the _most
efficient_.

If I'm working with really small numbers, using 16-bit shorts or even
8-bit chars is likely to be much faster than (or at worse, the same
speed as) 64+ bit long longs, even if I've got some whiz-bang 128-bit
processor.

S
 
E

Eric Sosman

Frederick Gotham wrote On 08/08/06 15:52,:
jacob navia posted:






What's the point in having a 64-Bit system if it's not taken advantage of? It

Please define precisely what you mean by "a 64-Bit system."
What specific characteristic or set of characteristics causes
you to classify system X as "64-Bit" and system Y as "48-bit?"

Register width? Memory transaction width? Virtual address
width? Physical address width? What's your chosen criterion,
and why does it dominate the others? "64-bit CPUs" have been
around for a dozen years or so, and have evolved a variety of
different traits; simply saying "64-bit" is not sufficiently
specific.
would be less efficient to use 32-Bit integers on a 64-Bit machine. It would
probably be more efficient to use 32-Bit integers on a 32-Bit machine rather
than on a 64-Bit machine, no?

When people use an "int", they expect it to be the most efficient integer
type.

Please define precisely what you mean by "most efficient,"
or at least by "efficient." Are you concerned about instruction
speed inside the ALU? Access speed between ALU, assorted caches,
RAM, and swap? Total data size, with accompanying effects on
cache misses and page fault rates? Can you formulate a definition
that captures the crucial issues for all applications?
 
S

Stephen Sprunk

Frederick Gotham said:
Stephen Sprunk posted:




I don't see why (unless you're reading/writing data to/from disk
perhaps?).



No, I presume that int is faster than long (or they're both as fast
as each
other).

I would expect the following:

speed_of(int) >= speed_of(long)

True in every implementation I've seen; int is typically chosen to be
the fastest size that the platform supports (of at least 16 bits).
However, there is no guarantee that the designers of a particular
implementation possess common sense.
speed_of(int) >= speed_of(short) >= speed_of(char)

Not always true. shorts are slower than chars even on some common
platforms (e.g. Intel P6-based cores), and chars may be faster than
ints and shorts due to reduced memory pressure.

I would expect ints to be faster than shorts or chars in most
situations, but it's possible that memory effects may make chars
faster for particular tasks.

S
 
S

Stephen Sprunk

Malcolm said:
....
If you use int you want an integer.

You mean an integer which is not required to hold values outside the
range -32767 to 32767, and which is probably the fastest integer type.
If the manufacturer has kindly provided 64 bit registers, obviously
he wants you to use 64-bit integers.
So it seems pretty obvious what to do.

What the manufacturer wants has nothing to do with what I want or what
the compiler writers want. I want my code to (1) work, and (2) run as
fast as possible. The manufacturer wants to extract money from my
wallet. There are no shortage of cases where 32-bit ints are faster
than 64-bit types on a processor that "kindly provided 64 bit
registers."

If I need an integer with at least 64 bits, I'll use long long; if I
want a fast integer, I'll use int; if I want a fast integer with at
least 32 bits, I'll use long. They may not be the same size, and it's
up to the compiler folks to pick which combination is best for a given
platform.
size_t was a nice idea - a type to hold a size of an object in
memory.
Sadly the implications weren't thought through - if you can't use an
int to index an array, then the machine manufacturer has done
something weird and wonderful with his address bus.

And such weird and wonderful things are allowed by the standard,
because they existed prior to its creation and the purpose of ANSI C
was, for the most part, to document what existed and not to create an
ideal language.

You can assume int or long (or long long) is good enough, and in most
cases you'll be right, but that non-portable assumption will
eventually come crashing down -- typically while you're doing a demo
for a customer, or years after the original coder left and you're
stuck with maintaining his crap. Use size_t and you'll never need to
worry about it. Thanks, ANSI.

S
 
J

J. J. Farrell

Frederick said:
On modern 32-Bit PC's, the following setup is common:

char: 8-Bit
short: 16-Bit
int: 32-Bit
long: 32-Bit

"char" is commonly used to store text characters.
"short" is commonly used to store large arrays of numbers, or perhaps wide
text characters (via wchar_t).
"int" is commonly used to store an integer.
"long" is commonly used to store an integer greater than 65535.

Now that 64-Bit machines are coming in, how should the integer types be
distributed?

Now coming in ?????

Discussions like this were common place 10 or 15 years ago when
mainstream 64-bit processors were coming in, and the subject was done
to death then. You'll find lots of interesting discussion of this
subject in the archives of this group and comp.std.c, among other
places.

The main issues to be considered are memory usage and compatibility
with the huge quantities of non-portable code which makes assumptions
(often implicit) about the size of objects. These are part of why 'long
long' eventually made it into C99. The questions of speed of integers
are what led to the int_fast types in C99. It's arguable that most of
the core language changes and extensions in C99 were driven by the
64-bit processors which had become widespread since C89 was
standardized.
 
K

Keith Thompson

Frederick Gotham said:
Stephen Sprunk posted:


I don't see why (unless you're reading/writing data to/from disk perhaps?).

Because, even though C doens't guarantee CHAR_BIT==8, there are a lot
of advantages to having CHAR_BIT==8. In particular, both Cray and DEC
Alpha systems run Unix-like operating systems; I suspect that
implementing Unix with 64-bit characters would be a nightmare. Even
if it's possible, exchanging data with other Unix-like systems would
be difficult.

For Cray vector systems, the code that uses most of the CPU time is
doing floating-point calculations; if some of the other code happens
to be slow, it doesn't matter a whole lot.
 
I

Ian Collins

jacob said:
For windows systems, Microsoft decided that with 64 bit machines it
will be
char 8, short 16, int 32, long 32, __int64 64

For unix systems, gcc decided that
char 8, short 16, int 32, long 64, long long 64
gcc decided? I don't think so.

LP64 is by far the most common model on UNIX and UNIX like systems and
the main reason is probably pragmatic - it's the most straightforward
model to port 32 bit applications to.
 
A

Al Balmer

jacob navia posted:




What's the point in having a 64-Bit system if it's not taken advantage of? It
would be less efficient to use 32-Bit integers on a 64-Bit machine.

Not necessarily.
It would
probably be more efficient to use 32-Bit integers on a 32-Bit machine rather
than on a 64-Bit machine, no?

Not necessarily. The same data path that can carry a 64-bit integer
can carry two 32-bit integers simultaneously.
 
J

jacob navia

Ian said:
gcc decided? I don't think so.

LP64 is by far the most common model on UNIX and UNIX like systems and
the main reason is probably pragmatic - it's the most straightforward
model to port 32 bit applications to.

Microsoft disagrees... :)

I am not saying that gcc's decision is bad, I am just stating this as
a fact without any value judgement. Gcc is by far the most widely
used compiler under Unix, and they decided LP64, what probably is a good
decision for them.

Microsoft decided otherwise because they have another code base.

And lcc-win32 did not decide anything. Under windows I compile
for long 32 bits, under Unix I compiler for long 64 bits.

I have to follow the lead compiler in each system. By the way, the
lead compiler in an operating system is the compiler that compiled
the Operating System: MSVC under windows, gcc under linux, etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top