Machines where size of size_t is not equal to size of unsigned int/long

James Harris · Sep 30, 2013

AIUI for many CPUs and CPU modes a size_t could be typedef'd to unsigned int
or unsigned long. I wondered where that would not be the case. Anyone know
which CPUs or modes would have a size_t which was not the same size as
unsigned int or unsigned long?

James

tim prince · Sep 30, 2013

AIUI for many CPUs and CPU modes a size_t could be typedef'd to unsigned int
or unsigned long. I wondered where that would not be the case. Anyone know
which CPUs or modes would have a size_t which was not the same size as
unsigned int or unsigned long?

James

Most of our work nowadays is on the AMD64/Intel64 linux, or
corresponding Windows X64, where size_t is a 64-bit data type, but int
is a 32-bit type. On Windows, long int also is a 32-bit type.
I don't know how those software vendors who vowed long ago not to
support platforms where size_t differs from unsigned int can survive.

Jorgen Grahn · Sep 30, 2013

It's not a question of "machines" it is a matter of "implementation".

To be fair, "machine" is often an euphemism for "machine, plus the
tradeoffs made by the ABI and/or compiler vendor".

(But yes, it's useful to point out that there's a distinction.)

/Jorgen

Stephen Sprunk · Sep 30, 2013

It's not a question of "machines" it is a matter of "implementation".

If you created a compiler for lets say a modern Intel processor
running in 64-bit mode, you'd reasonably use a 64 bit unsigned
integer type for size_t.. You'd also likely make unsigned long long a
64 bit unsigned integer.

That's the minimum, and x86-64 has no hardware support for a wider
integer type, so that's the only logical choice.

On the other hand, there is no particular reason why int and long
shouldn't both be 32 bit,

There is disagreement on that even within the x86-64 world: Microsoft
chose IL32LLP64, presumably to make porting from Win32 easier, but the
POSIX world standardized on I32LP64.

While obviously not x86-64, it's notable that most implementations for
Alpha were ILP64, rather than I32LP64. IIRC, Windows NT was ILP32!

S

James Harris · Oct 1, 2013

Jorgen Grahn said:
To be fair, "machine" is often an euphemism for "machine, plus the
tradeoffs made by the ABI and/or compiler vendor".

(But yes, it's useful to point out that there's a distinction.)

Agreed.

People have pointed out the differences that could be found on x86-64. I
appreciate the info and it was one I hadn't thought of but I was principally
wondering about CPUs which are still in use today where the size of their
addresses cannot be made to match the size of any of their integer types
despite the implementation.

The only one I can think of is old real-mode x86 using far pointers where an
address is 20 bits but the integers can be only 16-bit or 32-bit.

I suppose the same mismatch might occur where a machine has separate address
and data registers and they have different sizes but would guess they are
not common.

Some machines used words which were not a power of 2 but I don't know how
they manipulated addresses. Presumably their addresses were often smaller
than their word size and few or none of those are still in use.

James

Noob · Oct 1, 2013

Stephen said:
That's the minimum, and x86-64 has no hardware support for a wider
integer type, so that's the only logical choice.

Errr...

x86-64 does have limited support for 128-bit GP integers, in the form
of add-with-carry, widening multiply, and shift right/left double.
(The same way x86 has limited support for 64-bit GP integers.)

Therefore, it would not be unreasonable for an implementation to pick

CHAR_BIT = 8, sizeof(int) = 4, sizeof(long) = 8, sizeof(long long) = 16

and define uint32_t, uint64_t, uint128_t accordingly.

Regards.

James Kuyper · Oct 1, 2013

On 10/01/2013 07:15 AM, James Harris wrote:
....

People have pointed out the differences that could be found on x86-64. I
appreciate the info and it was one I hadn't thought of but I was principally
wondering about CPUs which are still in use today where the size of their
addresses cannot be made to match the size of any of their integer types
despite the implementation.

If that's what your actual question was about, you asked it very poorly.
It seems to me that uintmax_t would be more relevant to your question
than either unsigned int or unsigned long. On the machines you describe,
size_t would probably be the same as uintmax_t, which might or might not
be bigger than unsigned long, so asking about "not equal" also seems
irrelevant. intptr_t is more relevant to the question you describe.
intprt_t is optional, and on the machines you describe, could not be
supported. So it would be more relevant to ask about "machines where
intptr_t cannot be supported".

The only one I can think of is old real-mode x86 using far pointers where an
address is 20 bits but the integers can be only 16-bit or 32-bit.

I don't understand how that's an example of what you say you're looking
for. It might have required only 20 bits to uniquely specify a byte of
addressable memory, but they were usually accessed as a 16-bit segment
and a 16-bit offset, and could be stored in 32 bits, the same as
unsigned long. With 8-bit bytes, they couldn't have been stored in 20
bits. They could have been stored in 24-bit pointers, but I don't think
that would have worked very well, and I'm not aware of any
implementation that did so (though that could just be ignorance on my part).

A system such as you describe would have to have addresses too big to
fit in uintmax_t. Support for a 64 bit integer types is mandatory, even
if only by software emulation. Therefore, addresses would have to be
larger than that, and the implementor would have to have some good
reason for not implementing an integer type of the same size.

Keith Thompson · Oct 1, 2013

Noob said:
Errr...

x86-64 does have limited support for 128-bit GP integers, in the form
of add-with-carry, widening multiply, and shift right/left double.
(The same way x86 has limited support for 64-bit GP integers.)

Therefore, it would not be unreasonable for an implementation to pick

CHAR_BIT = 8, sizeof(int) = 4, sizeof(long) = 8, sizeof(long long) = 16

and define uint32_t, uint64_t, uint128_t accordingly.

Support for division is still mandatory. Of course it could be done in
software.

Another reasonable choice would be 64-bit [unsigned] long long and
128-bit intmax_t/int128_t, which would require the use of extended
integer types.

glen herrmannsfeldt · Oct 1, 2013

Richard Damon said:
On 10/1/13 7:15 AM, James Harris wrote:
(snip)
(snip)

Minor nit. In real-mode x86, while addresses only had 20 bits of
results, far pointers were 32 bits in length (4 bytes).

In real mode 8086 and 8088 address were 20 bits.

Addresses had many redundant ways of being represented,
as the effective address was (segment << 4) + offset (with
both segment and offset being 16 bits).

On later processors than the 8086 and 8088, the result of the
addition was 21 bits. Because some programs depend on the result
being 20 bits, extra hardware was added to zero A20 in real mode,
but that could be turned off. Turning it off allowed real mode
programs an extra 64K (almost) of memory.

For protected mode 80286, you had a 16 bit segment selector
and 16 bit offset. The selector selected an entry into
a segment descriptor table giving a 24 bit origin and 16 bit
length for each addressable segment.

-- glen

Thomas Jahns · Oct 8, 2013

On the other hand, there is no particular reason why int and long
shouldn't both be 32 bit, in which case size_t would be unsigned
long long, and have a different size from both unsigned int and
unsigned long.

Actually there is: legacy code beforce even C89 is prone to assume a
long can hold a pointer value. Definitely bad practice but happened to
work almost universally back then.

Regards, Thomas

James Kuyper · Oct 8, 2013

Actually there is: legacy code beforce even C89 is prone to assume a
long can hold a pointer value. Definitely bad practice but happened to
work almost universally back then.

True, but that's not a particularly compelling reason. A policy of
accommodating legacy code that has built-in assumptions about things
left unspecified by the standard would prevent you from ever creating an
implementation significantly different from the ones where those
assumptions were valid. You should not expect to be able to port legacy
code containing such assumptions to new systems; either it must be
forever restricted to the steadily decreasing number of systems matching
all of its assumptions, or you must sooner or later bite the bullet and
remove at least some of those assumptions. You shouldn't used them as an
argument to justify restricting new implementations.

James Kuyper · Oct 9, 2013

Although it would be a foolish standards body that did something that
broke a non-standard but widely used assumption for no good reason.
Breaking a lot of existing code is generally a bad idea. A lot of the
cruft in the C standard is just that sort of accommodation.

The comment I was responding to was not about a decision to be made by a
standards body, but by an implementation. The assumption Thomas Jahns
mentioned is, in C99 terms, that UINTPTR_MAX <= ULONG_MAX. He mentioned
it in the context of legacy code that pre-dates C89, and therefore C99,
so uintptr_t didn't even exist yet. However, the concept behind
uintptr_t dates back to before C89. The standard allows that assumption
to be true, and it allows it to be false (either because a type larger
than unsigned long is needed , or because no supported integer type is
big enough to meet the requirements for uintptr_t).

It's individual implementors who decide whether or not that should be
true for their implementation. That decision should be made on the basis
of what's good for their intended customers, and sometimes it's better
for to break legacy code than to make the accommodations needed to avoid
breaking it. As long as someone needs the legacy code to be compilable,
someone will maintain a compiler that has a mode that will allow it to
be compiled, but that doesn't mean that all compilers need to be able to
do so, nor even that it be the default mode for that compiler.

Malcolm McLean · Oct 9, 2013

It's individual implementors who decide whether or not that should be
true for their implementation. That decision should be made on the basis
of what's good for their intended customers, and sometimes it's better
for to break legacy code than to make the accommodations needed to avoid
breaking it. As long as someone needs the legacy code to be compilable,
someone will maintain a compiler that has a mode that will allow it to
be compiled, but that doesn't mean that all compilers need to be able to
do so, nor even that it be the default mode for that compiler.

A real example of this happening is the MS Windows interface.

Windows are defined by opaque handles, which can be PrivateWindow *s underneath, but originally they were longs, I suspect an index into a
window table. To have any sort of encapsulation, you need to be able to hang
a pointer off a window. But Microsoft didn't provude a "Set user pointer"
function. Instead they provided a "set/set Window long", with a USER_DATA
field nicely defined.

So if a void *fitted into a long, you could hang a pointer off a window. It was
wrong, but the only alternative was to specify some sort of memory handle
scheme. Then you wouldn't have encapsulation, because your window widget would
depend on an external malloc/handle wrapper. You could get round this by
having separate malloc wrappers for each class, but then it gets even more
messy, and all to avoid a cast from a long to a void *.

So lots of widgets were built with this scheme. Now you want the code to mix
with new code. There's limited use in having a widget that can't be taken and
dropped into a new program. So just having one mode which defines long as
the same size as void * doesn't help. Of course Microsoft put in a layer of
typedefs, so the function actually takes a LONG. Then they provided a
SetWindowLongPtr() function, which, it turns out, also needs a long. But these
strategies haven't actually worked. They rarely do. Changing typedef has
too many effects to be a smooth process.

There's no easy answer. The changes needs to most code are pretty trivial,
you've just got to replace the call to get/set the user long with a call to
the latest memory hook. But it still means editing and maintaining two versions
of files.

size of void * is not always equal to size of int *	13	May 2, 2014
size_t, ssize_t and ptrdiff_t	56	Oct 12, 2013
Promoting unsigned long int to long int	112	Jun 30, 2008
size_t in inttypes.h	4	May 26, 2011
Natural size: int	78	Aug 8, 2006
size_t in C++	0	May 9, 2010
size_t or int for malloc-type functions?	318	Dec 31, 2006
size_t and ptr_diff_t	9	Aug 23, 2007

Machines where size of size_t is not equal to size of unsigned int/long

James Harris

tim prince

Jorgen Grahn

Stephen Sprunk

James Harris

Noob

James Kuyper

Keith Thompson

glen herrmannsfeldt

Thomas Jahns

James Kuyper

James Kuyper

Malcolm McLean

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads