Machines where size of size_t is not equal to size of unsigned int/long

Discussion in 'C Programming' started by James Harris, Sep 30, 2013.

  1. James Harris

    James Harris Guest

    AIUI for many CPUs and CPU modes a size_t could be typedef'd to unsigned int
    or unsigned long. I wondered where that would not be the case. Anyone know
    which CPUs or modes would have a size_t which was not the same size as
    unsigned int or unsigned long?

    James
     
    James Harris, Sep 30, 2013
    #1
    1. Advertisements

  2. James Harris

    tim prince Guest

    Most of our work nowadays is on the AMD64/Intel64 linux, or
    corresponding Windows X64, where size_t is a 64-bit data type, but int
    is a 32-bit type. On Windows, long int also is a 32-bit type.
    I don't know how those software vendors who vowed long ago not to
    support platforms where size_t differs from unsigned int can survive.
     
    tim prince, Sep 30, 2013
    #2
    1. Advertisements

  3. James Harris

    Jorgen Grahn Guest

    To be fair, "machine" is often an euphemism for "machine, plus the
    tradeoffs made by the ABI and/or compiler vendor".

    (But yes, it's useful to point out that there's a distinction.)

    /Jorgen
     
    Jorgen Grahn, Sep 30, 2013
    #3
  4. That's the minimum, and x86-64 has no hardware support for a wider
    integer type, so that's the only logical choice.
    There is disagreement on that even within the x86-64 world: Microsoft
    chose IL32LLP64, presumably to make porting from Win32 easier, but the
    POSIX world standardized on I32LP64.

    While obviously not x86-64, it's notable that most implementations for
    Alpha were ILP64, rather than I32LP64. IIRC, Windows NT was ILP32!

    S
     
    Stephen Sprunk, Oct 1, 2013
    #4
  5. James Harris

    James Harris Guest

    Agreed.

    People have pointed out the differences that could be found on x86-64. I
    appreciate the info and it was one I hadn't thought of but I was principally
    wondering about CPUs which are still in use today where the size of their
    addresses cannot be made to match the size of any of their integer types
    despite the implementation.

    The only one I can think of is old real-mode x86 using far pointers where an
    address is 20 bits but the integers can be only 16-bit or 32-bit.

    I suppose the same mismatch might occur where a machine has separate address
    and data registers and they have different sizes but would guess they are
    not common.

    Some machines used words which were not a power of 2 but I don't know how
    they manipulated addresses. Presumably their addresses were often smaller
    than their word size and few or none of those are still in use.

    James
     
    James Harris, Oct 1, 2013
    #5
  6. James Harris

    Noob Guest

    Errr...

    x86-64 does have limited support for 128-bit GP integers, in the form
    of add-with-carry, widening multiply, and shift right/left double.
    (The same way x86 has limited support for 64-bit GP integers.)

    Therefore, it would not be unreasonable for an implementation to pick

    CHAR_BIT = 8, sizeof(int) = 4, sizeof(long) = 8, sizeof(long long) = 16

    and define uint32_t, uint64_t, uint128_t accordingly.

    Regards.
     
    Noob, Oct 1, 2013
    #6
  7. James Harris

    James Kuyper Guest

    On 10/01/2013 07:15 AM, James Harris wrote:
    ....
    If that's what your actual question was about, you asked it very poorly.
    It seems to me that uintmax_t would be more relevant to your question
    than either unsigned int or unsigned long. On the machines you describe,
    size_t would probably be the same as uintmax_t, which might or might not
    be bigger than unsigned long, so asking about "not equal" also seems
    irrelevant. intptr_t is more relevant to the question you describe.
    intprt_t is optional, and on the machines you describe, could not be
    supported. So it would be more relevant to ask about "machines where
    intptr_t cannot be supported".
    I don't understand how that's an example of what you say you're looking
    for. It might have required only 20 bits to uniquely specify a byte of
    addressable memory, but they were usually accessed as a 16-bit segment
    and a 16-bit offset, and could be stored in 32 bits, the same as
    unsigned long. With 8-bit bytes, they couldn't have been stored in 20
    bits. They could have been stored in 24-bit pointers, but I don't think
    that would have worked very well, and I'm not aware of any
    implementation that did so (though that could just be ignorance on my part).

    A system such as you describe would have to have addresses too big to
    fit in uintmax_t. Support for a 64 bit integer types is mandatory, even
    if only by software emulation. Therefore, addresses would have to be
    larger than that, and the implementor would have to have some good
    reason for not implementing an integer type of the same size.
     
    James Kuyper, Oct 1, 2013
    #7
  8. Support for division is still mandatory. Of course it could be done in
    software.

    Another reasonable choice would be 64-bit [unsigned] long long and
    128-bit intmax_t/int128_t, which would require the use of extended
    integer types.
     
    Keith Thompson, Oct 1, 2013
    #8
  9. In real mode 8086 and 8088 address were 20 bits.
    On later processors than the 8086 and 8088, the result of the
    addition was 21 bits. Because some programs depend on the result
    being 20 bits, extra hardware was added to zero A20 in real mode,
    but that could be turned off. Turning it off allowed real mode
    programs an extra 64K (almost) of memory.

    For protected mode 80286, you had a 16 bit segment selector
    and 16 bit offset. The selector selected an entry into
    a segment descriptor table giving a 24 bit origin and 16 bit
    length for each addressable segment.

    -- glen
     
    glen herrmannsfeldt, Oct 1, 2013
    #9
  10. James Harris

    Thomas Jahns Guest

    Actually there is: legacy code beforce even C89 is prone to assume a
    long can hold a pointer value. Definitely bad practice but happened to
    work almost universally back then.

    Regards, Thomas
     
    Thomas Jahns, Oct 8, 2013
    #10
  11. James Harris

    James Kuyper Guest

    True, but that's not a particularly compelling reason. A policy of
    accommodating legacy code that has built-in assumptions about things
    left unspecified by the standard would prevent you from ever creating an
    implementation significantly different from the ones where those
    assumptions were valid. You should not expect to be able to port legacy
    code containing such assumptions to new systems; either it must be
    forever restricted to the steadily decreasing number of systems matching
    all of its assumptions, or you must sooner or later bite the bullet and
    remove at least some of those assumptions. You shouldn't used them as an
    argument to justify restricting new implementations.
     
    James Kuyper, Oct 8, 2013
    #11
  12. James Harris

    James Kuyper Guest

    The comment I was responding to was not about a decision to be made by a
    standards body, but by an implementation. The assumption Thomas Jahns
    mentioned is, in C99 terms, that UINTPTR_MAX <= ULONG_MAX. He mentioned
    it in the context of legacy code that pre-dates C89, and therefore C99,
    so uintptr_t didn't even exist yet. However, the concept behind
    uintptr_t dates back to before C89. The standard allows that assumption
    to be true, and it allows it to be false (either because a type larger
    than unsigned long is needed , or because no supported integer type is
    big enough to meet the requirements for uintptr_t).

    It's individual implementors who decide whether or not that should be
    true for their implementation. That decision should be made on the basis
    of what's good for their intended customers, and sometimes it's better
    for to break legacy code than to make the accommodations needed to avoid
    breaking it. As long as someone needs the legacy code to be compilable,
    someone will maintain a compiler that has a mode that will allow it to
    be compiled, but that doesn't mean that all compilers need to be able to
    do so, nor even that it be the default mode for that compiler.
     
    James Kuyper, Oct 9, 2013
    #12
  13. A real example of this happening is the MS Windows interface.

    Windows are defined by opaque handles, which can be PrivateWindow *s underneath, but originally they were longs, I suspect an index into a
    window table. To have any sort of encapsulation, you need to be able to hang
    a pointer off a window. But Microsoft didn't provude a "Set user pointer"
    function. Instead they provided a "set/set Window long", with a USER_DATA
    field nicely defined.

    So if a void *fitted into a long, you could hang a pointer off a window. It was
    wrong, but the only alternative was to specify some sort of memory handle
    scheme. Then you wouldn't have encapsulation, because your window widget would
    depend on an external malloc/handle wrapper. You could get round this by
    having separate malloc wrappers for each class, but then it gets even more
    messy, and all to avoid a cast from a long to a void *.

    So lots of widgets were built with this scheme. Now you want the code to mix
    with new code. There's limited use in having a widget that can't be taken and
    dropped into a new program. So just having one mode which defines long as
    the same size as void * doesn't help. Of course Microsoft put in a layer of
    typedefs, so the function actually takes a LONG. Then they provided a
    SetWindowLongPtr() function, which, it turns out, also needs a long. But these
    strategies haven't actually worked. They rarely do. Changing typedef has
    too many effects to be a smooth process.

    There's no easy answer. The changes needs to most code are pretty trivial,
    you've just got to replace the call to get/set the user long with a call to
    the latest memory hook. But it still means editing and maintaining two versions
    of files.
     
    Malcolm McLean, Oct 9, 2013
    #13
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.