Sizes of pointers

Discussion in 'C Programming' started by James Harris \(es\), Jul 30, 2013.

  1. Am I right that there is no guarantee in C that pointers to different types
    can be stored in one another except that a pointer of any type can be stored
    in a void * and later recovered without loss of info?

    What is the rationale for distinguishing the convertability of pointers
    based on the type of the object they point to? Are there machines on which a
    pointer to a char would have different *size* from a pointer to a float,
    say, or that a pointer representation might be changed when converted? I can
    imagine that the two addresses may need different alignments (perhaps floats
    needing 4-byte alignment, chars needing only 1-byte alignment). This would
    mean that the rules for the *lower* bits of a pointer could be different.
    This would then be nothing to do with pointer size. The rule about void
    pointers would simply mean that void pointers had no alignment restrictions.
    Is that the reason why void pointers can be used in a special way?

    On some computers it makes sense to distinguish a pointer's size based not
    on the type of the object pointed at but on *where* that object could be
    stored. On old hardware such as 16-bit x86 there were near or far pointers.
    On modern multi-machine clusters it might make sense to allow pointers to
    local memory to be short while pointers to remote memory are longer. On old
    and new hardware, then, it's not the type of the object pointed at but the
    location of the object pointed at which would determine the requirement for
    pointer size.

    Some compilers allow the user to specify that all pointers will be short or
    all pointers will be long - e.g. the awful old memory models. Wouldn't it be
    better for a compiler to choose pointer sizes depending on where the object
    to be referred to might be placed? Basically, all pointers would default to
    long but where the compiler could prove that a given pointer can be used for
    only local references that pointer could be made short.

    James Harris \(es\), Jul 30, 2013
    1. Advertisements

  2. James Harris \(es\)

    Kleuske Guest

    I haven't checked the rationale, but I know of at least one platform
    (68HC11) on which a pointer to a function has a different size than a
    pointer to char.

    This is due to the harvard-architecture used, which has a separate
    memories (i.e. data+adress busses) for data and instructions.

    Kleuske, Jul 30, 2013
    1. Advertisements

  3. I used one word-addressed machine where conversion from int * to char *
    involved a 1-bit left shift -- the bottom bit of a char * being used to
    specify which char of the two-byte word was being pointed to. On the
    same machine, function pointers were twice the size of object pointers
    and there was no reasonable conversion between them and pointers to
    object types.

    This was pre ANSI C, and I am sure such machines were well-known to the
    C committee in the late 80s.

    Ben Bacarisse, Jul 30, 2013
  4. James Harris \(es\)

    James Kuyper Guest

    Yes, though double provides a better example. On systems where
    _Alignof(double) == 8, there are 8 times as many different char
    positions that could be pointed at as there are double positions that
    can be pointed at. That means that a char* needs 3 more bits than a
    double* to identify those locations. Those bits can, depending upon how
    much memory the system can have installed, allow a double* to be stored
    in fewer bytes than a char*.

    There have been real machines where this was an issue. On a typical
    system where that is true, hardware addresses refer to words, with
    multiple bytes per word. Pointers to types whose alignment was a
    multiple of the word size would have small pointers that just contained
    the address of the first word of the object. Pointers to void or to
    types with alignment requirements that are smaller than the word size,
    such as char, have larger pointers, which contain both the address of
    the word, and the byte offset within the word, of the start of the object.
    That's also permitted, but as far as I know, is far less common.
    No, you've overlooked a key possibility: rather than using different
    rules for those lower bits, don't even bother to store them for
    word-aligned types.
    Such a distinction has never been part of standard C.
    James Kuyper, Jul 30, 2013
  5. James Harris \(es\)

    Noob Guest

    Also IA-64.

    (64 bits for object pointers, 128 bits for function pointers AFAIR)
    Noob, Jul 30, 2013
  6. James Harris \(es\)

    Eric Sosman Guest

    The guarantees are a little more extensive:

    - Any data pointer can be converted to void* and back

    - Any struct pointer can be converted to any other kind
    of struct pointer and back

    - Any union pointer can be converted to any other kind
    of union pointer and back

    - void*, char*, unsigned char*, and signed char* have the
    same representation
    Exactly, the canonical example being word-addressed machines.
    An int* on such a machine might well hold a word address, while a
    char* might hold a word address along with extra information to
    designate a particular char within the word.
    You're on shaky ground applying terms like "lower" to the bits
    of a pointer. On "flat address space" machines it's easy to confuse
    pointers with numbers, but C does not require any such correspondence.
    The encoding of a pointer's value into its bits is unspecified, and
    if you speak of "the eights' bit" or "the 1024s' bit" you are reading
    more into the encoding than C guarantees. Pointers are "opaque."
    There are two alignments to consider: There's the alignment of
    the pointer variable itself, and the alignment of the thing it points
    at. Since void* can point at char, the smallest and least-aligned
    of all addressable types, the representation of a void* value must
    be able to accommodate every possible alignment; in that sense void*
    has "no alignment restrictions." However, the system might insist
    (for example) that a void* variable be located on a four-byte boundary;
    in that sense void* would have a four-byte alignment requirement.
    That wouldn't work in C's scheme of things. Every int* must
    be the same size as every other int*, have the same encoding, and
    be capable of pointing at all the same places. There's no provision
    for near-flavored and far-flavored and strawberry-flavored int*'s;
    they're all made out of ticky-tacky and they all look just the same.
    Put an int* in a struct along with a few other fields, put
    the declaration in a header file, and compile various C sources
    with that header. In module A the compiler sees that only local
    references are stored, so it allocates a short pointer and makes
    the struct (say) 12 bytes long. In module B the compiler is unable
    to rule out remote pointers, allocates a long pointer, and makes
    the struct 16 bytes long. Both module A and module B call a
    function in module C, passing a pointer to the struct. How does
    module C know which version of the struct it's looking at?

    Variable-length types are possible in some languages -- think
    of integers that grow wider instead of overflowing -- but not in C.
    Eric Sosman, Jul 30, 2013
  7. Good luck proving that. At best, that would be an optimization that
    falls under the as-if rule, so it could never be observed.
    IIRC, object code using different memory models cannot be linked
    together for exactly this reason.

    Stephen Sprunk, Jul 30, 2013
  8. James Harris \(es\)

    Tim Rentsch Guest

    More generally, any data pointer can be converted to type
    (T *) and back if the alignment of T evenly divides the
    alignment of the original pointer. As a special case,
    if the alignment of T is one, any data pointer may be
    converted to (T*) and back.
    As a practical matter these conversions are likely to work
    on most implementations, but the Standard doesn't guarantee
    that they will.
    There are several other equivalences of representation and alignment

    pointers to compatible types have the same R&A

    pointers to qualified versions of a type have the same R&A

    pointers to structs have the same R&A

    pointers to unions have the same R&A

    Did I miss any?
    Tim Rentsch, Jul 30, 2013
  9. I've used IA-64 systems, though not recently. They had 64-bit function
    pointers, at least for the compiler (probably gcc) I was using.
    Keith Thompson, Jul 30, 2013
  10. Almost. Any pointer to an object (or incomplete) type can be converted
    to void* and back again without loss of information. And any function
    pointer can be converted to another function pointer type and back
    without loss of information.
    Here's a concrete example that's *almost* relevant.

    The Cray T90 was a vector system, optimized for fast floating-point
    operations. It ran Unicos, a Unix operating system, so it needed to
    support things like 8-bit character data; setting CHAR_BIT==64 would
    have been natural, but it wasn't really an option.

    A memory address was 64 bits, and referred to a 64-bit word in memory.
    A byte pointers (char*, void*) consisted of a word pointer with a 3-bit
    offset stored in the high-order 3 bits. This was possible because the
    actual addressing space was much smaller than 64 bits, so the high-order
    bits of a word pointer were otherwise always 0.

    So all pointers were the same size, but pointer arithmetic on byte
    pointers could be complicated and slow. (This was all done in

    If there hadn't been room in the high-order bits, the extra offset
    needed for byte pointers could easily have been stored in a second word,
    making for 64-bit word pointers and 128-bit byte pointers. I wouldn't
    be surprised if some systems had 32-bit word pointers and 64-bit byte
    pointers, though I don't know of any examples.
    There would be only limited opportunities for such an optimization. Any
    pointer value stored in a pointer object would pretty much have to use
    the long representation, unless you have an language extension that lets
    you restrict a pointer to be for local use only (such as the old "near"
    and "far" keywords that we're all glad to be rid of).
    Keith Thompson, Jul 30, 2013
  11. IIRC, the initial implementation used 128-bit pointers, but doing so
    broke a lot of code that assumed it could stuff a function pointer into
    a void pointer. So, they added a layer of indirection: fake 64-bit
    function pointers that point to the real 128-bit function pointers.

    And that's the story of Itanic in a nutshell.

    Stephen Sprunk, Jul 30, 2013
  12. Right, I think POSIX requires function pointers to fit in a void*;
    otherwise dlsym() would break. (A more flexible design would have split
    dlsym() into two functions, one for objects and one for functions.)
    Keith Thompson, Jul 31, 2013
  13. ....
    As you say, "near" and "far" keywords are not C. They might be called
    extensions but that could be taken to imply enhancement. Maybe corruptions
    would be a better term!

    The opportunities to optimise such pointers down to the faster ones would be
    limited but compilers carry out work like this as a matter of course.

    James Harris \(es\), Jul 31, 2013
  14. James Harris \(es\)

    Noob Guest

    Doh! You're right. I /always/ remember it wrong. (And the sad
    part is that you've already pointed this out to me in 2006.)

    OK, so the /correct/ explanation is given in
    "Itanium Software Conventions and Runtime Architecture Guide"
    So, sizeof(void *) == sizeof(void(*)()) IIUC.

    Relevant discussion:

    Noob, Jul 31, 2013
  15. James Harris \(es\)

    James Kuyper Guest

    On 07/31/2013 05:43 AM, James Harris (es) wrote:
    If "extension" implies enforcement to you, then there's no such thing as
    an extension to standard C, since there's no enforcement of any of the
    standard's provisions.
    James Kuyper, Jul 31, 2013
  16. James Harris \(es\)

    Rosario1903 Guest

    i not see the reasons for distinguish pointers from unsigned
    [and all the concern they generate in portability of C programs]
    in not consider the pointer just as one fixed size unsigned
    for example something as:
    p32u32 *p;

    p is a pointer that can contain one 32 bit address [unsigned]
    that point to one array of 32 bits unsigned

    would resolve all problems i see of undefinitions
    and portability of programs for pointer "point of view"

    if people has need of 64 bit pointers to u32
    p64u32 *p;
    Rosario1903, Jul 31, 2013
  17. You didn't have to use them.
    If you wanted to compile a standard C program to operate on a big data set,
    you just compiled as the "huge" model. It would run slowly, for C, but it
    would run. Often that was the best solution.
    But it was tempting to keep your current objects in near memory and farm
    out the little-used ones to far memory. So you complied with 16 bit pointers
    as default and fiddled with near and far pointers.
    Malcolm McLean, Aug 1, 2013
  18. James Harris \(es\)

    Phil Carmody Guest

    That's what not what it says. We're given from that
    sizeof(void(*)()) == sizeof(struct descriptor *)
    and, not that it matters, that
    sizeof(struct descriptor) >= sizeof(global_data_pointer) + sizeof(function_address)

    The former line on its own strongly implies that
    sizeof(void(*)()) == sizeof(void*)

    Phil Carmody, Aug 1, 2013
  19. Can you give an example how the DS9000 could make a conversion between
    pointers to structs fail, given that the pointers involved are required
    to have the same representation and alignment (and that the intent of
    that requirement is to allow for interchangeability)?

    Bart v Ingen Schenau
    Bart van Ingen Schenau, Aug 1, 2013
  20. James Harris \(es\)

    James Kuyper Guest

    Well, the designers of the DS9000 are notorious for ignoring the intent
    of the standard; some have even claimed that they go out of their way to
    violate the intent.

    The way in which the conversions described above fail is considered
    evidence in favor of that claim. They fail for no better reason than the
    fact that the result of each such conversion is incremented by 1 from
    the value you might normally have expected to see. There's no plausible
    reason why the DS9000 should do anything of the sort, but there's
    nothing in the standard to prohibit it.
    James Kuyper, Aug 1, 2013
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.