size of void * is not always equal to size of int *

Discussion in 'C Programming' started by anish kumar, May 2, 2014.

  1. anish kumar

    anish kumar Guest

    size of void * is not always equal to size of int * ?
    If it is not then what is the reasoning?
     
    anish kumar, May 2, 2014
    #1
    1. Advertisements

  2. anish kumar

    James Kuyper Guest

    There's no requirement that they be equal, and that is, for me,
    sufficient reason not write code that makes the assumption that they are
    equal.

    However, it's also the case that there are some real world systems where
    there's a good reason for them being different. Every real world case
    that I'm familiar with involves hardware where the size of an
    addressable storage unit is too large to be convenient for use as a C
    byte. CHAR_BITS is choose so that N*CHAR_BITS is the number of bits in a
    addressable storage unit. Then if _Alignof(T*) >= N, T* need only
    contain the machine address of the storage unit containing the start of
    the object. However, if _Alignof(T*) < N, then a T* must contain not
    only the machine address, but also the byte offset within the storage
    unit identified by that address. The extra space required to store that
    extra information make result in int(void*) (which must have the same
    representation as 'char*', and must therefore be able to point at
    individual bytes within a storage unit) being larger than sizeof(int*).
     
    James Kuyper, May 2, 2014
    #2
    1. Advertisements

  3. ... CHAR_BIT is chosen so that N*CHAR_BIT ...
    ... _Alignof(T) ...
    ... _Alignof(T) ...
    ... may result in sizeof(void*) ...
    (I hate to be picky, but ... ok, actually I don't hate to be picky.)

    It's also worth noting that it's sometimes possible to store those extra
    offset bits within the pointer itself, if there are some extra bits
    available for the purpose.
     
    Keith Thompson, May 2, 2014
    #3
  4. anish kumar

    James Kuyper Guest

    On 05/02/2014 03:37 PM, Keith Thompson wrote:
    .....
    I don't mind being corrected - but I do wish I hadn't needed correction. :-(
    That's the key reason why I said only that the extra information *might*
    require a larger size.
     
    James Kuyper, May 2, 2014
    #4
  5. To fake up 8-bit bytes on system which only allow 32-bit addressing.
    The char * needs an extra two bits to represent the position within
    the 32 bit word. void *s need to be convertible to and from char *s,
    so of course they also need the extra two bits.
     
    Malcolm McLean, May 3, 2014
    #5
  6. anish kumar

    Kaz Kylheku Guest

    The reasoning is that on some machines, instruction-level addresses
    are understood as indexing fairly wide words, like 16 or 32 bits.
    So consecutive addresses like 1, 2, 3 ... step through consecutive words (that
    can hold a C int, perhaps).

    The C language requires byte addressability, so you have only
    two options for this machine:

    1. Define a C byte as the "word" of the machine, so that for instance
    sizeof(short) == 1, and CHAR_BIT is defined as 16; or

    2. Simulate the addressability of smaller data units (like 8 bit chars) by
    changing the representation of char * and void * pointers, and generating
    extra code when char * is dereferenced.

    Option 2 erquires extra bits in char * and void * pointers to encode
    the offset of a byte within a word and so it is possible to end up with
    char * and void * which are wider than int *.

    Either option breaks some C code. Some C code assumes that bytes are 8 bits
    wide, and breaks when CHAR_BIT exceeds 8. Some C code assumes that pointers are
    the same size, or even that their bit level representation works the
    same way.
     
    Kaz Kylheku, May 3, 2014
    #6
  7. anish kumar

    Ken Brody Guest

    I worked on hardware where the address was 18 bits wide, but a "pointer"
    could require 36 bits. The data "word" was 36 bits wide, and there were
    hardware instructions to do things such as "pull the next 7 bits of data
    from this pointer, put it in this register, and increment the pointer by 7
    bits", where extra data beyond just the address was kept in the other 18
    bits. (Strings, for example, were typically stored with five 7-bit ASCII
    characters per word, with one bit wasted.)

    I never programmed in C on this hardware, but my guess would be that all
    pointers would be 36 bits, simply because it would require less overhead as
    compared to packing data into memory.

    However, I see no reason why such an implementation couldn't exist where the
    address bus was as wide as the data bus, thereby requiring "void*" to be
    wider than what would be necessary, for example, for an "int*".
     
    Ken Brody, May 5, 2014
    #7
  8. Well, the other choice is to add the extra bits to all pointers.

    No so different from what we have now on 64 bit machines.
    I believe it was D. Knuth commenting about the waste of bits,
    as by far most programs don't need to address more than 4GB
    (or even 2GB with half for the OS).
    Seems strange now, but it was very common in the early days of
    high-level languages that the word size was much larger than the
    machine address size. Well, for scientific machines supporting
    floating point, 36 bit words were popular for many years.

    The Fortran five digit statement labels started on a 36 bit machine
    that did 16 bit signed (sign magnitude) integer arithmetic. Originally
    statement numbers ranged from 1 to 32767, later extended to 99999.

    The index registers on the IBM 36 bit machines, such as the 7090,
    have an address and a count field in 36 bits. As I understand it,
    that is the source of the LISP CAR and CDR operations, which stored
    values in either the Address or Decrement part of the word.
    The DEC PDP-10 uses 18 bit addresses in 36 bit words, but later
    systems extended the address space (I don't know the details
    well, though). The PDP-10 also has byte pointers that can address
    bytes of from 1 to 36 bits, and index through strings of them.
    In addition to byte pointers, the PDP-10 has instructions for
    operations of the upper or lower half of 36 bit words.

    Seems to me that a C compiler for the PDP-10 could use either 9
    or 18 bits for char.
    As well as I know it, for many core memory machines it cost about $1
    per 8 bits for memory. No-one would have imagined a use for 36 bit
    address buses. For the PDP-10, the address is per task, such that
    each program has its own address space. (The physical address might
    be bigger.)

    For IBM S/360, there is one 24 bit address space for everyone.
    (Again, thought big enough at the time, and also for S/370.)
    S/360 uses memory keys to divide into 2K protection blocks.
    (The first use for integrated circuit memory in a production
    machine was the protection keys for the 360/91, using 16 bit
    SRAM chips.)

    For a more modern and more interesting addressing case, the 64 bit
    Alpha only does 32 bit or 64 bit memory cycles, but addresses
    have two extra bits to address bytes. Bytes are inserted or
    extracted in registers using those bits. Otherwise, many
    operations ignore the two (or three) low bits.

    -- glen
     
    glen herrmannsfeldt, May 5, 2014
    #8
  9. anish kumar

    Ken Brody Guest

    FYI - the hardware in question which I used was the DEC KL-10.

    [...]
    When I was in college, the computer department was tossing out their ancient
    computer magazine collection. I, being a geek, looked through them, and
    remember an ad for memory at a breakthrough cost of "less than 50 cents a bit".

    [...]
     
    Ken Brody, May 6, 2014
    #9
  10. (snip, I wrote)
    Over the years, the cores got smaller so that they would switch
    faster, but also it was harder (and more expensive) to get the
    wires through them. As I remember, the good ones were done
    by Japanese women.

    If you wanted cheaper ones, you used the older, larger cores.

    -- glen
     
    glen herrmannsfeldt, May 6, 2014
    #10
  11. anish kumar

    crisdunbar Guest

    Cool, so, way back then, they had 32 bit bytes. =)
     
    crisdunbar, May 6, 2014
    #11
  12. anish kumar

    Lynn McGuire Guest

    Nope, six bit bytes (no lowercase). 36 bit
    words, 60 bit words and then 32 bit words when
    the eight bit bytes got popular.

    32 bit words always sucked for numerical accuracy.

    Lynn
     
    Lynn McGuire, May 6, 2014
    #12
  13. anish kumar

    BartC Guest

    I think he means bits in the sense of one eighth of a dollar.

    So one byte or 8 bits at 50c are $4 (32 'bits', 16 quarters).

    (Although it sounds a little high; 50 cents for one register bit, and 5
    cents for a memory core bit, sounds more along the right lines. But it
    depends on the era too. When I first bought memory, I paid roughly 0.5?
    cents a bit (0.3 UK pence) for sram; dram was cheaper.)
     
    BartC, May 6, 2014
    #13
  14. anish kumar

    crisdunbar Guest

    Ding ding ding.

    For clc of course, I should have been thorough and said that if CHAR_BIT is 8, you'd have 32 bit bytes ...
     
    crisdunbar, May 6, 2014
    #14
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.