void * arithmetic

Discussion in 'C Programming' started by Balban, Feb 11, 2010.

  1. Balban

    Balban Guest

    Hi,

    On my compiler (gcc), if I add an integer value to a void pointer the
    integer is interpreted as signed instead of unsigned. Is this expected
    behavior?

    Thanks,

    Bahadir
    Balban, Feb 11, 2010
    #1
    1. Advertising

  2. Balban

    Seebs Guest

    On 2010-02-11, Balban <> wrote:
    > On my compiler (gcc), if I add an integer value to a void pointer the
    > integer is interpreted as signed instead of unsigned. Is this expected
    > behavior?


    There is no expected behavior, pointer arithmetic is not defined at all
    for void pointers. :p

    That said, I don't understand how you're making the distinction. Imagine
    that you have a 32-bit integer of some unspecified type, and it has the
    value 0xFFFFFFFF, and you add it to a 32-bit pointer. It is going to do
    the same thing whether it's signed or unsigned.

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Feb 11, 2010
    #2
    1. Advertising

  3. Seebs <> writes:
    > On 2010-02-11, Balban <> wrote:
    >> On my compiler (gcc), if I add an integer value to a void pointer the
    >> integer is interpreted as signed instead of unsigned. Is this expected
    >> behavior?

    >
    > There is no expected behavior, pointer arithmetic is not defined at all
    > for void pointers. :p
    >
    > That said, I don't understand how you're making the distinction. Imagine
    > that you have a 32-bit integer of some unspecified type, and it has the
    > value 0xFFFFFFFF, and you add it to a 32-bit pointer. It is going to do
    > the same thing whether it's signed or unsigned.


    A signed 32-bit integer cannot have the value 0xFFFFFFFF.

    Do you mean 0xFFFFFFFF to refer to a certain bit pattern rather than a
    value?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 11, 2010
    #3
  4. Balban

    Seebs Guest

    On 2010-02-11, Keith Thompson <> wrote:
    > A signed 32-bit integer cannot have the value 0xFFFFFFFF.
    >
    > Do you mean 0xFFFFFFFF to refer to a certain bit pattern rather than a
    > value?


    Er, yeah. I meant "representation", specifically.

    I'm off my brain all this week, I think, came down with a cold and been
    sleeping funny hours.

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Feb 11, 2010
    #4
  5. Balban

    bartc Guest

    "Keith Thompson" <> wrote in message
    news:...
    > Seebs <> writes:
    >> On 2010-02-11, Balban <> wrote:
    >>> On my compiler (gcc), if I add an integer value to a void pointer the
    >>> integer is interpreted as signed instead of unsigned. Is this expected
    >>> behavior?

    >>
    >> There is no expected behavior, pointer arithmetic is not defined at all
    >> for void pointers. :p
    >>
    >> That said, I don't understand how you're making the distinction. Imagine
    >> that you have a 32-bit integer of some unspecified type, and it has the
    >> value 0xFFFFFFFF, and you add it to a 32-bit pointer. It is going to do
    >> the same thing whether it's signed or unsigned.

    >
    > A signed 32-bit integer cannot have the value 0xFFFFFFFF.


    Seems to work though:

    #include <stdio.h>
    #include <limits.h>

    int main(void){
    signed int a=0xFFFFFFFF;

    printf("Bits = %d\n",(sizeof a)*CHAR_BIT);
    printf("A = %X\n",a);
    }

    --
    Bartc
    bartc, Feb 11, 2010
    #5
  6. "bartc" <> writes:
    > "Keith Thompson" <> wrote in message
    > news:...
    >> Seebs <> writes:
    >>> On 2010-02-11, Balban <> wrote:
    >>>> On my compiler (gcc), if I add an integer value to a void pointer the
    >>>> integer is interpreted as signed instead of unsigned. Is this expected
    >>>> behavior?
    >>>
    >>> There is no expected behavior, pointer arithmetic is not defined at all
    >>> for void pointers. :p
    >>>
    >>> That said, I don't understand how you're making the distinction. Imagine
    >>> that you have a 32-bit integer of some unspecified type, and it has the
    >>> value 0xFFFFFFFF, and you add it to a 32-bit pointer. It is going to do
    >>> the same thing whether it's signed or unsigned.

    >>
    >> A signed 32-bit integer cannot have the value 0xFFFFFFFF.

    >
    > Seems to work though:
    >
    > #include <stdio.h>
    > #include <limits.h>
    >
    > int main(void){
    > signed int a=0xFFFFFFFF;
    >
    > printf("Bits = %d\n",(sizeof a)*CHAR_BIT);
    > printf("A = %X\n",a);
    > }


    Depends on what you mean by "work".

    Assuming int is 32 bits on your system, initializing ``a'' with
    the expression 0xFFFFFFFF does not store the value 0xFFFFFFFF
    (equivalently, 4294967295) in ``a''. Instead, it stores the result
    of converting 0xFFFFFFFF from unsigned int to int. That result
    is implementation-defined. (It's probably -1 on your system;
    it is on mine.)

    Then in your printf call, you use a "%X" format, which expects a value
    of type unsigned int, with an argument of type int. There's a special
    rule that says you can get away with this if the value is within the
    range of values representable either as int or as unsigned int, but
    that's not the case here, so strictly speaking I think the behavior is
    undefined. In practice, the printed result is very likely to be what
    you would get by interpreting the representation of the int object
    ``a'' (with whatever value resulted from the conversion) as it were an
    object of type unsigned int.

    It's hardly surprising that the output is "A = FFFFFFFF",
    but it's certainly not required, and it doesn't indicate that
    you've managed to store the value 0xFFFFFFFF in ``a''. In fact,
    it's simply not possible to do so.

    (Also, you're using "%d" with a size_t argument in the first
    printf call. And let me repeat my plea not to use the name "a" for
    variables in small demo programs; it makes the code more difficult
    to talk about. "x" or "n" would be fine.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 11, 2010
    #6
  7. Balban

    ImpalerCore Guest

    On Feb 11, 2:12 pm, Balban <> wrote:
    > Hi,
    >
    > On my compiler (gcc), if I add an integer value to a void pointer the
    > integer is interpreted as signed instead of unsigned. Is this expected
    > behavior?


    As other people have said, addition is not supported by the standard
    using void*. gcc allows you to add to void pointers by implicitly
    casting the pointer to unsigned char* (or maybe char*, I'm not really
    sure) type. I actually used to do it until recently. Now I cast the
    pointer before performing the addition; it plays nicer with -ansi -
    pedantic.

    i.e.

    typedef unsigned char byte;

    void* track_malloc( size_t size )
    {
    void* mem = NULL;
    void* p = NULL;

    p = malloc( size + sizeof( size_t ) );
    if ( p )
    {
    *((size_t*)p) = size;
    mem = (byte*)p + sizeof( size_t );
    }

    return mem;
    }

    void track_free( void* p )
    {
    void* actual_p = NULL;
    size_t p_size = 0;

    if ( p )
    {
    actual_p = (byte*)p - sizeof( size_t );
    p_size = *((size_t*)actual_p);
    free( actual_p );
    printf( "track_free [p,size] = [%p,%u]\n", actual_p, p_size );
    }
    }

    Best regards,
    John D.

    > Thanks,
    >
    > Bahadir
    ImpalerCore, Feb 11, 2010
    #7
  8. Balban

    gil_johnson Guest

    On Feb 11, 1:12 pm, Balban <> wrote:
    > Hi,
    >
    > On my compiler (gcc), if I add an integer value to a void pointer the
    > integer is interpreted as signed instead of unsigned. Is this expected
    > behavior?
    >
    > Thanks,
    >
    > Bahadir


    I'm not an expert, but it seems to be a good idea to me. I can imagine
    that you might calculate a new offset into a data structure, relative
    to the current position, and have it come out negative. It would be
    simpler to add the negative than force the answer to be positive and
    keep track of addition vs subtraction.
    As others have noted, the behavior is not specified by the standard, I
    think this may be an example of "Do the least surprising thing."
    Gil
    gil_johnson, Feb 12, 2010
    #8
  9. Balban <> writes:
    > On my compiler (gcc), if I add an integer value to a void pointer the
    > integer is interpreted as signed instead of unsigned. Is this expected
    > behavior?


    I don't think that's what's happening.

    As has already been mentioned, arithmetic on void* is a gcc-specific
    extension; in standard C, it's a constraint violation, requiring a
    diagnostic.

    But the same thing applies to arithmetic on char*, which is well
    defined by the standard.

    Adding a pointer and an integer (p + i) yields a new pointer value
    that points i elements away from where p points. For example, if p
    points to the element 0 of an array, then (p + 3) points to element 3
    of the same array. If p points to element 7 of an array, then (p - 2)
    points to element 5 of the same array.

    It would have been helpful if you had shown us an example of what
    you're talking about. But suppose we have:

    char arr[10];
    char *p = arr + 5;
    int i = -1;
    unsigned int u = -1;

    Let's assume a typical system where int and pointers are 32 bits.

    So p points to arr[5]. The expression (p + i) points to arr[4].
    But consider (p + u).

    Since u is unsigned, it can't actually hold the value -1. During
    initialization, that value is implicitly converted from signed
    int to unsigned int, and the value stored in u is 4294967295.
    In theory, then, (p + u) would point to arr[4294967300], which
    obviously doesn't exist. So the behavior is undefined, if you try
    to evaluate (p + u), anything can happen.

    What probably will happen on typical modern systems is that the
    addition will quietly wrap around. Let's assume that pointer values
    are represented as 32-bit addresses that look like unsigned integers
    (nothing like this is required by the standard, but it's a typical
    implementation), and let's say that arr is at address 0x12345678.
    Then p points to address 0x1234567d, and (p + 4294967295) would
    theoretically point to address 0x11234567c. But this would require 33
    bits, and we only have 32-bit addresses. Typically, an overflowing
    addition like this will quietly drop the high-order bit(s) yielding an
    address of 0x1234567c -- which just happens to be the address of
    arr[4].

    So you initialized u with the value -1, computed (p + u), and
    got the same result you would have gotten for (p + (-1)). But in
    the process, you generated an intermediate result that was out of
    range, resulting in undefined behavior. (This is really the worst
    possible consequence of undefined behavior: having your program
    behave exactly as you expected it to. It means your code is buggy,
    but it's going to be very difficult to find and correct the problem.)

    This kind of thing is very common with 2's-complement systems. The
    2's-complement representation is designed in such a way that addition
    and subtraction don't have to care whether the operands are signed or
    unsigned. But you shouldn't depend on this. The behavior of addition
    and subtraction operations, either on integers or on pointers, is well
    defined only when the mathematical result is within the required
    range. Adding 0xFFFFFFFF to a pointer can appear to work "correctly",
    as if you had really added -1, but it's better to just add a signed
    value -1 in the first place.

    Even if your code never runs on anything other that the system you
    wrote it for, an optimizing compiler may assume that no undefined
    behavior occurs. For example, if you write (p + u), it can assume
    that p is in the range 0 to 5, and perform optimizations that depend
    on that assumption.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 12, 2010
    #9
  10. Balban

    Ike Naar Guest

    In article <>,
    Keith Thompson <> wrote:
    > [snip]
    > char arr[10];
    > char *p = arr + 5;
    > int i = -1;
    > unsigned int u = -1;
    > [snip]
    >Even if your code never runs on anything other that the system you
    >wrote it for, an optimizing compiler may assume that no undefined
    >behavior occurs. For example, if you write (p + u), it can assume
    >that p is in the range 0 to 5, and perform optimizations that depend

    ^
    Is this a mis-typed ``u'' ?
    >on that assumption.
    Ike Naar, Feb 12, 2010
    #10
  11. Balban

    Balban Guest

    On Feb 12, 4:39 am, Keith Thompson <> wrote:
    > Balban <> writes:
    > > On my compiler (gcc), if I add an integer value to a void pointer the
    > > integer is interpreted as signed instead of unsigned. Is this expected
    > > behavior?

    >
    > I don't think that's what's happening.
    >
    > As has already been mentioned, arithmetic on void* is a gcc-specific
    > extension; in standard C, it's a constraint violation, requiring a
    > diagnostic.
    >
    > But the same thing applies to arithmetic on char*, which is well
    > defined by the standard.
    >
    > Adding a pointer and an integer (p + i) yields a new pointer value
    > that points i elements away from where p points.  For example, if p
    > points to the element 0 of an array, then (p + 3) points to element 3
    > of the same array.  If p points to element 7 of an array, then (p - 2)
    > points to element 5 of the same array.
    >
    > It would have been helpful if you had shown us an example of what
    > you're talking about.  But suppose we have:
    >


    Thanks to all who answered. I have the following code which had
    unexpected behavior for me:


    #define PAGER_VIRTUAL_START 0xa039d000

    /*
    * Find the page's offset from virtual start, add it to membank
    * physical start offset
    */
    void *virt_to_phys(void *v)
    {
    return v - PAGER_VIRTUAL_START + membank[0].start;
    }

    membank[0].start is an unsigned long of value 0x100000

    Now if I pass v argument with a value of 0xa039d000 to this function,
    I get a return value of 0x400000. Note v = 0xa039d000 means that v and
    PAGER_VIRTUAL_START would cancel out and return value would be the
    value of membank[0].start which is 0x100000

    Below is the corrected code.

    /*
    * Find the page's offset from virtual start, add it to membank
    * physical start offset
    */
    void *virt_to_phys(void *v)
    {
    unsigned long vaddr = (unsigned long)v;

    return (void *)(vaddr - PAGER_VIRTUAL_START +
    membank[0].start);
    }

    This one behaves as I expected, returning 0x100000.


    Thanks,

    Bahadir
    Balban, Feb 12, 2010
    #11
  12. Balban

    Seebs Guest

    On 2010-02-12, Balban <> wrote:
    > Thanks to all who answered. I have the following code which had
    > unexpected behavior for me:


    > #define PAGER_VIRTUAL_START 0xa039d000


    > /*
    > * Find the page's offset from virtual start, add it to membank
    > * physical start offset
    > */
    > void *virt_to_phys(void *v)
    > {
    > return v - PAGER_VIRTUAL_START + membank[0].start;
    > }


    > membank[0].start is an unsigned long of value 0x100000


    Hmm.

    > Now if I pass v argument with a value of 0xa039d000 to this function,
    > I get a return value of 0x400000. Note v = 0xa039d000 means that v and
    > PAGER_VIRTUAL_START would cancel out and return value would be the
    > value of membank[0].start which is 0x100000


    Hmm.

    It does seem so, and indeed, that's the behavior I get from gcc for this
    test program:

    #include <stdio.h>

    #define PVS 0xa039d000
    unsigned long mb0s = 0x100000;

    void *vtp(void *v) {
    return v - PVS + mb0s;
    }

    int
    main(void) {
    printf("%p\n", vtp((void *) PVS));
    return 0;
    }

    This produces 0x100000, as you appeared to expect. I can't see any reason
    for it to yield other values, but so far as I can tell, it's equivalent to
    what you described above.

    > Below is the corrected code.


    This code is probably less robust than you want it to be.

    > void *virt_to_phys(void *v)
    > {
    > unsigned long vaddr = (unsigned long)v;
    >
    > return (void *)(vaddr - PAGER_VIRTUAL_START +
    > membank[0].start);
    > }


    Don't use "unsigned long" -- there are real targets on which unsigned long
    is smaller than a pointer.

    Try:

    void *
    virt_to_phys(void *v)
    {
    unsigned char *u = v;
    return u - (PAGER_VIRTUAL_START + membank[0].start);
    }

    Rationale:

    You have a pair of unsigned long values. Do the arithmetic on those,
    then use the single offset, once, on an object that is of the right type
    to have defined semantics. (Obviously, semantics are not defined in
    general for pointer arithmetic outside the bounds of a C object, but in
    your case I think it's reasonable to assume that you have a good view of
    the nature of the address space.)

    If you want to do arithmetic on addresses, "unsigned char *" is nearly
    always the right type. If you want to do arithmetic on addresses in
    an integer type, see if your target has "intptr_t" defined, and if so,
    use that. (It's been standard since C99, but implementation isn't universal;
    it should be in <stdint.h> if it exists, and I think there's a feature
    test macro for it.)

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Feb 12, 2010
    #12
  13. (Ike Naar) writes:
    > In article <>,
    > Keith Thompson <> wrote:
    >> [snip]
    >> char arr[10];
    >> char *p = arr + 5;
    >> int i = -1;
    >> unsigned int u = -1;
    >> [snip]
    >>Even if your code never runs on anything other that the system you
    >>wrote it for, an optimizing compiler may assume that no undefined
    >>behavior occurs. For example, if you write (p + u), it can assume
    >>that p is in the range 0 to 5, and perform optimizations that depend

    > ^
    > Is this a mis-typed ``u'' ?
    >>on that assumption.


    Yes, thank you.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 12, 2010
    #13
  14. Balban

    Balban Guest

    On Feb 12, 3:38 pm, Seebs <> wrote:
    > On 2010-02-12, Balban <> wrote:
    > This produces 0x100000, as you appeared to expect.  I can't see any reason
    > for it to yield other values, but so far as I can tell, it's equivalent to
    > what you described above.
    >


    It might be that it is a compiler bug then. It is a cross-compiler and
    I suspect the generated assembler is not correct.

    > Don't use "unsigned long" -- there are real targets on which unsigned long
    > is smaller than a pointer.
    >


    This is going into off-topic areas but as far as I know at least in 32
    and 64-bit machines unsigned long always gives the machine's
    addressing size whereas unsigned int would give you the machine word
    i.e. register size.

    But you do have a point in that char * is fairly safe for pointer
    arithmetic.

    Thanks,

    Bahadir
    Balban, Feb 13, 2010
    #14
  15. Balban

    Seebs Guest

    On 2010-02-13, Balban <> wrote:
    > This is going into off-topic areas but as far as I know at least in 32
    > and 64-bit machines unsigned long always gives the machine's
    > addressing size whereas unsigned int would give you the machine word
    > i.e. register size.


    Not always. There have been machines on which long was 32-bit and pointer
    was 64-bit. Not many, perhaps, and it's arguably a pretty bad choice of
    sizes, but it's been done -- that's a big part of why we have "long long".

    > But you do have a point in that char * is fairly safe for pointer
    > arithmetic.


    And, if you really are seeing a compiler bug, this may also work around
    it. :)

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Feb 13, 2010
    #15
  16. On Sat, 13 Feb 2010 14:04:06 -0800 (PST), Balban
    <> wrote:

    >On Feb 12, 3:38 pm, Seebs <> wrote:
    >> On 2010-02-12, Balban <> wrote:
    >> This produces 0x100000, as you appeared to expect.  I can't see any reason
    >> for it to yield other values, but so far as I can tell, it's equivalent to
    >> what you described above.
    >>

    >
    >It might be that it is a compiler bug then. It is a cross-compiler and
    >I suspect the generated assembler is not correct.
    >
    >> Don't use "unsigned long" -- there are real targets on which unsigned long
    >> is smaller than a pointer.
    >>

    >
    >This is going into off-topic areas but as far as I know at least in 32
    >and 64-bit machines unsigned long always gives the machine's
    >addressing size whereas unsigned int would give you the machine word
    >i.e. register size.


    There are many shades of gray. On IBM z-Architecture machines, a word
    is 32 bits while the hardware registers are 64 bits. Furthermore,
    unsigned long is 64 bits whether the addressing mode (which is under
    program control) is 64 or 32 bits. (There is also a 24 bit
    addressing mode for backward compatibility and unsigned long is still
    64 bits.)

    --
    Remove del for email
    Barry Schwarz, Feb 13, 2010
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ollej Reemt
    Replies:
    7
    Views:
    526
    Jack Klein
    Apr 22, 2005
  2. Stig Brautaset

    `void **' revisited: void *pop(void **root)

    Stig Brautaset, Oct 25, 2003, in forum: C Programming
    Replies:
    15
    Views:
    788
    The Real OS/2 Guy
    Oct 28, 2003
  3. joshc
    Replies:
    5
    Views:
    555
    Keith Thompson
    Mar 31, 2005
  4. Replies:
    5
    Views:
    833
    S.Tobias
    Jul 22, 2005
  5. Replies:
    1
    Views:
    406
    Victor Bazarov
    May 23, 2007
Loading...

Share This Page