void * arithmetic

B

Balban

Hi,

On my compiler (gcc), if I add an integer value to a void pointer the
integer is interpreted as signed instead of unsigned. Is this expected
behavior?

Thanks,

Bahadir
 
S

Seebs

On my compiler (gcc), if I add an integer value to a void pointer the
integer is interpreted as signed instead of unsigned. Is this expected
behavior?

There is no expected behavior, pointer arithmetic is not defined at all
for void pointers. :p

That said, I don't understand how you're making the distinction. Imagine
that you have a 32-bit integer of some unspecified type, and it has the
value 0xFFFFFFFF, and you add it to a 32-bit pointer. It is going to do
the same thing whether it's signed or unsigned.

-s
 
K

Keith Thompson

Seebs said:
There is no expected behavior, pointer arithmetic is not defined at all
for void pointers. :p

That said, I don't understand how you're making the distinction. Imagine
that you have a 32-bit integer of some unspecified type, and it has the
value 0xFFFFFFFF, and you add it to a 32-bit pointer. It is going to do
the same thing whether it's signed or unsigned.

A signed 32-bit integer cannot have the value 0xFFFFFFFF.

Do you mean 0xFFFFFFFF to refer to a certain bit pattern rather than a
value?
 
S

Seebs

A signed 32-bit integer cannot have the value 0xFFFFFFFF.

Do you mean 0xFFFFFFFF to refer to a certain bit pattern rather than a
value?

Er, yeah. I meant "representation", specifically.

I'm off my brain all this week, I think, came down with a cold and been
sleeping funny hours.

-s
 
B

bartc

Keith Thompson said:
A signed 32-bit integer cannot have the value 0xFFFFFFFF.

Seems to work though:

#include <stdio.h>
#include <limits.h>

int main(void){
signed int a=0xFFFFFFFF;

printf("Bits = %d\n",(sizeof a)*CHAR_BIT);
printf("A = %X\n",a);
}
 
K

Keith Thompson

bartc said:
Seems to work though:

#include <stdio.h>
#include <limits.h>

int main(void){
signed int a=0xFFFFFFFF;

printf("Bits = %d\n",(sizeof a)*CHAR_BIT);
printf("A = %X\n",a);
}

Depends on what you mean by "work".

Assuming int is 32 bits on your system, initializing ``a'' with
the expression 0xFFFFFFFF does not store the value 0xFFFFFFFF
(equivalently, 4294967295) in ``a''. Instead, it stores the result
of converting 0xFFFFFFFF from unsigned int to int. That result
is implementation-defined. (It's probably -1 on your system;
it is on mine.)

Then in your printf call, you use a "%X" format, which expects a value
of type unsigned int, with an argument of type int. There's a special
rule that says you can get away with this if the value is within the
range of values representable either as int or as unsigned int, but
that's not the case here, so strictly speaking I think the behavior is
undefined. In practice, the printed result is very likely to be what
you would get by interpreting the representation of the int object
``a'' (with whatever value resulted from the conversion) as it were an
object of type unsigned int.

It's hardly surprising that the output is "A = FFFFFFFF",
but it's certainly not required, and it doesn't indicate that
you've managed to store the value 0xFFFFFFFF in ``a''. In fact,
it's simply not possible to do so.

(Also, you're using "%d" with a size_t argument in the first
printf call. And let me repeat my plea not to use the name "a" for
variables in small demo programs; it makes the code more difficult
to talk about. "x" or "n" would be fine.)
 
I

ImpalerCore

Hi,

On my compiler (gcc), if I add an integer value to a void pointer the
integer is interpreted as signed instead of unsigned. Is this expected
behavior?

As other people have said, addition is not supported by the standard
using void*. gcc allows you to add to void pointers by implicitly
casting the pointer to unsigned char* (or maybe char*, I'm not really
sure) type. I actually used to do it until recently. Now I cast the
pointer before performing the addition; it plays nicer with -ansi -
pedantic.

i.e.

typedef unsigned char byte;

void* track_malloc( size_t size )
{
void* mem = NULL;
void* p = NULL;

p = malloc( size + sizeof( size_t ) );
if ( p )
{
*((size_t*)p) = size;
mem = (byte*)p + sizeof( size_t );
}

return mem;
}

void track_free( void* p )
{
void* actual_p = NULL;
size_t p_size = 0;

if ( p )
{
actual_p = (byte*)p - sizeof( size_t );
p_size = *((size_t*)actual_p);
free( actual_p );
printf( "track_free [p,size] = [%p,%u]\n", actual_p, p_size );
}
}

Best regards,
John D.
 
G

gil_johnson

Hi,

On my compiler (gcc), if I add an integer value to a void pointer the
integer is interpreted as signed instead of unsigned. Is this expected
behavior?

Thanks,

Bahadir

I'm not an expert, but it seems to be a good idea to me. I can imagine
that you might calculate a new offset into a data structure, relative
to the current position, and have it come out negative. It would be
simpler to add the negative than force the answer to be positive and
keep track of addition vs subtraction.
As others have noted, the behavior is not specified by the standard, I
think this may be an example of "Do the least surprising thing."
Gil
 
K

Keith Thompson

Balban said:
On my compiler (gcc), if I add an integer value to a void pointer the
integer is interpreted as signed instead of unsigned. Is this expected
behavior?

I don't think that's what's happening.

As has already been mentioned, arithmetic on void* is a gcc-specific
extension; in standard C, it's a constraint violation, requiring a
diagnostic.

But the same thing applies to arithmetic on char*, which is well
defined by the standard.

Adding a pointer and an integer (p + i) yields a new pointer value
that points i elements away from where p points. For example, if p
points to the element 0 of an array, then (p + 3) points to element 3
of the same array. If p points to element 7 of an array, then (p - 2)
points to element 5 of the same array.

It would have been helpful if you had shown us an example of what
you're talking about. But suppose we have:

char arr[10];
char *p = arr + 5;
int i = -1;
unsigned int u = -1;

Let's assume a typical system where int and pointers are 32 bits.

So p points to arr[5]. The expression (p + i) points to arr[4].
But consider (p + u).

Since u is unsigned, it can't actually hold the value -1. During
initialization, that value is implicitly converted from signed
int to unsigned int, and the value stored in u is 4294967295.
In theory, then, (p + u) would point to arr[4294967300], which
obviously doesn't exist. So the behavior is undefined, if you try
to evaluate (p + u), anything can happen.

What probably will happen on typical modern systems is that the
addition will quietly wrap around. Let's assume that pointer values
are represented as 32-bit addresses that look like unsigned integers
(nothing like this is required by the standard, but it's a typical
implementation), and let's say that arr is at address 0x12345678.
Then p points to address 0x1234567d, and (p + 4294967295) would
theoretically point to address 0x11234567c. But this would require 33
bits, and we only have 32-bit addresses. Typically, an overflowing
addition like this will quietly drop the high-order bit(s) yielding an
address of 0x1234567c -- which just happens to be the address of
arr[4].

So you initialized u with the value -1, computed (p + u), and
got the same result you would have gotten for (p + (-1)). But in
the process, you generated an intermediate result that was out of
range, resulting in undefined behavior. (This is really the worst
possible consequence of undefined behavior: having your program
behave exactly as you expected it to. It means your code is buggy,
but it's going to be very difficult to find and correct the problem.)

This kind of thing is very common with 2's-complement systems. The
2's-complement representation is designed in such a way that addition
and subtraction don't have to care whether the operands are signed or
unsigned. But you shouldn't depend on this. The behavior of addition
and subtraction operations, either on integers or on pointers, is well
defined only when the mathematical result is within the required
range. Adding 0xFFFFFFFF to a pointer can appear to work "correctly",
as if you had really added -1, but it's better to just add a signed
value -1 in the first place.

Even if your code never runs on anything other that the system you
wrote it for, an optimizing compiler may assume that no undefined
behavior occurs. For example, if you write (p + u), it can assume
that p is in the range 0 to 5, and perform optimizations that depend
on that assumption.
 
I

Ike Naar

[snip]
char arr[10];
char *p = arr + 5;
int i = -1;
unsigned int u = -1;
[snip]
Even if your code never runs on anything other that the system you
wrote it for, an optimizing compiler may assume that no undefined
behavior occurs. For example, if you write (p + u), it can assume
that p is in the range 0 to 5, and perform optimizations that depend
^
Is this a mis-typed ``u'' ?
 
B

Balban

I don't think that's what's happening.

As has already been mentioned, arithmetic on void* is a gcc-specific
extension; in standard C, it's a constraint violation, requiring a
diagnostic.

But the same thing applies to arithmetic on char*, which is well
defined by the standard.

Adding a pointer and an integer (p + i) yields a new pointer value
that points i elements away from where p points.  For example, if p
points to the element 0 of an array, then (p + 3) points to element 3
of the same array.  If p points to element 7 of an array, then (p - 2)
points to element 5 of the same array.

It would have been helpful if you had shown us an example of what
you're talking about.  But suppose we have:

Thanks to all who answered. I have the following code which had
unexpected behavior for me:


#define PAGER_VIRTUAL_START 0xa039d000

/*
* Find the page's offset from virtual start, add it to membank
* physical start offset
*/
void *virt_to_phys(void *v)
{
return v - PAGER_VIRTUAL_START + membank[0].start;
}

membank[0].start is an unsigned long of value 0x100000

Now if I pass v argument with a value of 0xa039d000 to this function,
I get a return value of 0x400000. Note v = 0xa039d000 means that v and
PAGER_VIRTUAL_START would cancel out and return value would be the
value of membank[0].start which is 0x100000

Below is the corrected code.

/*
* Find the page's offset from virtual start, add it to membank
* physical start offset
*/
void *virt_to_phys(void *v)
{
unsigned long vaddr = (unsigned long)v;

return (void *)(vaddr - PAGER_VIRTUAL_START +
membank[0].start);
}

This one behaves as I expected, returning 0x100000.


Thanks,

Bahadir
 
S

Seebs

Thanks to all who answered. I have the following code which had
unexpected behavior for me:
#define PAGER_VIRTUAL_START 0xa039d000
/*
* Find the page's offset from virtual start, add it to membank
* physical start offset
*/
void *virt_to_phys(void *v)
{
return v - PAGER_VIRTUAL_START + membank[0].start;
}
membank[0].start is an unsigned long of value 0x100000
Hmm.

Now if I pass v argument with a value of 0xa039d000 to this function,
I get a return value of 0x400000. Note v = 0xa039d000 means that v and
PAGER_VIRTUAL_START would cancel out and return value would be the
value of membank[0].start which is 0x100000

Hmm.

It does seem so, and indeed, that's the behavior I get from gcc for this
test program:

#include <stdio.h>

#define PVS 0xa039d000
unsigned long mb0s = 0x100000;

void *vtp(void *v) {
return v - PVS + mb0s;
}

int
main(void) {
printf("%p\n", vtp((void *) PVS));
return 0;
}

This produces 0x100000, as you appeared to expect. I can't see any reason
for it to yield other values, but so far as I can tell, it's equivalent to
what you described above.
Below is the corrected code.

This code is probably less robust than you want it to be.
void *virt_to_phys(void *v)
{
unsigned long vaddr = (unsigned long)v;

return (void *)(vaddr - PAGER_VIRTUAL_START +
membank[0].start);
}

Don't use "unsigned long" -- there are real targets on which unsigned long
is smaller than a pointer.

Try:

void *
virt_to_phys(void *v)
{
unsigned char *u = v;
return u - (PAGER_VIRTUAL_START + membank[0].start);
}

Rationale:

You have a pair of unsigned long values. Do the arithmetic on those,
then use the single offset, once, on an object that is of the right type
to have defined semantics. (Obviously, semantics are not defined in
general for pointer arithmetic outside the bounds of a C object, but in
your case I think it's reasonable to assume that you have a good view of
the nature of the address space.)

If you want to do arithmetic on addresses, "unsigned char *" is nearly
always the right type. If you want to do arithmetic on addresses in
an integer type, see if your target has "intptr_t" defined, and if so,
use that. (It's been standard since C99, but implementation isn't universal;
it should be in <stdint.h> if it exists, and I think there's a feature
test macro for it.)

-s
 
K

Keith Thompson

[snip]
char arr[10];
char *p = arr + 5;
int i = -1;
unsigned int u = -1;
[snip]
Even if your code never runs on anything other that the system you
wrote it for, an optimizing compiler may assume that no undefined
behavior occurs. For example, if you write (p + u), it can assume
that p is in the range 0 to 5, and perform optimizations that depend
^
Is this a mis-typed ``u'' ?
on that assumption.

Yes, thank you.
 
B

Balban

This produces 0x100000, as you appeared to expect.  I can't see any reason
for it to yield other values, but so far as I can tell, it's equivalent to
what you described above.

It might be that it is a compiler bug then. It is a cross-compiler and
I suspect the generated assembler is not correct.
Don't use "unsigned long" -- there are real targets on which unsigned long
is smaller than a pointer.

This is going into off-topic areas but as far as I know at least in 32
and 64-bit machines unsigned long always gives the machine's
addressing size whereas unsigned int would give you the machine word
i.e. register size.

But you do have a point in that char * is fairly safe for pointer
arithmetic.

Thanks,

Bahadir
 
S

Seebs

This is going into off-topic areas but as far as I know at least in 32
and 64-bit machines unsigned long always gives the machine's
addressing size whereas unsigned int would give you the machine word
i.e. register size.

Not always. There have been machines on which long was 32-bit and pointer
was 64-bit. Not many, perhaps, and it's arguably a pretty bad choice of
sizes, but it's been done -- that's a big part of why we have "long long".
But you do have a point in that char * is fairly safe for pointer
arithmetic.

And, if you really are seeing a compiler bug, this may also work around
it. :)

-s
 
B

Barry Schwarz

It might be that it is a compiler bug then. It is a cross-compiler and
I suspect the generated assembler is not correct.


This is going into off-topic areas but as far as I know at least in 32
and 64-bit machines unsigned long always gives the machine's
addressing size whereas unsigned int would give you the machine word
i.e. register size.

There are many shades of gray. On IBM z-Architecture machines, a word
is 32 bits while the hardware registers are 64 bits. Furthermore,
unsigned long is 64 bits whether the addressing mode (which is under
program control) is 64 or 32 bits. (There is also a 24 bit
addressing mode for backward compatibility and unsigned long is still
64 bits.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top