sizeof (size_t) and sizeof (pointer)

Alex Vinokur · Nov 12, 2007

Does it have to be? :
sizeof (size_t) >= sizeof (pointer)

Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Ron Natalie · Nov 12, 2007

Alex said:
Does it have to be? :
sizeof (size_t) >= sizeof (pointer)

No. size_t only has to be big enough to represent the
maximum number of objects that could be created. There
are implementations where the sizeof the pointer is bigger
than even the number of chars that could be allocated (i.e,
not all the bits in the pointer were used to contribute
tot he address). It's also not the case that all pointers
need to be the same size.

Alf P. Steinbach · Nov 12, 2007

* Alex Vinokur:

Does it have to be? :
sizeof (size_t) >= sizeof (pointer)

No.

Cheers, & hth.,

- Alf

Juha Nieminen · Nov 12, 2007

Ron said:
It's also not the case that all pointers
need to be the same size.

Is that really so? I thought that it must be possible to cast any
pointer to and from a void*. If there were different-sized pointers then
it could be rather problematic.

Alf P. Steinbach · Nov 12, 2007

* Juha Nieminen:

Is that really so? I thought that it must be possible to cast any
pointer to and from a void*. If there were different-sized pointers then
it could be rather problematic.

There are different size pointers, e.g. member pointers tend to be
larger than others. Then there are function pointers in general, which
for freestanding functions tend to be the same size as data pointers,
but cannot be cast to void*. And that's intentionally "problematic".

Cheers, & hth.,

- Alf

Victor Bazarov · Nov 12, 2007

Juha said:
Is that really so? I thought that it must be possible to cast any
pointer to and from a void*. If there were different-sized pointers
then it could be rather problematic.

What if void* is at least as large as the largest of them? If the
sizes do differ, it would makes sense, no?

V

Bo Persson · Nov 12, 2007

Juha Nieminen wrote:
:: Ron Natalie wrote:
::: It's also not the case that all pointers
::: need to be the same size.
::
:: Is that really so? I thought that it must be possible to cast any
:: pointer to and from a void*. If there were different-sized
:: pointers then it could be rather problematic.

You can only reliably cast the pointer back to the original type. So,
as long as void* is among the largest types, other pointers can be
smaller.

Bo Persson

Andrey Tarasevich · Nov 13, 2007

Alex said:
Does it have to be? :
sizeof (size_t) >= sizeof (pointer)
...

Firstly, by 'pointer' you probably mean something like 'void*', since,
say, an object of 'SomeType (SomeClass::*)()' is also a "pointer" (a
pointer-to-member-function) and it most likely will be [a lot] bigger
that a 'size_t' on the same implementation.

Secondly, even the ordinary 'void*' can be bigger than 'size_t'. In
fact, typical DOS/Win16 implementations used to have a 16-bit 'size_t'
and 32-bit 'void*' pointers (depended on memory model).

Andrey Tarasevich · Nov 13, 2007

Juha said:
Is that really so? I thought that it must be possible to cast any
pointer to and from a void*. If there were different-sized pointers then
it could be rather problematic.

Well, to be C++-pedantic, you can expect to cast literally _any_ pointer
to 'void*' that way. Only pointers to object types can be cast to
'void*' and back. Having noted that, the round-trip conversion 'T*' ->
'void*' -> 'T*' is indeed guaranteed to preserve the original value of
'T*' pointer, but that only means that value representation of 'void*'
is at least as "precise" (as big) as the value representation of any
'T*' type. Yet various 'T*' can still have different representations
(including different sizes).

Andrey Tarasevich · Nov 13, 2007

Ron said:
size_t only has to be big enough to represent the
maximum number of objects that could be created.

Hmm... By definition, size_t has to be big enough to represent the
number of bytes in a single object. If you prefer to express it in terms
of "number of objects", it should probably sound like "size_t only has
to be big enough to represent the maximum number of [continuous] bytes
that could be allocated for a single object". Although I don't see the
point in trying to reformulate it like that.

To say that it should represent "maximum number of objects that could be
created" is misleading. Quite the opposite, in general case there's
nothing that prevents one from creating more objects than size_t can count.

James Kanze · Nov 13, 2007

Is that really so? I thought that it must be possible to cast any
pointer to and from a void*. If there were different-sized pointers then
it could be rather problematic.

It is, and your assumption is wrong. You can cast any pointer
to an object to void*, but all you are guaranteed then is that
you can cast it back to the original type without loss of
information. There are also guarantees concerning accessing
objects as arrays of char or unsigned char. All of which,
together, more or less imply that sizeof(void*) == sizeof(char*)
(an explicit requirement in the C standard), and that
sizeof(void*) >= sizeof(T*) for all object types T. I've worked
on systems where char* was larger than int*, and of course,
systems where void (*)() had a different size than char* were
quite frequent once upon a time---you can still find their
descendants in use today. (The same systems often had a size_t
which was smaller than a data pointer. And in some cases, data
pointers or function pointers which were larger than any
integral type.)

You cannot, of course, convert a pointer to function, or any
ponter to member, to a void*; an attempt to do so is an error,
and requires a diagnositic.

Some standards place stricter requirements on the
implementation: Posix does require object pointers and function
pointers have the same size and representation, for example.

James Kanze · Nov 13, 2007

Does it have to be? :
sizeof (size_t) >= sizeof (pointer)

Formally, there's no relation. Both can be pretty much anything
the implementation wants. Practically, you have the
relationship reversed: I can't think of a reason why
sizeof(size_t) <= sizeof(char*) would ever hold. (Of course, if
by "pointer", you mean any pointer, and not just a pointer to an
object, anything goes. I've used systems where the size of a
function pointer was two bytes, but size_t and data pointers
were four bytes.)

Ron Natalie · Nov 13, 2007

Juha said:
Is that really so? I thought that it must be possible to cast any
pointer to and from a void*. If there were different-sized pointers then
it could be rather problematic.

Correct. void* has to be the same as char* and that will hold anything.
Pointers to larger objects need not be as big. It is certainly not the
case that you can do this and I've worked on machines where it would
fail bizarrely:

char char_array[8];
char* charp = char_aray + 1;

int* intp = (int*) charp;
char* charp2 = (char*) intp;

There's no guarantee that int* need represent all the legal char*
values. On some machines it would shift the low order bits of
the pointer off in the conversion.

I did compiler work and ported UNIX to a Denelcor HEP supercomputer
decades ago. This machine encoded the operand size in the low order
bits of the non-character pointer. This can lead to all sorts of
fun if you manage to do soemthing like this:

int x;
union {
int* ip;
short* sp;
} carbide;
carbide.ip = &x;

short* sp = carbide.sp;

*sp = 5; // boom... sp has an "int" sized operand representation

I know this because the BSD UNIX kernel did effectively the above all
over internally.

Juha Nieminen · Nov 14, 2007

Bo said:
You can only reliably cast the pointer back to the original type. So,
as long as void* is among the largest types, other pointers can be
smaller.

At least in gcc in a 32-bit linux system it seems that a method
pointer is 8 bytes long, while a void* is 4 bytes.

I know this is not related to standard C++ per se, but why does a
method pointer need to be larger than a function pointer? I can't think
of any technical reason for this, because a method cannot be called
through a pointer without an object anyways, so any additional info the
function pointer needs would be in that object, wouldn't it?

Joel Yliluoma · Nov 15, 2007

At least in gcc in a 32-bit linux system it seems that a method
pointer is 8 bytes long, while a void* is 4 bytes.

I know this is not related to standard C++ per se, but why does a
method pointer need to be larger than a function pointer? I can't think
of any technical reason for this, because a method cannot be called
through a pointer without an object anyways, so any additional info the
function pointer needs would be in that object, wouldn't it?

Because a method pointer can point to a virtual function or a non-virtual
function, and when declaring the method pointer, you cannot know where it
will point to.

Say, you have this:

class A;
typedef void (A::*Aptr) ();
Aptr ptrtable[2];

Are the pointers stored in ptrtable virtual or not? You don't know.
You don't even know whether A has virtual functions or not, and thus
whether there is need to express virtual functions. So you need to
be able.

In fact, they may be both, virtual and non-virtual:

class A
{
public:
virtual void afunc() { }
void bfunc() { }
};

int main()
{
ptrtable[0] = &A::afunc;
ptrtable[1] = &A::bfunc;
}

This is valid code.

On the 64-bit and 32-bit Linux systems, GCC and ICC implement method
pointers as a pair of two pointer-size integers, with the following
semantics:

If the first value is even, the second will be zero.
In this case, the first value is a pointer to the member function,
that is not virtual. To follow the pointer, just read the pointer
and jump to that address.
If the first value is odd, this indicates a virtual function.
In this case, the following algorithm will be applied to acquire
the actual function address:
Add the second value to the address of the instance for which
you are calling the method. Read a pointer from that address.
Add the first value, minus 1, to that address. Read a pointer
from that resulting address.
Then jump to the address indicated by that pointer.

With testing I couldn't figure out the situations where the second
value would actually be non-zero, but I trust there are some.
On different platforms, the mechanics behind method pointers can
obviously be different.

Ron Natalie · Nov 15, 2007

Joel said:
Are the pointers stored in ptrtable virtual or not? You don't know.
You don't even know whether A has virtual functions or not, and thus
whether there is need to express virtual functions. So you need to
be able.

Further, in the case of virtual/multiple inheritance it needs to be able to
have the offset to adjust the "this" pointer as well.

If your compiler is ABSOLUTELY standards compliant, all pointers to
member functions need to be the same size (regardless of whether
there are virtual / multiple inheritance). This is because there
is no "void*" like super poitner for pointer-to-member and someone
made the stupid-assed decision that you should thus be able to
cast between pointer-to-member types and back without losing
information.

James Kanze · Nov 15, 2007

Joel Yliluoma wrote:

Further, in the case of virtual/multiple inheritance it needs
to be able to have the offset to adjust the "this" pointer as
well.

If your compiler is ABSOLUTELY standards compliant, all
pointers to member functions need to be the same size
(regardless of whether there are virtual / multiple
inheritance). This is because there is no "void*" like super
poitner for pointer-to-member and someone made the
stupid-assed decision that you should thus be able to cast
between pointer-to-member types and back without losing
information.

I don't think that that's the only reason. You can have a
pointer to a member of an incomplete type, so the compiler
cannot possibly know whether there are virtual functions,
mulitple inheritance, etc. or not. VC++ does the optimizations
you refer to, unless you specify otherwise. With the result
that you cannot reliably pass pointer to member functions as
arguments: something like:

class Toto ;

void
f( Toto* p, void (Toto::*f)() )
{
p->*f() ;
}

will not work.

Because you can have pointers to member of an incomplete type,
all pointers to member functions must have the same
representation.

Ben Rudiak-Gould · Nov 28, 2007

Joel said:
On the 64-bit and 32-bit Linux systems, GCC and ICC implement method
pointers as a pair of two pointer-size integers, with the following
semantics:

[ridiculously convoluted semantics omitted]

I don't know why so many compiler writers implement method pointers in such
a complicated way. The easy way to do it is:

* A method pointer is internally just a function pointer (perhaps with a
different calling convention, like fast-this).

* A call x->*p(args...) just does p(x,args...).

* When taking the address of a method, if it can be represented in the
above format, do so; otherwise, generate a proxy function equivalent to

rtntype T:

roxy(args) { return method(args); }

and point to that.

This avoids the need for complicated special-case logic for method pointers;
the proxy function always looks the same, and while the code it generates
may be complicated, the logic for generating it is already implemented. What
these other representations amount to is a gratuitous runtime state-machine
implementation of something that could have been compiled to native code
with less implementation effort and probably greater runtime efficiency. Not
to mention that this representation could be easily standardized as part of
an ABI, and is good for implementing delegates.

-- Ben

Ron Natalie · Nov 29, 2007

Ben said:
* When taking the address of a method, if it can be represented in the
above format, do so; otherwise, generate a proxy function equivalent to

rtntype T:roxy(args) { return method(args); }

How is this any more efficent or less convoluted than storing the method
pointer and a constant to add to the "this" pointer?

Ben Rudiak-Gould · Nov 30, 2007

Ron said:
How is this any more efficent or less convoluted than storing the method
pointer and a constant to add to the "this" pointer?

If Base::f() is virtual there's no method pointer you can store, because if
Derived overloads f() and x is a Derived, (x.*&Base::f)() calls
Derived::f(). (This is slightly odd given that x.Base::f() calls Base::f().
If they'd given that semantics to member pointers, none of this complexity
would exist.) So we need to encode a second case for virtual functions in
there somehow, and test it at each call site where the call might be virtual
(i.e. where Base is an incomplete type or contains some virtual method
compatible with the method pointer's type).

If f() is non-virtual, but implemented in a virtual base class of Base, then
we have a method pointer but no offset. This would need another case, except
that the standard doesn't require implementations to handle it. (It says
that &Base::f is a BasicBase::*, which isn't compatible with Base::*.)

What if f() is virtual and implemented in a virtual base class? On most
implementations, this is simpler than the previous case. We can handle it
like any other virtual function, because the compiler generated a
pointer-adjusting thunk to put into the vtable -- and it did that so the
vtable could be a vector of function pointers instead of a vector of
pointer-plus-this-adjustment-with-special-case-for-virtual-base thingies.
Vtable entries are method pointers, and they're always, to my knowledge,
implemented in just the way I'm suggesting that surface-language method
pointers should be. Almost all of the necessary code is already in the compiler.

This technique is certainly faster for the trivial case, and almost
certainly faster for the general non-virtual case, since the pointers are
half the size and each call requires an indirect jump and an unconditional
direct jump instead of an indirect jump and a conditional jump (with
potential misprediction). I see two problems with it. One is that it's
almost certainly slower for virtual methods (two indirect jumps), but I
think pointers to virtual methods are much rarer than pointers to
non-virtual methods in the wild. The other is that you can't implement
semantics-preserving casts from Base::* to Derived::* (or vice versa) in
nontrivial cases without horrible convolutions. The only sensible way I can
see to do it is to turn the cast into

switch (p) {
case &Base::f: return &Derived::f;
case &Base::g: return &Derived::g;
// ...
}

which is only workable if you have some way of guaranteeing that you haven't
generated duplicate thunks for the same method. This isn't a fatal problem
since the standard doesn't require such casts to work (unless you cast the
pointer back before using it). It's akin to casting from void (*)(Base*) to
void (*)(Derived*), which would be even harder to implement.

-- Ben

accumulate instead of for-loop	4	Apr 3, 2008
Difference between operator and function	7	Oct 29, 2006
sizeof of expression & sizeof of type	15	Jun 10, 2006
Something like push_back() for valarray<T>	1	Jan 18, 2008
#error and BOOST_STATIC_ASSERT	2	Apr 29, 2008
Is boost::lexical_cast<>() always bijective?	4	Mar 3, 2008
operator new() and new[]	13	Apr 2, 2006
new, delete and nothrow	4	Nov 23, 2006

sizeof (size_t) and sizeof (pointer)

Alex Vinokur

Ron Natalie

Alf P. Steinbach

Juha Nieminen

Alf P. Steinbach

Victor Bazarov

Bo Persson

Andrey Tarasevich

Andrey Tarasevich

Andrey Tarasevich

James Kanze

James Kanze

Ron Natalie

Juha Nieminen

Joel Yliluoma

Ron Natalie

James Kanze

Ben Rudiak-Gould

Ron Natalie

Ben Rudiak-Gould

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads