Byte Address Arithmetic Debate

F

Frederick Gotham

There is a thread currently active on this newsgroup entitled:

"how to calculate the difference between 2 addresses ?"

The thread deals with calculating the distance, in bytes, between two
memory addresses. Obviously, this can only be done if the addresses refer
to elements or members of the same object (or base objects, etc.).

John Carson and I proposed two separate methods.

I disagree with John's solution, and John disagrees with mine. Therefore,
I'd like to present them both here and see what the audience thinks.

Firstly, we shall start off with a simple POD type:

struct MyPOD {
int a;
double b;
void *c;
short d;
bool e;
int f;
};

Given an object of this type, we shall calculate the distance, in bytes,
between the "b" member and the "e" member.

My own method is as follows:

reinterpret_cast<char const volatile*>(&obj.e)
- reinterpret_cast<char const volatile*>(&obj.b)

John's method is as follows:

reinterpret_cast<long unsigned>(&obj.e)
- reinterpret_cast<long unsigned(&obj.b);

In defence of my own method:

(1) Any byte address can be accurately stored in a char*.

In attack of John's method:

(1) The Standard doesn't necessitate the existance of an integer type
large enough to accomodate a memory address.
(2) Even if such a type exists, the subtraction need not yield the
correct answer (e.g. if each integer 1 represents half a byte, or a quarter
of a byte).

Of course, seeing as how _I_ started this thread, it may be a little biased
toward my own ends, but I hope we get to the bottom of this objectively.
 
D

David Harmon

On Sun, 19 Nov 2006 20:05:11 GMT in comp.lang.c++, Frederick Gotham
Given an object of this type, we shall calculate the distance, in bytes,
between the "b" member and the "e" member.

#include <cstddef>
offsetof(MyPOD, e) - offsetof(MyPOD, b)
 
F

Frederick Gotham

David Harmon:
On Sun, 19 Nov 2006 20:05:11 GMT in comp.lang.c++, Frederick Gotham


#include <cstddef>
offsetof(MyPOD, e) - offsetof(MyPOD, b)


I'll rephrase the question:

Given two memory addresses in the form of pointers -- pointer types which
may be different -- calculate the distance in bytes between them. The
pointers refer to parts of the same object.
 
S

Salt_Peter

Frederick said:
David Harmon:



I'll rephrase the question:

Given two memory addresses in the form of pointers -- pointer types which
may be different -- calculate the distance in bytes between them. The
pointers refer to parts of the same object.

Not that i'm trying deliberately to be a pain in the attic, but what do
you mean by between them?
Thats not the same as offset.

struct test
{
int n;
int i;
};

The distance in bytes between a test instance.n and instance.i would be
zero assuming no padding is involved. Remember: To assume == makes an
ASS out of U and ME.
 
F

Frederick Gotham

Salt_Peter:
Not that i'm trying deliberately to be a pain in the attic, but what do
you mean by between them?


Let's say that a certain object is located at memory address 14.

Let's say that another object is located at memory address 18.

This distance between them is 4.

Thats not the same as offset.

struct test
{
int n;
int i;
};

The distance in bytes between a test instance.n and instance.i would be
zero assuming no padding is involved.


We're just looking for the amount of bytes between two addresses.

Let's say that &obj.n == Memory Byte Address 56
Let's say that &obj.i == Memory Byte Address 60

Therefore, the distance between them is 4 bytes.

Remember: To assume == makes an
ASS out of U and ME.

Should I understand that somehow?
 
G

Greg

Frederick said:
Salt_Peter:



Let's say that a certain object is located at memory address 14.

Let's say that another object is located at memory address 18.

This distance between them is 4.




We're just looking for the amount of bytes between two addresses.

Let's say that &obj.n == Memory Byte Address 56
Let's say that &obj.i == Memory Byte Address 60

Therefore, the distance between them is 4 bytes.

There is no guarantee that converting a pointer to an integer value
will produce the logical address of the referenced object. So neither
of the two approaches is certain to be portable. In fact, the only
portable approach available is to use the offsetof macro - either to
calculate the distance between the start of a POD object and one of its
members, or between any two members of the same object:

std::abs( offsetof(MyPOD, e) - offsetof(MyPOD, b));

Greg
 
F

Frederick Gotham

Greg:
There is no guarantee that converting a pointer to an integer value
will produce the logical address of the referenced object. So neither
of the two approaches is certain to be portable.


My claim is that the char* method is perfect.

#include <cstddef>

template<class A,class B>
std::ptrdiff_t BytesBetween(A const &a,B const &b)
{
return reinterpret_cast<char const volatile*>(&b)
- reinterpret_cast<char const volatile*>(&a);
}

Of course, both "a" and "b" must refer to parts of the same object.
 
J

John Carson

Frederick Gotham said:
There is a thread currently active on this newsgroup entitled:

"how to calculate the difference between 2 addresses ?"

The thread deals with calculating the distance, in bytes, between two
memory addresses. Obviously, this can only be done if the addresses
refer to elements or members of the same object (or base objects,
etc.).

John Carson and I proposed two separate methods.

I disagree with John's solution, and John disagrees with mine.
Therefore, I'd like to present them both here and see what the
audience thinks.

Just to be clear: I don't claim my approach is more correct than yours. I
think they both involve implementation-defined behavior according to the
Standard. Both will usually work in practice. My preference for converting
to an integer is more of an aesthetic one. The aesthetics may differ
depending on the exact nature of the problem.
Firstly, we shall start off with a simple POD type:

struct MyPOD {
int a;
double b;
void *c;
short d;
bool e;
int f;
};

Given an object of this type, we shall calculate the distance, in
bytes, between the "b" member and the "e" member.

My own method is as follows:

reinterpret_cast<char const volatile*>(&obj.e)
- reinterpret_cast<char const volatile*>(&obj.b)

John's method is as follows:

reinterpret_cast<long unsigned>(&obj.e)
- reinterpret_cast<long unsigned(&obj.b);

I wish to cast it to a pointer-sized integer. This is not synonymous with
long unsigned. Indeed on Win64, long unsigned is smaller than pointer-sized
(crazy, I know), but a pointer-sized integer nevertheless exists.
In defence of my own method:

(1) Any byte address can be accurately stored in a char*.

Any pointer can be cast to char*. However, by Section 5.2.10/3:

"The mapping performed by reinterpret_cast is implementation-defined. [Note:
it might, or might not, produce a representation different from the original
value. ]"

This applies equally to my method.
In attack of John's method:

(1) The Standard doesn't necessitate the existance of an integer
type large enough to accomodate a memory address.

True, but not an issue on most platforms.
(2) Even if such a type exists, the subtraction need not yield the
correct answer (e.g. if each integer 1 represents half a byte, or a
quarter of a byte).

If your cast can produce "a representation different from the original
value", I don't see that it offers an advantage. Moreover, Section 5.2.10/4
says that the conversion to an integer value "is intended to be unsurprising
to those who know the addressing structure of the underlying machine", which
provides an assurance of sorts for my preferred approach.

Finally, I point out that the Standard doesn't guarantee an integer type
large enough to store the result of the subtraction (See Section 5.7/6).
Once again, both approaches rely on an implementation-defined feature (or on
the choice of suitable addresses to compare).
 
G

Greg

Frederick said:
Greg:



My claim is that the char* method is perfect.

#include <cstddef>

template<class A,class B>
std::ptrdiff_t BytesBetween(A const &a,B const &b)
{
return reinterpret_cast<char const volatile*>(&b)
- reinterpret_cast<char const volatile*>(&a);
}

Of course, both "a" and "b" must refer to parts of the same object.

In order to subtract pointer a from pointer b, both a and b must point
to the same kind of object and the objects that they point to, must
both be members of the same array. Since the BytesBetween() function
template observes neither of these requirements, there is no guarantee
that its behavior will be defined.

"Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is
undefined." [§5.7/7]

C++ would not need the offsetof macro if there were another, portable
way to calculate the distance between two members of an object.

Greg
 
K

Kai-Uwe Bux

Greg said:
C++ would not need the offsetof macro if there were another, portable
way to calculate the distance between two members of an object.

That seems incorrect: the difficulty with the offsetof macro is the need for
compile-time evaluation. That makes it impossible to create an instance of
the struct type and measure offsets of its members. Thus, even if you had a
perfectly fine method of computing distances of members of an object, it
would not help in writing an offsetof macro.


Best

Kai-Uwe Bux
 
D

David Harmon

On Sun, 19 Nov 2006 20:58:34 GMT in comp.lang.c++, Frederick Gotham
I'll rephrase the question:

I'll still dodge it.
Eschew undefined behavior.
Cast not thy pointers into the void.
 
G

Greg

Kai-Uwe Bux said:
That seems incorrect: the difficulty with the offsetof macro is the need for
compile-time evaluation. That makes it impossible to create an instance of
the struct type and measure offsets of its members. Thus, even if you had a
perfectly fine method of computing distances of members of an object, it
would not help in writing an offsetof macro.

Counting the number bytes from the start of an object to one of its
members is not the only way to express the distance. But since the
requirement in this case is to provide a byte measurement of the
distance - the offsetof macro is the only portable way to obtain that
figure.

Requiring that the offset of a class member be expressed in bytes is of
course a completely artificial constraint - no C++ program would ever
face such a limitation. After all, no program calls offsetof simply to
obtain a number. Instead the number that offsetof returns is useful
only insofar as the program can use that value to gain access to the
specified class member given a pointer to a class object.

In C++, member access through an object pointer is already possible by
applying a member pointer to the object pointer. A member pointer
essentially abstracts the offset of a class member, and hides the
implementation details from the C++ program. So although a C++ program
cannot recover the byte distance of the offset that is stored within a
member pointer - a member pointer is still more useful than the
offsetof macro since a member pointer is not limited to members of POD
classes only.

Greg
 
F

Frederick Gotham

John Carson:
I think they both involve implementation-defined behavior according to
the Standard. Both will usually work in practice.


My own claim is that _my_ code is perfectly fine. I also claim that your
code is not OK, even though I acknowledge it would work on a lot of
systems.

I could imagine a system which doesn't have 8-Bit bytes, but which has a
layer between the machine and the C implementation that makes you think
there are 8-Bit bytes. Let's say that the machine actually has 4-Bit bytes.
When you cast to integer type and subtract, your result might be double
what you thought it would be.

Any pointer can be cast to char*. However, by Section 5.2.10/3:

"The mapping performed by reinterpret_cast is implementation-defined.
[Note: it might, or might not, produce a representation different from
the original value. ]"


There are several exceptions to the whole "reinterpret_cast is a wild
animal" idea. Casting to char* or void* is one of them. Another would be
casting from a POD pointer to a pointer to the first member in the POD.

True, but not an issue on most platforms.


On every platform though, the char* subtraction will work.

Moreover, Section
5.2.10/4 says that the conversion to an integer value "is intended to be
unsurprising to those who know the addressing structure of the
underlying machine", which provides an assurance of sorts for my
preferred approach.


What if we're working with the 4-Bit system disguised as an 8-Bit system?

Finally, I point out that the Standard doesn't guarantee an integer type
large enough to store the result of the subtraction (See Section 5.7/6).
Once again, both approaches rely on an implementation-defined feature
(or on the choice of suitable addresses to compare).


Are you sure about that? The purpose of ptrdiff_t is to store the result of
subtracting two pointers. Presumably, if the subtraction of the pointers is
valid, then the type should be able to hold the value.
 
F

Frederick Gotham

Frederick Gotham:
There are several exceptions to the whole "reinterpret_cast is a wild
animal" idea. Casting to char* or void* is one of them. Another would be
casting from a POD pointer to a pointer to the first member in the POD.


In the past, I've seen people so fearful of reinterpret_cast that they write:

char *p = static_cast<char*>(static_cast<void*>(&obj));

I myself just write:

char *p = (char*)&obj;
 
J

John Carson

Frederick Gotham said:
John Carson:

I could imagine a system which doesn't have 8-Bit bytes, but which
has a layer between the machine and the C implementation that makes
you think there are 8-Bit bytes. Let's say that the machine actually
has 4-Bit bytes. When you cast to integer type and subtract, your
result might be double what you thought it would be.

That would depend on the implementation.
There are several exceptions to the whole "reinterpret_cast is a wild
animal" idea. Casting to char* or void* is one of them. Another would
be casting from a POD pointer to a pointer to the first member in the
POD.

The effect of reinterpret_cast on a POD pointer is specified in the Standard
(section 9.2/17). The others are not as far as I am aware.
On every platform though, the char* subtraction will work.

The char* cast will work. The subtraction isn't guaranteed.
What if we're working with the 4-Bit system disguised as an 8-Bit
system?

I don't know, but the implementation should say what would happen.
Are you sure about that? The purpose of ptrdiff_t is to store the
result of subtracting two pointers. Presumably, if the subtraction of
the pointers is valid, then the type should be able to hold the value.

I can only go by the Standard, which I have already quoted in the previous
thread. The result of such a subtraction is a signed type and as such has a
maximum absolute value only half the size of the largest value supported by
the corresponding unsigned type. If addresses can have any value covered by
the unsigned type, this creates the possibility of overflow.
 
F

Frederick Gotham

John Carson:

(Referring to pointer arithmetic)
The result of such a subtraction is a signed type and
as such has a maximum absolute value only half the size of the largest
value supported by the corresponding unsigned type. If addresses can
have any value covered by the unsigned type, this creates the
possibility of overflow.


I think though that this argument can be countered by a combination of the
following excerpts from the Standard.

3.9.2
For any object (other than a base-class subobject) of POD type T, whether
or not the object holds a valid value of type T, the underlying bytes (1.7)
making up the object can be copied into an array of char or unsigned
char.36) If the content of the array of char or unsigned char is copied
back into the object, the object shall subsequently hold its original
value.

Therefore, we can do the following:

double arr[64] = { ... };

char unsigned buf[sizeof arr];

memcpy(buf,arr,sizeof buf);

The array object, "buf", is a fully-fledged object type.

Now let's read about ptrdiff_t:

5.7.6
When two pointers to elements of the same array object are subtracted, the
result is the difference of the subscripts of the two array elements. The
type of the result is an implementation-defined signed integral type; this
type shall be the same type that is defined as ptrdiff_t in the <cstddef>
header (18.1). As with any other arithmetic overflow, if the result does
not fit in the space provided, the behavior is undefined. In other words,
if the expressions P and Q point to, respectively, the i-th and j-th
elements of an array object, the expression (P)-(Q) has the value i–j
provided the value fits in an object of type ptrdiff_t.

I'm glad to see we're agreed that the casting to char* is OK. What I find
annoying though is the situation with ptrdiff_t... I'm going to take this
over to comp.std.c++.
 
S

Steve Pope

Frederick Gotham said:
I'll rephrase the question:
Given two memory addresses in the form of pointers -- pointer types which
may be different -- calculate the distance in bytes between them. The
pointers refer to parts of the same object.

You can't. You can only subtract pointers if they are pointing
to the same type of object, and then only if the pointed-to
objects are elements of the same array of such objects.

And even then, you will not necessarily get the distance in bytes.

Just my opinion.

Steve
 
F

Frederick Gotham

Steve Pope:
You can't. You can only subtract pointers if they are pointing
to the same type of object, and then only if the pointed-to
objects are elements of the same array of such objects.

And even then, you will not necessarily get the distance in bytes.

Just my opinion.


I don't see why there would be anything wrong with the following:

struct SomePOD {
int a;
char b;
int arr[5];
};

struct Base {
double a;
SomePOD b;
void *c;
};

struct Derived : Base {
double d;
Base e;
};

#include <cstddef>

template<class A,class B>
std::ptrdiff_t BytesBtwn(A const *const p,B const *const q)
{
return (char const volatile*)q - (char const volatile*)p;
}

int main()
{
Derived const volatile obj = Derived();

ptrdiff_t const i = BytesBtwn(obj.b.arr+2,&obj.e.b.a);
}
 
S

Steve Pope

Frederick Gotham said:
Steve Pope:
I don't see why there would be anything wrong with the following:

struct SomePOD {
int a;
char b;
int arr[5];
};

struct Base {
double a;
SomePOD b;
void *c;
};

struct Derived : Base {
double d;
Base e;
};

#include <cstddef>

template<class A,class B>
std::ptrdiff_t BytesBtwn(A const *const p,B const *const q)
{
return (char const volatile*)q - (char const volatile*)p;
}

int main()
{
Derived const volatile obj = Derived();

ptrdiff_t const i = BytesBtwn(obj.b.arr+2,&obj.e.b.a);
}

This would not give the difference in bytes on architectures
for which the address of an int is a word address.

(Now, I admit not having seen such an architecture for 20
years or so, but they may still be around.)

Steve
 
F

Frederick Gotham

Steve Pope:
This would not give the difference in bytes on architectures
for which the address of an int is a word address.


Sorry I don't understand, could you please explain that?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top