Alignment

A

Ark Khasin

Keith Thompson wrote:
The alignment of any type must be a factor of its size.
<snip>

struct {short a, b, c;} on my machine (ARM7) is 2 bytes aligned and its
size is 6.
 
H

Harald van =?UTF-8?B?RMSzaw==?=

Ark said:
Keith Thompson wrote:

<snip>

struct {short a, b, c;} on my machine (ARM7) is 2 bytes aligned and its
size is 6.

And 2 is a factor of 6, is it not?
 
A

Ark Khasin

Peter said:
Addresses aren't numbers, but each address includes at least one number
(if it didn't, pointer arithmetic would not be possible). In a flat
address space (the usual case today), the address consists only of a
single number. In a segmented address space, it contains (at least) two
numbers: A segment number and a segment offset, where arithmetic is
often only meaningful on the offset. An address can contain other
information (type information, flags, etc.).

Of course you can interpret any collection of bits as a number ...

hp
Not so, IMHO.
A "number" is not a collection of bits but a member of a set on which
common algebraic operations are defined. (And it may not matter how it
is represented).
There are 4 functions defined on addresses in C:
- typecast, and, for a non-void* type (operation: results in an address)
- adding a number (in the common sense of the word) (operation)
- subtracting a number (operation)
- subtracting two addresses (with a non-address result, so it it is not
an operation)
I don't see how it makes addresses include a number and what that is
supposed to mean
Again, I generally "know" how computers work, but I don't think this
knowledge is portable to the C model of addresses.
-- Ark
 
E

Eric Sosman

Ark Khasin wrote On 08/20/07 17:09,:
Not so, IMHO.
A "number" is not a collection of bits but a member of a set on which
common algebraic operations are defined. (And it may not matter how it
is represented).

He didn't say a collection of bits "is" a number, but
that it can be "interpreted as" a number. He's right: there
is a mapping (many mappings, in fact) from collections of
bits to numbers.
There are 4 functions defined on addresses in C:

Lots more than four.
- typecast, and, for a non-void* type (operation: results in an address)
- adding a number (in the common sense of the word) (operation)

If you're arguing that addresses are not numbers, not
even a little bit, then what is the "common sense of the
word" you refer to?
- subtracting a number (operation)
- subtracting two addresses (with a non-address result, so it it is not
an operation)

Why is this not an operation? The dot product of two
vectors yields a scalar rather than a vector; does that mean
the dot product is not an operation?
I don't see how it makes addresses include a number and what that is
supposed to mean
Again, I generally "know" how computers work, but I don't think this
knowledge is portable to the C model of addresses.

C is rather non-specific about the nature of addresses,
because there have been many addressing schemes in the past
and there may well be more in the future. By and large, C
specifies only a few properties of addresses but does not
describe the model that underlies them. Any model will do,
if it can support adding integers to pointers (in restricted
ranges), subtracting pointers (suitably related), comparing
pointers for equality, comparing them for order (suitably
related), copying their values from place to place, and
so on.

Alignment is one of the few places where C talks about
a property of addresses that C itself doesn't "need." It
describes alignment in terms of a divisibility condition
(hence implying some number-like properties), but doesn't
get into specifics. Why does the Standard tread on this
shaky ground? Because it needs to be able to require that
malloc() return addresses that satisfy whatever alignment
requirements the machine might impose, for example. C's
model of addresses would be perfectly satisfactory if there
were no such thing as alignment -- but since real machines
often *do* impose alignment requirements, the Standard needs
to be able to limit how much trouble they're allowed to make.
C needs to talk about alignment in order to tolerate it.
 
C

CBFalconer

Prehaps you are unaware how slow a floatingpoint comparison is
versus bitwise operations on an int?

So what? I would rather have the program do what it was designed
for, than do something else quicker.
 
C

CBFalconer

Keith said:
Not quite. chars cannot require anything stricter than byte
alignment. Furthermore, your scheme would not allow any legal
alignment for 'struct { char c; short s; }'.

The alignment of any type must be a factor of its size. It's not
required that all alignments are multiples or factors of each other,
but I've never heard of an implementation where they aren't. For
example, if int requires 3-byte alignment and float requires 4-byte
alignment, then a pointer returned by malloc() must point to a chunk
of memory that's at least 12-byte aligned.

I am not pushing "my scheme", just pointing out that the proposed
means of discovering alignment requirements are not necessarily
viable.
 
P

pete

CBFalconer said:
I am not pushing "my scheme", just pointing out that the proposed
means of discovering alignment requirements are not necessarily
viable.

Type punning is not what C is all about anyway.

C is designed to allow any object to be considered
as an array of unsigned char (easily).
But aside from that, aiming a pointer to one type,
at an object of another type,
is *supposed to be* tricky business.
 
K

Keith Thompson

CBFalconer said:
I am not pushing "my scheme", just pointing out that the proposed
means of discovering alignment requirements are not necessarily
viable.

I didn't mean to imply that you're pushing what I called your "scheme"
(I think "hypothetical implementation" would have been a better choice
of words), and I agree with your conclusion. I was just pointing out
a conforming implementation cannot require odd addresses for chars.
The rest of your hypothetical requirements would be perfectly ok, as
long as each type's size is a multiple of its alignment (arrays can't
have gaps).

It might have been useful for the standard to define a macro
ALIGN_MAX, specifying the maximum alignment of any type (typically 4
or 8), and perhaps a typedef for some arbitrary type that requires
that alignment. Most code wouldn't need it, but it would make it
possible to write a portable malloc-like function.
 
C

Chris Torek

However, as others have noted, finding the correct bit is nontrivial.
For full generality you must use also "unsigned char" to access
the correct byte, to avoid tripping over "trap representations" on
weird machines that use ones' complement or some such.

Prehaps you are unaware how slow a floatingpoint comparison is versus
bitwise operations on an int?

Except, perhaps, if floating-point comparison is *faster*. It is
on a number of architectures, given some reasonable assumptions,
such as "value to be tested is currently in a floating-point
register". To examine a floating-point register using integer
instructions, you must store this register to memory, then load
from memory into an integer register. The transfer through memory
can take many clock cycles, during which you could have performed
several single-cycle (or sub-1-cycle) FPU instructions.

(Even on the x86, you have to transfer from FPU stack to memory
in order to perform integer operations.)

(The need to use memory to copy between FPU and integer registers
is a moderately large problem on some architectures. The original
SPARC calling convention, in particular, passed floating-point
values in the integer registers, at noticeable runtime expense,
and even some annoying increases in code size. This was fixed in
the 64-bit calling conventions, where floating parameters are passed
-- and values returned -- in the FPU registers, while integer values
are sent via integer registers. This makes implementing printf()
and other variable-argument functions a bit trickier, so there is
a tradeoff.)
 
E

Eric Sosman

Prehaps you are unaware how slow a floatingpoint comparison is versus
bitwise operations on an int?

Perhaps you are unaware that the presence of floating-
point operands in an expression does not necessarily imply
the use of floating-point hardware to evaluate them?

Perhaps you are unaware that it may take considerable
time to store a `value' already resident in an F-P register
and re-fetch it into the integer ALU?

Perhaps you are unaware that the compilers' optimizers
exploit platform-specific and non-portable knowledge that is
not expressible in C source code?

True story: I once found myself porting a Postscript
engine a PPOE had acquired. The code dealt with a lot of
`float' values, and had apparently at some time in its
history been targeted for a machine where F-P was slow --
maybe simulated in software. In the pre-Standard C of the
time, `float' function arguments were always promoted to
`double' and then demoted to `float' again, and someone had
figured out that this took a lot of unnecessary time in the
F-P simulator. So the functions that would ordinarily have
taken `float' arguments instead looked like

func(ptr)
int *ptr; /* it's K&R style, remember? */
{
float val = *((float*)ptr);
...
}

.... and the calls all looked like

float x;
x = 42.0f;
func (&x);

This may have been an optimization for the F-P-weak
original target, but it was a mighty pessimization for the
F-P-capable machine I was trying to port it to. Practically
every function call involved mutual stalls between the integer
and F-P pipelines, each waiting for an operand the other had
yet to deliver ... Gaaaahhh, what a mess! The code's authors
may have had excellent reason to play this ugly game, and I
didn't curse them for it -- but boy, oh, boy, did I curse and
curse and curse those at my company who decided to buy it.

Write what you mean. Don't be clever until measurement
proves you need to be.
 
A

Ark Khasin

Eric said:
Ark Khasin wrote On 08/20/07 17:09,:

He didn't say a collection of bits "is" a number, but
that it can be "interpreted as" a number. He's right: there
is a mapping (many mappings, in fact) from collections of
bits to numbers.
Just meant to say that numbers are a set closed w.r.t. "operations" of
addition, subtraction and multiplication
But I agree to be flexible with definitions
Lots more than four.


If you're arguing that addresses are not numbers, not
even a little bit, then what is the "common sense of the
word" you refer to?
I just meant that adding a number to an address yields an address
But I agree to be flexible with definitions
Why is this not an operation? The dot product of two
vectors yields a scalar rather than a vector; does that mean
the dot product is not an operation?
I'd say, No: operation's result is in the same domain as the operand(s).
But I agree to be flexible with definitions :)
C is rather non-specific about the nature of addresses,
because there have been many addressing schemes in the past
and there may well be more in the future. By and large, C
specifies only a few properties of addresses but does not
describe the model that underlies them. Any model will do,
if it can support adding integers to pointers (in restricted
ranges), subtracting pointers (suitably related), comparing
pointers for equality, comparing them for order (suitably
related), copying their values from place to place, and
so on.
Emphatically agreed

> Why does the Standard tread on this
shaky ground? Because it needs to be able to require that
malloc() return addresses that satisfy whatever alignment
requirements the machine might impose, for example.
Ain't it good enough to say that malloc returns (on success) an address
good for storing any data type? (With no alignment in scope :)
C's model of addresses would be perfectly satisfactory if there
were no such thing as alignment
Indeed so. If the Standard said, plainly, that an address good for
storing a short is good for storing a char, etc. -- there would be a tad
less confusion and much less fights over definitions.
If I read the Standard correctly, that's precisely what it is saying
while talking about
- "alignment requirements"
- "correctly aligned"
- "inappropriately aligned"
- "member alignment" etc.
Informal (vs. 3.2) talk there about "align to nibble boundaries" only
adds to the confusion.
IMHO, it is safe to treat the word "alignment" in the standard as an
(unfortunate) figure of speech.
-- but since real machines
often *do* impose alignment requirements, the Standard needs
to be able to limit how much trouble they're allowed to make.
C needs to talk about alignment in order to tolerate it.
I am not sure I can understand this argument.

-- Ark
 
E

Eric Sosman

Ark said:
Eric said:
[...]
Why does the Standard tread on this
shaky ground? Because it needs to be able to require that
malloc() return addresses that satisfy whatever alignment
requirements the machine might impose, for example.
Ain't it good enough to say that malloc returns (on success) an address
good for storing any data type? (With no alignment in scope :)

The Standard might have chosen to say that the allocated
area shall be suitable for storing any data type whose sizeof
is not too large, but what it actually says is a bit stronger.
Since all alignment requirements are satisfied, the address
returned by malloc() can be stored in any kind of pointer
variable, even on machines where pointers to different types
have different "precision." For example,

double *p = malloc(1); /* assume alignof(double) > 1 */

must not "damage" the returned value by doing something like
discarding low-order bits that a `double*' doesn't need. A
subsequent free(p) must work properly, meaning that it must
be possible to reconstruct any discarded bits.

The "you can store it if it's not too big" criterion would
allow malloc(1) to return an unaligned address, and on a machine
where converting that address to a `double*' would change it,
that would not be a Good Thing.

Perhaps the Standard could have avoided talking about
alignment if it instead imposed an "assignable to any pointer"
requirement. But this isn't the only place the notion crops
up; it also appears as the reason structs might be padded, as
the reason you might not be able to store a double at "just
any" address, and so on. I imagine that the writers would
rather not have mentioned it, but the elephant would still
have been in the drawing room.
Indeed so. If the Standard said, plainly, that an address good for
storing a short is good for storing a char, etc. -- there would be a tad
less confusion and much less fights over definitions.

It *does* say that any object can be accessed as an array
of bytes, hence that any address can store a char. But it doesn't
say that any address suitable for a long is also suitable for a
double, or for a long double, or for a struct foobar -- that would
put the Standard in the position of dictating machine design. The
Standard does a lot of fancy footwork to stay out of the way of
machine designers as much as it can.
I am not sure I can understand this argument.

Think about word-addressed machines that use "decorated"
or "expanded" pointers to refer to sub-word entities like
bytes and shorts, and consider the problems of converting
between pointer types on such machines; maybe it will make
more sense then. Or maybe not, which is probably all right,
too: Unless you are writing a memory manager, you seldom
need to "do things" with alignment, just tolerate its effects.
 
A

Ark Khasin

Eric said:
Ark said:
Eric said:
[...]
Why does the Standard tread on this
shaky ground? Because it needs to be able to require that
malloc() return addresses that satisfy whatever alignment
requirements the machine might impose, for example.
Ain't it good enough to say that malloc returns (on success) an
address good for storing any data type? (With no alignment in scope :)

The Standard might have chosen to say that the allocated
area shall be suitable for storing any data type whose sizeof
is not too large, but what it actually says is a bit stronger.
Since all alignment requirements are satisfied, the address
returned by malloc() can be stored in any kind of pointer
variable, even on machines where pointers to different types
have different "precision." For example,

double *p = malloc(1); /* assume alignof(double) > 1 */

must not "damage" the returned value by doing something like
discarding low-order bits that a `double*' doesn't need. A
subsequent free(p) must work properly, meaning that it must
be possible to reconstruct any discarded bits.

The "you can store it if it's not too big" criterion would
allow malloc(1) to return an unaligned address, and on a machine
where converting that address to a `double*' would change it,
that would not be a Good Thing.

Perhaps the Standard could have avoided talking about
alignment if it instead imposed an "assignable to any pointer"
requirement. But this isn't the only place the notion crops
up; it also appears as the reason structs might be padded, as
the reason you might not be able to store a double at "just
any" address, and so on. I imagine that the writers would
rather not have mentioned it, but the elephant would still
have been in the drawing room.
Indeed so. If the Standard said, plainly, that an address good for
storing a short is good for storing a char, etc. -- there would be a
tad less confusion and much less fights over definitions.

It *does* say that any object can be accessed as an array
of bytes, hence that any address can store a char. But it doesn't
say that any address suitable for a long is also suitable for a
double, or for a long double, or for a struct foobar -- that would
put the Standard in the position of dictating machine design. The
Standard does a lot of fancy footwork to stay out of the way of
machine designers as much as it can.
I didn't mean that: I meant that a defined hierarchy is all that is
needed, like so:
1. if sizeof(T1) is a multiple of sizeof(T2) then an address good for T1
is good for T2 (for all built-in and all pointer types T1 and T2)
2. an address good for an aggregate type is good for any type of its members
Think about word-addressed machines that use "decorated"
or "expanded" pointers to refer to sub-word entities like
bytes and shorts, and consider the problems of converting
between pointer types on such machines; maybe it will make
more sense then. Or maybe not, which is probably all right,
too: Unless you are writing a memory manager, you seldom
need to "do things" with alignment, just tolerate its effects.
[I think of writing a memory manager as of a platform-specific activity,
so the whole discussion would be not applicable.]
If a pointer to an int is 4 bytes and a pointer to a char is 16 bytes
(with WHATEVER meaning of the bits there), and a pointer to a short is 8
bytes, so what?
A "reconstruction" in (char*)int_ptr would just mean obtaining a pointer
to the "first" "byte" of the int storage (whether the machine is
LOGICALLY considered big-, little- or middle-endian).
The address is always 16 bytes then, even though for some types it can
be represented in 4 or 2 bytes.

-- Ark
 
E

Eric Sosman

Ark Khasin wrote On 08/21/07 13:29,:
I didn't mean that: I meant that a defined hierarchy is all that is
needed, like so:
1. if sizeof(T1) is a multiple of sizeof(T2) then an address good for T1
is good for T2 (for all built-in and all pointer types T1 and T2)
2. an address good for an aggregate type is good for any type of its members

Property (1) holds on the machines I know of (taking
T1 and T2 as "basic" types, not aggregates), but that's no
proof that they hold on all machines past, present, and
future. Remember the days when mass-market CPU's were
integer-only, with an optional extra-cost coprocessor chip
to handle floating-point operations? It's easy to imagine
that `int' and `float' could have the same size on such a
machine, but have different alignment requirements because
they're processed by entirely different silicon, possibly
from different manufacturers. (I don't know whether alignment
requirements actually did differ, but it wouldn't have been
startling to find a difference.)

As far as I can tell, Property (2) *is* required by C.
Think about word-addressed machines that use "decorated"
or "expanded" pointers to refer to sub-word entities like
bytes and shorts, and consider the problems of converting
between pointer types on such machines; maybe it will make
more sense then. Or maybe not, which is probably all right,
too: Unless you are writing a memory manager, you seldom
need to "do things" with alignment, just tolerate its effects.

[I think of writing a memory manager as of a platform-specific activity,
so the whole discussion would be not applicable.]
If a pointer to an int is 4 bytes and a pointer to a char is 16 bytes
(with WHATEVER meaning of the bits there), and a pointer to a short is 8
bytes, so what?
A "reconstruction" in (char*)int_ptr would just mean obtaining a pointer
to the "first" "byte" of the int storage (whether the machine is
LOGICALLY considered big-, little- or middle-endian).
The address is always 16 bytes then, even though for some types it can
be represented in 4 or 2 bytes.

Right -- which means that malloc() cannot return "just
any" address, but only addresses that will not be changed
when stored in any of the possible data pointer types. If
different pointer types have different representations and
different sets of valid values, malloc() must return values
that are valid for all of them.
 
K

Keith Thompson

Ark Khasin said:
[I think of writing a memory manager as of a platform-specific
activity, so the whole discussion would be not applicable.]
[...]

But if it were possible to determine portably (a) the implementation's
strictest alignment, and (b) a type that requires that alignment, then
you *could* write a memory manager (that allocates chunks of a static
array) in portable C.
 
C

CBFalconer

Eric said:
.... snip ...

Think about word-addressed machines that use "decorated"
or "expanded" pointers to refer to sub-word entities like
bytes and shorts, and consider the problems of converting
between pointer types on such machines; maybe it will make
more sense then. Or maybe not, which is probably all right,
too: Unless you are writing a memory manager, you seldom
need to "do things" with alignment, just tolerate its effects.

But that problem does not exist. A void* can be converted to any
type, and any type can be converted to a void* and back, but not to
any other type.
 
P

Peter J. Holzer

Not so, IMHO.

First, let me state that I probably misunderstood you. I thought you
were arguing that addresses *are* numbers and I tried to explain that
they are not.
A "number" is not a collection of bits but a member of a set on which
common algebraic operations are defined. (And it may not matter how it
is represented).

Yes. And you can *interpret* any collection of bits as such a number (in
fact, you aren't restricted to bits: You can interpret any string of
symbols as a number: See for example Gödel numbers for an interesting
use). Then you can perform arithmetic operations on them and get another
number. Whether representing this number as bits and re-interpreting
these bits as whatever the original collection as bits was yields a
useful result is besides the point. (If you interpret a pointer as a
number and multiply it by three and then re-interpret the result as a
pointer, you generally won't get a valid pointer. Similarly, if you
compute the Gödel number of a mathematical theorem and multiply it by
three, you usually won't get the Gödel number of another mathematical
theorem).
There are 4 functions defined on addresses in C:
- typecast, and, for a non-void* type (operation: results in an address)
- adding a number (in the common sense of the word) (operation)
- subtracting a number (operation)
- subtracting two addresses (with a non-address result, so it it is not
an operation)
I don't see how it makes addresses include a number and what that is
supposed to mean

Lets use a concrete example. On the 80286, a (32 bit) pointer consisted
of 4 fields:

* An offset (16 bits)
* A privilege level (2 bits)
* A descriptor table selector (1 bit)
* An index into the descriptor table (13 bits)

As far as the compiler (and functions which care about pointer
internals, such as malloc/free) is concerned, the latter three fields
are not numbers: There is no arithmetic which can be usefully performed
on them.

But the first field, the offset, is an integral number: Pointer
arithmetic is just ordinary integer arithmetic on the offset field, you
can align a pointer by simple arithmetic, etc.

So, on the 80286, a pointer contains a number, but it isn't a number. If
you interpret it as a 32-bit number, the results of many operations
won't be useful.

I don't claim to know all (or even most) architectures on which C
compilers exist, but I'm rather sure that on all of them a pointer
contains a part on which arithmetic can be performed, since otherwise
pointer arithmetic would be rather hard to implement. But as a C
programmer I (normally) don't know or care where that part is or how it
is encoded.

hp
 
E

Eric Sosman

CBFalconer wrote On 08/21/07 16:57,:
Eric Sosman wrote:

... snip ...



But that problem does not exist. A void* can be converted to any
type, and any type can be converted to a void* and back, but not to
any other type.

Right, but 6.3.2.3p7 only guarantees the round-trip if
alignment requirements are satisfied throughout:

"[...] If the resulting pointer is not correctly
aligned for the pointed-to type, the behavior is
undefined. Otherwise, when converted back again,
the result shall compare equal to the original
pointer. [...]"

Note that the U.B. doesn't require dereferencing the
mis-aligned pointer; it occurs on the attempt to convert
a value that is invalid for the destination pointer type.
The following need not work:

double array[2]; /* assume "alignof(double)" > 1 */
char *cp = (char*) array;
double *dp = (double*)(cp + 1); /* U.B. here */

It's still U.B. even if we use void* to launder it:

double array[2]; /* same assumption */
char *cp = (char*) array;
void *vp = cp + 1;
double *dp = vp; /* U.B. here */

The reason it's undefined -- well, the "reason" is that
the Standard says so, but the practical reason -- is that
`double*' may not have or preserve enough bits to encode
all the information present in an arbitrary `void*'. A
`double*' needs to be able to store enough information to
point at any location where a `double' might live, but is
not required to be able to point at other locations.
 
K

Keith Thompson

Peter J. Holzer said:
Lets use a concrete example. On the 80286, a (32 bit) pointer consisted
of 4 fields:

* An offset (16 bits)
* A privilege level (2 bits)
* A descriptor table selector (1 bit)
* An index into the descriptor table (13 bits)

As far as the compiler (and functions which care about pointer
internals, such as malloc/free) is concerned, the latter three fields
are not numbers: There is no arithmetic which can be usefully performed
on them.

But the first field, the offset, is an integral number: Pointer
arithmetic is just ordinary integer arithmetic on the offset field, you
can align a pointer by simple arithmetic, etc.

So, on the 80286, a pointer contains a number, but it isn't a number. If
you interpret it as a 32-bit number, the results of many operations
won't be useful.

I don't claim to know all (or even most) architectures on which C
compilers exist, but I'm rather sure that on all of them a pointer
contains a part on which arithmetic can be performed, since otherwise
pointer arithmetic would be rather hard to implement. But as a C
programmer I (normally) don't know or care where that part is or how it
is encoded.

C pointers do behave in a number-like fashion in some ways, but the
semantics defined by the C standard do not require pointers to
*contain* numbers. Adding an integer to a pointer, for example, may
involve adding the integer to the integer-like part of the pointer
(which may be an offset field or the entire pointer), but it may not.
In particular, there's no standard way to *extract* a numeric value
from a pointer.

It's likely to be as you describe for all real-world implementations,
but I'll bet the DS9K does something quite tricky (and conforming, of
course) to avoid storing any meaningful numeric data in its pointers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top