On alignment (final committee draft for C++0x and n1425 for C1X)

G

Gennaro Prota

NOTE:

This is multi-posted. However this newsgroup (comp.lang.c++)
is where the discussion is meant to happen.

The message has been posted to comp.std.c++, comp.std.c and
comp.lang.c++, with suitable notices (a different notice for
each group).

Here's why:

the message was originally intended for comp.std.c++ only;
then I noticed that the wording it refers to was basically
copied from a C1X draft, so I cross-posted it to the two
".std." groups. But the comp.std.c++ software auto-rejected
it, on the grounds that this is difficult to handle.

Furthermore, since these days comp.std.c++ has an unbelievably
high latency the only way I could think of to make the
discussion possible was to set the follow-ups to a low-latency
group. I apologize, it's probably the Usenet hack of the year,
and I'm not proud of it, but I really couldn't think how else
to manage it (if you have better ideas, feel free to tell).

In any case, beware that the message is geared towards C++,
including the terminology and the references to the standard.
----------------------------------------------------------------


I was reading the Alignment paragraph ([basic.align]) in the FCD
for C++0x and was really, really perplexed.

In particular I couldn't find an answer to this question:

a) is "alignment" a function of the type (over the set of
complete object types [less, perhaps, array types])? Or can two
instances of the same type have different alignments?

(Note that in the question above "complete" refers to types, not
objects (parse it as "complete types that are object types").
Non-complete objects, i.e. sub-objects, do enter in the picture.
In particular I was looking for a guarantee that given e.g.

void f() {
T t ;
}
struct C {
char c ;
T t2 ;
} ;

the object t and the subobject t2 in an instance of C would have
the same alignment.)

Here are some sentences that I found particularly perplexing:

--

Furthermore, the types char, signed char, and unsigned char
shall have the weakest alignment requirement.

That is? Just 1, no? I was thinking (before reading the
paragraph) that since sizeof( T ) must be a multiple of the
alignment on every object, and since by (a) (if it holds) the
alignment of the type is that of any object, it was guaranteed
that align( char ) == 1.

--

An aligment [sic] is an implementation-defined integer value
representing the number of bytes between successive addresses
at which a given object can be allocated.

Minimum positive number? (Among other things, if one doesn't
make it (existing and) unique I don't even see how one can use
the definite article "the".)

<note>
Note, too, that this definition (or pseudo such) doesn't imply
that the numerical address is a multiple of the alignment:
think e.g. of alignment = 4 and the invented addresses 7, 11,
15 (as opposed to 8, 12, 16).

One might thing that talking of addresses as numbers
("multiples of") is problematic in the context of the standard
specification, but note that the above is basically talking
about the difference of two arbitrary pointers, which isn't
defined in general, either.
</note>


And is it a function of the type or not? alignof is applicable
to a type-id and its description says "An alignof expression
yields the alignment requirement of its operand *type*".

(But why "alignment requirement" rather than just "alignment"?)

Also, consider:

char c [[ align( 4 ) ]] ;
static_assert( alignof( c ) == 1, "" ) ; // intentional?

(I think this is OK: the attribute applies to the declaration,
thus to the particular object c, not the type. I'm asking just
because I seem to recall a gcc patch where the author assumed
that alignof worked like their __alignof__. But then, their
__alignof__ may also yield different values for a standalone
double than for a double in a struct, at least on some targets.
Again we are at the "is a function of the type" issue.)


--

Alignments are represented as values of the type std::size_t

That is? I thought they *were* numbers. And, at this stage,
alignof hasn't been introduced yet, so what's the point of
bringing in std::size_t? Aren't we talking of integers in the
mathematical sense?

--

A fundamental alignment is represented by an alignment less
than or equal...

An alignment is represented by an alignment?

Guys, please, consider that we need definitions, here, not
novels. If you have to explain what a fundamental alignment *is*
just say "a fundamental alignment is"; or something like "an
alignment is said to be "fundamental" if and only if...". (Note
that there's a "representing the number of bytes" above, too.
Just a little more acceptable than this one.)

In case you are wondering: yes, these things make me angry. They
waste everyone's time and mental energies.

--

Alignments have an order from weaker to stronger or stricter
alignments. Stricter alignments have larger alignment values.
An address that satisfies an alignment requirement also
satisfies any weaker valid alignment requirement.

Again, vagueness. Couldn't you just have said e.g.:

given two alignments a1 and a2 (a1 > 0, a2 > 0):

- a1 is said to be weaker than a2 if and only if a1 is a
proper integer submultiple of a2

- a1 is said to be stronger, or stricter, than a2 if and
only if a2 is weaker than a1

About this matter, I also found the following example in
7.6.2/7:

[Example: An aligned buffer with an alignment requirement of A
and holding N elements of type T other than char, signed char,
or unsigned char can be declared as:

T buffer [[ align(T), align(A) ]] [N];

Specifying align(T) in the attribute-list ensures that the
final requested alignment will not be weaker than alignof(T),
and therefore the program will not be ill-formed. —end example
]

I thought that such a thing would require a minimum alignment
that was the lcm of align( T ) and A.

Hmm, I think I found the key: it's /assumed/ that any valid
alignment is a power of 2 with a non-negative integer exponent;
but where is such a requirement?

--

Valid alignments include only those values returned by an
alignof expression for the fundamental types plus an
additional implementation-defined set of values which may be
empty.

What's the point of this if there's no requirement for the set
to be finite, or to contain PODs only, or to satisfy any
particular property? As I see it, this is just saying that it's
implementation-defined what alignments are valid, and that
alignof shall only yield valid alignments.


A PROPOSED, PROVISIONAL, NEW WORDING
------------------------------------

Here's some provisional wording which I think solves the
problems above. With this in place the paragraph about the
alignment attribute and the alignof operator would only need
minor tweaks.

NOTE: Just because of ASCII limitations, I use "!=" for "not
equal to" and "**" for "raised to".

For each implementation, there exists a mathematical function

align: S -> V

defined on the set S of all and only the complete types that are
object types but not array types. Its codomain V contains only
powers of two with an integral non-negative integer exponent.

For every t belonging to S, align(t) is the greatest a=2**k,
with k being a non-negative integer, such that

- all addresses at which instances of t can be placed are
exact multiples of a and

- it's possible for the implementation to place some instances
of t at an address which is *not* a multiple of 2a.
[footnote: Thus, for instance, an implementation which
places all instances of t to addresses multiple of 8 cannot
"lie" and just consider the alignment of the type to be four
on the ground that any multiple of 8 is also a multiple of
4. --endfootnote]

[NOTE: although there doesn't necessarily exist a way for the
program to check whether an address is a multiple of a given
integer, this is intended to be unsurprising to those who know
the addressing structure of the underlying machine. And when an
integral type Int large enough exists, it is intended that
reinterpret_cast< Int >( address ) % n == 0 has the expected
truth value.]

Note that, due to the power-of-two requirement, the following
property trivially holds: given two values in V, a1 and a2, a1
is a submultiple of a2 if and only if a1 <= a2; or,
equivalently, if and only if log2(a1) <= log2(a2).

Also, the least common multiple of two alignments is just the
greatest of them.

By definition, an alignment a1 is said to be "stricter" (or
"stronger") than a2 if and only if a2 != a1 and a2 is a
submultiple of a1.

Likewise, by definition, a1 is said to be "weaker" than a2 <=>
a2 is stricter than a1.

Let t0 be a type in the domain of align and arr an array
thereof, with at least two elements: since two consecutive
elements of arr have each an address multiple of align(t0) then
the positive difference (i.e. the difference from the address of
the later one), which is sizeof(t0), is a multiple of align(t0),
too. That is:

- for any type in S, align(t) is a submultiple of sizeof(t).

In particular, align( char ) is 1.
 
G

Gennaro Prota

@speranza.aioe.org:
[...]
Here's some provisional wording which I think solves the
problems above. [...]
For every t belonging to S, align(t) is the greatest a=2**k,
with k being a non-negative integer, such that

- all addresses at which instances of t can be placed are
exact multiples of a and

But on a very common hardware platform (Intel) one can use misaligned data,
and sometimes this comes quite handy, e.g. when processing PKZIP file
headers. I am not sure, but I think all the jumble-mumble in the draft
might be an disguised attempt to make this legal. Or not?

Well, I don't think that this wording makes a difference in this
respect. You can still apply an alignment attribute to an object
declaration (giving to that object an alignment which is
different from the "alignment of the type"), or play with
reinterpret_cast<>.

Did you have something specific in mind which makes the two
definition approaches ("number of bytes between" vs. "address
multiple of") different?
 
L

Larry Evans

On 08/19/10 19:52, Gennaro Prota wrote:
[snip]
Note that, due to the power-of-two requirement, the following
property trivially holds: given two values in V, a1 and a2, a1
is a submultiple of a2 if and only if a1<= a2; or,
equivalently, if and only if log2(a1)<= log2(a2).

Also, the least common multiple of two alignments is just the
greatest of them.
Would not the "extended alignments" mentioned in paragraph 3
on page 3 of:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2165.pdf
require using lcm, as shown here:

http://svn.boost.org/svn/boost/sand...boost/composite_storage/alignment/compose.hpp
?
[snip]

Larry
 
G

Gennaro Prota

@speranza.aioe.org:
[...]
Here's some provisional wording which I think solves the
problems above.
[...]
For every t belonging to S, align(t) is the greatest a=2**k,
with k being a non-negative integer, such that

- all addresses at which instances of t can be placed are
exact multiples of a and

But on a very common hardware platform (Intel) one can use misaligned
data, and sometimes this comes quite handy, e.g. when processing
PKZIP file headers. I am not sure, but I think all the jumble-mumble
in the draft might be an disguised attempt to make this legal. Or
not?

Well, I don't think that this wording makes a difference in this
respect. You can still apply an alignment attribute to an object
declaration (giving to that object an alignment which is
different from the "alignment of the type"), or play with
reinterpret_cast<>.

Did you have something specific in mind which makes the two
definition approaches ("number of bytes between" vs. "address
multiple of") different?

No, not really. I just thought you are attempting to define the alignment
in terms of hardware ("can be placed ..."), but this loses meaning on the
hardware where any data *can* be placed at any address (like Intel x86).

Ah, I see where you're coming from. It was meant as "can be
placed by the C++ implementation". The current wording uses this
same expression, with --AFAICS-- the same "by the C++
implementation" implication. It's up to the implementation to
define the align function and it may well make it the constant
function whose only value if 1.

Anyway:

something like "all objects to which no /alignment attribute
that says otherwise/ applies *will have* an address which is a
multiple..." probably works better.

Example:

// if align( double ) is 4

void f()
{
double d ; // address will be multiple of 4
}

struct [[ align( 2 ) ]] Pod
{
double m ;
} ;

void g()
{
Pod p ; // p.m has the same address as p, thus not
// necessarily a multiple of 4 (the attribute on the
// declaration of struct Pod indirectly "applies" (= has an
// effect on) to p.m)
}
 
G

Gennaro Prota

On 08/19/10 19:52, Gennaro Prota wrote:
[snip]
Note that, due to the power-of-two requirement, the following
property trivially holds: given two values in V, a1 and a2, a1
is a submultiple of a2 if and only if a1<= a2; or,
equivalently, if and only if log2(a1)<= log2(a2).

Also, the least common multiple of two alignments is just the
greatest of them.
Would not the "extended alignments" mentioned in paragraph 3
on page 3 of:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2165.pdf
require using lcm

Sorry for the late reply. I haven't read the paper but the issue
is purely mathematical: if S is a set of powers of two with a
non-negative integer exponent then lcm( S ) and max( S ) are the
same number.
 
L

Larry Evans

On 21/08/2010 18.22, Larry Evans wrote: [snip]
On 08/19/10 19:52, Gennaro Prota wrote:
[snip]
Note that, due to the power-of-two requirement, the following
property trivially holds: given two values in V, a1 and a2, a1
is a submultiple of a2 if and only if a1<= a2; or,
equivalently, if and only if log2(a1)<= log2(a2).

Also, the least common multiple of two alignments is just the
greatest of them.
Would not the "extended alignments" mentioned in paragraph 3
on page 3 of:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2165.pdf
require using lcm

Sorry for the late reply. I haven't read the paper but the issue
is purely mathematical: if S is a set of powers of two with a
non-negative integer exponent then lcm( S ) and max( S ) are the
same number.
However, I think I remember reading somewhere that an extended alignment
could be something other than a power of 2, and that
was why lcm would be required. Sorry, I've been looking for
the example for the last few minutes but have been unable to
find it. I did find:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1447.htm

which *proposes* to restrict alignments to power of 2 values;
however, I don't know if that's been accepted.
 
L

Larry Evans

On 08/25/10 09:28, Larry Evans wrote:
[snip]
However, I think I remember reading somewhere that an extended alignment
could be something other than a power of 2, and that
was why lcm would be required. Sorry, I've been looking for
the example for the last few minutes but have been unable to
find it.

After several more minutes looking, I still couldn't find
any document examples showing other than power of 2 alignments.

Maybe I just imagined it :(

Sorry for noise.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top