'Portable' Measurement of Pointer Alignment in C?

B

Brian Gladman

A lot of low level cryptographic code and some hardware cryptographic
accelerators either fail completely or perform very poorly if their input,
output and/or key storage areas in memory are not aligned on specific memory
boundaries. Moreover, in many situations the cryptographic code does not
itself have control over the memory alignment of its parameters so the best
it can do is to detect if these aligmments are correct and act accordingly.

This hence rasises the question of the most appropriate way of determining
the memory alignment of the memory referenced by a pointer in C. Here I am
less interested in the 'political' correctness of C code but rather in which
methods are most likely to work in practice on the highest proportion of
widely deployed processors and C compilers.

For example, when 'x' is a pointer of some kind, 'n' is a power of two and
'pint' is a pointer sized integer type, on what proportion of systems will:

#define PTR_OFFSET(x,n) (((pint)x) & ((n) - 1))

return the correct memory alignment of 'x ' from an 'n' byte boundary? Is:

#define PTR_OFFSET(x,n) (((char*)(x) - (char*)0) & ((n) - 1))

any better (or worse)? Or, on systems that allow for the declaration of
aligned variables, is:

declare_aligned(n) type var;
#define PTR_OFFSET(x,n) (((char*)(x) - (char*)&var) & ((n) - 1))

any better? To what extent is this likely to depend on declaring 'var' in
the same way that 'x' may have been declared (static, dynamic, auto, ...)?
And on what proportion of 'current' systems is a flat memory model used
anyway?

I would much appreciate knowing of any experiences that people here have on
the practical portability of ways to determine the physical alignment of the
memory referenced by a pointer when this pointer is outside the control of
the software in which this alignment has to be determined.

Brian Gladman
 
E

Eric Sosman

Brian Gladman wrote On 03/26/07 13:09,:
A lot of low level cryptographic code and some hardware cryptographic
accelerators either fail completely or perform very poorly if their input,
output and/or key storage areas in memory are not aligned on specific memory
boundaries. Moreover, in many situations the cryptographic code does not
itself have control over the memory alignment of its parameters so the best
it can do is to detect if these aligmments are correct and act accordingly.

This hence rasises the question of the most appropriate way of determining
the memory alignment of the memory referenced by a pointer in C. Here I am
less interested in the 'political' correctness of C code but rather in which
methods are most likely to work in practice on the highest proportion of
widely deployed processors and C compilers.

For example, when 'x' is a pointer of some kind, 'n' is a power of two and
'pint' is a pointer sized integer type, on what proportion of systems will:

#define PTR_OFFSET(x,n) (((pint)x) & ((n) - 1))

return the correct memory alignment of 'x ' from an 'n' byte boundary? Is:

99.44% (keep in mind that 82.6% of quoted statistics are
made up out of thin air).
#define PTR_OFFSET(x,n) (((char*)(x) - (char*)0) & ((n) - 1))

any better (or worse)?

Marginally worse, I think. Note that (char*)0 is (char*)NULL,
which has a faint chance of misbehaving in the subraction, in ways
that the direct conversion to an integer might not. Even if it's
equally safe and effective, it's longer-winded, brevitywisespeaking.
Or, on systems that allow for the declaration of
aligned variables, is:

declare_aligned(n) type var;
#define PTR_OFFSET(x,n) (((char*)(x) - (char*)&var) & ((n) - 1))

any better?

I don't think so, and it's certainly uglier.
To what extent is this likely to depend on declaring 'var' in
the same way that 'x' may have been declared (static, dynamic, auto, ...)?

I think it unlikely that there'll be a dependency. C's pointers
don't discriminate among the storage classes of the things they
point at; the same char* can point at auto, static, dynamic, and
even read-only data at different times during the execution of the
same program. Furthermore, the underlying platform needs to re-use
the same locations for different purposes in different programs, so
the designers have an incentive to build memories that are "regular."
And on what proportion of 'current' systems is a flat memory model used
anyway?

99.44%, as before (and with the same caveat).

Also, it depends on what you mean by "flat." Imagine a 64-bit
pointer carrying a 56-bit address and an 8-bit "privilege tag" or
"ring identifier" in the high-order positions (when converted to
an integer). Would you call the memory model "flat?" Now move
the 8-bit tag to the low-order positions; still "flat?" The
tests you're attempting will work with one format but get confused
by the other, yet the memory addressing seems pretty much the same
in both ... Perhaps "flatness" isn't really the right question.
I would much appreciate knowing of any experiences that people here have on
the practical portability of ways to determine the physical alignment of the
memory referenced by a pointer when this pointer is outside the control of
the software in which this alignment has to be determined.

A straightforward cast to uintptr_t (if you have it, or to
unsigned long or size_t if you don't) seems at least as likely
to work as any of the fancier alternatives. Keep in mind that
the "practical portability" is already somewhat strained: You are
trying to detect bad alignments because you have sub rosa knowledge
of what's "good" and "bad" and what the consequences are, and that
knowledge is itself non-portable. A little casting and masking is
not going to make things noticeably worse.

Finally, portability is just one desirable characteristic code
can have. There are others (correctness, speed, debuggability,
time to market, ...) and some may in some cases be more important
than portability. Don't discard portability needlessly, but don't
pursue it beyond reason. ("Reason" is situational, of course.)
 
B

Brian Gladman

Eric Sosman said:
Brian Gladman wrote On 03/26/07 13:09,:
[snip]
For example, when 'x' is a pointer of some kind, 'n' is a power of two
and
'pint' is a pointer sized integer type, on what proportion of systems
will:

#define PTR_OFFSET(x,n) (((pint)x) & ((n) - 1))

return the correct memory alignment of 'x ' from an 'n' byte boundary?
Is:

99.44% (keep in mind that 82.6% of quoted statistics are
made up out of thin air).

Such precise figures make me suspicious :)
Marginally worse, I think. Note that (char*)0 is (char*)NULL,
which has a faint chance of misbehaving in the subraction, in ways
that the direct conversion to an integer might not. Even if it's
equally safe and effective, it's longer-winded, brevitywisespeaking.


I don't think so, and it's certainly uglier.

I was hoping that this might be more reliable than the other two even though
it is certainly uglier.
I think it unlikely that there'll be a dependency. C's pointers
don't discriminate among the storage classes of the things they
point at; the same char* can point at auto, static, dynamic, and
even read-only data at different times during the execution of the
same program. Furthermore, the underlying platform needs to re-use
the same locations for different purposes in different programs, so
the designers have an incentive to build memories that are "regular."


99.44%, as before (and with the same caveat).

Also, it depends on what you mean by "flat." Imagine a 64-bit
pointer carrying a 56-bit address and an 8-bit "privilege tag" or
"ring identifier" in the high-order positions (when converted to
an integer). Would you call the memory model "flat?" Now move
the 8-bit tag to the low-order positions; still "flat?" The
tests you're attempting will work with one format but get confused
by the other, yet the memory addressing seems pretty much the same
in both ... Perhaps "flatness" isn't really the right question.

You are right, I was being too restrictive in using the term 'flat'.

All I really need to know for some 'reasonable' value of N (not less than,
say, 8) is what proportion of systems are such that the least significant N
bits of all pointers to data in memory are the same as the least significant
N bits of the physical memory addresses to which they are mapped.
A straightforward cast to uintptr_t (if you have it, or to
unsigned long or size_t if you don't) seems at least as likely
to work as any of the fancier alternatives. Keep in mind that
the "practical portability" is already somewhat strained: You are
trying to detect bad alignments because you have sub rosa knowledge
of what's "good" and "bad" and what the consequences are, and that
knowledge is itself non-portable. A little casting and masking is
not going to make things noticeably worse.

Finally, portability is just one desirable characteristic code
can have. There are others (correctness, speed, debuggability,
time to market, ...) and some may in some cases be more important
than portability. Don't discard portability needlessly, but don't
pursue it beyond reason. ("Reason" is situational, of course.)

I willingly accept your advice - I ask about this precisely because I am
hoping to achieve a reasonable balance between portability and practical
utility.

I am also asking because those who I have consulted on the alternatives have
provided very different answers on which is preferable (both for portability
and for other reasons). I hence wanted to consult as many people as
possible to see if any concensus might emerge.

Thanks for your input, which I much appreciate.

Brian Gladman
 
K

Keith Thompson

Brian Gladman said:
A lot of low level cryptographic code and some hardware
cryptographic accelerators either fail completely or perform very
poorly if their input, output and/or key storage areas in memory are
not aligned on specific memory boundaries. Moreover, in many
situations the cryptographic code does not itself have control over
the memory alignment of its parameters so the best it can do is to
detect if these aligmments are correct and act accordingly.

This hence rasises the question of the most appropriate way of
determining the memory alignment of the memory referenced by a pointer
in C. Here I am less interested in the 'political' correctness of C
code but rather in which methods are most likely to work in practice
on the highest proportion of widely deployed processors and C
compilers.

For example, when 'x' is a pointer of some kind, 'n' is a power of two
and 'pint' is a pointer sized integer type, on what proportion of
systems will:

#define PTR_OFFSET(x,n) (((pint)x) & ((n) - 1))

return the correct memory alignment of 'x ' from an 'n' byte boundary?
[snip]

In other words, does examining the low-order bits of (the
representation of) a pointer tell you how it's aligned?

On most systems, this should work, but I have worked on systems where
it wouldn't (Cray vector systems).

Something you might consider is testing whether this works before you
depend on it. For example, declare an array of, say, 8 characters,
take the address of each one, and examine the low-order bits of each
address. On most systems, you should find that the low-order bits
have the values 0, 1, 2, 3, 4, 5, 6, 7 (possibly in some rotated order
such as 4, 5, 6, 7, 0, 1, 2, 3 if the character array isn't 8-byte
aligned). (On a Cray vector system, all 8 pointers would have the
same low-order 3 bits, and your method wouldn't work.)

Or you can use some system-specific method, such as
implementation-defined macros, to determine what kind of system you're
on, and use your knowledge of that system to determine how pointers
behave (the common "twisty maze of ifdefs" approach).
 
S

Stephen Sprunk

Brian Gladman said:
You are right, I was being too restrictive in using the term 'flat'.

All I really need to know for some 'reasonable' value of N (not less
than, say, 8) is what proportion of systems are such that the least
significant N bits of all pointers to data in memory are the same
as the least significant N bits of the physical memory addresses
to which they are mapped.

Example: x86 real mode (e.g. DOS) has N==4, but x86 protected mode (e.g.
Windows, anything UNIXish) has N==12. All the RISCs I know are going to
have N>8 as well. That likely covers the vast majority of your target
market.

In general, N will be reasonable on any system that is "paged", but it will
be unreasonable or even zero on systems that are "segmented". However, I'd
suggest reposting the above question on comp.arch, since those folks are
going to be more familiar with the virtual address-to-physical address
mappings on a variety of strange systems. We only discuss such oddities
here to the extent needed to explain to people why certain things are
unportable (and Crays, AS/400s, and the DS9k cover that need adequately).

Keith's trick may be useful if you can do the sanity test at runtime, but I
expect that will be too expensive for you if you're trying to write
high-performance encryption code.

S
 
B

Brian Gladman

Keith Thompson said:
[snip]
For example, when 'x' is a pointer of some kind, 'n' is a power of two
and 'pint' is a pointer sized integer type, on what proportion of
systems will:

#define PTR_OFFSET(x,n) (((pint)x) & ((n) - 1))

return the correct memory alignment of 'x ' from an 'n' byte boundary?
[snip]

In other words, does examining the low-order bits of (the
representation of) a pointer tell you how it's aligned?

On most systems, this should work, but I have worked on systems where
it wouldn't (Cray vector systems).

Something you might consider is testing whether this works before you
depend on it. For example, declare an array of, say, 8 characters,
take the address of each one, and examine the low-order bits of each
address. On most systems, you should find that the low-order bits
have the values 0, 1, 2, 3, 4, 5, 6, 7 (possibly in some rotated order
such as 4, 5, 6, 7, 0, 1, 2, 3 if the character array isn't 8-byte
aligned). (On a Cray vector system, all 8 pointers would have the
same low-order 3 bits, and your method wouldn't work.)

Thanks, that's something worth thinking about.

As a matter of interest, would either of the other methods I mentioned work
on the Cray vector system?
Or you can use some system-specific method, such as
implementation-defined macros, to determine what kind of system you're
on, and use your knowledge of that system to determine how pointers
behave (the common "twisty maze of ifdefs" approach).

I already do plenty of that in working out endianness :)

But I am not sure it would be easy to collect the information needed to fill
out the branches.

Anyway, it is useful to know that the simple technique may be the most
practical approach since I had felt that this was possibly the least
portable of the possibilities that I mentioned.

Thanks for your input.

Brian Gladman
 
C

christian.bau

A lot of low level cryptographic code and some hardware cryptographic
accelerators either fail completely or perform very poorly if their input,
output and/or key storage areas in memory are not aligned on specific memory
boundaries. Moreover, in many situations the cryptographic code does not
itself have control over the memory alignment of its parameters so the best
it can do is to detect if these aligmments are correct and act accordingly.

This hence rasises the question of the most appropriate way of determining
the memory alignment of the memory referenced by a pointer in C. Here I am
less interested in the 'political' correctness of C code but rather in which
methods are most likely to work in practice on the highest proportion of
widely deployed processors and C compilers.

For example, when 'x' is a pointer of some kind, 'n' is a power of two and
'pint' is a pointer sized integer type, on what proportion of systems will:

#define PTR_OFFSET(x,n) (((pint)x) & ((n) - 1))

return the correct memory alignment of 'x ' from an 'n' byte boundary? Is:

#define PTR_OFFSET(x,n) (((char*)(x) - (char*)0) & ((n) - 1))

any better (or worse)? Or, on systems that allow for the declaration of
aligned variables, is:

declare_aligned(n) type var;
#define PTR_OFFSET(x,n) (((char*)(x) - (char*)&var) & ((n) - 1))

any better? To what extent is this likely to depend on declaring 'var' in
the same way that 'x' may have been declared (static, dynamic, auto, ...)?
And on what proportion of 'current' systems is a flat memory model used
anyway?

I would much appreciate knowing of any experiences that people here have on
the practical portability of ways to determine the physical alignment of the
memory referenced by a pointer when this pointer is outside the control of
the software in which this alignment has to be determined.

Assume you are given a pointer char* p, you know that p-1 to p-15 are
all valid, and you wish to create a pointer q which is equal to one of
p, p-1, p-2 ... p-15 such that q is "16 byte aligned", whatever that
means.

unsigned long delta = ((unsigned long) p) & 0x0f;
char* q = p - delta;

will set q to one of p, p-1, ..., p-15, no matter what the result of
the conversion from char* to unsigned long is. On many machines, the
result will be 16 byte aligned. You can check whether (unsigned long)
q == ((unsigned long) p) - delta. On your usual x86, PowerPC, or ARM
processor that will be the case. If it is not the case, you know your
code is running on a more interesting machine. On the Deathstation
9000, you won't find this happening - but your customers will :)
 
P

Peter Nilsson

Brian Gladman said:
A lot of low level cryptographic code and some hardware
cryptographic accelerators either fail completely or perform
very poorly if their input, output and/or key storage areas in
memory are not aligned on specific memory boundaries. Moreover,
in many situations the cryptographic code does not itself have
control over the memory alignment of its parameters

Why should it?
so the best it can do is to detect if these aligmments are
correct and act accordingly.

Why should it bother?
This hence rasises the question of the most appropriate way of
determining the memory alignment of the memory referenced by a
pointer in C.

No, it raises the question of how the calling code can supply
aligned data. The answer to that is through malloc.
Here I am less interested in the 'political' correctness of C
code but rather in which methods are most likely to work in
practice on the highest proportion of widely deployed processors
and C compilers.

In other words you don't give a rat's about portable programming. ;)
Your loss really.

You should list what _you_ call 'widely deployed processors'
and ask in the appropriate forums. Note that embedded chips are
more widely deployed than desktop cpus.
<snip> ... on what proportion of systems will: ...
<snip> ... on systems that allow for the declaration of
aligned variables, ...
<snip> ... To what extent is this likely to depend on...
<snip> ... on what proportion of 'current' systems is...
I would much appreciate knowing of any experiences that people
here have on the practical portability

No you're not. You've clearly stated you'd prefer a non-portable
solution.

You're looking for a fix. But it's not a fix to the actual problem,
it's a fix to a poor solution.

Another poor option, but one which would be more portable, is
to copy the input to malloc-ed memory and copy that to the
output pointed memory. The intermidiary buffer will properly
aligned and you'll use the maximally portable memcpy function
to do the transfers.
 
C

CBFalconer

Brian said:
.... snip ...

I am also asking because those who I have consulted on the
alternatives have provided very different answers on which is
preferable (both for portability and for other reasons). I
hence wanted to consult as many people as possible to see if
any concensus might emerge.

Whatever you do, I consider it important that you have run-time
means of detecting that it won't work.

--
Some useful references about C:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://www.eskimo.com/~scs/C-faq/top.html> (C-faq)
<http://benpfaff.org/writings/clc/off-topic.html>
<http://anubis.dkuug.dk/jtc1/sc22/wg14/www/docs/n869/> (C99)
<http://www.dinkumware.com/refxc.html> (C-library}
<http://gcc.gnu.org/onlinedocs/> (GNU docs)
<http://clc-wiki.net/wiki/C_community:comp.lang.c:Introduction>
 
B

Brian Gladman

Peter Nilsson said:
Why should it?

It shouldn't necessarily. Nor did I claim, or even imply, that it should.
Why should it bother?

Because this helps to avoid application failures or preserve an
application's performance in a number of practical situations.
No, it raises the question of how the calling code can supply
aligned data. The answer to that is through malloc.

In some practical situiations it is not possible for the calling code to use
malloc. In other cases, even though buffer base addresses have been aligned,
it may not be easy (or even possible) to ensure the alignment of the
addresses passed to the cryptographic code since encryption may be needed on
only the upper part of an allocated buffer and timing constraints may
prevent realignment by copying. It may hence be preferable to accept slower
encryption in such cases provided that aligned cases proceed at full speed.

The most appropriate coding level at which alignment issues should be
handled is dependent on the specific nature of the application in question.
Nevertheless it is certainly true that where alignment is needed (or highly
desirable), this should be specified as a part of the API for the
cryptographic code in question. I take this as read.

But even in such situations I have many real examples of applications that
have failed in unpredictable ways because this requirement has not been met
even though it has been clearly stated. In such cases, the detection of
misaligned addresses can provide for controlled failures in place of the
unpredictable and uncontrolled failures that could otherwise result.

This is known as 'defensive programming' and is an important aspect of the
design for secure information systems. Asking others to only use aligned
pointers in calls and then not checking that they have done this is
_certain_ to result in unpredictable applications failures if the
cryptographic code in question is widely deployed. You may be comfortable
in being able to claim that this is someone else's fault but I am not.
In other words you don't give a rat's about portable programming. ;)
Your loss really.

You should list what _you_ call 'widely deployed processors'
and ask in the appropriate forums. Note that embedded chips are
more widely deployed than desktop cpus.

Which is exactly why I am happy with the way I phrased the question (the
answer is easy for desktop cpus).
No you're not. You've clearly stated you'd prefer a non-portable
solution.

Your claim about my lack of sincerity in asking this question reflects more
on you than it does on me.
You're looking for a fix. But it's not a fix to the actual problem,
it's a fix to a poor solution.

You are wrong. You evidently lack the practical experience needed to fully
understand the issue that I have raised.

Brian Gladman
 
B

Brian Gladman

christian.bau said:
Assume you are given a pointer char* p, you know that p-1 to p-15 are
all valid, and you wish to create a pointer q which is equal to one of
p, p-1, p-2 ... p-15 such that q is "16 byte aligned", whatever that
means.

unsigned long delta = ((unsigned long) p) & 0x0f;
char* q = p - delta;

Thanks for this suggestion, Christian, I think it is quite close to what I
am currently using:

#define PTR_OFF(x) ((size_t)(x))
#define ALIGN_OFFSET(x,n) (PTR_OFF(x) & ((n) - 1))
#define ALIGN_FLOOR(x,n) ((unsigned char*)(x) - ( PTR_OFF(x) & ((n) - 1)))
#define ALIGN_CEIL(x,n) ((unsigned char*)(x) + (-PTR_OFF(x) & ((n) -
1)))

where 'x' is a pointer and 'n' is the (power of two) alignment that I need.
will set q to one of p, p-1, ..., p-15, no matter what the result of
the conversion from char* to unsigned long is. On many machines, the
result will be 16 byte aligned. You can check whether (unsigned long)
q == ((unsigned long) p) - delta. On your usual x86, PowerPC, or ARM
processor that will be the case. If it is not the case, you know your
code is running on a more interesting machine. On the Deathstation
9000, you won't find this happening - but your customers will :)

Thanks for this suggestion - I don't do this right now but it is certainly
something I will look at.

When my code encounters the inevitable Deathstation 9000 I suspect I will
have more to worry about than mere pointer alignment :)

Thank you for your helpful comments

Brian Gladman
 
B

Brian Gladman

Stephen Sprunk said:
Example: x86 real mode (e.g. DOS) has N==4, but x86 protected mode (e.g.
Windows, anything UNIXish) has N==12. All the RISCs I know are going to
have N>8 as well. That likely covers the vast majority of your target
market.

In general, N will be reasonable on any system that is "paged", but it
will be unreasonable or even zero on systems that are "segmented".
However, I'd suggest reposting the above question on comp.arch, since
those folks are going to be more familiar with the virtual
address-to-physical address mappings on a variety of strange systems. We
only discuss such oddities here to the extent needed to explain to people
why certain things are unportable (and Crays, AS/400s, and the DS9k cover
that need adequately).

Keith's trick may be useful if you can do the sanity test at runtime, but
I expect that will be too expensive for you if you're trying to write
high-performance encryption code.

Thanks for your input, Stephen, your assessment is most helpful.

In some cases I need the input, output and key schedule buffers to be
aligned but some manufacturers of cryptographic add-ons have recognised the
problems of insisting on the alignment of encryption and decryption buffers
and have removed this constraint whilst still insisting that the key
schedule buffer is fully aligned.

In this case the key schedule code may be called only infrequently when
compared with the encryption or decryption code and this would allow more
time to be available for alignment tests (or even correction) without a
major impact on overall system performance. So some of the ideas that have
been suggested my well prove useful in such situations.

Thanks again for your input.

Brian Gladman
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,059
Latest member
cryptoseoagencies

Latest Threads

Top