Portable (??) pointer alignment

N

Noob

Hello,

I'm trying to write portable (as much as I can) alignment code.
Basically, I'm writing my own memalign function.

Function prototype:

void *aligned_malloc(size_t boundary, size_t size)

boundary shall be a power of two, and shall not be smaller
than sizeof(void *) otherwise the behavior is undefined.

Here's my first try:

#include <stdlib.h>
#include <stdint.h>
void *aligned_malloc1(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

size_t mask = boundary - 1;
uintptr_t res = ((uintptr_t)base_ptr + sizeof base_ptr + mask) & ~mask;
((void **)res)[-1] = base_ptr;
return (void *)res;
}

Sources of non-portability:

I used uintptr_t, which is an optional C99 type. If it's not
defined, I suppose there is no guarantee that pointers can be
represented as integers?

I suppose it is possible that sizeof(size_t) < sizeof(uintptr_t)
but I don't think that would cause a problem?

It is allowed to cast from (void *) to uintptr_t (and back) but
I don't think it is allowed to manipulate a uintptr_t, then cast
it to (void *) and expect a valid pointer. I think I can work
around this by computing an offset?

Here's my second try:

#include <stdlib.h>
#include <stdint.h>
void *aligned_malloc2(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

size_t mask = boundary - 1;
uintptr_t base = (uintptr_t)base_ptr;
uintptr_t addr = (base + sizeof base_ptr + mask) & ~mask;
void *res = (unsigned char *)base_ptr + (addr - base);
((void **)res)[-1] = base_ptr;
return res;
}

My compiler generates slightly different code for the two functions.
(gcc -Wall -Wextra -std=c99 -pedantic -O3 -fomit-frame-pointer)

Is aligned_malloc2 truly more portable than aligned_malloc1?

Do you see ways to improve aligned_malloc2?

Regards.
 
E

Eric Sosman

Hello,

I'm trying to write portable (as much as I can) alignment code.
Basically, I'm writing my own memalign function.

Function prototype:

void *aligned_malloc(size_t boundary, size_t size)

boundary shall be a power of two, and shall not be smaller
than sizeof(void *) otherwise the behavior is undefined.

Side note: You could just round it up if needed.
Here's my first try:

#include<stdlib.h>
#include<stdint.h>
void *aligned_malloc1(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

size_t mask = boundary - 1;
uintptr_t res = ((uintptr_t)base_ptr + sizeof base_ptr + mask)& ~mask;
((void **)res)[-1] = base_ptr;
return (void *)res;
}

Sources of non-portability:

I used uintptr_t, which is an optional C99 type. If it's not
defined, I suppose there is no guarantee that pointers can be
represented as integers?

Right. Pointer values can be converted to integers, but the
nature of the conversion is at the implementation's whim. The
conversion may lose information, so in that sense the converted
integer may not "represent" the pointer. Similar concerns apply
to converting integers back to pointers again.

Even with a reversible conversion that loses no data, it may
turn out that the integer's bits play different roles than you
expect. Bentley and McIlroy mention certain Cray machines on which

... plausible code such as ((long)a | es) % sizeof(long)
fails because the least significant part of a byte address
occupies the most significant part of a long.
-- "Engineering a Sort Function"
I suppose it is possible that sizeof(size_t)< sizeof(uintptr_t)
but I don't think that would cause a problem?

It is allowed to cast from (void *) to uintptr_t (and back) but
I don't think it is allowed to manipulate a uintptr_t, then cast
it to (void *) and expect a valid pointer. I think I can work
around this by computing an offset?

Right again: The conversions work, but the manipulations are
hazardous. Since you don't know how to discover the alignment of
a value returned by malloc(), I don't see how computing known
offsets from that unknown boundary will be of help.
Here's my second try:

#include<stdlib.h>
#include<stdint.h>
void *aligned_malloc2(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

size_t mask = boundary - 1;
uintptr_t base = (uintptr_t)base_ptr;
uintptr_t addr = (base + sizeof base_ptr + mask)& ~mask;
void *res = (unsigned char *)base_ptr + (addr - base);
((void **)res)[-1] = base_ptr;
return res;
}

My compiler generates slightly different code for the two functions.
(gcc -Wall -Wextra -std=c99 -pedantic -O3 -fomit-frame-pointer)

Is aligned_malloc2 truly more portable than aligned_malloc1?

Looks like it has all the same issues.
Do you see ways to improve aligned_malloc2?

Instead of fiddling with the low-order bits of a converted
pointer to discover the alignment, Bentley and McIlroy try to
use pointer subtraction. Recast in your framework,

uintmax_t base_offset = (char*)base_ptr - (char*)0;

.... the idea being that `(char*)0' is "infinitely aligned" and
that the pointer subtraction deals with representational issues.
This still isn't completely portable, but has a chance of working
on machines where the bit layout of pointer values is surprising.
 
J

Jens Gustedt

Hello,

Am 04/18/2012 01:38 PM, schrieb Eric Sosman:
Instead of fiddling with the low-order bits of a converted
pointer to discover the alignment, Bentley and McIlroy try to
use pointer subtraction. Recast in your framework,

uintmax_t base_offset = (char*)base_ptr - (char*)0;

... the idea being that `(char*)0' is "infinitely aligned" and
that the pointer subtraction deals with representational issues.
This still isn't completely portable, but has a chance of working
on machines where the bit layout of pointer values is surprising.

I think there is no portable way to do this. Even the definition of
alignment in the C standard is underspecified. The term refers to a
"multiple of a byte address" but since addresses are not numbers, it
is not clear what the multiple of an address would be.

The only thing that one can deduce from the standard is that
"alignment" is a numerical feature of the address of objects, with
certain properties on the allowed numerical values. And that access to
subobjects of such an object can be subject to constraints that can be
expressed with these numerical values.

I think this is exactly the reason why C11 and POSIX have proper
interfaces for that task (posix_memalign and alligned_alloc).

Jens
 
N

Noob

Eric said:
Side note: You could just round it up if needed.

In the grand old tradition of C, it doesn't seem "fair" to
impose unnecessary overhead for correct use of an interface.

An acceptable solution might be to provide
void *aligned_malloc(size_t boundary, size_t size)
{
assert(boundary > sizeof(void *));
assert((boundary & (boundary - 1)) == 0);
return aligned_malloc(boundary, size);
}

Side note: is there a trivial bit-twiddling operation to round
an integer up to the next power of 2?

/me runs off to Google
Depends what one considers "trivial"
http://bits.stephan-brumme.com/roundUpToNextPowerOfTwo.html
http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
Here's my first try:

#include<stdlib.h>
#include<stdint.h>
void *aligned_malloc1(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

size_t mask = boundary - 1;
uintptr_t res = ((uintptr_t)base_ptr + sizeof base_ptr + mask)& ~mask;
((void **)res)[-1] = base_ptr;
return (void *)res;
}

Sources of non-portability:

I used uintptr_t, which is an optional C99 type. If it's not
defined, I suppose there is no guarantee that pointers can be
represented as integers?

Right. Pointer values can be converted to integers, but the
nature of the conversion is at the implementation's whim. The
conversion may lose information, so in that sense the converted
integer may not "represent" the pointer. Similar concerns apply
to converting integers back to pointers again.

Even with a reversible conversion that loses no data, it may
turn out that the integer's bits play different roles than you
expect. Bentley and McIlroy mention certain Cray machines on which

... plausible code such as ((long)a | es) % sizeof(long)
fails because the least significant part of a byte address
occupies the most significant part of a long.
-- "Engineering a Sort Function"

I've been staring at that quote, but cannot parse it.
I don't think (??) they are hinting at endian-ness issues.
I assume "a" is a (char *)? (because they mention "byte address")
I guess I'm having a hard time "breaking free" from the simplicity
of a linear address space :)
(I suppose I implicitly assumed that if the implementation defines
uintptr_t, then this implies a linear address space. I imagine
my assumption was incorrect?)

Lemme see... this looks interesting (and relevant):
http://c-faq.com/null/machexamp.html

"Some 64-bit Cray machines represent int * in the lower 48 bits of a
word; char * additionally uses some of the upper 16 bits to indicate
a byte address within a word."

Are there C99 compilers for such a beast?
Assuming, for illustration purposes, that bits 48-50 indicate the
byte offset, then given the following code:

extern char *foo;
char *bar = foo + n;

the implementation would have to generate code to extract the
byte offset, add n, handle the potential overflow, then make
a valid pointer out of this mess? (I suppose there was no
hardware support for this kind of operations.)
I suppose it is possible that sizeof(size_t)< sizeof(uintptr_t)
but I don't think that would cause a problem?

It is allowed to cast from (void *) to uintptr_t (and back) but
I don't think it is allowed to manipulate a uintptr_t, then cast
it to (void *) and expect a valid pointer. I think I can work
around this by computing an offset?

Right again: The conversions work, but the manipulations are
hazardous. Since you don't know how to discover the alignment of
a value returned by malloc(), I don't see how computing known
offsets from that unknown boundary will be of help.
Here's my second try:

#include<stdlib.h>
#include<stdint.h>
void *aligned_malloc2(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

size_t mask = boundary - 1;
uintptr_t base = (uintptr_t)base_ptr;
uintptr_t addr = (base + sizeof base_ptr + mask)& ~mask;
void *res = (unsigned char *)base_ptr + (addr - base);
((void **)res)[-1] = base_ptr;
return res;
}

My compiler generates slightly different code for the two functions.
(gcc -Wall -Wextra -std=c99 -pedantic -O3 -fomit-frame-pointer)

Is aligned_malloc2 truly more portable than aligned_malloc1?

Looks like it has all the same issues.
Do you see ways to improve aligned_malloc2?

Instead of fiddling with the low-order bits of a converted
pointer to discover the alignment, Bentley and McIlroy try to
use pointer subtraction. Recast in your framework,

uintmax_t base_offset = (char*)base_ptr - (char*)0;

... the idea being that `(char*)0' is "infinitely aligned" and
that the pointer subtraction deals with representational issues.
This still isn't completely portable, but has a chance of working
on machines where the bit layout of pointer values is surprising.

I thought pointer subtraction was only defined when both pointers
actually point into the same object? Since NULL cannot point into
an object, such an operation must have undefined behavior?

Regards.
 
J

James Kuyper

Hello,

I'm trying to write portable (as much as I can) alignment code.
Basically, I'm writing my own memalign function.

Function prototype:

void *aligned_malloc(size_t boundary, size_t size)

boundary shall be a power of two, and shall not be smaller
than sizeof(void *) otherwise the behavior is undefined.

Here's my first try:

#include <stdlib.h>
#include <stdint.h>
void *aligned_malloc1(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

size_t mask = boundary - 1;
uintptr_t res = ((uintptr_t)base_ptr + sizeof base_ptr + mask) & ~mask;
((void **)res)[-1] = base_ptr;
return (void *)res;
}

Sources of non-portability:

Assuming that mathematical operations on the uintptr_t value
corresponding to a pointer tell you anything meaningful about the
alignment of that pointer. There's many systems where this will work as
intended; but the standard doesn't require it to work.
I used uintptr_t, which is an optional C99 type. If it's not
defined, I suppose there is no guarantee that pointers can be
represented as integers?

Correct. As a practical matter, uintptr_t is unlikely to be missing
unless sizeof(void*) > sizeof(uintmax_t).

To cover that possibility, I'd recommend

#ifndef UINTPTR_MAX
#error uintptr_t not supported
#endif
I suppose it is possible that sizeof(size_t) < sizeof(uintptr_t)
but I don't think that would cause a problem?

It shouldn't.
It is allowed to cast from (void *) to uintptr_t (and back) but
I don't think it is allowed to manipulate a uintptr_t, then cast
it to (void *) and expect a valid pointer. ...
Correct.

... I think I can work
around this by computing an offset?

Here's my second try:

#include <stdlib.h>
#include <stdint.h>
void *aligned_malloc2(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

size_t mask = boundary - 1;
uintptr_t base = (uintptr_t)base_ptr;
uintptr_t addr = (base + sizeof base_ptr + mask) & ~mask;
void *res = (unsigned char *)base_ptr + (addr - base);

Unportable assumption: subtracting uintptr_t values is equivalent to
subtracting the corresponding pointers. Again, this will work on many
real-world systems; it will probably fail on the same ones that your
first one would fail on.
((void **)res)[-1] = base_ptr;
return res;
}

My compiler generates slightly different code for the two functions.
(gcc -Wall -Wextra -std=c99 -pedantic -O3 -fomit-frame-pointer)

Is aligned_malloc2 truly more portable than aligned_malloc1?
No.

Do you see ways to improve aligned_malloc2?

In C2011, there's a new function aligned_alloc() that appears to be
almost identical in function to memalign(). The biggest difference is
that memalign() requires that the alignment requirement be a power of 2,
while aligned_alloc() is more restrictive, requiring that "The value of
alignment shall be a valid alignment supported by the implementation and
the value of size shall be an integral multiple of alignment". This
basically means that you have to use the new _Alignof operator when
determining the values to pass to aligned_alloc().

The hard part, of course, will be finding a C2011 compiler. The man page
on my computer describes memalign() as obsolete, with posix_memalign()
as the preferred alternative.

malloc() is guaranteed to return a pointer to memory that "is suitably
aligned so that it may be assigned to a pointer to any type of object
with a fundamental alignment requirement and then used to access such an
object or an array of such objects in the space allocated (until the
space is explicitly deallocated)". (7.22.3p1)

The key thing to understand about either aligned_alloc() or memalign()
is that they are allowed to allocate memory that is LESS strictly
aligned than that allocated by malloc(). On systems where
aligned_alloc() or memalign() are properly integrated with the malloc()
family, the freedom to allocate less strictly aligned memory allows them
to manage the heap in a less wasteful fashion.

However, if your memalign() substitute is just a wrapper for malloc(),
you gain no such advantage; in fact, since you're saving a copy of the
original pointer at a fixed negative offset from the returned value,
you're actually wasting more space than malloc().

Ignoring for the moment the issue of whether there's any point in
defining such a function, here's a simpler and more portable approach to
implementing it. It relies on the assumption that void* has an alignment
requirement which is a power of 2; this is mandatory for C2011, but
unspecified in earlier versions of the C standard. However, it happens
to be true for (almost?) all real-world implementations of C, so it's
unlikely to be a problem:

void *aligned_malloc1(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

void *res = base_ptr + boundary;
((void **)res)[-1] = base_ptr; // Dangerous
return (void *)res;
}

The line marked "Dangerous" has undefined behavior if boundary <
sizeof(void*), or is not a multiple of _Alignof(void*); but you've
already specified that the behavior is allowed to be undefined in those
cases.
 
J

Jens Gustedt

Am 04/18/2012 03:15 PM, schrieb James Kuyper:
In C2011, there's a new function aligned_alloc() that appears to be
almost identical in function to memalign(). The biggest difference is
that memalign() requires that the alignment requirement be a power of 2,
while aligned_alloc() is more restrictive, requiring that "The value of
alignment shall be a valid alignment supported by the implementation and
the value of size shall be an integral multiple of alignment". This
basically means that you have to use the new _Alignof operator when
determining the values to pass to aligned_alloc().

I have a vague memory of someone proving that under restrictions of
C11 alignment has to be powers of two, anyhow. (But I didn't find
anything else supporting that claim.)
void *aligned_malloc1(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

void *res = base_ptr + boundary;

not well defined, for arithmetic you'd have to cast to a char type
first

Jens
 
J

James Kuyper

On 04/18/2012 10:06 AM, Jens Gustedt wrote:
....
I have a vague memory of someone proving that under restrictions of
C11 alignment has to be powers of two, anyhow. (But I didn't find
anything else supporting that claim.)

"Every valid alignment value shall be a nonnegative integral power of
two." (6.2.8p4)
not well defined, for arithmetic you'd have to cast to a char type
first

I knew I'd forgotten something. Declaring base_ptr to be char * is
sufficient, no explicit conversions are needed. Also, in the final line:
return (void*)res;

the cast is unnecessary; it's left over from the original program, in
which it was necessary.
 
N

Noob

James said:
Hello,

I'm trying to write portable (as much as I can) alignment code.
Basically, I'm writing my own memalign function.

Function prototype:

void *aligned_malloc(size_t boundary, size_t size)

boundary shall be a power of two, and shall not be smaller
than sizeof(void *) otherwise the behavior is undefined.

Here's my first try:

#include <stdlib.h>
#include <stdint.h>
void *aligned_malloc1(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

size_t mask = boundary - 1;
uintptr_t res = ((uintptr_t)base_ptr + sizeof base_ptr + mask) & ~mask;
((void **)res)[-1] = base_ptr;
return (void *)res;
}

Sources of non-portability:

Assuming that mathematical operations on the uintptr_t value
corresponding to a pointer tell you anything meaningful about the
alignment of that pointer. There's many systems where this will work as
intended; but the standard doesn't require it to work.
I used uintptr_t, which is an optional C99 type. If it's not
defined, I suppose there is no guarantee that pointers can be
represented as integers?

Correct. As a practical matter, uintptr_t is unlikely to be missing
unless sizeof(void*) > sizeof(uintmax_t).

To cover that possibility, I'd recommend

#ifndef UINTPTR_MAX
#error uintptr_t not supported
#endif

If uintptr_t is not supported, then the compiler will barf when
it sees uintptr_t. Are you using #error to produce a cleaner
error report?

[snip]
In C2011, there's a new function aligned_alloc() that appears to be
almost identical in function to memalign(). The biggest difference is
that memalign() requires that the alignment requirement be a power of 2,
while aligned_alloc() is more restrictive, requiring that "The value of
alignment shall be a valid alignment supported by the implementation and
the value of size shall be an integral multiple of alignment". This
basically means that you have to use the new _Alignof operator when
determining the values to pass to aligned_alloc().

The hard part, of course, will be finding a C2011 compiler. The man page
on my computer describes memalign() as obsolete, with posix_memalign()
as the preferred alternative.

malloc() is guaranteed to return a pointer to memory that "is suitably
aligned so that it may be assigned to a pointer to any type of object
with a fundamental alignment requirement and then used to access such an
object or an array of such objects in the space allocated (until the
space is explicitly deallocated)". (7.22.3p1)

The key thing to understand about either aligned_alloc() or memalign()
is that they are allowed to allocate memory that is LESS strictly
aligned than that allocated by malloc(). On systems where
aligned_alloc() or memalign() are properly integrated with the malloc()
family, the freedom to allocate less strictly aligned memory allows them
to manage the heap in a less wasteful fashion.

However, if your memalign() substitute is just a wrapper for malloc(),
you gain no such advantage; in fact, since you're saving a copy of the
original pointer at a fixed negative offset from the returned value,
you're actually wasting more space than malloc().

Even though my platform does provide memalign, I can't use it, because
I need to allocate memory from a specific memory pool.
Ignoring for the moment the issue of whether there's any point in
defining such a function,

(Leaving the realm of standard C) I have to provide an allocation
function which returns 32-byte aligned pointers to uncached memory.
The way this is done on my platform is to allocate a large block
of memory using malloc, get the physical address of that block,
ask the OS to map the physical address to an uncached virtual
address, and then provide this block of uncached memory to the OS
memory manager, which doesn't provide an aligned_malloc :-(
here's a simpler and more portable approach to
implementing it. It relies on the assumption that void* has an alignment
requirement which is a power of 2; this is mandatory for C2011, but
unspecified in earlier versions of the C standard. However, it happens
to be true for (almost?) all real-world implementations of C, so it's
unlikely to be a problem:

void *aligned_malloc1(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

void *res = base_ptr + boundary;
((void **)res)[-1] = base_ptr; // Dangerous
return (void *)res;
}

Your proposed function doesn't work (even with the corrections you
mention in an ulterior message).

On my platform, CHAR_BIT = 8, and malloc returns 8-byte (64-bits)
aligned pointers.

What happens when I request a 32-byte aligned pointer?
If base_ptr mod boundary != 0 (e.g. 0x4008)
then (base_ptr + boundary) mod boundary != 0
The line marked "Dangerous" has undefined behavior if boundary <
sizeof(void*), or is not a multiple of _Alignof(void*); but you've
already specified that the behavior is allowed to be undefined in those
cases.

Regards.
 
B

Barry Schwarz

In the grand old tradition of C, it doesn't seem "fair" to
impose unnecessary overhead for correct use of an interface.

An acceptable solution might be to provide
void *aligned_malloc(size_t boundary, size_t size)
{
assert(boundary > sizeof(void *));
assert((boundary & (boundary - 1)) == 0);
return aligned_malloc(boundary, size);
}

Side note: is there a trivial bit-twiddling operation to round
an integer up to the next power of 2?

In the absence of a more sophisticated response:

if (boundary & boundary-1){
for (i = 1; boundary >> i; i++)
;
boundary = 1U << i;
}

I imagine a check for i < the number of bits in a size_t (or the
initial value of boundary <= SIZE_MAX>>1) should be added for
completeness.
 
J

James Kuyper

James Kuyper wrote: ....

If uintptr_t is not supported, then the compiler will barf when
it sees uintptr_t. Are you using #error to produce a cleaner
error report?

Basically, yes. Also, it documents the dependency in the code itself, so
the error message is recognized as indicating that the problem is a
failure of the implementation to meet the requirements of the code,
rather than a code defect. In the more general case, checking whether
UINTPTR_MAX is #defined allows selection of alternative strategies when
it's not.

....
On my platform, CHAR_BIT = 8, and malloc returns 8-byte (64-bits)
aligned pointers.

What happens when I request a 32-byte aligned pointer?
If base_ptr mod boundary != 0 (e.g. 0x4008)

If malloc() returns 8-byte aligned pointers, then 8 bytes should be
sufficient to meet the alignment requirements of all data types; why do
you need 32-byte alignment? Alternatively, if you legitimately need
32-byte alignment for some data type, then why does malloc() return
pointers with only 8 byte alignment? One way or another, there's
something wrong about this picture.

If for some reason (which you should explain) you need memory aligned
more strictly than is required by any data type supported by the
compiler, the new C2011 function aligned_alloc() won't solve your
problem either, because passing any number other than the alignment
requirement of an actual data type as the first argument of
aligned_alloc() is undefined behavior. posix_memalign(), if it were
available, would be fine, as long as requested alignment were a power of
2. If the requested alignment wasn't a power of 2, the result would be a
return value of NULL and errno set to EINVAL; which is an improvement on
the undefined behavior that aligned_alloc() would have in that case.
 
E

Eric Sosman

In the grand old tradition of C, it doesn't seem "fair" to
impose unnecessary overhead for correct use of an interface.

In another grand old tradition of C, it doesn't seem fair for
a portable interface to require that the caller know and use non-
portable implementation-specific sizes!

void *p = aligned_malloc(8, 64);

vs.

void *q = aligned_malloc(
sizeof(void*) < 8 ? 8 : sizeof(void*), 64);

.... and even then one needs to worry about sizeof(void*)==6 ...
An acceptable solution might be to provide
void *aligned_malloc(size_t boundary, size_t size)
{
assert(boundary> sizeof(void *));
assert((boundary& (boundary - 1)) == 0);
return aligned_malloc(boundary, size);

I hope you've got lots of stack space ...
}

Side note: is there a trivial bit-twiddling operation to round
an integer up to the next power of 2?

I don't know of one, but how "trivial" do you need it to be?
[...] Bentley and McIlroy mention certain Cray machines on which

... plausible code such as ((long)a | es) % sizeof(long)
fails because the least significant part of a byte address
occupies the most significant part of a long.
-- "Engineering a Sort Function"

I've been staring at that quote, but cannot parse it.

Think word-addressed machines. The "natural" form of a pointer
would be an N-bit integer designating a word of (say) 48 bits. To
deal with characters, the implementor has two "natural" choices: Use
a 48-bit `char', or pack six `char' into each addressable word and
inflate the `char*' so it can designate "word thus-and-such (natural
address), field so-and-so within the word." The Crays in question
apparently do this by gluing extra information onto a machine address,
and for reasons of their own they chose to put the extras in high-
order rather than low-order positions.

You may be thinking "Stupid Cray designers!" But if you think
you're smarter than Seymour Cray, perhaps you should nominate yourself
for the IEEE's Seymour Cray Computer Engineering Award.
I don't think (??) they are hinting at endian-ness issues.
I assume "a" is a (char *)? (because they mention "byte address")
I guess I'm having a hard time "breaking free" from the simplicity
of a linear address space :)
(I suppose I implicitly assumed that if the implementation defines
uintptr_t, then this implies a linear address space. I imagine
my assumption was incorrect?)

Yes, it was incorrect. But even in a linear address space your
manipulations can fail (see, e.g., the aforementioned Crays).
Lemme see... this looks interesting (and relevant):
http://c-faq.com/null/machexamp.html

"Some 64-bit Cray machines represent int * in the lower 48 bits of a
word; char * additionally uses some of the upper 16 bits to indicate
a byte address within a word."

Are there C99 compilers for such a beast?

No, obviously not. That's why Bentley and McIlroy didn't write
C that caters to such beasts: It would have been a purely intellectual
exercise on their part, utterly devoid of practical application.

Doh!
Assuming, for illustration purposes, that bits 48-50 indicate the
byte offset, then given the following code:

extern char *foo;
char *bar = foo + n;

the implementation would have to generate code to extract the
byte offset, add n, handle the potential overflow, then make
a valid pointer out of this mess? (I suppose there was no
hardware support for this kind of operations.)

Yes to the first part, "I don't know" to the second. GIYF.
I thought pointer subtraction was only defined when both pointers
actually point into the same object? Since NULL cannot point into
an object, such an operation must have undefined behavior?

As I said, "isn't completely portable."
 
J

James Kuyper

Clarity.

One requires the reader to recall the precedence rules, one does not.

The acronym ITYM (I Think You Mean) implies a difference in meaning; the
proposed alternative had the same meaning as the original. If you're
suggesting a rewrite for improved clarity without change in meaning, you
should say so.
 
J

Joe keane

Side note: is there a trivial bit-twiddling operation to round an
integer up to the next power of 2?

best way i know

int roundup(int old, int *newp)
{
int t;

t = old - 1;
t |= t >> 1;
t |= t >> 2;
t |= t >> 4;
t |= t >> 8;
t |= t >> 16;
/* t |= t >> 32; */
*newp = t + 1;
}

When i was doing this for real, different ranges of size were done
differently anyway (plus i have three-times-power-of-two and
five-times-power-of-two), so lookup tables are the way to go.
 
S

Stephen Sprunk

best way i know

int roundup(int old, int *newp)
{
int t;

t = old - 1;
t |= t >> 1;
t |= t >> 2;
t |= t >> 4;
t |= t >> 8;
t |= t >> 16;
/* t |= t >> 32; */
*newp = t + 1;
}

What's the point of newp, especially since you never return the int you
declared? Why create a temporary variable when "old" is passed by value
and can be locally modified? Shouldn't those ints be unsigned?
Finally, why not let the compiler figure out if the 32-bit shift is
needed on the current implementation?

#include <limits.h>
unsigned int roundup(unsigned int n) {
n--;
n |= n >> 1;
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n |= n >> 16;
#if UINT_MAX > 0xFFFFFFFF
n |= n >> 32;
#endif
n++;
return n;
}

Versions with long or long long are left as an exercise for the reader.

S
 
N

Noob

Eric said:
Noob said:
Here's my second try:

#include<stdlib.h>
#include<stdint.h>
void *aligned_malloc2(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return NULL;

size_t mask = boundary - 1;
uintptr_t base = (uintptr_t)base_ptr;
uintptr_t addr = (base + sizeof base_ptr + mask)& ~mask;
void *res = (unsigned char *)base_ptr + (addr - base);
((void **)res)[-1] = base_ptr;
return res;
}

Do you see ways to improve aligned_malloc2?

Instead of fiddling with the low-order bits of a converted
pointer to discover the alignment, Bentley and McIlroy try to
use pointer subtraction. Recast in your framework,

uintmax_t base_offset = (char*)base_ptr - (char*)0;

... the idea being that `(char*)0' is "infinitely aligned" and
that the pointer subtraction deals with representational issues.
This still isn't completely portable, but has a chance of working
on machines where the bit layout of pointer values is surprising.

If I understand correctly, you're saying that aligned_malloc3
is probably the best we can do, in terms of portability.

#include <stdlib.h>
#include <stdint.h>
void *aligned_malloc3(size_t boundary, size_t size)
{
void *base_ptr = malloc(size + boundary);
if (base_ptr == NULL) return base_ptr;

#define MASK (boundary-1)
uintmax_t base = (char *)base_ptr - (char *)0;
uintmax_t addr = (base + sizeof(void *) + MASK) & ~MASK;
void *res = (char *)0 + addr;
((void **)res)[-1] = base_ptr;
return res;
}
 
N

Noob

James said:
If malloc() returns 8-byte aligned pointers, then 8 bytes should be
sufficient to meet the alignment requirements of all data types; why do
you need 32-byte alignment? Alternatively, if you legitimately need
32-byte alignment for some data type, then why does malloc() return
pointers with only 8 byte alignment? One way or another, there's
something wrong about this picture.

malloc works within the framework of the /abstract/ machine. But there
is /real/ hardware behind this abstraction. (I did state I was leaving
the realm of standard C in the paragraph you snipped.)

To name a few reasons for "stricter" alignment than what malloc provides,
consider aligning on a cache line boundary for performance, or aligning
on a page boundary in virtual memory code, or satisfying the requirements
of the USB protocol in driver code.
If for some reason (which you should explain) you need memory aligned
more strictly than is required by any data type supported by the
compiler, the new C2011 function aligned_alloc() won't solve your
problem either, because passing any number other than the alignment
requirement of an actual data type as the first argument of
aligned_alloc() is undefined behavior. posix_memalign(), if it were
available, would be fine, as long as requested alignment were a power of
2. If the requested alignment wasn't a power of 2, the result would be a
return value of NULL and errno set to EINVAL; which is an improvement on
the undefined behavior that aligned_alloc() would have in that case.

My platform does provide memalign, which works great when I need aligned
pointers in cached address space (bonus: newlib's memalign doesn't waste
space from the alignment constraint like my implementation does), but
the platform designers somehow forgot to provide a "generic" memalign
that would work with any memory partition.
 
N

Noob

Eric said:
In another grand old tradition of C, it doesn't seem fair for
a portable interface to require that the caller know and use non-
portable implementation-specific sizes!

I guess "portable" is somewhat ambiguous. I'm not writing a library
for others to use. I meant "portable" in the sense that I want the
same code to be correct on all the platforms I work with.
(In my code, boundary is not even a parameter, it's just 32.)
I hope you've got lots of stack space ...

I cannot understand why you would say that. An additional function
call within a short-lived call chain has no impact on the stack space
requirement. What am I missing?
Think word-addressed machines. The "natural" form of a pointer
would be an N-bit integer designating a word of (say) 48 bits. To
deal with characters, the implementor has two "natural" choices: Use
a 48-bit `char', or pack six `char' into each addressable word and
inflate the `char*' so it can designate "word thus-and-such (natural
address), field so-and-so within the word." The Crays in question
apparently do this by gluing extra information onto a machine address,
and for reasons of their own they chose to put the extras in high-
order rather than low-order positions.

You may be thinking "Stupid Cray designers!" But if you think
you're smarter than Seymour Cray, perhaps you should nominate yourself
for the IEEE's Seymour Cray Computer Engineering Award.

IMHO, if Seymour Cray designed a new high-performance architecture
today, it would be less "exotic" than that from the past. And that's
a GOOD THING(TM), again IMHO.
No, obviously not. That's why Bentley and McIlroy didn't write
C that caters to such beasts: It would have been a purely intellectual
exercise on their part, utterly devoid of practical application.

Doh!

Sir, my irony detector is at the repair shop this week, and I'm just
not getting what you wrote. Can you spell it out?

Regards.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top