alignment using malloc

D

digz

I saw the following code and comments in a source file, I am not able
to appreciate
what the author is trying to do in ALIGNED_ALLOC , can some one plz
help me understand why CACHE_LINE_SIZE * 2 is first added to the
parameters to malloc and then CACHE_LINE_SIZE -1 is added to the
return address and after that its bit anded with ( CACHE_LINE_SIZE
-1 ) , how is this helping alignment etc ?


/*
* Allow us to efficiently align and pad structures so that shared
fields
* don't cause contention on thread-local or read-only fields.
*/
#define CACHE_LINE_SIZE 64
#define CACHE_PAD(_n) char __pad ## _n [CACHE_LINE_SIZE]
#define ALIGNED_ALLOC(_s) \
((void *)(((unsigned long)malloc((_s)+CACHE_LINE_SIZE*2) + \
CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE-1)))


Thx
Digz
 
R

Richard

digz said:
I saw the following code and comments in a source file, I am not able
to appreciate
what the author is trying to do in ALIGNED_ALLOC , can some one plz
help me understand why CACHE_LINE_SIZE * 2 is first added to the
parameters to malloc and then CACHE_LINE_SIZE -1 is added to the
return address and after that its bit anded with ( CACHE_LINE_SIZE
-1 ) , how is this helping alignment etc ?


/*
* Allow us to efficiently align and pad structures so that shared
fields
* don't cause contention on thread-local or read-only fields.
*/
#define CACHE_LINE_SIZE 64
#define CACHE_PAD(_n) char __pad ## _n [CACHE_LINE_SIZE]
#define ALIGNED_ALLOC(_s) \
((void *)(((unsigned long)malloc((_s)+CACHE_LINE_SIZE*2) + \
CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE-1)))


Thx
Digz

because you can not tell malloc (afaik) how to align.

You need to ensure that the malloc area is large enough to allow a NEW
value to be used whichi points to somewhere in the malloced block but is
aligned on a linesize boundary. This is critical to avoid oft accessed
structures/data running over a cache line size. I'm not sure how, if at
all, the memory can then be freed.
 
J

Jack Klein

I saw the following code and comments in a source file, I am not able
to appreciate
what the author is trying to do in ALIGNED_ALLOC , can some one plz
help me understand why CACHE_LINE_SIZE * 2 is first added to the
parameters to malloc and then CACHE_LINE_SIZE -1 is added to the
return address and after that its bit anded with ( CACHE_LINE_SIZE
-1 ) , how is this helping alignment etc ?


/*
* Allow us to efficiently align and pad structures so that shared
fields
* don't cause contention on thread-local or read-only fields.
*/
#define CACHE_LINE_SIZE 64
#define CACHE_PAD(_n) char __pad ## _n [CACHE_LINE_SIZE]
#define ALIGNED_ALLOC(_s) \
((void *)(((unsigned long)malloc((_s)+CACHE_LINE_SIZE*2) + \
CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE-1)))

Ask the person who wrote the code, or a group supporting the compiler
and platform it was written for. It produces undefined behavior in
several ways.

On the other hand, replace the call to malloc() with a call to a
function of your own that returns different unsigned long values each
time it is called, and outputs the value with printf() before
returning it.

Then write a main() function that invokes the macro, and prints the
final result. Compare the value returned by your malloc() replacement
function and the final value after mishandling by the macro.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
 
B

Barry Schwarz

I saw the following code and comments in a source file, I am not able
to appreciate
what the author is trying to do in ALIGNED_ALLOC , can some one plz
help me understand why CACHE_LINE_SIZE * 2 is first added to the
parameters to malloc and then CACHE_LINE_SIZE -1 is added to the
return address and after that its bit anded with ( CACHE_LINE_SIZE
-1 ) , how is this helping alignment etc ?


/*
* Allow us to efficiently align and pad structures so that shared
fields
* don't cause contention on thread-local or read-only fields.
*/
#define CACHE_LINE_SIZE 64
#define CACHE_PAD(_n) char __pad ## _n [CACHE_LINE_SIZE]
#define ALIGNED_ALLOC(_s) \
((void *)(((unsigned long)malloc((_s)+CACHE_LINE_SIZE*2) + \
CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE-1)))

As bad an idea as this is, here is one more problem. How would you
ever free the allocated memory?


Remove del for email
 
C

christian.bau

digz said:
I saw the following code and comments in a source file, I am not able
to appreciate
what the author is trying to do in ALIGNED_ALLOC , can some one plz
help me understand why CACHE_LINE_SIZE * 2 is first added to the
parameters to malloc and then CACHE_LINE_SIZE -1 is added to the
return address and after that its bit anded with ( CACHE_LINE_SIZE
-1 ) , how is this helping alignment etc ?


/*
* Allow us to efficiently align and pad structures so that shared
fields
* don't cause contention on thread-local or read-only fields.
*/
#define CACHE_LINE_SIZE 64
#define CACHE_PAD(_n) char __pad ## _n [CACHE_LINE_SIZE]
#define ALIGNED_ALLOC(_s) \
((void *)(((unsigned long)malloc((_s)+CACHE_LINE_SIZE*2) + \
CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE-1)))

What this code tries to do is create a pointer that is aligned to a
byte address that is a multiple of CACHE_LINE_SIZE, which is a power
of two.

Here CACHE_LINE_SIZE is a power of two (two to the sixth power).
Therefore CACHE_LINE_SIZE - 1 is a value with the lowest six bits set,
and all other bits zero. X & (CACHE_LINE_SIZE - 1) equals X with the
lowest six bits cleared, so the result is X rounded down to the next
multiple of CACHE_LINE_SIZE,

That result would have the correct line size, but it wouldn't be in
the area allocated by malloc. By adding CACHE_LINE_SIZE - 1 = 63
first, we get a result that is not smaller than what malloc returned.
However, the caller wanted a pointer to _s bytes, and up to
CACHE_LINE_SIZE - 1 bytes have been skipped. To make sure that _s
bytes are actually available, CACHE_LINE_SIZE - 1 should be added to
the argument of malloc.

There are a few problems with this code: First, it wastes memory by
adding CACHE_LINE_SIZE*2 bytes instead of CACHE_LINE_SIZE - 1. That's
not a serious problem, but the next problem is serious: The author
casts a pointer to unsigned long, modifies it, then casts the result
back to a pointer. This will go horribly wrong if unsigned long = 32
bit and a pointer is 64 bit, for example. There is also no guarantee
whatsoever that casting to unsigned long, doing arithmetic, and
casting back will give the result wanted. It is much safer to
calculate how many bytes to add, then cast the pointer to char* and
add the number of bytes wanted. The third problem is that after using
the macro, we don't know the value that malloc returned and there is
no way to recover it. Therefore, it is impossible to ever free the
memory that was allocated.

One harmless and two rather serious problems in a very short bit of
code. Oh well.
 
D

digz

digzwrote:
I saw the following code and comments in a source file, I am not able
to appreciate
what the author is trying to do in ALIGNED_ALLOC , can some one plz
help me understand why CACHE_LINE_SIZE * 2 is first added to the
parameters to malloc and then CACHE_LINE_SIZE -1 is added to the
return address and after that its bit anded with ( CACHE_LINE_SIZE
-1 ) , how is this helping alignment etc ?
/*
* Allow us to efficiently align and pad structures so that shared
fields
* don't cause contention on thread-local or read-only fields.
*/
#define CACHE_LINE_SIZE 64
#define CACHE_PAD(_n) char __pad ## _n [CACHE_LINE_SIZE]
#define ALIGNED_ALLOC(_s) \
((void *)(((unsigned long)malloc((_s)+CACHE_LINE_SIZE*2) + \
CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE-1)))

What this code tries to do is create a pointer that is aligned to a
byte address that is a multiple of CACHE_LINE_SIZE, which is a power
of two.

Here CACHE_LINE_SIZE is a power of two (two to the sixth power).
Therefore CACHE_LINE_SIZE - 1 is a value with the lowest six bits set,
and all other bits zero. X & (CACHE_LINE_SIZE - 1) equals X with the
lowest six bits cleared, so the result is X rounded down to the next
multiple of CACHE_LINE_SIZE,

That result would have the correct line size, but it wouldn't be in
the area allocated by malloc. By adding CACHE_LINE_SIZE - 1 = 63
first, we get a result that is not smaller than what malloc returned.
However, the caller wanted a pointer to _s bytes, and up to
CACHE_LINE_SIZE - 1 bytes have been skipped. To make sure that _s
bytes are actually available, CACHE_LINE_SIZE - 1 should be added to
the argument of malloc.

There are a few problems with this code: First, it wastes memory by
adding CACHE_LINE_SIZE*2 bytes instead of CACHE_LINE_SIZE - 1. That's
not a serious problem, but the next problem is serious: The author
casts a pointer to unsigned long, modifies it, then casts the result
back to a pointer. This will go horribly wrong if unsigned long = 32
bit and a pointer is 64 bit, for example. There is also no guarantee
whatsoever that casting to unsigned long, doing arithmetic, and
casting back will give the result wanted.

The third problem is that after using
the macro, we don't know the value that malloc returned and there is
no way to recover it. Therefore, it is impossible to ever free the
memory that was allocated.

One harmless and two rather serious problems in a very short bit of
code. Oh well.

When you say
It is much safer to calculate how many bytes to add, then cast the
pointer to char* and
add the number of bytes wanted.

do u mean something like ,
size_t alligned_size = _s + ( CACHE_LINE_SIZE - ( _s %
CACHE_LINE_SIZE ) )
aligned_ptr = (char * ) malloc( aligned_size )

Thx
Digz
 
D

digz

digzwrote:
I saw the following code and comments in a source file, I am not able
to appreciate
what the author is trying to do in ALIGNED_ALLOC , can some one plz
help me understand why CACHE_LINE_SIZE * 2 is first added to the
parameters to malloc and then CACHE_LINE_SIZE -1 is added to the
return address and after that its bit anded with ( CACHE_LINE_SIZE
-1 ) , how is this helping alignment etc ?
/*
* Allow us to efficiently align and pad structures so that shared
fields
* don't cause contention on thread-local or read-only fields.
*/
#define CACHE_LINE_SIZE 64
#define CACHE_PAD(_n) char __pad ## _n [CACHE_LINE_SIZE]
#define ALIGNED_ALLOC(_s) \
((void *)(((unsigned long)malloc((_s)+CACHE_LINE_SIZE*2) + \
CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE-1)))

What this code tries to do is create a pointer that is aligned to a
byte address that is a multiple of CACHE_LINE_SIZE, which is a power
of two.

Here CACHE_LINE_SIZE is a power of two (two to the sixth power).
Therefore CACHE_LINE_SIZE - 1 is a value with the lowest six bits set,
and all other bits zero. X & (CACHE_LINE_SIZE - 1) equals X with the
lowest six bits cleared, so the result is X rounded down to the next
multiple of CACHE_LINE_SIZE,

That result would have the correct line size, but it wouldn't be in
the area allocated by malloc. By adding CACHE_LINE_SIZE - 1 = 63
first, we get a result that is not smaller than what malloc returned.
However, the caller wanted a pointer to _s bytes, and up to
CACHE_LINE_SIZE - 1 bytes have been skipped. To make sure that _s
bytes are actually available, CACHE_LINE_SIZE - 1 should be added to
the argument of malloc.

There are a few problems with this code: First, it wastes memory by
adding CACHE_LINE_SIZE*2 bytes instead of CACHE_LINE_SIZE - 1. That's
not a serious problem, but the next problem is serious: The author
casts a pointer to unsigned long, modifies it, then casts the result
back to a pointer. This will go horribly wrong if unsigned long = 32
bit and a pointer is 64 bit, for example. There is also no guarantee
whatsoever that casting to unsigned long, doing arithmetic, and
casting back will give the result wanted.

The third problem is that after using
the macro, we don't know the value that malloc returned and there is
no way to recover it. Therefore, it is impossible to ever free the
memory that was allocated.

One harmless and two rather serious problems in a very short bit of
code. Oh well.

When you say
It is much safer to calculate how many bytes to add, then cast the
pointer to char* and
add the number of bytes wanted.

do u mean something like ,
size_t alligned_size = _s + ( CACHE_LINE_SIZE - ( _s %
CACHE_LINE_SIZE ) )
aligned_ptr = (char * ) malloc( aligned_size )

Thx
Digz
 
D

digz

digzwrote:
I saw the following code and comments in a source file, I am not able
to appreciate
what the author is trying to do in ALIGNED_ALLOC , can some one plz
help me understand why CACHE_LINE_SIZE * 2 is first added to the
parameters to malloc and then CACHE_LINE_SIZE -1 is added to the
return address and after that its bit anded with ( CACHE_LINE_SIZE
-1 ) , how is this helping alignment etc ?
/*
* Allow us to efficiently align and pad structures so that shared
fields
* don't cause contention on thread-local or read-only fields.
*/
#define CACHE_LINE_SIZE 64
#define CACHE_PAD(_n) char __pad ## _n [CACHE_LINE_SIZE]
#define ALIGNED_ALLOC(_s) \
((void *)(((unsigned long)malloc((_s)+CACHE_LINE_SIZE*2) + \
CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE-1)))

What this code tries to do is create a pointer that is aligned to a
byte address that is a multiple of CACHE_LINE_SIZE, which is a power
of two.

Here CACHE_LINE_SIZE is a power of two (two to the sixth power).
Therefore CACHE_LINE_SIZE - 1 is a value with the lowest six bits set,
and all other bits zero. X & (CACHE_LINE_SIZE - 1) equals X with the
lowest six bits cleared, so the result is X rounded down to the next
multiple of CACHE_LINE_SIZE,

That result would have the correct line size, but it wouldn't be in
the area allocated by malloc. By adding CACHE_LINE_SIZE - 1 = 63
first, we get a result that is not smaller than what malloc returned.
However, the caller wanted a pointer to _s bytes, and up to
CACHE_LINE_SIZE - 1 bytes have been skipped. To make sure that _s
bytes are actually available, CACHE_LINE_SIZE - 1 should be added to
the argument of malloc.

There are a few problems with this code: First, it wastes memory by
adding CACHE_LINE_SIZE*2 bytes instead of CACHE_LINE_SIZE - 1. That's
not a serious problem, but the next problem is serious: The author
casts a pointer to unsigned long, modifies it, then casts the result
back to a pointer. This will go horribly wrong if unsigned long = 32
bit and a pointer is 64 bit, for example. There is also no guarantee
whatsoever that casting to unsigned long, doing arithmetic, and
casting back will give the result wanted.

The third problem is that after using
the macro, we don't know the value that malloc returned and there is
no way to recover it. Therefore, it is impossible to ever free the
memory that was allocated.

One harmless and two rather serious problems in a very short bit of
code. Oh well.

When you say
It is much safer to calculate how many bytes to add, then cast the
pointer to char* and
add the number of bytes wanted.

do u mean something like ,
size_t alligned_size = _s + ( CACHE_LINE_SIZE - ( _s %
CACHE_LINE_SIZE ) )
aligned_ptr = (char * ) malloc( aligned_size )

Thx
Digz
 
D

digz

digzwrote:
I saw the following code and comments in a source file, I am not able
to appreciate
what the author is trying to do in ALIGNED_ALLOC , can some one plz
help me understand why CACHE_LINE_SIZE * 2 is first added to the
parameters to malloc and then CACHE_LINE_SIZE -1 is added to the
return address and after that its bit anded with ( CACHE_LINE_SIZE
-1 ) , how is this helping alignment etc ?
/*
* Allow us to efficiently align and pad structures so that shared
fields
* don't cause contention on thread-local or read-only fields.
*/
#define CACHE_LINE_SIZE 64
#define CACHE_PAD(_n) char __pad ## _n [CACHE_LINE_SIZE]
#define ALIGNED_ALLOC(_s) \
((void *)(((unsigned long)malloc((_s)+CACHE_LINE_SIZE*2) + \
CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE-1)))

What this code tries to do is create a pointer that is aligned to a
byte address that is a multiple of CACHE_LINE_SIZE, which is a power
of two.

Here CACHE_LINE_SIZE is a power of two (two to the sixth power).
Therefore CACHE_LINE_SIZE - 1 is a value with the lowest six bits set,
and all other bits zero. X & (CACHE_LINE_SIZE - 1) equals X with the
lowest six bits cleared, so the result is X rounded down to the next
multiple of CACHE_LINE_SIZE,

That result would have the correct line size, but it wouldn't be in
the area allocated by malloc. By adding CACHE_LINE_SIZE - 1 = 63
first, we get a result that is not smaller than what malloc returned.
However, the caller wanted a pointer to _s bytes, and up to
CACHE_LINE_SIZE - 1 bytes have been skipped. To make sure that _s
bytes are actually available, CACHE_LINE_SIZE - 1 should be added to
the argument of malloc.

There are a few problems with this code: First, it wastes memory by
adding CACHE_LINE_SIZE*2 bytes instead of CACHE_LINE_SIZE - 1. That's
not a serious problem, but the next problem is serious: The author
casts a pointer to unsigned long, modifies it, then casts the result
back to a pointer. This will go horribly wrong if unsigned long = 32
bit and a pointer is 64 bit, for example. There is also no guarantee
whatsoever that casting to unsigned long, doing arithmetic, and
casting back will give the result wanted.

The third problem is that after using
the macro, we don't know the value that malloc returned and there is
no way to recover it. Therefore, it is impossible to ever free the
memory that was allocated.

One harmless and two rather serious problems in a very short bit of
code. Oh well.

When you say
It is much safer to calculate how many bytes to add, then cast the
pointer to char* and
add the number of bytes wanted.

do u mean something like ,
size_t alligned_size = _s + ( CACHE_LINE_SIZE - ( _s %
CACHE_LINE_SIZE ) )
aligned_ptr = (char * ) malloc( aligned_size )

Thx
Digz
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,775
Messages
2,569,601
Members
45,182
Latest member
alexanderrm

Latest Threads

Top