Functional Local Static Zero Initialization - When?

B

Brian Cole

A working draft of the C++ standard I was able to obtain says the
following in section 6.7.4:
The zero-initialization (8.5) of all local objects with static storage
duration (3.7.1) or thread storage duration (3.7.2) is performed
before any other initialization takes place.

First, the only addition for C++0x is the thread storage duration, so
I assume the sentence was the following for previous versions of the
standard:
The zero-initialization (8.5) of all local objects with static storage
duration (3.7.1) is performed before any other initialization takes
place.

The criteria "before any other initialization" is a little ambiguous
here. Does this mean any other initialization inside the function the
static resides, or any other initialization the entire program may
perform.

Basically, I'm trying to implement something like the following to
allow for thread safe function local static initialization while
maintaining proper destructor ordering atexit.

template<class T>
struct Once
{
T *_obj;
long _once;
Once()
{
while (1)
{
long prev = InterlockedCompareExchange(&_once, 1, 0);
if (0 == prev) // got the lock
break;
else if (2 == prev) // The singleton has been initialized.
return _obj;
else {
// Another thread is initializing the singleton: must wait.
assert(1 == prev);
sleep(1); // sleep 1 millisecond
}
}
assert(_obj == 0);
_obj = new T;
InterlockedExchange(&_once, 2);
return _obj;
}

~Once() { delete _obj; }
inline T& operator *() { return *_obj; }
inline T* operator ->() { return _obj; }
inline operator T* () { return operator ->(); }
};

If I can guarantee that the memory of the object is zero-initialized
during "static initialization", then I can safely use that zero value
to do mutual exclusion in the constructor of the object using atomic
operations. And the following code is then safe during either "dynamic
initialization" or multi-threaded execution.

Foo *GetMeyersSingletonFoo()
{
static Once<Foo> foo;
return foo;
}

Thanks, I've been trying to tackle this for months now, and I think
I'm finally on the last steps.
 
C

Chris M. Thomasson

Brian Cole said:
A working draft of the C++ standard I was able to obtain says the
following in section 6.7.4:
The zero-initialization (8.5) of all local objects with static storage
duration (3.7.1) or thread storage duration (3.7.2) is performed
before any other initialization takes place.

First, the only addition for C++0x is the thread storage duration, so
I assume the sentence was the following for previous versions of the
standard:
The zero-initialization (8.5) of all local objects with static storage
duration (3.7.1) is performed before any other initialization takes
place.

The criteria "before any other initialization" is a little ambiguous
here. Does this mean any other initialization inside the function the
static resides, or any other initialization the entire program may
perform.

Basically, I'm trying to implement something like the following to
allow for thread safe function local static initialization while
maintaining proper destructor ordering atexit.

template<class T>
struct Once
{
T *_obj;
long _once;
Once()
{
while (1)
{
long prev = InterlockedCompareExchange(&_once, 1, 0);

[...]

classic problem with CAS docs on Windows: Does
`InterlockedCompareExchange()' always execute a memory barrier when it
encounters a failed operation? If so, where is it _explicitly_ documented?
Perhaps it is implied somewhere in their documentation. Who knows for sure?
Humm...
 
M

Marcel Müller

Hi,
classic problem with CAS docs on Windows: Does
`InterlockedCompareExchange()' always execute a memory barrier when it
encounters a failed operation?

no.

It only synchronizes this one word. It is an exact mapping of the x86
LOCK CMPXCHG instruction. No more, no less.
And on other platforms it is emulated somehow.


Marcel
 
J

James Kanze

A working draft of the C++ standard I was able to obtain says
the following in section 6.7.4:
The zero-initialization (8.5) of all local objects with static
storage duration (3.7.1) or thread storage duration (3.7.2) is
performed before any other initialization takes place.
First, the only addition for C++0x is the thread storage
duration, so I assume the sentence was the following for
previous versions of the standard:
The zero-initialization (8.5) of all local objects with static
storage duration (3.7.1) is performed before any other
initialization takes place.
The criteria "before any other initialization" is a little
ambiguous here. Does this mean any other initialization inside
the function the static resides, or any other initialization
the entire program may perform.

I don't see any ambiguity. "Before any other initialization"
means "before any other initialization".

Of course, if the compiler can determine that a conformant
program cannot see the difference... I rather suspect that no
implementation actually initializes the thread local storage
before the thread using it is created.
Basically, I'm trying to implement something like the
following to allow for thread safe function local static
initialization while maintaining proper destructor ordering
atexit.
template<class T>
struct Once
{
T *_obj;
long _once;
Once()
{
while (1)
{
long prev = InterlockedCompareExchange(&_once, 1, 0);
if (0 == prev) // got the lock
break;
else if (2 == prev) // The singleton has been initialized.
return _obj;
else {
// Another thread is initializing the singleton: must wait.
assert(1 == prev);
sleep(1); // sleep 1 millisecond

That's one second, not one millisecond. At least on Posix
platforms, and I'm pretty sure Windows as well. (There is no
C++ standard function sleep.)
}
}
assert(_obj == 0);
_obj = new T;
InterlockedExchange(&_once, 2);
return _obj;
}
~Once() { delete _obj; }
inline T& operator *() { return *_obj; }
inline T* operator ->() { return _obj; }
inline operator T* () { return operator ->(); }
};
If I can guarantee that the memory of the object is
zero-initialized during "static initialization",

It will be if the object has static storage duration. Otherwise
not.
 
J

James Kanze

Chris M. Thomasson schrieb:

It only synchronizes this one word. It is an exact mapping of
the x86 LOCK CMPXCHG instruction. No more, no less. And on
other platforms it is emulated somehow.

Doesn't the X86 lock prefix force a memory synchronization? I
was under the impression that it did (but I could easily be
mistaken; I've only recently started to do any significant work
on the platform, and have not yet studied this aspect in
detail).
 
J

jason.cipriani

A working draft of the C++ standard I was able to obtain says the
following in section 6.7.4:
The zero-initialization (8.5) of all local objects with static storage
duration (3.7.1) or thread storage duration (3.7.2) is performed
before any other initialization takes place.
First, the only addition for C++0x is the thread storage duration, so
I assume the sentence was the following for previous versions of the
standard:
The zero-initialization (8.5) of all local objects with static storage
duration (3.7.1) is performed before any other initialization takes
place.
The criteria "before any other initialization" is a little ambiguous
here. Does this mean any other initialization inside the function the
static resides, or any other initialization the entire program may
perform.
Basically, I'm trying to implement something like the following to
allow for thread safe function local static initialization while
maintaining proper destructor ordering atexit.
template<class T>
struct Once
{
 T   *_obj;
 long _once;
 Once()
 {
   while (1)
   {
     long prev = InterlockedCompareExchange(&_once, 1, 0);

[...]

classic problem with CAS docs on Windows: Does
`InterlockedCompareExchange()' always execute a memory barrier when it
encounters a failed operation? If so, where is it _explicitly_ documented?
Perhaps it is implied somewhere in their documentation. Who knows for sure?
Humm...

It does. It is explicitly documented here:

http://msdn.microsoft.com/en-us/library/ms683560.aspx

"This function generates a full memory barrier (or fence) to ensure
that memory operations are completed in order."

Starting with Vista (or Server 2003) you can also choose "acquire" vs.
"release" semantics with InterlockedCompareExchangeAcquire/
InterlockedCompareExchangeRelease (and same with the other interlocked
functions):

http://msdn.microsoft.com/en-us/library/ms684122(VS.85).aspx

Jason
 
J

jason.cipriani

I don't see any ambiguity.  "Before any other initialization"
means "before any other initialization".

Of course, if the compiler can determine that a conformant
program cannot see the difference... I rather suspect that no
implementation actually initializes the thread local storage
before the thread using it is created.




That's one second, not one millisecond.  At least on Posix
platforms, and I'm pretty sure Windows as well.  (There is no
C++ standard function sleep.)

There is no "sleep" on Windows. If he meant "Sleep", then it's 1
millisecond (well, more like 50 or so, realistically, depending on the
platform).
 
B

Brian Cole

There is no "sleep" on Windows. If he meant "Sleep", then it's 1
millisecond (well, more like 50 or so, realistically, depending on the
platform).

I did mean millisecond sleep. The code originally called an internal
cross-platform millisecond sleep function. I changed it to just
"sleep" so that everyone else knew the jist of what was going on
there. In fact, there's probably better things to do instead of
"sleep", exponential back-off and such, but this condition is so
rarely encountered it doesn't seem to warrant anything complex.

Thanks for being meticulous though. :)
 
B

Brian Cole

I don't see any ambiguity.  "Before any other initialization"
means "before any other initialization".

I guess the ambiguity is in my own mind fueled by the rest of the
paragraph:
"A local object of trivial or literal type (3.9) with static or thread
storage duration initialized with constant-expressions is initialized
before its
block is ï¬rst entered."

Hinting that the zero-initialization could occur after main is invoked
as long as it's before the function is entered. The next sentence only
says the implementation is "permitted" to perform initialization
before main, doesn't seem to require it:
"An implementation is permitted to perform early initialization of
other local objects with static or thread storage duration under the
same conditions that an implementation is permitted to statically
initialize an object with static or thread storage duration in
namespace scope (3.6.2)."

I am willing to accept that any decent compiler implementation would
zero out all the memory defined for function local statics during
"zero-initialization" since that would be cheaper than doing it during
main. Just wanted to be sure. Any idea what standard this guarantee
first appeared in? I deal with some rather old compilers sometimes.
Of course, if the compiler can determine that a conformant
program cannot see the difference... I rather suspect that no
implementation actually initializes the thread local storage
before the thread using it is created.




That's one second, not one millisecond.  At least on Posix
platforms, and I'm pretty sure Windows as well.  (There is no
C++ standard function sleep.)


It will be if the object has static storage duration.  Otherwise
not.

So the next obvious question is if there is a way I can force users of
the class to always declare it "static" since the implementation will
depend on this condition. Since static is a storage class specifier
and has nothing to do with the type there is no fancy typedef trickery
I could do to catch the following misuse of the class:
Foo *GetMeyersSingletonFoo()
{
Once<Foo> foo;
return foo;
}

The only hope is that during testing that foo would get placed in some
memory on the stack that wasn't already zero'd out, triggering an
assertion in the constructor. Seeing that memory is often zero'd out
for various reasons it seems way to easy for this to fall through
testing and only appear in production down the road.

Can any C++ wizards think of a way to catch this at compile or run
time?

Thanks
 
C

Chris M. Thomasson

It does. It is explicitly documented here:

"This function generates a full memory barrier (or fence) to ensure
that memory operations are completed in order."

You need to read this post from Neill Clift who works/worked on
synchronization issues within the Windows Kernel:


http://groups.google.com/group/comp.programming.threads/msg/c3cdcd25235e5349

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/29ea516c5581240e


He cannot just point to the documentation because it is rather vague. It
says it generates a full barrier, but when? On success for sure, but what
about failure? I am coming from the point of view that atomic RMW
instructions are naked, and one need to explicitly add memory barriers
exactly where there needed. Are they saying that the operation is 100% fully
fenced in:


word CAS(word* pdest, word cmp, word xchg) {
MEMBAR #StoreLoad | #StoreStore | #LoadStore | #LoadLoad;
word* cur;
atomic {
cur = *pdest;
if (cur == cmp) {
*pdest = xchg;
}
}
MEMBAR #StoreLoad | #StoreStore | #LoadStore | #LoadLoad;
return cur;
}


Or is it like:

word CAS(word* pdest, word cmp, word xchg) {
MEMBAR #LoadStore | #StoreStore;
word* cur;
atomic {
cur = *pdest;
if (cur == cmp) {
*pdest = xchg;
}
}
MEMBAR #StoreLoad | #StoreStore;
return cur;
}



Or perhaps optimize for failure case...:

word CAS(word* pdest, word cmp, word xchg) {
MEMBAR #LoadStore | #StoreStore;
word* cur;
atomic {
cur = *pdest;
if (cur == cmp) {
*pdest = xchg;
MEMBAR #StoreLoad | #StoreStore;
}
}
return cur;
}



Anyway, according to Neill, it would be a bug if
`InterlockedCompareExchange()' did not execute a full memory barrier on the
failure case.



Starting with Vista (or Server 2003) you can also choose "acquire" vs.
"release" semantics with InterlockedCompareExchangeAcquire/
InterlockedCompareExchangeRelease (and same with the other interlocked
functions):

That's good, but they can get way more fine-grain wrt memory barrier
support, IMVHO of course...
 
J

James Kanze

I guess the ambiguity is in my own mind fueled by the rest of
the paragraph:
"A local object of trivial or literal type (3.9) with static
or thread storage duration initialized with
constant-expressions is initialized before its block is ï¬rst
entered."
Hinting that the zero-initialization could occur after main is
invoked as long as it's before the function is entered.

Because, basically, a conforming program can't tell the
difference. There's no way to access a local object before the
function has been entered.
The next sentence only says the implementation is "permitted"
to perform initialization before main, doesn't seem to require
it:
"An implementation is permitted to perform early
initialization of other local objects with static or thread
storage duration under the same conditions that an
implementation is permitted to statically initialize an object
with static or thread storage duration in namespace scope
(3.6.2)."
I am willing to accept that any decent compiler implementation
would zero out all the memory defined for function local
statics during "zero-initialization" since that would be
cheaper than doing it during main. Just wanted to be sure. Any
idea what standard this guarantee first appeared in? I deal
with some rather old compilers sometimes.

The rules concerning zero initialization and static
initialization of PODs are taken directly from the C standard,
and go back to Kernighan and Richie. And although I've heard of
some odd pre-standard C compilers which didn't follow them, I
think you can feel safe with any compiler later than about
1985/1990, and with any C++ compiler. (I've used all of the C++
compilers which were available before 1990.)

[...]
So the next obvious question is if there is a way I can force
users of the class to always declare it "static" since the
implementation will depend on this condition.

Attention: what is required is static storage duration. That
has nothing to do with the keyword static. Defining it at
namespace scope is sufficient.
Since static is a storage class specifier and has nothing to
do with the type there is no fancy typedef trickery I could do
to catch the following misuse of the class:
Foo *GetMeyersSingletonFoo()
{
  Once<Foo> foo;
  return foo;
}
The only hope is that during testing that foo would get placed
in some memory on the stack that wasn't already zero'd out,
triggering an assertion in the constructor. Seeing that memory
is often zero'd out for various reasons it seems way to easy
for this to fall through testing and only appear in production
down the road.
Can any C++ wizards think of a way to catch this at compile or
run time?

You can easily require dynamic allocation, by making the
destructor private, but I don't know off hand of any way of
requiring static lifetime. (On some specific machines, I know
ways of catching the error at runtime: on a Sparc under Solaris
or a PC under Linux, for example, there is a global symbol int
end; any address less than the address of this symbol is in
namespace scope. But that doesn't work on many other systems.)
 
J

James Kanze

Chris M. Thomasson schrieb:
you are right. The coherent caches of Intel CPUs do their Job.
I am still unsure whether this was a good idea with respect to
scalability.

There's a lot more that just cache coherency involved. The cost
of the Intel guarantees is negligible as long as all of the
cores are on a single chip. It's very, very costly if they
aren't. The reason why Intel gives this guarantee is that not
giving it would break most Windows software (and Windows is
still the biggest market for x86 processors). The reason why no
one else does is that it doesn't scale, and mutli-core using
multiple chips has been state of the art for at least ten years
now.
 
M

Marcel Müller

James said:
There's a lot more that just cache coherency involved. The cost
of the Intel guarantees is negligible as long as all of the
cores are on a single chip. It's very, very costly if they
aren't.

I know. Syncronization over distance is costly.

Einsteins theory of relativity sets the limit in the way that events,
that are farther in space than in time compared to the speed of light,
are not related in any way. For bidirectional synchronisation it is only
half of the distance.

So within an 1GHz clock cycle you can synchronize at most over a
distance of 15 cm. Taking into account the reduced velocity of
propagation in condutors, the fact that the connection is not straight
and that electronic components are part of the game, you can certainly
reduce this distance to about 5cm.


Marcel
 
B

Brian Cole

I guess the ambiguity is in my own mind fueled by the rest of
the paragraph:
"A local object of trivial or literal type (3.9) with static
or thread storage duration initialized with
constant-expressions is initialized before its block is ï¬rst
entered."
Hinting that the zero-initialization could occur after main is
invoked as long as it's before the function is entered.

Because, basically, a conforming program can't tell the
difference.  There's no way to access a local object before the
function has been entered.
The next sentence only says the implementation is "permitted"
to perform initialization before main, doesn't seem to require
it:
"An implementation is permitted to perform early
initialization of other local objects with static or thread
storage duration under the same conditions that an
implementation is permitted to statically initialize an object
with static or thread storage duration in namespace scope
(3.6.2)."
I am willing to accept that any decent compiler implementation
would zero out all the memory defined for function local
statics during "zero-initialization" since that would be
cheaper than doing it during main. Just wanted to be sure. Any
idea what standard this guarantee first appeared in? I deal
with some rather old compilers sometimes.

The rules concerning zero initialization and static
initialization of PODs are taken directly from the C standard,
and go back to Kernighan and Richie.  And although I've heard of
some odd pre-standard C compilers which didn't follow them, I
think you can feel safe with any compiler later than about
1985/1990, and with any C++ compiler.  (I've used all of the C++
compilers which were available before 1990.)

    [...]
So the next obvious question is if there is a way I can force
users of the class to always declare it "static" since the
implementation will depend on this condition.

Attention: what is required is static storage duration.  That
has nothing to do with the keyword static.  Defining it at
namespace scope is sufficient.


Since static is a storage class specifier and has nothing to
do with the type there is no fancy typedef trickery I could do
to catch the following misuse of the class:
Foo *GetMeyersSingletonFoo()
{
  Once<Foo> foo;
  return foo;
}
The only hope is that during testing that foo would get placed
in some memory on the stack that wasn't already zero'd out,
triggering an assertion in the constructor. Seeing that memory
is often zero'd out for various reasons it seems way to easy
for this to fall through testing and only appear in production
down the road.
Can any C++ wizards think of a way to catch this at compile or
run time?

You can easily require dynamic allocation, by making the
destructor private, but I don't know off hand of any way of
requiring static lifetime.  (On some specific machines, I know
ways of catching the error at runtime: on a Sparc under Solaris
or a PC under Linux, for example, there is a global symbol int
end; any address less than the address of this symbol is in
namespace scope.  But that doesn't work on many other systems.)

Since we always test on Linux it is better than nothing, catching %99
of the misuses. I can't find any references to the "end" pointer,
could you give me a link or the magic google phrase? And since it is
only an "int" is there a 64-bit safe way to do it. Or the assumption
is uninitialized data can always fit in the first 2GB of memory?

Thanks,
Brian
 
J

James Kanze

On Dec 6, 3:29 am, James Kanze <[email protected]> wrote:

[...]
Since we always test on Linux it is better than nothing,
catching %99 of the misuses. I can't find any references to
the "end" pointer, could you give me a link or the magic
google phrase?

That's a good question. The automatic definition of a symbol
end, at the end of data, is just one of those things that's
always been there, since I started working in Unix. I think it
was documented in the ld documentation in version 7, but I've
not looked since (and I may have just learned it from a
collegue, rather than from the man page). I've never heard of a
Unix system without it, but it is really only useful if the
memory layout puts all of the static data at the lowest possible
addresses (which is the case for Solaris on Sparc and for Linux
on PC). At least under Linux, it seems to be defined as a weak
symbol; if your code has a symbol named end, it won't be double
defined. Sun's documentation (_Linker and Libraries Guide_,
chapter 2) gives a list of automatically created reserved
symbols, including _etext (the end of code), _edata (the end of
initialized data) and _end or _END_; historically C symbols were
prefixed with a _ for the linker (but I don't know if this is
still the case). At any rate, a quick check shows that both end
and _end work, both under Linux and Solaris. The Solaris
documentation concerning _START_ says that "together with _END_,
provides a means of establishing an objects's address range."
Which from context, I interpret to mean that all objects with
static lifetime will be in the range _START_..._END_.
Regretfully, a quick test shows that Linux doesn't support
_END_. (I like the idea of using a name in the implementation's
name space for this.)
And since it is only an "int" is there a 64-bit safe way to do
it. Or the assumption is uninitialized data can always fit in
the first 2GB of memory?

The type is irrelevant; traditionally, I've always seen it
declared int, but that's probably because way back then, int was
the default type; it wouldn't surprise me to see it declared:
extern end ;
in some early C code. The linker does no type checking on
variables, there's no actual object associated with it, and all
you can legally do with the symbol is take its address. So:

if ( p < &end ) {
// p points to object with static lifetime...
}

Note too that all of this probably only holds for statically
linked modules. I would not expect it to hold for anything
dynamically linked.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top