After another two hours work, now I'm 99% sure it's a bug in gcc
4.4.0 when optimizing for function with "stdcall" calling convention.
Below is a piece of disassemble code (dump from OllyDbg, I don't
know how to use Code::Blocks to debug efficiently).
004067EA > CC int3
004067EB . C74424 78 CF07>mov [dword ss:esp+78],7CF
004067F3 . 8B4C24 28 mov ecx,[dword ss:esp+28]
004067F7 . C641 08 00 mov [byte ds:ecx+8],0
004067FB . 8B39 mov edi,[dword ds:ecx]
004067FD . 897C24 2C mov [dword ss:esp+2C],edi
00406801 . 89F8 mov eax,edi
00406803 . 39F9 cmp ecx,edi
00406805 . 0F84 5B010000 je RunTestC.00406966
0040680B . 8D4424 48 lea eax,[dword ss:esp+48]
0040680F . 894424 1C mov [dword ss:esp+1C],eax
00406813 . 90 nop
00406814 > 8B5424 2C mov edx,[dword ss:esp+2C]
00406818 . 8B42 08 mov eax,[dword ds:edx+8]
0040681B . 85C0 test eax,eax
0040681D . 74 10 je short RunTestC.0040682F
0040681F . 8B10 mov edx,[dword ds:eax]
00406821 . 8D4C24 78 lea ecx,[dword ss:esp+78]
00406825 . 894C24 04 mov [dword ss:esp+4],ecx
00406829 . 890424 mov [dword ss:esp],eax
0040682C . FF52 10 call [dword ds:edx+10]
0040682F > 8B7C24 2C mov edi,[dword ss:esp+2C]
00406833 . 8B47 18 mov eax,[dword ds:edi+18]
Note the call at address 0040682C, it's a call to a member function,
void __stdcall callback1(int n) const {
(void)n;
TS_TRACE("callback1");
}
And note that in the end of callback1, there is a ret instruction
which is "retn 8", which is a normal return instruction for stdcall.
However, the problem is, among all of the above disassemble code,
there is no any normal "push" instruction to pass the arguments.
Instead of that, the compiler uses existing stack frame to pass the
argument, see address 004067EB, 7CF is the argument 1999 I passed
to the callback1.
As as stdcall, after the call, at the address 0040682F, ESP should
be same as at the address 00406829 so the stack is balanced.
But due to the "retn 8" I mentioned above, the stack gets unbalanced
and then at address 0040682F EDI gets wrong value.
I also examined GCC 4.5.2, the code of callback1 it generated, the last
return instruction is "retn", as if it's not a stdcall, so there is
no problem (of course other code is also different with 4.4.0).
Since this bug had been fixed in newer GCC, we can ignore this bug.
But I'm happy to spend two "two hours" to learn and verify that,
1, My code is safe and has no the potential nasty bugs I suspected,
2, Learn a little bit how C++ compilers optimize the code (the
disassemble code is too hard to read).
I think I should read the release notes of past GCC versions when
I have another two hours.