Swap two integers without using temporary variable

R

REH

I seem to recall some 8086 compilers actually recognizing the
classical swap idiom (with the temporary) and generating the
exch instruction to implement it. Modern x86 compilers don't,
however, probably because on more recent x86 processors, xchg
has an implied lock prefix, which acts as a memory fence (which
in turn means that the instruction is considerably slower than
it would be otherwise).
I just ran some quick benchmarks on an Intel based Linux machine
here (using g++ 4.1.0, -O3), using the following "swappers":
struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;
struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;
struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,% \n movl %%eax,%
[a]"
: [a] "+m" (a), "+m" (b) : : "%eax" ) ;
}
} ;


What's the advantage of using struct and operator(), instead of simply:

void swapperX(int& a, int& b) ?


Better chances for optimization with templates like std::foreach, et.
al.

REH
 
J

James Kanze

I should have been more precise: on a 32 bit machine, it is
always a rotation (or it is always a rotation over the low 32
bits).
old C books say that it is shift,is the behavoir changed since
then?

You have old C books that say that the expression "(a << 8) | (a
why not to have a shift (<<) operator along with a rotation(<<<)?

I don't know why C doesn't have a rotation operator; it has
everything else. There are a few algorithms where it would be
useful. (And some of those algorithms, such as DES encryption,
were known to the authors of C. If I'm not mistaken, the Unix
crypt() function uses a rotation.)
 
J

James Kanze

[...]
Not sure what James meant by the <<*always* a rotation>> bit.

The expression that Juha presented.
My guess is that in 1970 the machines for which C was
developed didn't have a rotation instruction, and that's why C
didn't get a rotation operator.

I think they did (although it's been a long, long time since I
looked at the machine instructions for a PDP-11). They didn't
have a machine instruction for pre-increment nor for
post-increment, however, but they're both there.
Whether '<<<' would be the right symbol for it, I am not sure.
Triple-character operators are really only 50% different from
their two-character counterparts and would probably be much
less favorable than, say, '<%' or '>%'.

Agreed. Even more favorable might be keywords, or "pre-defined"
functions: ror and rol, for example.
 
J

James Kanze

Yes. I just sort of assumed that (and that everyone else would,
since it seems so obvious).
I don't know about now, but in the olden days, << always 0
filled and >> was machine dependent.

It depends on the type. As a general rule, I only use shifts
and bitwise operators on unsigned types, so there are no
problems.

Juha's expression is wide-spread, and is the idiomatic way to
express a rotation in C/C++. It would, I believe, be
immediately recognized as a rotation by anyone who has worked on
code where rotations were needed. (It's used in the reference
implementations of MD5 and SHA1, for example.)
 
J

James Kanze

The people who thing the XOR behavior is a neat idea totally lose the
concept that it's probably the least efficient way to do so. Even
the presumbalby space inefficent:
t = a; a = b; t = b;
might not actually allocate any memory for the temporary
value, it may in fact just be an transient register that
may have been allocated anyway.
LD R1, a
LD R2, b
ST a, R2
ST b, R1
where the XOR debacle
LD R1, a
LD R2, b
XOR R1,R1,R2
XOR R2,R2,R1
XOR R1,R1,R2
ST a, R1
ST b, R2
even old x86 machines support some 'exg' opcode that swaps
two registers in just one instruction ,so a good compiler
can handle triple assignment much better than you
mentioned in case both ints have register storage.The xor
trick has less chance of such optimization, but it is a
joyfull solution to a programming delima(I like it more
than add/subtract solution).
I seem to recall some 8086 compilers actually recognizing
the classical swap idiom (with the temporary) and generating
the exch instruction to implement it. Modern x86 compilers
don't, however, probably because on more recent x86
processors, xchg has an implied lock prefix, which acts as a
memory fence (which in turn means that the instruction is
considerably slower than it would be otherwise).
I just ran some quick benchmarks on an Intel based Linux
machine here (using g++ 4.1.0, -O3), using the following
"swappers":
struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;
struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;
struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,% \n movl %%eax,%
[a]"
: [a] "+m" (a), "+m" (b) : : "%eax" ) ;
}
} ;
On this particular machine (not sure of its actual spec's,
which processor or the clock frequency), I got:
ns per machine memory
iter. instr. accesses
SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7
SwapperAsm: 84.4 4 4
Tests run on 500 million iterations. In this case, the actual
function was invoked within a virtual member function, and
swapped two member variables. And the last two columns do not
include the standard function prefix or postfix.
A quick glance at the generated assembler showed that none of
the three versions used any local variables.
And as you can see, the cost of the memory fence due to the
implicit lock prefix on the xchgl instruction is very, very
high. The generated code uses one less instruction, and has
one less memory access, but requires roughly 50 times more
time to run.

I wrote that **if both ints have register storage
class.Exchanging to memory **has to lock** the bus (I would be
suprised otherwise)for it is the simplest facility to handle a
semaphor.

There is no such thing as "register storage class" in C++. If
you have to read both int's into a register, then the xchg
instruction is just an extra, unneeded instruction (on most
machines, anyway). And there is no real reason why xchg should
lock the bus, any more that incr or decr or any other
instruction which might require two (or more) memory accesses,
and even less reason why it should implement a memory fence.
You already have separate instructions/prefixes for those
things, which can be used in the cases where they are needed.
(The original 8086, and at least through the 80386, did not lock
the bus, nor does the normal exchange instruction on any other
architecture I've used.) And for most multi-thread algorithms,
xchg isn't enough; you need a cas instruction (or some other
combinations of instructions: g++ uses "lock add" for its atomic
increment, for example---which can't be implemented using xchg).
 
J

James Kanze

James Kanze wrote:

[...]
I just ran some quick benchmarks on an Intel based Linux machine
here (using g++ 4.1.0, -O3), using the following "swappers":
struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;
struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;
struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,% \n movl %%eax,%
[a]"
: [a] "+m" (a), "+m" (b) : : "%eax" ) ;
}
} ;

What's the advantage of using struct and operator(), instead
of simply:
void swapperX(int& a, int& b) ?

It results in different types, to instantiation a template which
actually carries out the benchmarks.

It's true that in this case, I could have just as easily used a
non-type parameter for the template, but it's a pretty strong
habit for me to use a type argument here. (Presumably, the
compiler would have generated the same code in both cases.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top