Swap two integers without using temporary variable

REH · Nov 29, 2007

I seem to recall some 8086 compilers actually recognizing the
classical swap idiom (with the temporary) and generating the
exch instruction to implement it. Modern x86 compilers don't,
however, probably because on more recent x86 processors, xchg
has an implied lock prefix, which acts as a memory fence (which
in turn means that the instruction is considerably slower than
it would be otherwise).

Click to expand...

I just ran some quick benchmarks on an Intel based Linux machine
here (using g++ 4.1.0, -O3), using the following "swappers":

Click to expand...

struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;

Click to expand...

struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;

Click to expand...

struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,% \n movl %%eax,%
[a]"
: [a] "+m" (a), "+m" (b) : : "%eax" ) ;
}
} ;

Click to expand...

What's the advantage of using struct and operator(), instead of simply:

void swapperX(int& a, int& b) ?

Better chances for optimization with templates like std::foreach, et.
al.

REH

James Kanze · Nov 30, 2007

I should have been more precise: on a 32 bit machine, it is
always a rotation (or it is always a rotation over the low 32
bits).

old C books say that it is shift,is the behavoir changed since
then?

You have old C books that say that the expression "(a << 8) | (a

why not to have a shift (<<) operator along with a rotation(<<<)?

I don't know why C doesn't have a rotation operator; it has
everything else. There are a few algorithms where it would be
useful. (And some of those algorithms, such as DES encryption,
were known to the authors of C. If I'm not mistaken, the Unix
crypt() function uses a rotation.)

James Kanze · Nov 30, 2007

[...]

Not sure what James meant by the <<*always* a rotation>> bit.

The expression that Juha presented.

My guess is that in 1970 the machines for which C was
developed didn't have a rotation instruction, and that's why C
didn't get a rotation operator.

I think they did (although it's been a long, long time since I
looked at the machine instructions for a PDP-11). They didn't
have a machine instruction for pre-increment nor for
post-increment, however, but they're both there.

Whether '<<<' would be the right symbol for it, I am not sure.
Triple-character operators are really only 50% different from
their two-character counterparts and would probably be much
less favorable than, say, '<%' or '>%'.

Agreed. Even more favorable might be keywords, or "pre-defined"
functions: ror and rol, for example.

James Kanze · Nov 30, 2007

Yes. I just sort of assumed that (and that everyone else would,
since it seems so obvious).

[snip]

Click to expand...

I don't know about now, but in the olden days, << always 0
filled and >> was machine dependent.

It depends on the type. As a general rule, I only use shifts
and bitwise operators on unsigned types, so there are no
problems.

Juha's expression is wide-spread, and is the idiomatic way to
express a rotation in C/C++. It would, I believe, be
immediately recognized as a rotation by anyone who has worked on
code where rotations were needed. (It's used in the reference
implementations of MD5 and SHA1, for example.)

James Kanze · Nov 30, 2007

The people who thing the XOR behavior is a neat idea totally lose the
concept that it's probably the least efficient way to do so. Even
the presumbalby space inefficent:
t = a; a = b; t = b;
might not actually allocate any memory for the temporary
value, it may in fact just be an transient register that
may have been allocated anyway.
LD R1, a
LD R2, b
ST a, R2
ST b, R1
where the XOR debacle
LD R1, a
LD R2, b
XOR R1,R1,R2
XOR R2,R2,R1
XOR R1,R1,R2
ST a, R1
ST b, R2
even old x86 machines support some 'exg' opcode that swaps
two registers in just one instruction ,so a good compiler
can handle triple assignment much better than you
mentioned in case both ints have register storage.The xor
trick has less chance of such optimization, but it is a
joyfull solution to a programming delima(I like it more
than add/subtract solution).

Click to expand...

I seem to recall some 8086 compilers actually recognizing
the classical swap idiom (with the temporary) and generating
the exch instruction to implement it. Modern x86 compilers
don't, however, probably because on more recent x86
processors, xchg has an implied lock prefix, which acts as a
memory fence (which in turn means that the instruction is
considerably slower than it would be otherwise).
I just ran some quick benchmarks on an Intel based Linux
machine here (using g++ 4.1.0, -O3), using the following
"swappers":
struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;
struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;
struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,% \n movl %%eax,%
[a]"
: [a] "+m" (a), "+m" (b) : : "%eax" ) ;
}
} ;
On this particular machine (not sure of its actual spec's,
which processor or the clock frequency), I got:
ns per machine memory
iter. instr. accesses
SwapperClassic: 1.7 5 5
SwapperXor: 2.5 9 7
SwapperAsm: 84.4 4 4
Tests run on 500 million iterations. In this case, the actual
function was invoked within a virtual member function, and
swapped two member variables. And the last two columns do not
include the standard function prefix or postfix.
A quick glance at the generated assembler showed that none of
the three versions used any local variables.
And as you can see, the cost of the memory fence due to the
implicit lock prefix on the xchgl instruction is very, very
high. The generated code uses one less instruction, and has
one less memory access, but requires roughly 50 times more
time to run.

Click to expand...

I wrote that **if both ints have register storage
class.Exchanging to memory **has to lock** the bus (I would be
suprised otherwise)for it is the simplest facility to handle a
semaphor.

Click to expand...

There is no such thing as "register storage class" in C++. If
you have to read both int's into a register, then the xchg
instruction is just an extra, unneeded instruction (on most
machines, anyway). And there is no real reason why xchg should
lock the bus, any more that incr or decr or any other
instruction which might require two (or more) memory accesses,
and even less reason why it should implement a memory fence.
You already have separate instructions/prefixes for those
things, which can be used in the cases where they are needed.
(The original 8086, and at least through the 80386, did not lock
the bus, nor does the normal exchange instruction on any other
architecture I've used.) And for most multi-thread algorithms,
xchg isn't enough; you need a cas instruction (or some other
combinations of instructions: g++ uses "lock add" for its atomic
increment, for example---which can't be implemented using xchg).

James Kanze · Nov 30, 2007

James Kanze wrote:

[...]

I just ran some quick benchmarks on an Intel based Linux machine
here (using g++ 4.1.0, -O3), using the following "swappers":
struct SwapperClassic
{
void operator()( int& a, int& b )
{
int tmp = a ;
a = b ;
b = tmp ;
}
} ;
struct SwapperXor
{
void operator()( int& a, int& b )
{
a ^= b ;
b ^= a ;
a ^= b ;
}
} ;
struct SwapperAsm
{
void operator()( int& a, int& b )
{
asm ( "movl %[a], %%eax\n xchgl %%eax,% \n movl %%eax,%
[a]"
: [a] "+m" (a), "+m" (b) : : "%eax" ) ;
}
} ;

Click to expand...

What's the advantage of using struct and operator(), instead
of simply:

Click to expand...

void swapperX(int& a, int& b) ?

Click to expand...

It results in different types, to instantiation a template which
actually carries out the benchmarks.

It's true that in this case, I could have just as easily used a
non-type parameter for the template, but it's a pretty strong
habit for me to use a type argument here. (Presumably, the
compiler would have generated the same code in both cases.)

swap two integers without using a tmp variable?	50	Jun 22, 2004
Using different iterator types in subclasses without breaking theinheritance mechanism	4	Feb 17, 2012
Where do two linked lists merge?	0	Jan 26, 2008
having "throw" write to a file before actually throwing	13	Mar 3, 2006
Boost::Asio synchronous I/O operation with timeout	0	Apr 14, 2012
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 1, 2008
Incremental build systems, infamake	20	Jul 25, 2011
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008

Swap two integers without using temporary variable

REH

James Kanze

James Kanze

James Kanze

James Kanze

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads