Fast Bit Operations Contest

S

Skybuck Flying

Hello,

The objective of this contest is to set and clear a bit at an arbitrary
memory address as fast as possible.

Implement one or multiple prototypes to take part in the contest:

// 8 bit versions
// 1 in 256 bits
procedure SetBit_8bit( BaseAddress : pointer; BitIndex : byte );
function GetBit_8bit( BaseAddress : pointer; BitIndex : byte ) : boolean;
procedure ClearBit_8bit( BaseAddress : pointer; BitIndex : byte );
procedure XorBit_8bit( BaseAddress : pointer; BitIndex : int64 );

// 16 bit versions
// 1 in 2^16 bits
procedure SetBit_16bit( BaseAddress : pointer; BitIndex : word );
function GetBit_16bit( BaseAddress : pointer; BitIndex : word ) : boolean;
procedure ClearBit_16bit( BaseAddress : pointer; BitIndex : word );
procedure XorBit_16bit( BaseAddress : pointer; BitIndex : word );

// 32 bit versions
// 1 in 2^32 bits
procedure SetBit_32bit( BaseAddress : pointer; BitIndex : longword );
function GetBit_32bit( BaseAddress : pointer; BitIndex : longword ) :
boolean;
procedure ClearBit_32bit( BaseAddress : pointer; BitIndex : longword );
procedure XorBit_32bit( BaseAddress : pointer; BitIndex : longword );

// 64 bit versions
// 1 in 2^32 * 8 bits
procedure SetBit_64bit( BaseAddress : pointer; BitIndex : int64 );
function GetBit_64bit( BaseAddress : pointer; BitIndex : int64 ) : boolean;
procedure ClearBit_64bit( BaseAddress : pointer; BitIndex : int64 );
procedure XorBit_64bit( BaseAddress : pointer; BitIndex : int64 );

All implementations will be bug tested, examined, and performance tested.

Code can be in C, Delphi or Assembler.

Code should work on 286 and above.

( Hint: 286 code will run on 386,486,pentium, etc <- backwards compatible)

( Hint: Single 386 BTS instruction is slow )

( Hint: I already have a pretty fast 8 bit implementation )

Processor specific code is to be mentioned and still interesting.

Processor detection code is also interesting.

Results will be posted along the way.

Multiple entries/versions are allowed.

Final results will be posted as well.

Bye,
Skybuck.
 
S

Skybuck Flying

In case MMX instructions can also speed it up here is a possible MMX
prototype:

procedure SetFourBits_8bit(
BaseAddress1 : pointer; BitIndex1 : byte;
BaseAddress2 : pointer; BitIndex2 : byte;
BaseAddress3 : pointer; BitIndex3 : byte;
BaseAddress4 : pointer; BitIndex4 : byte );

etc...

Bye,
Skybuck.
 
S

Skybuck Flying

Skybuck Flying said:
In case MMX instructions can also speed it up here is a possible MMX
prototype:

procedure SetFourBits_8bit(
BaseAddress1 : pointer; BitIndex1 : byte;
BaseAddress2 : pointer; BitIndex2 : byte;
BaseAddress3 : pointer; BitIndex3 : byte;
BaseAddress4 : pointer; BitIndex4 : byte );

Euhm MMX register 64 bit... means 8 bytes so it could be:

procedure SetFourBits_8bit(
BaseAddress1 : pointer; BitIndex1 : byte;
BaseAddress2 : pointer; BitIndex2 : byte;
BaseAddress3 : pointer; BitIndex3 : byte;
BaseAddress4 : pointer; BitIndex4 : byte);
BaseAddress5 : pointer; BitIndex5 : byte;
BaseAddress6 : pointer; BitIndex6 : byte;
BaseAddress7 : pointer; BitIndex7 : byte;
BaseAddress8 : pointer; BitIndex8 : byte );

8 random access bits in memory ;)

Bye,
Skybuck.
 
S

Skybuck Flying

Or just:

procedure SetFourBits_8bit(
BaseAddress : pointer;
BitIndex0 : byte;
BitIndex1 : byte;
BitIndex2 : byte;
BitIndex3 : byte;
BitIndex4 : byte;
BitIndex5 : byte;
BitIndex6 : byte;
BitIndex7 : byte );

If all bit indexes part of the same memory/data structure ;)

Bye,
Skybuck.
 
S

Skybuck Flying

Skybuck Flying said:
Or just:

procedure SetFourBits_8bit(
BaseAddress : pointer;
BitIndex0 : byte;
BitIndex1 : byte;
BitIndex2 : byte;
BitIndex3 : byte;
BitIndex4 : byte;
BitIndex5 : byte;
BitIndex6 : byte;
BitIndex7 : byte );


Hmmm probably not possible with the current setbit algorithm.

MMX/SSE1/2/3 has no divide or shr instruction which works on bytes !?

Only words supported... there isn't even a divide instruction... hmm.

One could use floating points but that sounds slow.

Or one could still try to do 8 at the same time by using multiple mmx
registers and shifting them...

For now I ll try to implement 4 bit indexes

Bye,
Skybuck.
 
S

Skybuck Flying

Hmmm,

Maybe it's still possible to do a 64 bit shift on all bytes at the same
time.

And later and them with a special mask to clear the upper bits of each byte
to zero ;)

So 8 bit indexes might still be possible ;)

Bye,
Skybuck.
 
S

Skybuck Flying

Holyshit the pand instruction is 128 bit...

and mmx registers/technology is extended in xmm registers and technology...
hmmm

mmx <-> xmm could be confusion but ok...

So maybe with SSE2 is even possible to set 16 bit indexes at the same time.

Holyshit ?!

Except maybe for the final writing to memory on independant memory locations
but still the calculation could be performed at the same time ! :)

Gonna try it :)

Bye,
Skybuck.
 
S

Skybuck Flying

Hello,

Today I made a new xmm version which can set 8 indexes at the same time.

Before I actually coded it I estimated the speed based on the latency in the
optimization manual and I estimated it would be slower than the non-xmm
version.

I still decided to test it to be sure... and indeed the benchmark does show
the xmm version to be slower...

I actually wanted to use a loop to make the code 8 times shorther but
unfortunately the extract word instruction does not except a parameter for
the word index/offset... so all code
had to be repeated 8 times !

The overhead is probably in pushing the xmm data into the stack, loading it,
extracting it etc <- extra instructions needed.

Still interesting to see though, maybe somebody knows some handy to tricks
to speed it up a little further.

I shall also compare it to the other benchmark method from Herbert...
haven't done that yet gonna do it right away ;) :) to see the results =D

type
Txmm8Indexes = packed record
mIndex : packed array[0..7] of word;
end;

type
Txmm8IndexesAndMask = packed record
mMask : packed array[0..7] of word;
end;

procedure SetBit_16bit_8indexes_xmm( BaseAddress : pointer; Xmm8Indexes :
Txmm8Indexes );
const
vXmm8IndexesAndMask : Txmm8IndexesAndMask = ( mMask : (7,7,7,7, 7,7,7,7) );
begin
asm
// step 1. Copy BitIndexes into BytePositions (xmm0) and BitPositions
(xmm1)
movdqu xmm0, Xmm8Indexes
movdqu xmm1, xmm0

// step 2. Divide the BitIndexes in BytePositions(xmm0) by 8 (shr 3) to
get the true BytePositions (xmm0)
psrlw xmm0, $3

// step 3. Mod the BitIndexes in BitPositions(xmm1) by 8 (and 7) to get
the true BitPositions(xmm1)
pand xmm1, vXmm8IndexesAndMask

// optimize step: store BaseAddress in edx
mov edx, BaseAddress

// Repeat 8 times:

// BitIndex 0

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 0

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 0

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 1

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 1

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 1

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 2

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 2

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 2

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 3

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 3

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 3

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 4

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 4

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 4

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 5

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 5

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 5

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 6

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 6

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 6

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 7

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 7

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 7

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch
end;
end;

Bye,
Skybuck.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,202
Latest member
MikoOslo

Latest Threads

Top