B
ben-goldberg
I've got some code which was written to make use of some integers as vectors of bits, which deals with only a couple of bits at a time.
I'd like to speed it up, and use whole words at a time... similar to how SIMD coding works.
Here's some pseudocode for how the original bit-at-a-time code works:
outer_loop_start:
advance_lfsr();
old_bits[0] = lfsr_lsb();
have_old[0] = true;
advance_lfsr();
new_bit = lfsr_lsb();
i = 0;
inner_loop_start:
next_bit = new_bit != old_bits;
if( next_bit ) output(new_bit);
new_bit = next_bit;
have_old = false;
i = i + 1;
if( i >= M ) goto outer_loop_start;
if( have_old ) goto inner_loop_start;
have_old = true;
old_bits = new_bit;
goto outer_loop_start;
An LFSR is a Linear Feedback Shift Register, a type of pseudo-random bit generator which produces statistically decent output. This pseudocode takes the bits and hopefully makes them into something cryptographically secure.
But if I can't make it go fast, then it's not worthwhile to try to cryptanalyse it, because there are so many fast, well studied (and believed to be secure) algorithms out there.
So, I want to completely eliminate the inner loop, by keeping an extra unsigned int, and replacing the current inner loop with a few operations which perform, in effect, 32 simultaneous passes through the original inner loop using ands, ors, nots, xors, etc.
What I have so far is:
static unsigned lfsr = 1, have_old = 0, old_bits = 0, new_bits = 0;
void output(unsigned which, unsigned thebits);
void make_some_bits() {
unsigned bit = lfsr & 1;
lfsr = (lfsr >> 1) ^ (0xD0000001u & -(uint32_t)bit);
old_bits += bit;
have_old += 1;
bit = lfsr & 1;
lfsr = (lfsr >> 1) ^ (0xD0000001u & -(uint32_t)bit);
new_bits += bit;
unsigned next_bit = new_bits ^ old_bits;
unsigned output_these = have_old & next_bit;
output( output_these, new_bits );
new_bits = next_bit << 1;
/* Help? */
}
void output(unsigned which, unsigned thebits) {
while( which ) {
int next = which & (which-1);
int lsb = which - next;
printf("%d", (thebits & lsb) ? 1 : 0 );
which = next;
}
}
But I'm not sure what to do next.
I'd like to speed it up, and use whole words at a time... similar to how SIMD coding works.
Here's some pseudocode for how the original bit-at-a-time code works:
outer_loop_start:
advance_lfsr();
old_bits[0] = lfsr_lsb();
have_old[0] = true;
advance_lfsr();
new_bit = lfsr_lsb();
i = 0;
inner_loop_start:
next_bit = new_bit != old_bits;
if( next_bit ) output(new_bit);
new_bit = next_bit;
have_old = false;
i = i + 1;
if( i >= M ) goto outer_loop_start;
if( have_old ) goto inner_loop_start;
have_old = true;
old_bits = new_bit;
goto outer_loop_start;
An LFSR is a Linear Feedback Shift Register, a type of pseudo-random bit generator which produces statistically decent output. This pseudocode takes the bits and hopefully makes them into something cryptographically secure.
But if I can't make it go fast, then it's not worthwhile to try to cryptanalyse it, because there are so many fast, well studied (and believed to be secure) algorithms out there.
So, I want to completely eliminate the inner loop, by keeping an extra unsigned int, and replacing the current inner loop with a few operations which perform, in effect, 32 simultaneous passes through the original inner loop using ands, ors, nots, xors, etc.
What I have so far is:
static unsigned lfsr = 1, have_old = 0, old_bits = 0, new_bits = 0;
void output(unsigned which, unsigned thebits);
void make_some_bits() {
unsigned bit = lfsr & 1;
lfsr = (lfsr >> 1) ^ (0xD0000001u & -(uint32_t)bit);
old_bits += bit;
have_old += 1;
bit = lfsr & 1;
lfsr = (lfsr >> 1) ^ (0xD0000001u & -(uint32_t)bit);
new_bits += bit;
unsigned next_bit = new_bits ^ old_bits;
unsigned output_these = have_old & next_bit;
output( output_these, new_bits );
new_bits = next_bit << 1;
/* Help? */
}
void output(unsigned which, unsigned thebits) {
while( which ) {
int next = which & (which-1);
int lsb = which - next;
printf("%d", (thebits & lsb) ? 1 : 0 );
which = next;
}
}
But I'm not sure what to do next.