simd-style coding

Discussion in 'C Programming' started by ben-goldberg@hotmail.com, Feb 7, 2013.

  1. Guest

    I've got some code which was written to make use of some integers as vectors of bits, which deals with only a couple of bits at a time.

    I'd like to speed it up, and use whole words at a time... similar to how SIMD coding works.

    Here's some pseudocode for how the original bit-at-a-time code works:

    outer_loop_start:

    advance_lfsr();
    old_bits[0] = lfsr_lsb();
    have_old[0] = true;
    advance_lfsr();
    new_bit = lfsr_lsb();
    i = 0;

    inner_loop_start:

    next_bit = new_bit != old_bits;
    if( next_bit ) output(new_bit);
    new_bit = next_bit;
    have_old = false;
    i = i + 1;
    if( i >= M ) goto outer_loop_start;

    if( have_old ) goto inner_loop_start;

    have_old = true;
    old_bits = new_bit;

    goto outer_loop_start;

    An LFSR is a Linear Feedback Shift Register, a type of pseudo-random bit generator which produces statistically decent output. This pseudocode takes the bits and hopefully makes them into something cryptographically secure.

    But if I can't make it go fast, then it's not worthwhile to try to cryptanalyse it, because there are so many fast, well studied (and believed to be secure) algorithms out there.

    So, I want to completely eliminate the inner loop, by keeping an extra unsigned int, and replacing the current inner loop with a few operations which perform, in effect, 32 simultaneous passes through the original inner loop using ands, ors, nots, xors, etc.

    What I have so far is:


    static unsigned lfsr = 1, have_old = 0, old_bits = 0, new_bits = 0;

    void output(unsigned which, unsigned thebits);

    void make_some_bits() {
    unsigned bit = lfsr & 1;
    lfsr = (lfsr >> 1) ^ (0xD0000001u & -(uint32_t)bit);
    old_bits += bit;
    have_old += 1;
    bit = lfsr & 1;
    lfsr = (lfsr >> 1) ^ (0xD0000001u & -(uint32_t)bit);
    new_bits += bit;

    unsigned next_bit = new_bits ^ old_bits;
    unsigned output_these = have_old & next_bit;
    output( output_these, new_bits );
    new_bits = next_bit << 1;

    /* Help? */
    }

    void output(unsigned which, unsigned thebits) {
    while( which ) {
    int next = which & (which-1);
    int lsb = which - next;
    printf("%d", (thebits & lsb) ? 1 : 0 );
    which = next;
    }
    }

    But I'm not sure what to do next.
    , Feb 7, 2013
    #1
    1. Advertising

  2. Shao Miller Guest

    On 2/7/2013 01:02, wrote:
    > I've got some code which was written to make use of some integers as vectors of bits, which deals with only a couple of bits at a time.
    >


    Ok.

    > I'd like to speed it up, and use whole words at a time... similar to how SIMD coding works.
    >


    C doesn't define the speed of operations. If you want extremes of
    speed, you probably don't want to code in C. In C, we can talk about
    minimizing the number of side effects and sequence points, but these
    aren't exactly related to speed.

    > Here's some pseudocode for how the original bit-at-a-time code works:
    >
    > outer_loop_start:
    >
    > advance_lfsr();
    > old_bits[0] = lfsr_lsb();
    > have_old[0] = true;
    > advance_lfsr();
    > new_bit = lfsr_lsb();
    > i = 0;
    >
    > inner_loop_start:
    >
    > next_bit = new_bit != old_bits;
    > if( next_bit ) output(new_bit);
    > new_bit = next_bit;
    > have_old = false;
    > i = i + 1;
    > if( i >= M ) goto outer_loop_start;
    >
    > if( have_old ) goto inner_loop_start;
    >
    > have_old = true;
    > old_bits = new_bit;
    >
    > goto outer_loop_start;
    >
    > An LFSR is a Linear Feedback Shift Register, a type of pseudo-random bit generator which produces statistically decent output. This pseudocode takes the bits and hopefully makes them into something cryptographically secure.
    >
    > But if I can't make it go fast, then it's not worthwhile to try to cryptanalyse it, because there are so many fast, well studied (and believed to be secure) algorithms out there.
    >


    Are you thinking about gathering statistics of the code's output and
    scrutinizing those statistics? Is a mathematical analysis (without C
    code involvement) not possible?

    > So, I want to completely eliminate the inner loop, by keeping an extra unsigned int, and replacing the current inner loop with a few operations which perform, in effect, 32 simultaneous passes through the original inner loop using ands, ors, nots, xors, etc.
    >


    You might be interested in "loop unrolling". Is that what you're
    talking about?

    > What I have so far is:
    >
    >
    > static unsigned lfsr = 1, have_old = 0, old_bits = 0, new_bits = 0;
    >
    > void output(unsigned which, unsigned thebits);
    >
    > void make_some_bits() {
    > unsigned bit = lfsr & 1;
    > lfsr = (lfsr >> 1) ^ (0xD0000001u & -(uint32_t)bit);
    > old_bits += bit;
    > have_old += 1;
    > bit = lfsr & 1;
    > lfsr = (lfsr >> 1) ^ (0xD0000001u & -(uint32_t)bit);
    > new_bits += bit;
    >
    > unsigned next_bit = new_bits ^ old_bits;
    > unsigned output_these = have_old & next_bit;
    > output( output_these, new_bits );
    > new_bits = next_bit << 1;
    >
    > /* Help? */
    > }
    >
    > void output(unsigned which, unsigned thebits) {
    > while( which ) {
    > int next = which & (which-1);
    > int lsb = which - next;
    > printf("%d", (thebits & lsb) ? 1 : 0 );
    > which = next;
    > }
    > }
    >
    > But I'm not sure what to do next.
    >


    Is the goal to rewrite the original pseudo-code in C, or to speed up
    some existing C code, or neither of these, or both?

    --
    - Shao Miller
    --
    "Thank you for the kind words; those are the kind of words I like to hear.

    Cheerily," -- Richard Harter
    Shao Miller, Feb 7, 2013
    #2
    1. Advertising

  3. Shao Miller <> wrote:
    > On 2/7/2013 01:02, wrote:
    >> I've got some code which was written to make use of some integers as
    >> vectors of bits, which deals with only a couple of bits at a time.


    > Ok.


    >> I'd like to speed it up, and use whole words at a time... similar
    >> to how SIMD coding works.


    > C doesn't define the speed of operations. If you want extremes of
    > speed, you probably don't want to code in C. In C, we can talk about
    > minimizing the number of side effects and sequence points, but these
    > aren't exactly related to speed.


    Well, yes, but if you do bit manipulation sizeof(int)*CHAR_BIT bits
    at a time instead of one bit at a time, it is likely faster.

    (snip)

    >> An LFSR is a Linear Feedback Shift Register, a type
    >> of pseudo-random bit generator which produces statistically
    >> decent output. This pseudocode takes the bits and hopefully
    >> makes them into something cryptographically secure.


    >> But if I can't make it go fast, then it's not worthwhile to try
    >> to cryptanalyse it, because there are so many fast, well studied
    >> (and believed to be secure) algorithms out there.


    The LFSRs used for CRC calculation are commonly done in software
    one byte at a time. That is, the whole 32 bit CRC is processed
    with a table lookup, one 32 bit XOR, and a shift. (Not necessarily
    in that order.)

    Here is a CRC calculating routine and table generating routine.
    Though it is more usual to put in a static 256 entry table
    initialized with constants.

    #define POLYNOMIAL 0x04c11db7L
    static unsigned long crc_table[256];
    void gen_crc_table()
    /* generate the table of CRC remainders for all possible bytes */
    { register int i, j; register unsigned long crc_accum;
    for ( i = 0; i < 256; i++ )
    { crc_accum = ( (unsigned long) i << 24 );
    for ( j = 0; j < 8; j++ )
    { if ( crc_accum & 0x80000000L )
    crc_accum =
    ( crc_accum << 1 ) ^ POLYNOMIAL;
    else
    crc_accum =
    ( crc_accum << 1 ); }
    crc_table = crc_accum; }
    return; }

    unsigned long update_crc(unsigned long crc_accum, char *data_blk_ptr,
    int data_blk_size)
    /* update the CRC on the data block one byte at a time */
    { register int i, j;
    for ( j = 0; j < data_blk_size; j++ )
    { i = ( (int) ( crc_accum >> 24) ^ *data_blk_ptr++ ) & 0xff;
    crc_accum = ( crc_accum << 8 ) ^ crc_table; }
    return crc_accum; }

    -- glen
    glen herrmannsfeldt, Feb 7, 2013
    #3
  4. Guest

    On Thursday, February 7, 2013 2:00:31 AM UTC-5, Shao Miller wrote:
    > On 2/7/2013 01:02, wrote:
    >
    > > I've got some code which was written to make use of some integers
    > > as vectors of bits, which deals with only a couple of bits at a time.

    >
    >
    > Ok.
    >
    >
    > > I'd like to speed it up, and use whole words at a
    > > time... similar to how SIMD coding works.

    >
    > C doesn't define the speed of operations. If you want extremes of
    > speed, you probably don't want to code in C. In C, we can talk about
    > minimizing the number of side effects and sequence points, but these
    > aren't exactly related to speed.


    True, C doesn't define how long any particular operation takes.
    But generally, fewer operations takes less time.

    For example, suppose my original code were this:

    for( int i = 0; i < sizeof(unsigned)*CHAR_BIT; ++i ) {
    int bit_from_a = (a >> i) & 1;
    int bit_from_b = (b >> i) & 1;
    unsigned bit_for_c = bit_from_a ^ bit_from_b;
    c = (c & ~(1<<i)) + (bit_for_c << i);
    }

    and I replaced it with this:
    c = a ^ b;

    The latter is unquestionably faster, because the compiler isn't smart
    enough to convert the first into the second.

    > > Here's some pseudocode for how the original bit-at-a-time code works:

    >

    [snip]
    > > An LFSR is a Linear Feedback Shift Register, a type of pseudo-random bit
    > > generator which produces statistically decent output. This pseudocode
    > > takes the bits and hopefully makes them into something cryptographically
    > > secure.

    >
    > >

    >
    > > But if I can't make it go fast, then it's not worthwhile to try to
    > > cryptanalyse it, because there are so many fast, well studied (and believed
    > > to be secure) algorithms out there.

    >
    >
    > Are you thinking about gathering statistics of the code's output and
    > scrutinizing those statistics?


    Oh, I've already done that. I know that it's output is statistically good.

    And while a statistically bad pseudorandom generator makes a bad cipher,
    having a statistically good pseudorandom generator doesn't guarantee
    cryptographic goodness.

    For example, MT19937 produces statistically excellent output,
    and it is very fast, but you can't use it (by itself) as a stream
    cipher, since one can very easily use a known plaintext attack
    compute MT's internal state.

    > Is a mathematical analysis (without C
    > code involvement) not possible?


    Determining whether something is cryptographically secure does not require C code, you're quite right in your implication.

    However, cryptographic analysis is difficult in general, and quite time consuming.

    But even supposing I do analyse the algorithm, and find it to be secure, it's going to be fairly useless if I can't make it run faster than the BBS cipher.

    > > So, I want to completely eliminate the inner loop, by keeping an extra
    > > unsigned int, and replacing the current inner loop with a few operations
    > > which perform, in effect, 32 simultaneous passes through the original inner
    > > loop using ands, ors, nots, xors, etc.

    >
    >
    > You might be interested in "loop unrolling". Is that what you're
    > talking about?


    Simple loop unrolling would convert the inner loop of the algorithm
    into 32 or 64 repetitions of the inside of that loop.

    What I want is to eliminate the inner loop entirely,
    in the same way the example loop I showed earlier was
    converted to the much, much, simpler code, c = a ^ b.


    > > What I have so far is:

    >

    [snip]
    >
    > Is the goal to rewrite the original pseudo-code in C, or to speed up
    > some existing C code, or neither of these, or both?


    I already have some code in C, which does the same as the pseudo-code.

    Here's a working code snippet:
    http://codepad.org/LRZ5fmWR

    But it's still much much much too slow to be a practical cipher.

    I want to turn the entire inner loop into three
    or four xors, ands, etc, with no loop at all, and
    no branching in the code that replaces that loop.

    I'm perfectly willing to have the output bits come
    out in a different order, if that's necessary for the
    code to be sped up.
    , Feb 8, 2013
    #4
  5. wrote:

    > [ actual C question snipped ]
    >
    > An LFSR is a Linear Feedback Shift Register, a type of pseudo-random
    > bit generator which produces statistically decent output. This
    > pseudocode takes the bits and hopefully makes them into something
    > cryptographically secure.


    No! These things are not designed for cryptography.

    For the C question I was going to suggest you look at CRC routines
    on the net, but Glen already did --and provided code. If I remember
    correctly there is also a good discussion in the Wikipedia article
    (which one is for you to search !)

    --
    pa at panix dot com
    Pierre Asselin, Feb 8, 2013
    #5
  6. On Wed, 06 Feb 2013 22:02:20 -0800, ben-goldberg wrote:

    > I've got some code which was written to make use of some integers as
    > vectors of bits, which deals with only a couple of bits at a time.
    >
    > I'd like to speed it up, and use whole words at a time... similar to how
    > SIMD coding works.


    <snip>
    > But I'm not sure what to do next.


    As your algorithm works on a different number of bits each time around
    the outer loop, your best bet for letting it operate on words (or bytes)
    at a time is to use a helper table that stores the values of the masks
    and shifts needed for each iteration to compare the right number of bits
    and to insert the new bit in the right location.

    Bart v Ingen Schenau
    Bart van Ingen Schenau, Feb 8, 2013
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    658
  2. Glen Low
    Replies:
    0
    Views:
    339
    Glen Low
    Feb 14, 2005
  3. calmar
    Replies:
    11
    Views:
    777
    calmar
    Feb 21, 2006
  4. Daniel

    SIMD?

    Daniel, Mar 29, 2006, in forum: C++
    Replies:
    2
    Views:
    470
  5. Bytter

    SIMD powered Python

    Bytter, Jun 23, 2007, in forum: Python
    Replies:
    4
    Views:
    1,298
    Paul Rubin
    Jun 23, 2007
Loading...

Share This Page