Byte array compare - Speed

Discussion in 'C++' started by MrCoder, Jun 6, 2004.

  1. MrCoder

    MrCoder Guest

    Hey guys, my first post on here so I'll just say "Hello everbody!"

    Ok heres my question for you lot.

    Is there a faster way to compare 1 byte array to another?

    This is my current code

    // Check for a match

    match = true;

    for ( int i = 0; i < arraylen; i++)

    {

    if( data != data2 )

    {

    match = false;

    break;

    }

    }



    Thanks in advance guys!!
     
    MrCoder, Jun 6, 2004
    #1
    1. Advertising

  2. MrCoder

    JKop Guest

    MrCoder posted:

    > Hey guys, my first post on here so I'll just say "Hello everbody!"
    >
    > Ok heres my question for you lot.
    >
    > Is there a faster way to compare 1 byte array to another?
    >
    > This is my current code
    >
    > // Check for a match
    >
    > match = true;
    >
    > for ( int i = 0; i < arraylen; i++)
    >
    > {
    >
    > if( data != data2 )
    >
    > {
    >
    > match = false;
    >
    > break;
    >
    > }
    >
    > }
    >
    >
    >
    > Thanks in advance guys!!



    If the array is of a variable length, as in your above example, then I would
    say that Yes, your code is pretty efficent, although maybe there's some
    memory functions I'm unaware of that could do the job better?


    If the array is of constant length, I'd suggest the following:

    Struct ChocolateArray
    {
    int monkey[50];
    };


    int main(void)
    {
    bool match;

    ChocolateArray aa;

    ChocolateArray bb;

    //Mess with the arrays

    if (aa == bb)
    {
    match = true;
    }
    else
    {
    match = false;
    }
    }


    -JKop
     
    JKop, Jun 6, 2004
    #2
    1. Advertising

  3. MrCoder

    ak Guest

    On Sun, 6 Jun 2004 09:23:28 +0000 (UTC), "MrCoder" <> wrote:

    >>Hey guys, my first post on here so I'll just say "Hello everbody!"
    >>
    >>Ok heres my question for you lot.
    >>
    >>Is there a faster way to compare 1 byte array to another?
    >>
    >>This is my current code
    >>
    >>// Check for a match
    >>
    >>match = true;
    >>
    >>for ( int i = 0; i < arraylen; i++)
    >>
    >>{
    >>
    >>if( data != data2 )
    >>
    >>{
    >>
    >>match = false;
    >>
    >>break;
    >>
    >>}
    >>
    >>}
    >>
    >>
    >>
    >>Thanks in advance guys!!
    >>


    memcmp( data, data2, arraylen );
     
    ak, Jun 6, 2004
    #3
  4. MrCoder wrote:
    > Hey guys, my first post on here so I'll just say "Hello everbody!"
    >
    > Ok heres my question for you lot.
    >
    > Is there a faster way to compare 1 byte array to another?
    >
    > This is my current code
    >
    > // Check for a match
    >
    > match = true;
    >
    > for ( int i = 0; i < arraylen; i++)
    >
    > {
    >
    > if( data != data2 )
    >
    > {
    >
    > match = false;
    >
    > break;
    >
    > }
    >
    > }


    On most systems, memcmp may be nicely optimized, it may (on some
    platforms) be inlined by the compiler.

    i.e.

    #include <cstring>
    bool isequal( const char * a, const char * b, int len )
    {
    return ::memcmp( a, b, len ) == 0;
    }

    using gcc: g++ -mtune=i686 -O3 generates:
    memcmp_tst.o: file format elf32-i386

    Disassembly of section .text:

    00000000 <_Z7isequalPKcS0_i>:
    0: 55 push %ebp
    1: 89 e5 mov %esp,%ebp
    3: 83 ec 0c sub $0xc,%esp
    6: 89 75 f8 mov %esi,0xfffffff8(%ebp)
    9: 8b 4d 10 mov 0x10(%ebp),%ecx
    c: 8b 75 08 mov 0x8(%ebp),%esi
    f: 89 7d fc mov %edi,0xfffffffc(%ebp)
    12: 8b 7d 0c mov 0xc(%ebp),%edi
    15: fc cld
    16: 39 c9 cmp %ecx,%ecx
    18: f3 a6 repz cmpsb %es:(%edi),%ds:(%esi)
    1a: 0f 94 c0 sete %al
    1d: 8b 75 f8 mov 0xfffffff8(%ebp),%esi
    20: 8b 7d fc mov 0xfffffffc(%ebp),%edi
    23: 89 ec mov %ebp,%esp
    25: 0f b6 c0 movzbl %al,%eax
    28: 5d pop %ebp
    29: c3 ret


    (Notice no function calls !)

    However, on some architectures (Risc - i.e. mips, sparc, hppa etc) there
    are a couple of different fast paths and it depends alot on the
    processor, cache and usage patterns.

    On 64 bit cpu, you might be able to take advantage of reading in large
    64 bit chunks at a time. Performance optimizations at this level can get
    very involved but unless the profiling indicates that there is a huge
    proportion of time spent in memcmp, it's just not worth your time and
    effort to optimize it.

    memcmp is probably the best choice for a balance of performance and
    portability.
     
    Gianni Mariani, Jun 6, 2004
    #4
  5. MrCoder

    Siemel Naran Guest

    "Gianni Mariani" <> wrote in message news:c9vfo4
    > MrCoder wrote:


    > > match = true;
    > >
    > > for ( int i = 0; i < arraylen; i++)
    > >
    > > {
    > >
    > > if( data != data2 )
    > >
    > > {
    > >
    > > match = false;
    > >
    > > break;
    > >
    > > }



    > #include <cstring>
    > bool isequal( const char * a, const char * b, int len )
    > {
    > return ::memcmp( a, b, len ) == 0;
    > }
    >
    > using gcc: g++ -mtune=i686 -O3 generates:
    > memcmp_tst.o: file format elf32-i386
    >
    > Disassembly of section .text:
    >
    > 00000000 <_Z7isequalPKcS0_i>:
    > 0: 55 push %ebp
    > 1: 89 e5 mov %esp,%ebp
    > 3: 83 ec 0c sub $0xc,%esp
    > 6: 89 75 f8 mov %esi,0xfffffff8(%ebp)
    > 9: 8b 4d 10 mov 0x10(%ebp),%ecx
    > c: 8b 75 08 mov 0x8(%ebp),%esi
    > f: 89 7d fc mov %edi,0xfffffffc(%ebp)
    > 12: 8b 7d 0c mov 0xc(%ebp),%edi
    > 15: fc cld
    > 16: 39 c9 cmp %ecx,%ecx
    > 18: f3 a6 repz cmpsb %es:(%edi),%ds:(%esi)
    > 1a: 0f 94 c0 sete %al
    > 1d: 8b 75 f8 mov 0xfffffff8(%ebp),%esi
    > 20: 8b 7d fc mov 0xfffffffc(%ebp),%edi
    > 23: 89 ec mov %ebp,%esp
    > 25: 0f b6 c0 movzbl %al,%eax
    > 28: 5d pop %ebp
    > 29: c3 ret


    How is the memcmp version different from the original version? Is it
    faster? Sorry, I don't know how to read assembly well.


    > memcmp is probably the best choice for a balance of performance and
    > portability.


    An implementation may specialize std::equal(begin, end, begin2) to call
    std::memcmp for POD types. I don't know how many actually do though. My
    implementation (Borland 6) does not.
     
    Siemel Naran, Jun 6, 2004
    #5
  6. MrCoder

    MrCoder Guest

    Thanks for the answers guys!

    Im going to try it using memcmp shortly and see what sort of speed increase
    I get.
     
    MrCoder, Jun 6, 2004
    #6
  7. MrCoder

    Siemel Naran Guest

    "JKop" <> wrote in message news:7FDwc.1527

    > If the array is of constant length, I'd suggest the following:
    >
    > Struct ChocolateArray
    > {
    > int monkey[50];
    > };
    >
    >
    > int main(void)
    > {
    > bool match;
    >
    > ChocolateArray aa;
    >
    > ChocolateArray bb;
    >
    > //Mess with the arrays
    >
    > if (aa == bb)


    There is no compiler defined operator== for structs or classes. There's
    only operator=, copy constructor, and destructor.
     
    Siemel Naran, Jun 6, 2004
    #7
  8. Siemel Naran wrote:
    > "Gianni Mariani" <> wrote in message news:c9vfo4
    >


    >
    >>#include <cstring>
    >>bool isequal( const char * a, const char * b, int len )
    >>{
    >> return ::memcmp( a, b, len ) == 0;
    >>}
    >>
    >>using gcc: g++ -mtune=i686 -O3 generates:
    >>memcmp_tst.o: file format elf32-i386
    >>
    >>Disassembly of section .text:
    >>
    >>00000000 <_Z7isequalPKcS0_i>:
    >> 0: 55 push %ebp
    >> 1: 89 e5 mov %esp,%ebp
    >> 3: 83 ec 0c sub $0xc,%esp
    >> 6: 89 75 f8 mov %esi,0xfffffff8(%ebp)
    >> 9: 8b 4d 10 mov 0x10(%ebp),%ecx
    >> c: 8b 75 08 mov 0x8(%ebp),%esi
    >> f: 89 7d fc mov %edi,0xfffffffc(%ebp)
    >> 12: 8b 7d 0c mov 0xc(%ebp),%edi
    >> 15: fc cld
    >> 16: 39 c9 cmp %ecx,%ecx
    >> 18: f3 a6 repz cmpsb %es:(%edi),%ds:(%esi)
    >> 1a: 0f 94 c0 sete %al
    >> 1d: 8b 75 f8 mov 0xfffffff8(%ebp),%esi
    >> 20: 8b 7d fc mov 0xfffffffc(%ebp),%edi
    >> 23: 89 ec mov %ebp,%esp
    >> 25: 0f b6 c0 movzbl %al,%eax
    >> 28: 5d pop %ebp
    >> 29: c3 ret

    >
    >
    > How is the memcmp version different from the original version? Is it
    > faster? Sorry, I don't know how to read assembly well.


    The disassembly shows that memcmp functionality is inlined - there is no
    call to the memcmp function. It's replaced with the ia32 "repz" prefix
    to the "cmps" instruction. This is a single instuction that is
    equivalent to the entire for loop in the op's example.

    isequal() equivalent to isequal2() below.

    bool isequal2( const char * a, const char * b, int len )
    {

    while ( len -- )
    {
    if ( * (a++) != * (b++) )
    {
    return false;
    }
    }
    return true;

    }

    memcmp_tst.o: file format elf32-i386

    Disassembly of section .text:

    00000000 <_Z8isequal2PKcS0_i>:
    0: 55 push %ebp
    1: 89 e5 mov %esp,%ebp
    3: 56 push %esi
    4: 53 push %ebx
    5: 8b 75 08 mov 0x8(%ebp),%esi
    8: 8b 5d 0c mov 0xc(%ebp),%ebx
    b: 8b 4d 10 mov 0x10(%ebp),%ecx
    e: eb 0c jmp 1c <_Z8isequal2PKcS0_i+0x1c>
    10: 0f b6 13 movzbl (%ebx),%edx
    13: 43 inc %ebx
    14: 0f b6 06 movzbl (%esi),%eax
    17: 46 inc %esi
    18: 38 d0 cmp %dl,%al
    1a: 75 0f jne 2b <_Z8isequal2PKcS0_i+0x2b>
    1c: 49 dec %ecx
    1d: 83 f9 ff cmp $0xffffffff,%ecx
    20: 75 ee jne 10 <_Z8isequal2PKcS0_i+0x10>
    22: 5b pop %ebx
    23: b8 01 00 00 00 mov $0x1,%eax
    28: 5e pop %esi
    29: 5d pop %ebp
    2a: c3 ret
    2b: 31 c0 xor %eax,%eax
    2d: 5b pop %ebx
    2e: 5e pop %esi
    2f: 5d pop %ebp
    30: c3 ret

    As you can see, it has 2 conditional jumps "jne = jump if not equal".
    Conditional jumps are costly in modern CPU's that do branch prediction
    when the prediction is wrong. If the compiler was really really smart,
    it *might* be able to figure out how to do the optimization to use "repz
    cmp", but I have yet to see one !

    >
    >>memcmp is probably the best choice for a balance of performance and
    >>portability.

    >
    >
    > An implementation may specialize std::equal(begin, end, begin2) to call
    > std::memcmp for POD types. I don't know how many actually do though. My
    > implementation (Borland 6) does not.


    It appears that gcc also does not. Using std::equal

    00000040 <_Z8isequal3PKcS0_i>:
    40: 55 push %ebp
    41: 89 e5 mov %esp,%ebp
    43: 53 push %ebx
    44: 8b 55 08 mov 0x8(%ebp),%edx
    47: 8b 45 10 mov 0x10(%ebp),%eax
    4a: 8b 4d 0c mov 0xc(%ebp),%ecx
    4d: 89 d3 mov %edx,%ebx
    4f: 01 c3 add %eax,%ebx
    51: eb 09 jmp 5c <_Z8isequal3PKcS0_i+0x1c>
    53: 0f b6 01 movzbl (%ecx),%eax
    56: 38 02 cmp %al,(%edx)
    58: 75 11 jne 6b <_Z8isequal3PKcS0_i+0x2b>
    5a: 42 inc %edx
    5b: 41 inc %ecx
    5c: 39 da cmp %ebx,%edx
    5e: 75 f3 jne 53 <_Z8isequal3PKcS0_i+0x13>
    60: 5b pop %ebx
    61: b8 01 00 00 00 mov $0x1,%eax
    66: 0f b6 c0 movzbl %al,%eax
    69: 5d pop %ebp
    6a: c3 ret
    6b: 5b pop %ebx
    6c: 31 c0 xor %eax,%eax
    6e: 0f b6 c0 movzbl %al,%eax
    71: 5d pop %ebp
    72: c3 ret


    Note the 2 jne's - basically the same as the previous loop.
     
    Gianni Mariani, Jun 6, 2004
    #8
  9. MrCoder wrote:
    > Thanks for the answers guys!
    >
    > Im going to try it using memcmp shortly and see what sort of speed
    > increase I get.


    When you know the size, you can also unroll (or partially unroll) the loop.
    This is a semi-generalized implementation:

    #include <functional>
    #include <iterator>

    // identity

    template<class T> struct identity {
    typedef T type;
    };

    // map_integral

    template<class T, T V> struct map_integral {
    static const T value = V;
    };

    template<class T, T V> const T map_integral<T, V>::value;

    // enable_if

    template<bool, class R> struct enable_if { };
    template<class R> struct enable_if<true, R> : identity<R> { };

    // is_class

    template<class T> class is_class {
    private:
    template<class U> static char check(int U::*);
    template<class U> static char (& check(...))[2];
    public:
    static const bool value = sizeof(check<T>(0)) == 1;
    };

    template<class T> const bool is_class<T>::value;

    // is_same

    template<class T, class U> struct is_same : map_integral<bool, false> { };
    template<class T> struct is_same<T, T> : map_integral<bool, true> { };

    // is_pointer_to_function

    template<class> struct is_pointer_to_function
    : map_integral<bool, false> { };

    template<> struct is_pointer_to_function<void*>
    : map_integral<bool, false> { };

    template<class T> struct is_pointer_to_function<T*>
    : is_same<void (T), void (T*)> { };

    // is_reference_to_function

    template<class> struct is_reference_to_function
    : map_integral<bool, false> { };

    template<class T> struct is_reference_to_function<T&>
    : is_same<void (T), void (T*)> { };

    // compare

    namespace detail {

    template<class iterator, class predicate>
    bool compare(
    iterator a1, iterator a2,
    iterator b1, iterator b2,
    predicate pred, std::input_iterator_tag
    ) {
    while (a1 != a2) {
    if (b1 == b2 || !pred(*a1++, *b1++)) {
    return false;
    }
    }
    return b1 == b2;
    }

    template<class iterator, class predicate>
    bool compare(
    iterator a1, iterator a2,
    iterator b1, iterator b2,
    predicate pred, std::random_access_iterator_tag
    ) {
    typedef typename std::iterator_traits<iterator>::difference_type
    difference_type;
    difference_type size(a2 - a1);
    if (size != b2 - b1) {
    return false;
    }
    // unrolling factor == 4
    register difference_type n((size + 3) / 4);
    #define body() \
    if (!pred(*a1++, *b1++)) { \
    return false; \
    } \
    /**/
    switch (size % 4 - !size) {
    case 0:
    do {
    body()
    case 3: body()
    case 2: body()
    case 1: body()
    } while (--n);
    }
    #undef body
    return true;
    }

    } // detail

    template<class iterator, class predicate>
    inline bool compare(
    iterator a1, iterator a2,
    iterator b1, iterator b2,
    predicate pred
    ) {
    return detail::compare(
    a1, a2, b1, b2,
    pred, typename std::iterator_traits<iterator>::iterator_category()
    );
    }

    template<class iterator>
    inline bool compare(iterator a1, iterator a2, iterator b1, iterator b2) {
    return compare(
    a1, a2, b1, b2,
    std::equal_to<typename std::iterator_traits<iterator>::value_type>()
    );
    }

    template<class T, class predicate>
    inline bool compare(const T* a, const T* b, unsigned size, predicate pred) {
    return compare(a, a + size, b, b + size, pred);
    }

    template<class T, unsigned long s1, unsigned long s2, class predicate>
    inline typename enable_if<
    is_class<predicate>::value
    || is_pointer_to_function<predicate>::value
    || is_reference_to_function<predicate>::value,
    bool
    >::type compare(T (& a)[s1], T (& b)[s2], predicate pred) {

    return compare(a, a + s1, b, b + s2, pred);
    }

    template<class T> inline bool compare(const T* a, const T* b, unsigned size) {
    return compare(a, b, size, std::equal_to<const T>());
    }

    template<class T, unsigned long s1, unsigned long s2>
    inline bool compare(T (& a)[s1], T (& b)[s2]) {
    return compare(a, b, std::equal_to<T>());
    }

    Regards,
    Paul Mensonides
     
    Paul Mensonides, Jun 7, 2004
    #9
  10. MrCoder

    Siemel Naran Guest

    "Gianni Mariani" <> wrote in message news:c9vr41
    > Siemel Naran wrote:


    > > How is the memcmp version different from the original version? Is it
    > > faster? Sorry, I don't know how to read assembly well.

    >
    > The disassembly shows that memcmp functionality is inlined - there is no
    > call to the memcmp function. It's replaced with the ia32 "repz" prefix
    > to the "cmps" instruction. This is a single instuction that is
    > equivalent to the entire for loop in the op's example.
    >
    > isequal() equivalent to isequal2() below.
    >
    > bool isequal2( const char * a, const char * b, int len )
    > {
    >
    > while ( len -- )
    > {
    > if ( * (a++) != * (b++) )
    > {
    > return false;
    > }
    > }
    > return true;
    >
    > }
    >
    > memcmp_tst.o: file format elf32-i386
    >
    > Disassembly of section .text:
    >
    > 00000000 <_Z8isequal2PKcS0_i>:
    > 0: 55 push %ebp
    > 1: 89 e5 mov %esp,%ebp
    > 3: 56 push %esi
    > 4: 53 push %ebx
    > 5: 8b 75 08 mov 0x8(%ebp),%esi
    > 8: 8b 5d 0c mov 0xc(%ebp),%ebx
    > b: 8b 4d 10 mov 0x10(%ebp),%ecx
    > e: eb 0c jmp 1c <_Z8isequal2PKcS0_i+0x1c>
    > 10: 0f b6 13 movzbl (%ebx),%edx
    > 13: 43 inc %ebx
    > 14: 0f b6 06 movzbl (%esi),%eax
    > 17: 46 inc %esi
    > 18: 38 d0 cmp %dl,%al
    > 1a: 75 0f jne 2b <_Z8isequal2PKcS0_i+0x2b>
    > 1c: 49 dec %ecx
    > 1d: 83 f9 ff cmp $0xffffffff,%ecx
    > 20: 75 ee jne 10 <_Z8isequal2PKcS0_i+0x10>
    > 22: 5b pop %ebx
    > 23: b8 01 00 00 00 mov $0x1,%eax
    > 28: 5e pop %esi
    > 29: 5d pop %ebp
    > 2a: c3 ret
    > 2b: 31 c0 xor %eax,%eax
    > 2d: 5b pop %ebx
    > 2e: 5e pop %esi
    > 2f: 5d pop %ebp
    > 30: c3 ret
    >
    > As you can see, it has 2 conditional jumps "jne = jump if not equal".
    > Conditional jumps are costly in modern CPU's that do branch prediction
    > when the prediction is wrong. If the compiler was really really smart,
    > it *might* be able to figure out how to do the optimization to use "repz
    > cmp", but I have yet to see one !
    >
    > >
    > >>memcmp is probably the best choice for a balance of performance and
    > >>portability.

    > >
    > >
    > > An implementation may specialize std::equal(begin, end, begin2) to call
    > > std::memcmp for POD types. I don't know how many actually do though.

    My
    > > implementation (Borland 6) does not.

    >
    > It appears that gcc also does not. Using std::equal
    >
    > 00000040 <_Z8isequal3PKcS0_i>:
    > 40: 55 push %ebp
    > 41: 89 e5 mov %esp,%ebp
    > 43: 53 push %ebx
    > 44: 8b 55 08 mov 0x8(%ebp),%edx
    > 47: 8b 45 10 mov 0x10(%ebp),%eax
    > 4a: 8b 4d 0c mov 0xc(%ebp),%ecx
    > 4d: 89 d3 mov %edx,%ebx
    > 4f: 01 c3 add %eax,%ebx
    > 51: eb 09 jmp 5c <_Z8isequal3PKcS0_i+0x1c>
    > 53: 0f b6 01 movzbl (%ecx),%eax
    > 56: 38 02 cmp %al,(%edx)
    > 58: 75 11 jne 6b <_Z8isequal3PKcS0_i+0x2b>
    > 5a: 42 inc %edx
    > 5b: 41 inc %ecx
    > 5c: 39 da cmp %ebx,%edx
    > 5e: 75 f3 jne 53 <_Z8isequal3PKcS0_i+0x13>
    > 60: 5b pop %ebx
    > 61: b8 01 00 00 00 mov $0x1,%eax
    > 66: 0f b6 c0 movzbl %al,%eax
    > 69: 5d pop %ebp
    > 6a: c3 ret
    > 6b: 5b pop %ebx
    > 6c: 31 c0 xor %eax,%eax
    > 6e: 0f b6 c0 movzbl %al,%eax
    > 71: 5d pop %ebp
    > 72: c3 ret
    >
    >
    > Note the 2 jne's - basically the same as the previous loop.


    Thanks for your answers!
     
    Siemel Naran, Jun 7, 2004
    #10
  11. MrCoder wrote:
    > Hey guys, my first post on here so I'll just say "Hello everbody!"
    > Ok heres my question for you lot.
    > Is there a faster way to compare 1 byte array to another?
    > This is my current code
    > // Check for a match
    > match = true;
    > for ( int i = 0; i < arraylen; i++)
    > {
    > if( data != data2 )
    > {
    > match = false;
    > break;
    > }
    > }
    >
    > Thanks in advance guys!!


    First off, search the web and the newgroups for
    "premature optimization". Only optimize when absolutely
    necessary.

    Loop Unrolling (also search the web on this one).
    --------------
    In most CPU architectures, branches (a.k.a. jumps) are
    more expensive (timewise) than data processing instructions.
    Some processors also allow for conditional execution of
    instructions. Anyway, the fundamental issue here is to
    perform more comparisons than branches. Try this:
    bool equal(true);
    for (unsigned int i = 0;
    equal && (i < arraylen);)
    {
    equal = data == data2[i++];
    // Repeat the following line many times.
    equal = equal && (i < array_length)
    && (data == data2[i++]);
    }
    The objective here is to use boolean logic and the
    short-circuiting of the logical AND operator to reduce
    the amount of branching. The more data operations per
    branch is better.


    Library Routines
    ----------------
    As others have said, use library routines. Some
    library routines may be optimized for your platform's
    processor while a simple for loop may not be.


    Customize In Assembly.
    ----------------------
    You can write a custom memory compare function that
    will take advantage of any processor special functions.
    The processor I'm working with has the capability to
    fetch much data from memory into registers with one
    instruction. So for large sized arrays, I would
    use a "block" method that would haul in a lot of
    data into registers, then compare each register pair.


    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, Jun 7, 2004
    #11
  12. MrCoder

    Siemel Naran Guest

    "Thomas Matthews" <> wrote in
    message

    > Loop Unrolling (also search the web on this one).
    > --------------
    > In most CPU architectures, branches (a.k.a. jumps) are
    > more expensive (timewise) than data processing instructions.
    > Some processors also allow for conditional execution of
    > instructions. Anyway, the fundamental issue here is to
    > perform more comparisons than branches. Try this:
    > bool equal(true);
    > for (unsigned int i = 0;
    > equal && (i < arraylen);)
    > {
    > equal = data == data2[i++];
    > // Repeat the following line many times.
    > equal = equal && (i < array_length)
    > && (data == data2[i++]);
    > }
    > The objective here is to use boolean logic and the
    > short-circuiting of the logical AND operator to reduce
    > the amount of branching. The more data operations per
    > branch is better.


    Have you measured the speed increase?
     
    Siemel Naran, Jun 7, 2004
    #12
  13. Siemel Naran wrote:
    > "Thomas Matthews" <> wrote in
    > message
    >
    >
    >>Loop Unrolling (also search the web on this one).
    >>--------------
    >>In most CPU architectures, branches (a.k.a. jumps) are
    >>more expensive (timewise) than data processing instructions.
    >>Some processors also allow for conditional execution of
    >>instructions. Anyway, the fundamental issue here is to
    >>perform more comparisons than branches. Try this:
    >> bool equal(true);
    >> for (unsigned int i = 0;
    >> equal && (i < arraylen);)
    >> {
    >> equal = data == data2[i++];
    >> // Repeat the following line many times.
    >> equal = equal && (i < array_length)
    >> && (data == data2[i++]);
    >> }
    >>The objective here is to use boolean logic and the
    >>short-circuiting of the logical AND operator to reduce
    >>the amount of branching. The more data operations per
    >>branch is better.

    >
    >
    > Have you measured the speed increase?
    >
    >


    Only on my assembly language version for an embbeded
    system using the ARM processor at 30Mhz.

    I've found that it is faster for larger chunks of data,
    sizes of 32 bytes or more, than using a simple byte
    comparison.

    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, Jun 7, 2004
    #13
  14. And nobody's mentioned Duff's device yet?


    "MrCoder" <> wrote in message
    news:c9unqg$h76$...
    > Hey guys, my first post on here so I'll just say "Hello everbody!"
    >
    > Ok heres my question for you lot.
    >
    > Is there a faster way to compare 1 byte array to another?
    >
    > This is my current code
    >
    > // Check for a match
    >
    > match = true;
    >
    > for ( int i = 0; i < arraylen; i++)
    >
    > {
    >
    > if( data != data2 )
    >
    > {
    >
    > match = false;
    >
    > break;
    >
    > }
    >
    > }
    >
    >
    >
    > Thanks in advance guys!!
    >
    >
     
    Dave Townsend, Jun 8, 2004
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bharat Bhushan

    Appending byte[] to another byte[] array

    Bharat Bhushan, Aug 5, 2003, in forum: Java
    Replies:
    15
    Views:
    40,261
    Roedy Green
    Aug 5, 2003
  2. Kirby
    Replies:
    3
    Views:
    653
    Kirby
    Oct 8, 2004
  3. Replies:
    20
    Views:
    9,800
    licebmi
    Sep 7, 2009
  4. Tom McGlynn
    Replies:
    4
    Views:
    860
    Mark Space
    Apr 19, 2008
  5. Patricia Shanahan
    Replies:
    0
    Views:
    392
    Patricia Shanahan
    Apr 17, 2008
Loading...

Share This Page