question on std::search_n

Discussion in 'C++' started by ma740988, Mar 8, 2006.

  1. ma740988

    ma740988 Guest

    I've been perusing for some time now - I suppose some of the algorithms
    in Josuttis. 'Today', I do alot of funny math on data resident within
    memory. As part of this. I find myself using std::search_n alot to
    find a particular value in a range of memory. I'm fairly pleased with
    std::search_n - in that my benchmark ( below ) shows on average an 8
    second performance gain comparable to a for loop. The question.
    Since I haven't quite gotten my head around most of the algorithms, is
    there an alternative to std::search_n that I'm perhaps overlooking?
    Ironically I didn't see a lot of use on the web with regards to
    std::search_n, nonetheless, hopefully my question is not too 'open
    ended'. Source code for bechmark - executed on a PC

    # include <iostream>
    # include <algorithm>
    # include <ctime>

    using namespace std;

    int main()
    {
    int const buffer_len( 0x100000 );
    int *ptr_mem = new int [ buffer_len ];
    std::fill ( ptr_mem, ptr_mem + buffer_len, 0xA5 );

    int *ptr_mem_end = ptr_mem + buffer_len;

    std::clock_t Start, End;
    double Elapsed;
    int count(0);
    int const num_test(1000);

    Start = std::clock();
    for ( int idx(0); idx < num_test; ++idx )
    {
    if ( std::search_n (
    ptr_mem,
    ptr_mem_end,
    buffer_len,
    0xA5 ) == ptr_mem_end )
    {
    ++count;
    break;
    }
    }
    if ( !count )
    {
    End = std::clock();
    Elapsed = static_cast<double>( ( End - Start ) / CLOCKS_PER_SEC );

    std::cout << " Elapsed - std::search_n " << Elapsed << std::endl;
    }

    count = 0;
    Start = std::clock();
    for ( int jdx(0); jdx < num_test; ++jdx )
    {
    for ( int idx(0); idx < buffer_len; ++idx )
    {
    if ( ptr_mem [idx ] != 0xA5 )
    {
    ++count;
    break;
    }
    }
    }
    if ( !count )
    {
    End = std::clock();
    Elapsed = static_cast<double>( ( End - Start ) / CLOCKS_PER_SEC );

    std::cout << " Elapsed - for loop " << Elapsed << std::endl;
    }
    delete [] ptr_mem; // try to clean up your mess

    }

    /*
    Elapsed - std::search_n 30
    Elapsed - for loop 38

    Press any key to continue
    */
     
    ma740988, Mar 8, 2006
    #1
    1. Advertising

  2. ma740988 wrote:
    > Since I haven't quite gotten my head around most of the algorithms, is
    > there an alternative to std::search_n that I'm perhaps overlooking?


    The first question is: what do you intend to do? 'search_n()' locates
    the first consecutive sequence of the specified length matching the
    specified object. There is pretty rare use for this and the example
    you gave is not a situation where I would apply 'search_n()': I would
    apply 'find()' with an inverted predicate to locate the first position
    where the condition does not hold (this is probably what is essentially
    happening inside 'search_n()' in you situation).
    --
    <mailto:> <http://www.dietmar-kuehl.de/>
    <http://www.eai-systems.com> - Efficient Artificial Intelligence
     
    Dietmar Kuehl, Mar 8, 2006
    #2
    1. Advertising

  3. ma740988

    ma740988 Guest

    Dietmar Kuehl wrote:
    > ma740988 wrote:
    > > Since I haven't quite gotten my head around most of the algorithms, is
    > > there an alternative to std::search_n that I'm perhaps overlooking?

    >
    > The first question is: what do you intend to do?

    Dietmar, Lets assume I transmitted (we'll ignore the protocol used -
    just assume you got data somehow ) 4 MiB of data to you. For
    simplicity*, lets also assume that this data is all '0xA5'. Your job
    then is to ensure that ( as part of what I call 'communication check' )
    you received 4 MiB worth of 0xA5's.

    That's the gist in a nutshell.

    Worse case scenario: Within the 4 MiB of data, each 1 MiB boundary
    will have a different value. For instance:

    Value Range
    0xA5 - 0x000000 - 0x100000
    0xA6 - 0x100000 - 0x200000
    0xA7 - 0x200000 - 0x300000
    0xA8 - 0x300000 - 0x400000

    You'll know in advance that the received data will change on 1 MiB
    boundaries. That's because I'll tell you prior to sending the data -
    what 'mode' I'm in.
    The mode is irrelevant though. The important thing surrounds an
    approach that'll walk through the memory finding the specific values.

    The program shown highlights my current approach to the - simple case.
    4 MiB worth of 0xA5's.

    >'search_n()' locates
    > the first consecutive sequence of the specified length matching the
    > specified object. There is pretty rare use for this and the example
    > you gave is not a situation where I would apply 'search_n()': I would
    > apply 'find()' with an inverted predicate to locate the first position
    > where the condition does not hold (this is probably what is essentially
    > happening inside 'search_n()' in you situation).


    An inverted predicated. Sounds like a not1 .. I'll look into a find
    approach. The trouble with alot of these algorithms is the boundary
    for me - today appears muddled. search versus find. We'll they're
    synonymous, but I wonder sometimes if it would have been better to just
    'provide the better of the two' as part of the standard. In any event
    ....
     
    ma740988, Mar 8, 2006
    #3
  4. ma740988 wrote:
    > Dietmar, Lets assume I transmitted (we'll ignore the protocol used -
    > just assume you got data somehow ) 4 MiB of data to you. For
    > simplicity*, lets also assume that this data is all '0xA5'. Your job
    > then is to ensure that ( as part of what I call 'communication check' )
    > you received 4 MiB worth of 0xA5's.


    This would be a pretty stupid transmission and I would certainly
    represent the received data differently, e.g. using a sequence of
    (value, count) pairs. In any case, assuming I got a container with
    the data in the form you described, I would use one of two approaches
    depending on the characteristics of the container:

    I would use 'std::find()' to locate the first object not matching the
    condition and 'std::distance()' to determine the number of matching
    elements:

    if (n != std::distance(c.begin(),
    std::find_if(c.begin(), c.end(),
    std::bind2nd(std::not_equal_to<char>(), 0xA5))))
    ...

    If the call to 'std::distance()' is not bundled with the call to
    'std::find_if()', you would get the position of the first sequence,
    too. Note, however, that the sequence is traversed twice in this
    situation if the container does not provide random access.

    > The trouble with alot of these algorithms is the boundary
    > for me - today appears muddled. search versus find. We'll they're
    > synonymous,


    In some sense 'std::search_n()' is a generalization of 'std::find()':
    'std::find()' is identical to a 'std::search_n()' with a count of '1'.

    > but I wonder sometimes if it would have been better to just
    > 'provide the better of the two' as part of the standard.


    Which one is the better one? The simple one which does not work for
    everybody? ...or the more general one which requires users in need
    of a simple solution to pass extra arguments? Do you really want
    to pass on all those arguments for the most general algorithms when
    you only need a simple algorithm? Also, do you accept the possible
    slowdown due to using a more general algorithm which cannot take
    advantage of specific properties? ... or that the general algorithm
    may be inapplicable because it requires more specific input e.g. a
    forward iterator ('std::search_n()') instead of an input iterator
    ('std::find()')?
    --
    <mailto:> <http://www.dietmar-kuehl.de/>
    <http://www.eai-systems.com> - Efficient Artificial Intelligence
     
    Dietmar Kuehl, Mar 8, 2006
    #4
  5. ma740988

    ma740988 Guest

    > This would be a pretty stupid transmission and I would certainly
    > represent the received data differently, e.g. using a sequence of
    > (value, count) pairs.

    You lost me there. You're told in advance about the sequence in a
    header. You get header/data, header/data - at all times. The header
    tells you what to do with the junk you receive/may have received
    (assuming tranmittal errors).

    In any case, assuming I got a container with
    > the data in the form you described, I would use one of two approaches
    > depending on the characteristics of the container:
    >
    > I would use 'std::find()' to locate the first object not matching the
    > condition and 'std::distance()' to determine the number of matching
    > elements:
    >
    > if (n != std::distance(c.begin(),
    > std::find_if(c.begin(), c.end(),
    > std::bind2nd(std::not_equal_to<char>(), 0xA5))))
    > ...

    Ahh!! That's right!! I was missing the bind2nd part. Actually, I was
    trying out a combination of count_if(find_if ... )

    > If the call to 'std::distance()' is not bundled with the call to
    > 'std::find_if()', you would get the position of the first sequence,
    > too. Note, however, that the sequence is traversed twice in this
    > situation if the container does not provide random access.


    I'm a little ignorant in this area so bear with me. Just so I
    understand this: Assume our container is std::list and given 4
    elements. Pictorially:

    [ begin ][ begin + 1 ][ begin + 2][ begin + 3]

    So find_if will run from begin to 1 past begin + 3. Now why is there a
    need to traverse twice? All the sequence needs to do is run from being
    to 1 past begin + 3 to find a matching value.


    > > The trouble with alot of these algorithms is the boundary
    > > for me - today appears muddled. search versus find. We'll they're
    > > synonymous,

    >
    > In some sense 'std::search_n()' is a generalization of 'std::find()':
    > 'std::find()' is identical to a 'std::search_n()' with a count of '1'.
    >
    > > but I wonder sometimes if it would have been better to just
    > > 'provide the better of the two' as part of the standard.

    >
    > Which one is the better one? The simple one which does not work for
    > everybody? ...or the more general one which requires users in need
    > of a simple solution to pass extra arguments? Do you really want
    > to pass on all those arguments for the most general algorithms when
    > you only need a simple algorithm? Also, do you accept the possible
    > slowdown due to using a more general algorithm which cannot take
    > advantage of specific properties? ... or that the general algorithm
    > may be inapplicable because it requires more specific input e.g. a
    > forward iterator ('std::search_n()') instead of an input iterator
    > ('std::find()')?

    Ok. the devil is in the details, I suppose. :) I need to back up and
    review once again distinction between the input versus forward
    iterators

    Thanks for the help
     
    ma740988, Mar 8, 2006
    #5
  6. ma740988 wrote:

    >> This would be a pretty stupid transmission and I would certainly
    >> represent the received data differently, e.g. using a sequence of
    >> (value, count) pairs.

    > You lost me there. You're told in advance about the sequence in a
    > header. You get header/data, header/data - at all times. The header
    > tells you what to do with the junk you receive/may have received
    > (assuming tranmittal errors).


    All I'm saying is that I would read the input and check immediately
    the data I receive without ever putting it into any form of container
    or sequence (other than the input sequence you might obtain using
    'std::istreambuf_iterator<char>'). Especially if the sequences are
    huge, this would be much more memory conservative.

    >> If the call to 'std::distance()' is not bundled with the call to
    >> 'std::find_if()', you would get the position of the first sequence,
    >> too. Note, however, that the sequence is traversed twice in this
    >> situation if the container does not provide random access.

    >
    > I'm a little ignorant in this area so bear with me. Just so I
    > understand this: Assume our container is std::list and given 4
    > elements. Pictorially:
    >
    > [ begin ][ begin + 1 ][ begin + 2][ begin + 3]
    >
    > So find_if will run from begin to 1 past begin + 3. Now why is there a
    > need to traverse twice? All the sequence needs to do is run from being
    > to 1 past begin + 3 to find a matching value.


    The sequence is traversed once for 'std::find_if()' to determine the
    end and then another time for 'std::distance()' to count the number
    of elements. Duplicate traversal of 1M list nodes could be more
    expensive than bundling the counting with the finding and thus
    requiring only one traversal.

    > Ok. the devil is in the details, I suppose. :)


    I wouldn't call the difference between a forward iterator and an
    input iterator to be much more severe than just a "detail". But
    then, I'm pretty focused on the concepts involved in generic
    programming and thus consider things a big deal which most others
    don't really care about...
    --
    <mailto:> <http://www.dietmar-kuehl.de/>
    <http://www.eai-systems.com> - Efficient Artificial Intelligence
     
    Dietmar Kuehl, Mar 8, 2006
    #6
  7. ma740988

    ma740988 Guest


    > All I'm saying is that I would read the input and check immediately
    > the data I receive without ever putting it into any form of container
    > or sequence (other than the input sequence you might obtain using
    > 'std::istreambuf_iterator<char>'). Especially if the sequences are
    > huge, this would be much more memory conservative.


    I see. That's exactly what I doing though. I know where start and end
    of the data is ( based on what the header told me ). I konw in memory
    where the data resides. From there I just check to ensure that what
    the 'sender' claimed he sent matched what I received.
    My initial post showed me allocating memory and filling it with A5's.
    In my real system, obviously the receiver doesn't need to do all that.
    That said, those steps are omitted. The receiver just _walks_ across
    the memory - validating the data.

    >
    > The sequence is traversed once for 'std::find_if()' to determine the
    > end and then another time for 'std::distance()' to count the number
    > of elements. Duplicate traversal of 1M list nodes could be more
    > expensive than bundling the counting with the finding and thus
    > requiring only one traversal.
    >

    Got it. Thanks alot.
     
    ma740988, Mar 8, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Jefferson

    Inefficency in search_n

    Chris Jefferson, Aug 9, 2004, in forum: C++
    Replies:
    4
    Views:
    404
    tom_usenet
    Aug 9, 2004
  2. Peter Jansson
    Replies:
    5
    Views:
    6,439
    Ivan Vecerina
    Mar 17, 2005
  3. Vinu
    Replies:
    4
    Views:
    394
    Jim Langston
    Jul 7, 2005
  4. Vinu
    Replies:
    0
    Views:
    376
  5. Geoffrey S. Knauth
    Replies:
    6
    Views:
    1,049
    Earl Purple
    Jan 18, 2006
Loading...

Share This Page