memdiff

Discussion in 'C++' started by paul.anderson@jhuapl.edu, Nov 17, 2006.

  1. Guest

    I need to compare about 25k bytes of data. I'd like to be able to do a
    simple comparison of memory between 2 structures containing intrinsic
    data and find the position where the first difference occurs. Is there
    an easy way to do this? I'd like to use a call like memcmp but that
    only provides <, > or = responses. I thought maybe there was some sort
    of "memdiff" call that would give me the position where the 2
    structures are different. Needless to say, I don't want to write
    overloaded operator== methods that will handle the comparisons - my 25k
    of data is a series of structs of structs that will be a pain to write
    a method for every one.

    Thanks for any insight!
    , Nov 17, 2006
    #1
    1. Advertising

  2. Phlip Guest

    paul.anderson wrote:

    > I need to compare about 25k bytes of data. I'd like to be able to do a
    > simple comparison of memory between 2 structures containing intrinsic
    > data and find the position where the first difference occurs. Is there
    > an easy way to do this? I'd like to use a call like memcmp but that
    > only provides <, > or = responses. I thought maybe there was some sort
    > of "memdiff" call that would give me the position where the 2
    > structures are different. Needless to say, I don't want to write
    > overloaded operator== methods that will handle the comparisons - my 25k
    > of data is a series of structs of structs that will be a pain to write
    > a method for every one.
    >
    > Thanks for any insight!


    Are all of the structs PODS - Plain Ol' Data Structures?\

    If so, you can just point to the first ones with (unsigned?) char *
    pointers, and increment your pointers until you get a miss.

    If not... why did you lose track of the difference between all these
    structures? Can you fix the problem upstream without resorting to brute
    force?

    Further, even PODS may contain padding characters, and these might not be
    guaranteed to be comparable.

    I thought that PODS have a default operator== that's guaranteed to work
    acceptably. If so, you could use this, or rely on it to build better
    operator== methods on the containing structures.

    --
    Phlip
    http://www.greencheese.us/ZeekLand <-- NOT a blog!!!
    Phlip, Nov 17, 2006
    #2
    1. Advertising

  3. Phlip:

    > Are all of the structs PODS - Plain Ol' Data Structures?\
    >
    > If so, you can just point to the first ones with (unsigned?) char *
    > pointers, and increment your pointers until you get a miss.



    Depends whether the padding must be equal also. The test could produce a
    false positive if two objects can be equal yet have different padding.

    --

    Frederick Gotham
    Frederick Gotham, Nov 17, 2006
    #3

  4. > I need to compare about 25k bytes of data.



    size_t const len = 25600;

    char unsigned *const a = new char unsigned[len];
    *const b = new char unsigned[len];

    /* Alter memory chunk a */

    /* Alter memory chunk b */

    memcmp(a,b,len);


    > I'd like to be able to do a simple comparison of memory between 2
    > structures containing intrinsic data and find the position where the
    > first difference occurs.



    Too unspecific.


    > Is there an easy way to do this?



    Yes, probably, but I don't know your requirements.


    > I'd like to use a call like memcmp but that only provides <, > or =
    > responses. I thought maybe there was some sort of "memdiff" call that
    > would give me the position where the 2 structures are different.
    > Needless to say, I don't want to write overloaded operator== methods
    > that will handle the comparisons - my 25k of data is a series of structs
    > of structs that will be a pain to write a method for every one.



    These struct objects might contain padding. It's possible for two POD
    objects to be identical, but yet have padding which is different. Do you
    want these to still compare equal? If so, your ownly choice is to perform
    an equality object-by-object.

    You might start off with an algorithm to tell you if a particular byte is a
    padding byte or not:

    bool IsPaddingByte(MyPOD const &obj,char unsigned const *const p)
    {
    struct NoPaddingRange {
    char unsigned const *p;
    size_t len;
    };

    /* Let's say it has six members */

    char unsigned const *const member_start_addresses[6] = {
    p+offsetof(MyPOD,a),p+offsetof(MyPOD,b),p+offsetof(MyPOD,c),
    p+offsetof(MyPOD,d),p+offsetof(MyPOD,e),p+offsetof(MyPOD,f) };

    char unsigned const *const member_over_addresses[6] = {
    member_start_addresses[0] + sizeof obj.a,
    member_start_addresses[1] + sizeof obj.b,
    member_start_addresses[2] + sizeof obj.c,
    member_start_addresses[3] + sizeof obj.d,
    member_start_addresses[4] + sizeof obj.e,
    member_start_addresses[5] + sizeof obj.f };

    for(unsigned i=0;i!=sizeof ranges/sizeof*ranges;++i)
    {
    /* This is a tad complicated... */
    }

    return false;
    }

    Then you could do as follows:

    size_t OffsetWhereDiff(T const &a, T const &b)
    {
    for (...)
    {
    if (*p != *q && !IsPaddingByte(p)) return offset;
    }
    }


    --

    Frederick Gotham
    Frederick Gotham, Nov 17, 2006
    #4
  5. Jim Langston Guest

    <> wrote in message
    news:...
    >I need to compare about 25k bytes of data. I'd like to be able to do a
    > simple comparison of memory between 2 structures containing intrinsic
    > data and find the position where the first difference occurs. Is there
    > an easy way to do this? I'd like to use a call like memcmp but that
    > only provides <, > or = responses. I thought maybe there was some sort
    > of "memdiff" call that would give me the position where the 2
    > structures are different. Needless to say, I don't want to write
    > overloaded operator== methods that will handle the comparisons - my 25k
    > of data is a series of structs of structs that will be a pain to write
    > a method for every one.
    >
    > Thanks for any insight!


    It seems as if you're worried that iterating over 25000 characters will take
    too long. I did a test to see how long it would take.

    #include <iostream>
    #include <ctime>
    #include <string>

    int main()
    {
    const size_t size = 25000;
    const double iterations = 1000.0f;
    char* Array1 = new char[size];
    char* Array2 = new char[size];

    for ( size_t i = 0; i < size; ++i )
    {
    Array1 = 0;
    Array2 = 0;
    }

    Array2[size - 1] = 127;

    clock_t start = clock();
    size_t position = size;
    for ( int x = 0; x < iterations; ++x )
    {
    for ( size_t i = 0; i < size; ++i )
    {
    if ( Array1 != Array2 )
    {
    position = i;
    break;
    }
    }
    }
    clock_t end = clock();

    std::cout << "Offset of difference is: " << position << "\n";
    std::cout << "It took me " << static_cast<double>( end - start ) /
    iterations << " milliseconds" << std::endl;

    delete[] Array1;
    delete[] Array2;

    std::string wait;
    std::cin >> wait;
    }

    My output is:
    Offset of difference is: 24999
    It took me 0.094 milliseconds

    I had to do it a few times to get a time because just doing it once told me
    it took 0 milliseconds. And, as you can see, I did a worst case scenario,
    where the very last character was the one that was different. Notice, this
    is 0.094 milliseconds, not 0.094 seconds. So it's actually took, what,
    0.000094 of a second. I don't think you need to worry about the time.
    Jim Langston, Nov 17, 2006
    #5
  6. * Jim Langston:
    > is 0.094 milliseconds, not 0.094 seconds. So it's actually took, what,
    > 0.000094 of a second. I don't think you need to worry about the time.


    <OT>
    Makes one wonder what's going one while the Windows Start menu mulls
    over the question of whether to pop up or not, for a minute or more.
    Ah, wait! It wasn't made in C++, it was a VB prototype for Windows 95!
    </OT>

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
    Alf P. Steinbach, Nov 18, 2006
    #6
  7. DragonSt0rm Guest

    wrote:

    > I need to compare about 25k bytes of data. I'd like to be able to do a
    > simple comparison of memory between 2 structures containing intrinsic
    > data and find the position where the first difference occurs. Is there
    > an easy way to do this? I'd like to use a call like memcmp but that
    > only provides <, > or = responses. I thought maybe there was some sort
    > of "memdiff" call that would give me the position where the 2
    > structures are different.


    It is a bit more complicated than that. The structures can have different
    padding due to alignament. I mean, 2 identical structures located at
    different addresses may return a difference if you memcmp them, while if
    you call your member to member comparison they will be equal.

    As far as I remember from when I read the standard (years ago:) the only
    things that guaranteed to compare equal byte by byte are arrays of
    elementary types.

    In practice however, I found that actually is pretty reliable (I encountered
    issues only when compared a stack allocated POD to a heap allocated object)
    but since it is not guaranteed by standard, you never know if on your
    architecture you won't encounter issues.
    So, you may not want to take the risk that 2 identical structure will be
    reported as different.

    > Needless to say, I don't want to write
    > overloaded operator== methods that will handle the comparisons - my 25k
    > of data is a series of structs of structs that will be a pain to write
    > a method for every one.


    Well, unfortunate you may be forced to do that.
    Especially if you have struct in struct and stuff like that, it is too
    risky.

    MTM
    DragonSt0rm, Nov 18, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page