memdiff

P

paul.anderson

I need to compare about 25k bytes of data. I'd like to be able to do a
simple comparison of memory between 2 structures containing intrinsic
data and find the position where the first difference occurs. Is there
an easy way to do this? I'd like to use a call like memcmp but that
only provides <, > or = responses. I thought maybe there was some sort
of "memdiff" call that would give me the position where the 2
structures are different. Needless to say, I don't want to write
overloaded operator== methods that will handle the comparisons - my 25k
of data is a series of structs of structs that will be a pain to write
a method for every one.

Thanks for any insight!
 
P

Phlip

paul.anderson said:
I need to compare about 25k bytes of data. I'd like to be able to do a
simple comparison of memory between 2 structures containing intrinsic
data and find the position where the first difference occurs. Is there
an easy way to do this? I'd like to use a call like memcmp but that
only provides <, > or = responses. I thought maybe there was some sort
of "memdiff" call that would give me the position where the 2
structures are different. Needless to say, I don't want to write
overloaded operator== methods that will handle the comparisons - my 25k
of data is a series of structs of structs that will be a pain to write
a method for every one.

Thanks for any insight!

Are all of the structs PODS - Plain Ol' Data Structures?\

If so, you can just point to the first ones with (unsigned?) char *
pointers, and increment your pointers until you get a miss.

If not... why did you lose track of the difference between all these
structures? Can you fix the problem upstream without resorting to brute
force?

Further, even PODS may contain padding characters, and these might not be
guaranteed to be comparable.

I thought that PODS have a default operator== that's guaranteed to work
acceptably. If so, you could use this, or rely on it to build better
operator== methods on the containing structures.
 
F

Frederick Gotham

Phlip:
Are all of the structs PODS - Plain Ol' Data Structures?\

If so, you can just point to the first ones with (unsigned?) char *
pointers, and increment your pointers until you get a miss.


Depends whether the padding must be equal also. The test could produce a
false positive if two objects can be equal yet have different padding.
 
F

Frederick Gotham

I need to compare about 25k bytes of data.


size_t const len = 25600;

char unsigned *const a = new char unsigned[len];
*const b = new char unsigned[len];

/* Alter memory chunk a */

/* Alter memory chunk b */

memcmp(a,b,len);

I'd like to be able to do a simple comparison of memory between 2
structures containing intrinsic data and find the position where the
first difference occurs.


Too unspecific.

Is there an easy way to do this?


Yes, probably, but I don't know your requirements.

I'd like to use a call like memcmp but that only provides <, > or =
responses. I thought maybe there was some sort of "memdiff" call that
would give me the position where the 2 structures are different.
Needless to say, I don't want to write overloaded operator== methods
that will handle the comparisons - my 25k of data is a series of structs
of structs that will be a pain to write a method for every one.


These struct objects might contain padding. It's possible for two POD
objects to be identical, but yet have padding which is different. Do you
want these to still compare equal? If so, your ownly choice is to perform
an equality object-by-object.

You might start off with an algorithm to tell you if a particular byte is a
padding byte or not:

bool IsPaddingByte(MyPOD const &obj,char unsigned const *const p)
{
struct NoPaddingRange {
char unsigned const *p;
size_t len;
};

/* Let's say it has six members */

char unsigned const *const member_start_addresses[6] = {
p+offsetof(MyPOD,a),p+offsetof(MyPOD,b),p+offsetof(MyPOD,c),
p+offsetof(MyPOD,d),p+offsetof(MyPOD,e),p+offsetof(MyPOD,f) };

char unsigned const *const member_over_addresses[6] = {
member_start_addresses[0] + sizeof obj.a,
member_start_addresses[1] + sizeof obj.b,
member_start_addresses[2] + sizeof obj.c,
member_start_addresses[3] + sizeof obj.d,
member_start_addresses[4] + sizeof obj.e,
member_start_addresses[5] + sizeof obj.f };

for(unsigned i=0;i!=sizeof ranges/sizeof*ranges;++i)
{
/* This is a tad complicated... */
}

return false;
}

Then you could do as follows:

size_t OffsetWhereDiff(T const &a, T const &b)
{
for (...)
{
if (*p != *q && !IsPaddingByte(p)) return offset;
}
}
 
J

Jim Langston

I need to compare about 25k bytes of data. I'd like to be able to do a
simple comparison of memory between 2 structures containing intrinsic
data and find the position where the first difference occurs. Is there
an easy way to do this? I'd like to use a call like memcmp but that
only provides <, > or = responses. I thought maybe there was some sort
of "memdiff" call that would give me the position where the 2
structures are different. Needless to say, I don't want to write
overloaded operator== methods that will handle the comparisons - my 25k
of data is a series of structs of structs that will be a pain to write
a method for every one.

Thanks for any insight!

It seems as if you're worried that iterating over 25000 characters will take
too long. I did a test to see how long it would take.

#include <iostream>
#include <ctime>
#include <string>

int main()
{
const size_t size = 25000;
const double iterations = 1000.0f;
char* Array1 = new char[size];
char* Array2 = new char[size];

for ( size_t i = 0; i < size; ++i )
{
Array1 = 0;
Array2 = 0;
}

Array2[size - 1] = 127;

clock_t start = clock();
size_t position = size;
for ( int x = 0; x < iterations; ++x )
{
for ( size_t i = 0; i < size; ++i )
{
if ( Array1 != Array2 )
{
position = i;
break;
}
}
}
clock_t end = clock();

std::cout << "Offset of difference is: " << position << "\n";
std::cout << "It took me " << static_cast<double>( end - start ) /
iterations << " milliseconds" << std::endl;

delete[] Array1;
delete[] Array2;

std::string wait;
std::cin >> wait;
}

My output is:
Offset of difference is: 24999
It took me 0.094 milliseconds

I had to do it a few times to get a time because just doing it once told me
it took 0 milliseconds. And, as you can see, I did a worst case scenario,
where the very last character was the one that was different. Notice, this
is 0.094 milliseconds, not 0.094 seconds. So it's actually took, what,
0.000094 of a second. I don't think you need to worry about the time.
 
A

Alf P. Steinbach

* Jim Langston:
is 0.094 milliseconds, not 0.094 seconds. So it's actually took, what,
0.000094 of a second. I don't think you need to worry about the time.

<OT>
Makes one wonder what's going one while the Windows Start menu mulls
over the question of whether to pop up or not, for a minute or more.
Ah, wait! It wasn't made in C++, it was a VB prototype for Windows 95!
</OT>
 
D

DragonSt0rm

I need to compare about 25k bytes of data. I'd like to be able to do a
simple comparison of memory between 2 structures containing intrinsic
data and find the position where the first difference occurs. Is there
an easy way to do this? I'd like to use a call like memcmp but that
only provides <, > or = responses. I thought maybe there was some sort
of "memdiff" call that would give me the position where the 2
structures are different.

It is a bit more complicated than that. The structures can have different
padding due to alignament. I mean, 2 identical structures located at
different addresses may return a difference if you memcmp them, while if
you call your member to member comparison they will be equal.

As far as I remember from when I read the standard (years ago:) the only
things that guaranteed to compare equal byte by byte are arrays of
elementary types.

In practice however, I found that actually is pretty reliable (I encountered
issues only when compared a stack allocated POD to a heap allocated object)
but since it is not guaranteed by standard, you never know if on your
architecture you won't encounter issues.
So, you may not want to take the risk that 2 identical structure will be
reported as different.
Needless to say, I don't want to write
overloaded operator== methods that will handle the comparisons - my 25k
of data is a series of structs of structs that will be a pain to write
a method for every one.

Well, unfortunate you may be forced to do that.
Especially if you have struct in struct and stuff like that, it is too
risky.

MTM
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top