compare 2 files

S

Siemel Naran

How to compare if two files are identical? I wrote the following:

bool comparefiles(const std::string& lhs, const std::string& rhs)
{
std::ifstream lhsfile(lhs.c_str());
std::ifstream rhsfile(rhs.c_str());

typedef std::istreambuf_iterator<char> istreambuf_iterator;

return std::equal(
istreambuf_iterator(lhsfile),
istreambuf_iterator(),
istreambuf_iterator(rhsfile)
);
}

But I don't think it will work becuase: (1) we only compare the first N
chars where N is the number of chars in lhsfile, so if rhsfile has more
chars the function will return true if the first N are equal which is
incorrect, (2) the standard says that calling operator* on an end of stream
is undefined (24.5.3.3), so if lhsfile has more chars then we will at some
point call operator* on rhsfile when it is at EOF, and the result is
undefined (though I think it should always return EOF).

So what else can we do?

I could use the stat function to check if lhsfile and rhsfile have the same
size, but I want to keep my code ANSI compatible.

So I came up with the following function, which looks very much like strcmp.


bool comparefiles(const std::string& lhs, const std::string& rhs)
{
using namespace std;
const streambuf::int_type eof = streambuf::traits_type::eof();

ifstream lhsfile(lhs.c_str());
ifstream rhsfile(rhs.c_str());

streambuf * lhsbuf = lhsfile.rdbuf();
streambuf * rhsbuf = rhsfile.rdbuf();

char lhschar, rhschar;
while (true)
{
lhschar = lhsbuf->sbumpc();
rhschar = rhsbuf->sbumpc();

if (lhschar == eof && rhschar == eof) return true;
if (lhschar == eof || rhschar == eof) break;
if (lhschar != rhschar) break;
}

cout << "compare \"" << lhs << "\" and \"" << rhs << "\" failed\n";
return false;
}


Any comments?
 
I

Ivan Vecerina

Siemel Naran said:
How to compare if two files are identical? I wrote the following: ....
So I came up with the following function, which looks very much like
strcmp.


bool comparefiles(const std::string& lhs, const std::string& rhs)
{
using namespace std;
const streambuf::int_type eof = streambuf::traits_type::eof();

ifstream lhsfile(lhs.c_str());
ifstream rhsfile(rhs.c_str());

streambuf * lhsbuf = lhsfile.rdbuf();
streambuf * rhsbuf = rhsfile.rdbuf();
Since only the stream buffer interface is used, you can directly
create instances of std::filebuf instead of an ifstream.
char lhschar, rhschar;
These two variables should be of type int_type. char may be unable
to represent eof (or be equal to eof when it should not, e.g.
when reading 0xFF on an implementation where char is signed).
while (true)
{
lhschar = lhsbuf->sbumpc();
rhschar = rhsbuf->sbumpc();

if (lhschar == eof && rhschar == eof) return true;
if (lhschar == eof || rhschar == eof) break;
if (lhschar != rhschar) break;
}
or:
do {
lhschar = lhsbuf.sbumpc();
rhschar = rhsbuf.sbumpc();
if( lhschar != rhschar ) return false;
} while( lhschar != eof );
return true;


Cheers,
Ivan
 
H

Howard

I think the first thing I'd do it check if the file sizes are the same. No
need to read tthrough the file looking for differences if they're different
sizes. I'm not familiar with how to check file size, but if that's easy
enough to do, you might want to throw in a check for that equality before
bothering to check the contents. Just a thought...

-Howard
 
S

Siemel Naran

Howard said:
I think the first thing I'd do it check if the file sizes are the same. No
need to read tthrough the file looking for differences if they're different
sizes. I'm not familiar with how to check file size, but if that's easy
enough to do, you might want to throw in a check for that equality before
bothering to check the contents. Just a thought...

This is the ideal solution, then I can continue to use std::equal as in my
original code. However, the standard does not provide a way to find the
file size without opening it and scanning to the last character. Opening
the file, calling file.seekg(ios::end) followed by file.tellp() is allowed
to return 0 rather than the actual byte position though my implementation
does in fact return the file size. There is a function stat, and it's on
Windows and Linux, but it's not ANSI standard (though maybe it should be).
I know that boost also has some way to get the file size, and I imagine the
implementation calls stat on Windows and Linux, etc.
 
J

Jeff Flinn

Siemel said:
This is the ideal solution, then I can continue to use std::equal as
in my original code. However, the standard does not provide a way to
find the file size without opening it and scanning to the last
character. Opening the file, calling file.seekg(ios::end) followed
by file.tellp() is allowed to return 0 rather than the actual byte
position though my implementation does in fact return the file size.
There is a function stat, and it's on Windows and Linux, but it's not
ANSI standard (though maybe it should be). I know that boost also has
some way to get the file size, and

Yes, at:

http://www.boost.org/libs/filesystem/doc/operations.htm#file_size

I imagine the implementation calls stat on Windows and Linux, etc.

Windows: GetFileAttributes()
POSIX: stat()

Jeff Flinn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top