fstream, getline() and failbit

R

rory

I am reading a binary file and I want to search it for a string. The
only problem is that failbit gets set after only a few calls to
getline() so it never reaches the end of the file where the string is
contained. From reading through posts to this list it seems that
failbit gets set if there is a format error whilst reading. Is it bad
form to reading binary data into a char[] array? Is this why my
function below doesn't work?

void ReadBinData()
{
int reads=0;
string data;
char str[1024];
fstream myFile ("test.exe", ios::in | ios::binary);
if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
cout << "error";

while(myFile.getline(str, 1024 ))
{
data = str;
if(data.find("roryrory", 0)!=string::npos)
cout << "found it";
reads++;
}

cout << "\nno of times getline was called = " << reads << endl;

if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
cout << "\nerror, failbit set....";

myFile.close();
}

Rory.
 
L

Lars Uffmann

rory said:
while(myFile.getline(str, 1024 ))

Try read (str, 1024) instead of getline (can't use "while (read (...))"
then though) - why would you want to read "lines" from a binary (.exe)
file anyways?

However that should not be the cause of your problem. What COULD happen
is that your search expression will get split over 2 different
"getlines" in your case, when 1023 bytes have been read without finding
a newline delimiter. str will then for example contain "roryr\n" in the
last 6 bytes, and on the next call will be filled with "ory" in the
first 3 bytes. You'd never find your search expression.
And the failbit might well be set after "only a few" calls to getline,
if the executable only contains a few newlines and isn't much bigger
than a couple of kilobytes.. So long story short: don't use getline!
Then see if your problem still occurs.

Best Regards,

Lars
 
R

rory

Try read (str, 1024) instead of getline (can't use "while (read (...))"
then though) - why would you want to read "lines" from a binary (.exe)
file anyways?

However that should not be the cause of your problem. What COULD happen
is that your search expression will get split over 2 different
"getlines" in your case, when 1023 bytes have been read without finding
a newline delimiter. str will then for example contain "roryr\n" in the
last 6 bytes, and on the next call will be filled with "ory" in the
first 3 bytes. You'd never find your search expression.
And the failbit might well be set after "only a few" calls to getline,
if the executable only contains a few newlines and isn't much bigger
than a couple of kilobytes.. So long story short: don't use getline!
Then see if your problem still occurs.

Best Regards,

Lars

Thanks Lars, using read it seems to read the entire file, the filesize
is 3.830 mbs and read is called 3829 times. My next problem is one you
alluded to, reading blocks of data means the string could get chopped
up which is the last thing I want. The idea is that I append a unique
string identifier to a binary file, then I append some text after it.
I then want to search that file for the unique string identifier and
then retrieve the text that follows it. Before writing the unique
string I first write a newline char, that's why I thought I could just
use getline() as it runs until a new line. Valid point however that it
might not always get to a new line. Have you any suggestions for me on
how I might do this? Thanks for the reply,

Rory.
 
J

Jerry Coffin

[ ... ]
Thanks Lars, using read it seems to read the entire file, the filesize
is 3.830 mbs and read is called 3829 times. My next problem is one you
alluded to, reading blocks of data means the string could get chopped
up which is the last thing I want. The idea is that I append a unique
string identifier to a binary file, then I append some text after it.
I then want to search that file for the unique string identifier and
then retrieve the text that follows it. Before writing the unique
string I first write a newline char, that's why I thought I could just
use getline() as it runs until a new line. Valid point however that it
might not always get to a new line. Have you any suggestions for me on
how I might do this? Thanks for the reply,

My guess is that the failure is due to some value in the file being
interpreted as signaling the end of the file when it's treated as text.
Unix generally treats control-D this way; for Windows it's control-Z.
Regardless, you need to tell your stream not to interpret control
characters that way, by opening it as a binary stream:

std::ifstream file(your_file_name, std::ios::binary);

std::stringstream temp;

// copy the file into a string
temp << file.rdbuf();

// marker for the beginning of your data:
std::string sentinel("\nroryrory");

// find your data (std::string::npos if it doesn't exist)
int data_pos = temp.str().find(sentinel)+sentinel.length();
 
P

Pete Becker

My guess is that the failure is due to some value in the file being
interpreted as signaling the end of the file when it's treated as text.
Unix generally treats control-D this way; for Windows it's control-Z.
Regardless, you need to tell your stream not to interpret control
characters that way, by opening it as a binary stream:

Yes, that's the right way to read a binary file. But having done that,
the runtime library also won't translate the character sequence that
represents a newline into the character '\n'. It's binary data all the
way...
// marker for the beginning of your data:
std::string sentinel("\nroryrory");

That '\n' at the beginning may or may not match some sequence of bytes
that was written to the file. Search for "roryrory" instead.
 
R

rory

Thanks for the help, I wasn't aware of the stringstream class. It is
working better now but I still have one or two little problems. Here
is the function I use to append the identifier string and subsequent
text to my binary file:

void WriteBinData()
{
ofstream myFile ("test.exe", ios::eek:ut | ios::binary |ios::app);
if((myFile.rdstate() & ofstream::failbit | ofstream::badbit)!=0)
myFile.write ("\nroryrory\n", 12);
myFile.write ("bingo was his namo", 18);
if(!myFile) cout << "Error" ;
myFile.close();
}

And here is my new read function:

void ReadBinData()
{
std::ifstream file("test.exe", std::ios::binary);
stringstream temp(" ");
temp << file.rdbuf();
std::string sentinel("roryrory");
char data[1024];
std::string myText = "";
int data_pos = temp.str().find(sentinel)+sentinel.length();
myText = temp.str().substr(data_pos, 50);
cout << myText << endl;
file.close();
}

As you can see I am using the position returned from find in order to
retrieve the rest of my text. The strange this is it keeps returning
this string:

bingo was hbingo

??? The other thing is that if I leave out the length of the substr
the program starts beeping and spits out lots of rubbish characters
over and over, I couldn't kill it, I had to just press the power
button for a few seconds and reboot. Any ideas on what's going on?
Ideally I would like not to have to pass a length value when using
substr() but I guess the length could be written just after the
identifier if needs be.

Rory.
 
B

benj

I think this take care of the '\n' problem also

#include <iostream>
#include <fstream>
using namespace std;

int main()
{
ifstream::pos_type size;
char * memblock;

ifstream iFile;
iFile.open("asdf.txt",ios::in|ios::binary|ios::ate);
if(iFile.is_open())
{
size = iFile.tellg(); //get the size of file
memblock = new char[size]; //memblock needs size byte to hold all
data
iFile.seekg(0,ios::beg); //go back to beginning of file
iFile.read(memblock,size); //read whole file in memblock
iFile.close(); //close the file
}
char * pos;
pos = strchr(&memblock[0],'r'); //find the position of the first
'r'
while( strncmp(pos,"roryrory",8) ) //compare 8 char size
{
pos = strchr(pos+1,'a'); // keep looking to the next char
}
return 0;
}
 
R

rory

That code causes my program to crash. It seems to die at the while
loop, if I place a cout in there it never gets printed and I get a
'program has encountered a problem and needs to close' notice. My
previous version, the one using stringstream was finding the correct
string and returning the right position but when I tried reading from
that position on I get a strange string. I don't know why? Thanks for
replying, I am getting closer to finding the problem but will have to
leave it till tomorrow, it's bedtime!

Rory.
 
K

kasthurirangan.balaji

That code causes my program to crash. It seems to die at the while
loop, if I place a cout in there it never gets printed and I get a
'program has encountered a problem and needs to close' notice. My
previous version, the one using stringstream was finding the correct
string and returning the right position but when I tried reading from
that position on I get a strange string. I don't know why? Thanks for
replying, I am getting closer to finding the problem but will have to
leave it till tomorrow, it's bedtime!

Rory.

This code worked for me.

#include <fstream>
#include <iostream>
#include <sstream>
#include <string>

main()
{
std::ifstream ifstr("new",std::ios::binary);
std::stringstream temp;
temp << ifstr.rdbuf();
const std::string sentinel("roryrory");
const std::string::size_type data_pos(temp.str().find(sentinel,
0)+sentinel.length());
const std::string myText(temp.str().substr(data_pos,50));
std::cout << myText << '\n';
}


Thanks,
Balaji.
 
R

rory

Your code work here too. The only difference I can spot is that you
used const std::string's. I'm delighted that it now works for me but
can someone explain *why* it didn't work using plain old std::strings?
Thanks to everyone who's replied, I can now move forward with my
project.

Rory.
 
J

James Kanze

[ ... ]
Thanks Lars, using read it seems to read the entire file, the filesize
is 3.830 mbs and read is called 3829 times. My next problem is one you
alluded to, reading blocks of data means the string could get chopped
up which is the last thing I want. The idea is that I append a unique
string identifier to a binary file, then I append some text after it.
I then want to search that file for the unique string identifier and
then retrieve the text that follows it. Before writing the unique
string I first write a newline char, that's why I thought I could just
use getline() as it runs until a new line. Valid point however that it
might not always get to a new line. Have you any suggestions for me on
how I might do this? Thanks for the reply,
My guess is that the failure is due to some value in the file being
interpreted as signaling the end of the file when it's treated as text.

That's one possibility. Another is simply that his buffer isn't
big enough to hold the longest "line". getline() will set the
failbit if it encounters the end of the buffer before it sees a
'\n' character.
Unix generally treats control-D this way; for Windows it's control-Z.

Unix never treats control-D this way in a file. Under Unix,
there is absolutely no difference between binary files and text
files.
Regardless, you need to tell your stream not to interpret
control characters that way, by opening it as a binary stream:
std::ifstream file(your_file_name, std::ios::binary);

And of course, he'll also have to write the file in binary mode;
otherwise, some of the output data might be modified.
std::stringstream temp;
// copy the file into a string
temp << file.rdbuf();
// marker for the beginning of your data:
std::string sentinel("\nroryrory");
// find your data (std::string::npos if it doesn't exist)
int data_pos = temp.str().find(sentinel)+sentinel.length();

Depending on the implementation, that might not be such a good
idea; some implementations of stringstream grow the string very
inefficiently (and it will be 3.8 MB). If he can determine the
size of the file before hand, resizing an std::vector<char> and
reading the entire file into it, then using std::search might be
a good option. (If portability isn't a concern, mmap'ing the
file is likely to be the fastest solution.) Otherwise, a KMP
search is pretty straightforward, and since it never requires
backing up, it avoids the problem of the sequence being split
across two successive buffers. Or if he needs an even faster
algorithm (BM, for example), he can save a block the size of the
sentinel at the start of the buffer, copy the end of the
preceding buffer into it before each read, and start his search
from there.
 
J

James Kanze

On 2008-01-23 10:25:24 -0500, Jerry Coffin <[email protected]> said:
Yes, that's the right way to read a binary file. But having
done that, the runtime library also won't translate the
character sequence that represents a newline into the
character '\n'. It's binary data all the way...
That '\n' at the beginning may or may not match some sequence
of bytes that was written to the file. Search for "roryrory"
instead.

If it's binary data, he'd better have used binary mode when he
wrote it as well. In which case, reading it in binary mode
will return exactly the same bytes he wrote.
 
D

Daniel T.

rory said:
I am reading a binary file and I want to search it for a string. The
only problem is that failbit gets set after only a few calls to
getline() so it never reaches the end of the file where the string is
contained. From reading through posts to this list it seems that
failbit gets set if there is a format error whilst reading. Is it bad
form to reading binary data into a char[] array? Is this why my
function below doesn't work?

void ReadBinData()
{
int reads=0;
string data;
char str[1024];
fstream myFile ("test.exe", ios::in | ios::binary);
if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
cout << "error";

while(myFile.getline(str, 1024 ))
{
data = str;
if(data.find("roryrory", 0)!=string::npos)
cout << "found it";
reads++;
}

cout << "\nno of times getline was called = " << reads << endl;

if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
cout << "\nerror, failbit set....";

myFile.close();
}

Is there a particular reason why you can't use a standard algorithm?

void ReadBinData()
{
fstream myFile("test.exe", ios::in | ios::binary);
const char* rory = "roryrory";
search( istream_iterator<char>( myFile ), istream_iterator<char>(),
rory, rory + strlen( rory ) );
if ( myFile )
{
cout << "found it\n";
}
else if ( myFile.eof() )
{
cout << "not found\n";
}
else
cout << "error\n";
myFile.close();
}
 
P

Pete Becker

If it's binary data, he'd better have used binary mode when he
wrote it as well. In which case, reading it in binary mode
will return exactly the same bytes he wrote.

Yes, certainly: if you assume something that wasn't in the problem
statement, then the solution can be different.
 
J

Jerry Coffin

[ ... ]
Unix never treats control-D this way in a file. Under Unix,
there is absolutely no difference between binary files and text
files.

Not really true. What C++ sees as a file can be treated by Unix in
either raw or cooked mode. From one viewpoint, that applies only to
devices, not files -- but Unix being Unix, devices are seen as files, so
what's seen as a file can just as easily be a device as not.
And of course, he'll also have to write the file in binary mode;
otherwise, some of the output data might be modified.

Quite true -- but then, if he's writing to a binary file, that would be
a really good idea in any case.

[ ... ]
Depending on the implementation, that might not be such a good
idea; some implementations of stringstream grow the string very
inefficiently (and it will be 3.8 MB).

True -- I was trying to concentrate on the problem he was seeing, and
wrote the rest more to be short than to be the ultimate in efficiency.
If he can determine the
size of the file before hand, resizing an std::vector<char> and
reading the entire file into it, then using std::search might be
a good option. (If portability isn't a concern, mmap'ing the
file is likely to be the fastest solution.) Otherwise, a KMP
search is pretty straightforward, and since it never requires
backing up, it avoids the problem of the sequence being split
across two successive buffers. Or if he needs an even faster
algorithm (BM, for example), he can save a block the size of the
sentinel at the start of the buffer, copy the end of the
preceding buffer into it before each read, and start his search
from there.

If he really wants an efficient solution, I'd do things a bit
differently in general. Specifically, I'd append the position of the
data to the very end of the file. Seek to the position of the pointer,
read it in, seek to the correct position, and read the real data.

This does have some shortcomings of course. At least in theory, it's not
portable because the implementation is allowed to append an arbitrary
number of zero bytes to the end of a binary file. In reality, it's
unlikely that he cares about porting to the ancient systems (e.g. CP/M)
that actually did this. This idea also only works once -- i.e. if you
append any other data after this block, what you read at the end won't
be a pointer to your data.

Of course, the better solution would be to design a file format that
really accommodates the data you're putting into it instead of
attempting to hack something together on top of an existing format that
apparently doesn't support what's really needed. Unfortunately, this is
sometimes impractical, which may be the case here.
 
J

James Kanze

[ ... ]
Unix never treats control-D this way in a file. Under Unix,
there is absolutely no difference between binary files and text
files.
Not really true. What C++ sees as a file can be treated by
Unix in either raw or cooked mode. From one viewpoint, that
applies only to devices, not files -- but Unix being Unix,
devices are seen as files, so what's seen as a file can just
as easily be a device as not.

Files under Unix don't have raw or cooked modes. Only tty's do.
And the exact specification of handling control-D in cooked mode
(the normal case) on a tty isn't to generate EOF, in the usual
sense, although if it is at the beginning of a line, it will end
up being treated as an EOF in the C++ library (usually---I've
had some weird cases where it wasn't). It's radically different
from control-Z under Windows, which is recognized by filebuf in
text mode (but not in binary), and really is treated as an end
of file.

Note that the behavior of control-D under Unix is also
independent of the open mode of the file.
[ ... ]
Depending on the implementation, that might not be such a
good idea; some implementations of stringstream grow the
string very inefficiently (and it will be 3.8 MB).
True -- I was trying to concentrate on the problem he was
seeing, and wrote the rest more to be short than to be the
ultimate in efficiency.

I know, but he seemed to pick up on the use of stringstream more
than any of the other details:).
If he really wants an efficient solution, I'd do things a bit
differently in general. Specifically, I'd append the position
of the data to the very end of the file. Seek to the position
of the pointer, read it in, seek to the correct position, and
read the real data.

Or, seeing as how the postfix seems to be fairly small, just
read the last one or two KBytes of the file, and start searching
there. Or design it so that the postfix he's adding always has
a fixed length (even if it means padding in some cases.) Or if
portability isn't an issue, mmap the entire file, then do a BM
search backwards, from the end.

You're entirely right that reading some 3MB to find something
that you know is at the end, if it is there at all, is far from
the best solution.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top