fstream, getline() and failbit

Discussion in 'C++' started by rory, Jan 23, 2008.

  1. rory

    rory Guest

    I am reading a binary file and I want to search it for a string. The
    only problem is that failbit gets set after only a few calls to
    getline() so it never reaches the end of the file where the string is
    contained. From reading through posts to this list it seems that
    failbit gets set if there is a format error whilst reading. Is it bad
    form to reading binary data into a char[] array? Is this why my
    function below doesn't work?

    void ReadBinData()
    {
    int reads=0;
    string data;
    char str[1024];
    fstream myFile ("test.exe", ios::in | ios::binary);
    if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
    cout << "error";

    while(myFile.getline(str, 1024 ))
    {
    data = str;
    if(data.find("roryrory", 0)!=string::npos)
    cout << "found it";
    reads++;
    }

    cout << "\nno of times getline was called = " << reads << endl;

    if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
    cout << "\nerror, failbit set....";

    myFile.close();
    }

    Rory.
     
    rory, Jan 23, 2008
    #1
    1. Advertising

  2. rory

    Lars Uffmann Guest

    rory wrote:
    > while(myFile.getline(str, 1024 ))


    Try read (str, 1024) instead of getline (can't use "while (read (...))"
    then though) - why would you want to read "lines" from a binary (.exe)
    file anyways?

    However that should not be the cause of your problem. What COULD happen
    is that your search expression will get split over 2 different
    "getlines" in your case, when 1023 bytes have been read without finding
    a newline delimiter. str will then for example contain "roryr\n" in the
    last 6 bytes, and on the next call will be filled with "ory" in the
    first 3 bytes. You'd never find your search expression.
    And the failbit might well be set after "only a few" calls to getline,
    if the executable only contains a few newlines and isn't much bigger
    than a couple of kilobytes.. So long story short: don't use getline!
    Then see if your problem still occurs.

    Best Regards,

    Lars
     
    Lars Uffmann, Jan 23, 2008
    #2
    1. Advertising

  3. rory

    rory Guest

    On Jan 23, 1:43 pm, Lars Uffmann <> wrote:
    > rory wrote:
    > > while(myFile.getline(str, 1024 ))

    >
    > Try read (str, 1024) instead of getline (can't use "while (read (...))"
    > then though) - why would you want to read "lines" from a binary (.exe)
    > file anyways?
    >
    > However that should not be the cause of your problem. What COULD happen
    > is that your search expression will get split over 2 different
    > "getlines" in your case, when 1023 bytes have been read without finding
    > a newline delimiter. str will then for example contain "roryr\n" in the
    > last 6 bytes, and on the next call will be filled with "ory" in the
    > first 3 bytes. You'd never find your search expression.
    > And the failbit might well be set after "only a few" calls to getline,
    > if the executable only contains a few newlines and isn't much bigger
    > than a couple of kilobytes.. So long story short: don't use getline!
    > Then see if your problem still occurs.
    >
    > Best Regards,
    >
    > Lars


    Thanks Lars, using read it seems to read the entire file, the filesize
    is 3.830 mbs and read is called 3829 times. My next problem is one you
    alluded to, reading blocks of data means the string could get chopped
    up which is the last thing I want. The idea is that I append a unique
    string identifier to a binary file, then I append some text after it.
    I then want to search that file for the unique string identifier and
    then retrieve the text that follows it. Before writing the unique
    string I first write a newline char, that's why I thought I could just
    use getline() as it runs until a new line. Valid point however that it
    might not always get to a new line. Have you any suggestions for me on
    how I might do this? Thanks for the reply,

    Rory.
     
    rory, Jan 23, 2008
    #3
  4. rory

    Jerry Coffin Guest

    In article <ed81c061-791b-4d38-ac59-
    >, says...

    [ ... ]

    > Thanks Lars, using read it seems to read the entire file, the filesize
    > is 3.830 mbs and read is called 3829 times. My next problem is one you
    > alluded to, reading blocks of data means the string could get chopped
    > up which is the last thing I want. The idea is that I append a unique
    > string identifier to a binary file, then I append some text after it.
    > I then want to search that file for the unique string identifier and
    > then retrieve the text that follows it. Before writing the unique
    > string I first write a newline char, that's why I thought I could just
    > use getline() as it runs until a new line. Valid point however that it
    > might not always get to a new line. Have you any suggestions for me on
    > how I might do this? Thanks for the reply,


    My guess is that the failure is due to some value in the file being
    interpreted as signaling the end of the file when it's treated as text.
    Unix generally treats control-D this way; for Windows it's control-Z.
    Regardless, you need to tell your stream not to interpret control
    characters that way, by opening it as a binary stream:

    std::ifstream file(your_file_name, std::ios::binary);

    std::stringstream temp;

    // copy the file into a string
    temp << file.rdbuf();

    // marker for the beginning of your data:
    std::string sentinel("\nroryrory");

    // find your data (std::string::npos if it doesn't exist)
    int data_pos = temp.str().find(sentinel)+sentinel.length();

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Jan 23, 2008
    #4
  5. rory

    Pete Becker Guest

    On 2008-01-23 10:25:24 -0500, Jerry Coffin <> said:

    >
    > My guess is that the failure is due to some value in the file being
    > interpreted as signaling the end of the file when it's treated as text.
    > Unix generally treats control-D this way; for Windows it's control-Z.
    > Regardless, you need to tell your stream not to interpret control
    > characters that way, by opening it as a binary stream:


    Yes, that's the right way to read a binary file. But having done that,
    the runtime library also won't translate the character sequence that
    represents a newline into the character '\n'. It's binary data all the
    way...

    >
    > // marker for the beginning of your data:
    > std::string sentinel("\nroryrory");
    >


    That '\n' at the beginning may or may not match some sequence of bytes
    that was written to the file. Search for "roryrory" instead.

    --
    Pete
    Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
    Standard C++ Library Extensions: a Tutorial and Reference
    (www.petebecker.com/tr1book)
     
    Pete Becker, Jan 23, 2008
    #5
  6. rory

    rory Guest

    Thanks for the help, I wasn't aware of the stringstream class. It is
    working better now but I still have one or two little problems. Here
    is the function I use to append the identifier string and subsequent
    text to my binary file:

    void WriteBinData()
    {
    ofstream myFile ("test.exe", ios::eek:ut | ios::binary |ios::app);
    if((myFile.rdstate() & ofstream::failbit | ofstream::badbit)!=0)
    myFile.write ("\nroryrory\n", 12);
    myFile.write ("bingo was his namo", 18);
    if(!myFile) cout << "Error" ;
    myFile.close();
    }

    And here is my new read function:

    void ReadBinData()
    {
    std::ifstream file("test.exe", std::ios::binary);
    stringstream temp(" ");
    temp << file.rdbuf();
    std::string sentinel("roryrory");
    char data[1024];
    std::string myText = "";
    int data_pos = temp.str().find(sentinel)+sentinel.length();
    myText = temp.str().substr(data_pos, 50);
    cout << myText << endl;
    file.close();
    }

    As you can see I am using the position returned from find in order to
    retrieve the rest of my text. The strange this is it keeps returning
    this string:

    bingo was hbingo

    ??? The other thing is that if I leave out the length of the substr
    the program starts beeping and spits out lots of rubbish characters
    over and over, I couldn't kill it, I had to just press the power
    button for a few seconds and reboot. Any ideas on what's going on?
    Ideally I would like not to have to pass a length value when using
    substr() but I guess the length could be written just after the
    identifier if needs be.

    Rory.
     
    rory, Jan 23, 2008
    #6
  7. rory

    benj Guest

    I think this take care of the '\n' problem also

    #include <iostream>
    #include <fstream>
    using namespace std;

    int main()
    {
    ifstream::pos_type size;
    char * memblock;

    ifstream iFile;
    iFile.open("asdf.txt",ios::in|ios::binary|ios::ate);
    if(iFile.is_open())
    {
    size = iFile.tellg(); //get the size of file
    memblock = new char[size]; //memblock needs size byte to hold all
    data
    iFile.seekg(0,ios::beg); //go back to beginning of file
    iFile.read(memblock,size); //read whole file in memblock
    iFile.close(); //close the file
    }
    char * pos;
    pos = strchr(&memblock[0],'r'); //find the position of the first
    'r'
    while( strncmp(pos,"roryrory",8) ) //compare 8 char size
    {
    pos = strchr(pos+1,'a'); // keep looking to the next char
    }
    return 0;
    }
     
    benj, Jan 23, 2008
    #7
  8. rory

    rory Guest

    That code causes my program to crash. It seems to die at the while
    loop, if I place a cout in there it never gets printed and I get a
    'program has encountered a problem and needs to close' notice. My
    previous version, the one using stringstream was finding the correct
    string and returning the right position but when I tried reading from
    that position on I get a strange string. I don't know why? Thanks for
    replying, I am getting closer to finding the problem but will have to
    leave it till tomorrow, it's bedtime!

    Rory.
     
    rory, Jan 23, 2008
    #8
  9. rory

    Guest

    On Jan 24, 4:55 am, rory <> wrote:
    > That code causes my program to crash. It seems to die at the while
    > loop, if I place a cout in there it never gets printed and I get a
    > 'program has encountered a problem and needs to close' notice. My
    > previous version, the one using stringstream was finding the correct
    > string and returning the right position but when I tried reading from
    > that position on I get a strange string. I don't know why? Thanks for
    > replying, I am getting closer to finding the problem but will have to
    > leave it till tomorrow, it's bedtime!
    >
    > Rory.


    This code worked for me.

    #include <fstream>
    #include <iostream>
    #include <sstream>
    #include <string>

    main()
    {
    std::ifstream ifstr("new",std::ios::binary);
    std::stringstream temp;
    temp << ifstr.rdbuf();
    const std::string sentinel("roryrory");
    const std::string::size_type data_pos(temp.str().find(sentinel,
    0)+sentinel.length());
    const std::string myText(temp.str().substr(data_pos,50));
    std::cout << myText << '\n';
    }


    Thanks,
    Balaji.
     
    , Jan 24, 2008
    #9
  10. rory

    rory Guest

    Your code work here too. The only difference I can spot is that you
    used const std::string's. I'm delighted that it now works for me but
    can someone explain *why* it didn't work using plain old std::strings?
    Thanks to everyone who's replied, I can now move forward with my
    project.

    Rory.
     
    rory, Jan 24, 2008
    #10
  11. rory

    James Kanze Guest

    On Jan 23, 4:25 pm, Jerry Coffin <> wrote:
    > In article <ed81c061-791b-4d38-ac59-
    > >, says...


    > [ ... ]


    > > Thanks Lars, using read it seems to read the entire file, the filesize
    > > is 3.830 mbs and read is called 3829 times. My next problem is one you
    > > alluded to, reading blocks of data means the string could get chopped
    > > up which is the last thing I want. The idea is that I append a unique
    > > string identifier to a binary file, then I append some text after it.
    > > I then want to search that file for the unique string identifier and
    > > then retrieve the text that follows it. Before writing the unique
    > > string I first write a newline char, that's why I thought I could just
    > > use getline() as it runs until a new line. Valid point however that it
    > > might not always get to a new line. Have you any suggestions for me on
    > > how I might do this? Thanks for the reply,


    > My guess is that the failure is due to some value in the file being
    > interpreted as signaling the end of the file when it's treated as text.


    That's one possibility. Another is simply that his buffer isn't
    big enough to hold the longest "line". getline() will set the
    failbit if it encounters the end of the buffer before it sees a
    '\n' character.

    > Unix generally treats control-D this way; for Windows it's control-Z.


    Unix never treats control-D this way in a file. Under Unix,
    there is absolutely no difference between binary files and text
    files.

    > Regardless, you need to tell your stream not to interpret
    > control characters that way, by opening it as a binary stream:


    > std::ifstream file(your_file_name, std::ios::binary);


    And of course, he'll also have to write the file in binary mode;
    otherwise, some of the output data might be modified.

    > std::stringstream temp;


    > // copy the file into a string
    > temp << file.rdbuf();


    > // marker for the beginning of your data:
    > std::string sentinel("\nroryrory");


    > // find your data (std::string::npos if it doesn't exist)
    > int data_pos = temp.str().find(sentinel)+sentinel.length();


    Depending on the implementation, that might not be such a good
    idea; some implementations of stringstream grow the string very
    inefficiently (and it will be 3.8 MB). If he can determine the
    size of the file before hand, resizing an std::vector<char> and
    reading the entire file into it, then using std::search might be
    a good option. (If portability isn't a concern, mmap'ing the
    file is likely to be the fastest solution.) Otherwise, a KMP
    search is pretty straightforward, and since it never requires
    backing up, it avoids the problem of the sequence being split
    across two successive buffers. Or if he needs an even faster
    algorithm (BM, for example), he can save a block the size of the
    sentinel at the start of the buffer, copy the end of the
    preceding buffer into it before each read, and start his search
    from there.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jan 24, 2008
    #11
  12. rory

    James Kanze Guest

    On Jan 23, 5:12 pm, Pete Becker <> wrote:
    > On 2008-01-23 10:25:24 -0500, Jerry Coffin <> said:


    > Yes, that's the right way to read a binary file. But having
    > done that, the runtime library also won't translate the
    > character sequence that represents a newline into the
    > character '\n'. It's binary data all the way...


    > > // marker for the beginning of your data:
    > > std::string sentinel("\nroryrory");


    > That '\n' at the beginning may or may not match some sequence
    > of bytes that was written to the file. Search for "roryrory"
    > instead.


    If it's binary data, he'd better have used binary mode when he
    wrote it as well. In which case, reading it in binary mode
    will return exactly the same bytes he wrote.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jan 24, 2008
    #12
  13. rory

    Daniel T. Guest

    rory <> wrote:

    > I am reading a binary file and I want to search it for a string. The
    > only problem is that failbit gets set after only a few calls to
    > getline() so it never reaches the end of the file where the string is
    > contained. From reading through posts to this list it seems that
    > failbit gets set if there is a format error whilst reading. Is it bad
    > form to reading binary data into a char[] array? Is this why my
    > function below doesn't work?
    >
    > void ReadBinData()
    > {
    > int reads=0;
    > string data;
    > char str[1024];
    > fstream myFile ("test.exe", ios::in | ios::binary);
    > if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
    > cout << "error";
    >
    > while(myFile.getline(str, 1024 ))
    > {
    > data = str;
    > if(data.find("roryrory", 0)!=string::npos)
    > cout << "found it";
    > reads++;
    > }
    >
    > cout << "\nno of times getline was called = " << reads << endl;
    >
    > if ( (myFile.rdstate() & ifstream::failbit ) != 0 )
    > cout << "\nerror, failbit set....";
    >
    > myFile.close();
    > }


    Is there a particular reason why you can't use a standard algorithm?

    void ReadBinData()
    {
    fstream myFile("test.exe", ios::in | ios::binary);
    const char* rory = "roryrory";
    search( istream_iterator<char>( myFile ), istream_iterator<char>(),
    rory, rory + strlen( rory ) );
    if ( myFile )
    {
    cout << "found it\n";
    }
    else if ( myFile.eof() )
    {
    cout << "not found\n";
    }
    else
    cout << "error\n";
    myFile.close();
    }
     
    Daniel T., Jan 24, 2008
    #13
  14. rory

    Pete Becker Guest

    On 2008-01-24 06:02:09 -0500, James Kanze <> said:

    > On Jan 23, 5:12 pm, Pete Becker <> wrote:
    >> On 2008-01-23 10:25:24 -0500, Jerry Coffin <> said:

    >
    >> Yes, that's the right way to read a binary file. But having
    >> done that, the runtime library also won't translate the
    >> character sequence that represents a newline into the
    >> character '\n'. It's binary data all the way...

    >
    >>> // marker for the beginning of your data:
    >>> std::string sentinel("\nroryrory");

    >
    >> That '\n' at the beginning may or may not match some sequence
    >> of bytes that was written to the file. Search for "roryrory"
    >> instead.

    >
    > If it's binary data, he'd better have used binary mode when he
    > wrote it as well. In which case, reading it in binary mode
    > will return exactly the same bytes he wrote.


    Yes, certainly: if you assume something that wasn't in the problem
    statement, then the solution can be different.

    --
    Pete
    Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
    Standard C++ Library Extensions: a Tutorial and Reference
    (www.petebecker.com/tr1book)
     
    Pete Becker, Jan 24, 2008
    #14
  15. rory

    Jerry Coffin Guest

    In article <ccabd59b-9776-486d-b19f-
    >, says...

    [ ... ]

    > > Unix generally treats control-D this way; for Windows it's control-Z.

    >
    > Unix never treats control-D this way in a file. Under Unix,
    > there is absolutely no difference between binary files and text
    > files.


    Not really true. What C++ sees as a file can be treated by Unix in
    either raw or cooked mode. From one viewpoint, that applies only to
    devices, not files -- but Unix being Unix, devices are seen as files, so
    what's seen as a file can just as easily be a device as not.

    > > Regardless, you need to tell your stream not to interpret
    > > control characters that way, by opening it as a binary stream:

    >
    > > std::ifstream file(your_file_name, std::ios::binary);

    >
    > And of course, he'll also have to write the file in binary mode;
    > otherwise, some of the output data might be modified.


    Quite true -- but then, if he's writing to a binary file, that would be
    a really good idea in any case.

    [ ... ]

    > Depending on the implementation, that might not be such a good
    > idea; some implementations of stringstream grow the string very
    > inefficiently (and it will be 3.8 MB).


    True -- I was trying to concentrate on the problem he was seeing, and
    wrote the rest more to be short than to be the ultimate in efficiency.

    > If he can determine the
    > size of the file before hand, resizing an std::vector<char> and
    > reading the entire file into it, then using std::search might be
    > a good option. (If portability isn't a concern, mmap'ing the
    > file is likely to be the fastest solution.) Otherwise, a KMP
    > search is pretty straightforward, and since it never requires
    > backing up, it avoids the problem of the sequence being split
    > across two successive buffers. Or if he needs an even faster
    > algorithm (BM, for example), he can save a block the size of the
    > sentinel at the start of the buffer, copy the end of the
    > preceding buffer into it before each read, and start his search
    > from there.


    If he really wants an efficient solution, I'd do things a bit
    differently in general. Specifically, I'd append the position of the
    data to the very end of the file. Seek to the position of the pointer,
    read it in, seek to the correct position, and read the real data.

    This does have some shortcomings of course. At least in theory, it's not
    portable because the implementation is allowed to append an arbitrary
    number of zero bytes to the end of a binary file. In reality, it's
    unlikely that he cares about porting to the ancient systems (e.g. CP/M)
    that actually did this. This idea also only works once -- i.e. if you
    append any other data after this block, what you read at the end won't
    be a pointer to your data.

    Of course, the better solution would be to design a file format that
    really accommodates the data you're putting into it instead of
    attempting to hack something together on top of an existing format that
    apparently doesn't support what's really needed. Unfortunately, this is
    sometimes impractical, which may be the case here.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Jan 25, 2008
    #15
  16. rory

    James Kanze Guest

    On Jan 25, 3:52 am, Jerry Coffin <> wrote:
    > In article <ccabd59b-9776-486d-b19f-
    > >, says...


    > [ ... ]


    > > > Unix generally treats control-D this way; for Windows it's
    > > > control-Z.


    > > Unix never treats control-D this way in a file. Under Unix,
    > > there is absolutely no difference between binary files and text
    > > files.


    > Not really true. What C++ sees as a file can be treated by
    > Unix in either raw or cooked mode. From one viewpoint, that
    > applies only to devices, not files -- but Unix being Unix,
    > devices are seen as files, so what's seen as a file can just
    > as easily be a device as not.


    Files under Unix don't have raw or cooked modes. Only tty's do.
    And the exact specification of handling control-D in cooked mode
    (the normal case) on a tty isn't to generate EOF, in the usual
    sense, although if it is at the beginning of a line, it will end
    up being treated as an EOF in the C++ library (usually---I've
    had some weird cases where it wasn't). It's radically different
    from control-Z under Windows, which is recognized by filebuf in
    text mode (but not in binary), and really is treated as an end
    of file.

    Note that the behavior of control-D under Unix is also
    independent of the open mode of the file.

    > [ ... ]


    > > Depending on the implementation, that might not be such a
    > > good idea; some implementations of stringstream grow the
    > > string very inefficiently (and it will be 3.8 MB).


    > True -- I was trying to concentrate on the problem he was
    > seeing, and wrote the rest more to be short than to be the
    > ultimate in efficiency.


    I know, but he seemed to pick up on the use of stringstream more
    than any of the other details:).

    > > If he can determine the
    > > size of the file before hand, resizing an std::vector<char> and
    > > reading the entire file into it, then using std::search might be
    > > a good option. (If portability isn't a concern, mmap'ing the
    > > file is likely to be the fastest solution.) Otherwise, a KMP
    > > search is pretty straightforward, and since it never requires
    > > backing up, it avoids the problem of the sequence being split
    > > across two successive buffers. Or if he needs an even faster
    > > algorithm (BM, for example), he can save a block the size of the
    > > sentinel at the start of the buffer, copy the end of the
    > > preceding buffer into it before each read, and start his search
    > > from there.


    > If he really wants an efficient solution, I'd do things a bit
    > differently in general. Specifically, I'd append the position
    > of the data to the very end of the file. Seek to the position
    > of the pointer, read it in, seek to the correct position, and
    > read the real data.


    Or, seeing as how the postfix seems to be fairly small, just
    read the last one or two KBytes of the file, and start searching
    there. Or design it so that the postfix he's adding always has
    a fixed length (even if it means padding in some cases.) Or if
    portability isn't an issue, mmap the entire file, then do a BM
    search backwards, from the end.

    You're entirely right that reading some 3MB to find something
    that you know is at the end, if it is there at all, is far from
    the best solution.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jan 25, 2008
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Guest
    Replies:
    1
    Views:
    3,931
    Kevin Goodsell
    Jan 22, 2004
  2. Armando
    Replies:
    6
    Views:
    779
    Armando
    Jan 29, 2004
  3. Mike Austin
    Replies:
    4
    Views:
    1,956
    strick
    Dec 8, 2007
  4. Peter Gordon
    Replies:
    2
    Views:
    4,598
    Peter Gordon
    Mar 15, 2005
  5. shyam
    Replies:
    3
    Views:
    657
    shyam
    Jan 15, 2007
Loading...

Share This Page