reading binary file

Discussion in 'C++' started by Use*n*x, Dec 8, 2006.

  1. Use*n*x

    Use*n*x Guest

    Hello,

    I have a binary file (image file) and am reading 4-bytes at a time. The
    File size is 63,480,320 bytes. My assumption is that if I loop through
    this file reading 4 bytes at a time, I should loop 15,870,080 times.

    The code is:

    newprogram.cpp
    =============
    #include <iostream>
    #include <fstream>
    using namespace std;

    int main ()
    {
    int counter=0;
    char * memblock;
    memblock = new char [4];

    ifstream file ("179060_mar_05_00_L7.024", ios::in|ios::binary);

    file.seekg (1,ios::beg);
    while (!file.eof())
    {
    file.read(memblock, 4);
    counter++;
    }
    cout << "Number of loops: " << counter << "\n";
    delete[] memblock;
    file.close();

    return 0;
    }

    $> g++ newprogram.cpp -o deletelater
    $> ./deletelater
    Number of loops: 15870080

    (a) Notice the file.seekg (1,ios::beg); statement. Is this correct?
    (b) If I were to use file.seekg (0,ios::beg); statement, the number of
    loops would end up 15870081. Is file.seekg(0,ios::beg) correct? If so,
    could you please help me understand why the loop goes 15870081 times?

    If I were to use a variable: int tempdata;
    and in the loop right after file.read, were to insert: tempdata = (int)
    (*memblock), I would get different results with (a)
    file.seekg(1,ios::beg) and (b) file.seekg(0,ios::beg). Which one would
    be correct?

    Your suggestions will be very helpful. Thank you.

    Use*n*x
     
    Use*n*x, Dec 8, 2006
    #1
    1. Advertisements

  2. Use*n*x

    Micah Cowan Guest

    Your loop appears to be written under a common, but false,
    misconception.

    file.eof() does not return whether or not you are at the end of a file;
    it returns whether or not you've attempted to read past the end of the
    file. Additionally, an istream does not even know whether it's reached
    the end-of-file until it tries to read past the end of the file.

    Without a seek, or with a seek to 0, what happens is that you have
    15870080 successful reads. After those, file.eof() still returns false;
    not because it's not at the end of the file (it is), but because it
    hasn't yet tried to read past the end of the file. The very next
    (15870081st) read will fail (read zero bytes), and /then/ the eof bit
    will be set; but counter will still be incremented.

    The reason why seeking to 1 appears to give the right number of reads,
    is that you are skipping the first byte (in position 0). Then follows
    15870079 successful reads, followed by the 15870080th read that only
    reads the final 3 bytes. It tries to read the fourth byte, and at that
    point encounters the end-of-file, so it sets the bit, and the test
    condition terminates the loop. But you have missed the first byte, and
    the final byte you /think/ you read (at memblock[3]) actually is just a
    duplicate from the read just before the last one.

    Another problem with your loop is that if there were a read /failure/,
    your loop would continue indefinitely, as the eof bit would never get
    set, and the read calls would just keep failing undected.

    The solution? Make the loop condition simply "while (file)" or "while
    (file.good())", and check the return value from file.read() before
    assuming that it filled your array completely (or at all). Only
    increment the counter if the read was successful (I have no clue how
    you might want to handle a partial read, but you should take note of
    them if they occur).
     
    Micah Cowan, Dec 8, 2006
    #2
    1. Advertisements

  3. Use*n*x

    Use*n*x Guest


    I was testing a little more and found this method to be more reliable
    than using file.eof(). Suggestions and comments are more than welcome.

    #include <iostream>
    #include <fstream>
    using namespace std;

    int main ()
    {
    int counter=0;
    char * memblock;
    memblock = new char [4];

    long begin,end,filesize,i;


    //ifstream file ("179060_mar_05_00_L7.024",
    ios::in|ios::binary);
    ifstream file ("test", ios::in|ios::binary);
    ofstream dump ("dump", ios::binary);

    // find file size
    begin = file.tellg();
    file.seekg(0,ios::end);
    end = file.tellg();
    filesize = end - begin;

    // reposition
    file.seekg(0,ios::beg);

    // loop
    for (i=0; i<filesize; i=i+4)
    {
    file.read(memblock,4);
    // not quite needed
    // cout<< memblock << ".." << file.tellg() << endl;
    counter++;
    }

    /*file.seekg (0,ios::beg);
    while (!file.eof())
    {
    file.read(memblock, 4);
    //dump << memblock;
    cout << memblock << ".." << file.tellg() << endl;
    counter++;
    }*/
    cout << "Number of loops: " << counter << "\n";
    delete[] memblock;
    file.close();
    dump.close();

    return 0;
    }
     
    Use*n*x, Dec 8, 2006
    #3
  4. Use*n*x

    Use*n*x Guest

    Your explanation makes good sense. Thank you.
     
    Use*n*x, Dec 8, 2006
    #4
  5. This is awfully inefficient.

    Try this:

    #include <iostream>
    #include <fstream>

    using namespace std;

    int main ()
    {

    ifstream file ("179060_mar_05_00_L7.024", ios::in|ios::binary);

    streambuf * pbuf = file.rdbuf();
    int l_blocks[1024];

    streamsize i;

    while (
    i = pbuf->sgetn(
    reinterpret_cast<char*>(l_buffer), sizeof(l_buffer) )
    )
    {
    streamsize num_read = i / sizeof(int);

    for ( streamsize x = 0; x < num_read; ++ x )
    {
    PROCESS_THIS_THING( l_blocks[x] );
    }
    }

    return 0;
    }

    Come to think of it, I have not checked the performance of the C++
    stream library lately so I could be wrong. However, I have found that
    frequent calls can significantly slow down the application, especially
    when you're reading large chunks of data.

    If you don't care about peformance, then you can stick with what you
    have. I would loose the new/delete tho. Just declare a small array on
    the stack and make sure you never read more bytes than you have allocated.
     
    Gianni Mariani, Dec 8, 2006
    #5
  6. Use*n*x

    Micah Cowan Guest

    <snipped the rest>

    You're much safer using std::streampos from <iosfwd> to store the
    result of file.tellg(), as it could well have a greater width than a
    long, and you might not detect a potential overflow.

    Other than that, you're still a lot better of checking for eof(): if a
    read failure occurs, your code above still won't catch it, and if some
    outside program were to truncate the file before you were through
    reading it, you don't detect that condition either. Also, it is
    possible for the current location to not be able to fit into a
    streampos, or to otherwise fail, in which case seekg() will return
    streampos(streamoff(-1)), and your code won't work as you expect.

    Also: I snipped a section where you use /* ... */ to comment out a
    block of code. While that works in your specific case, it's a habit to
    be avoided in general, as what if that block of code had a /* */
    comment of its own? Those comments don't nest, and you'd have a syntax
    error. It's easier in the long run just to get in the habit of using
    #if 0 instead.
     
    Micah Cowan, Dec 8, 2006
    #6
  7. Use*n*x

    Use*n*x Guest

    Good to know your thoughts. It helped. Thank you.
     
    Use*n*x, Dec 8, 2006
    #7
  8. Use*n*x

    Use*n*x Guest

    I started off using streamsize/streampos, but was not quite confident
    of what I was doing. So switched back to something that was similar to
    a sample code I had on hand.
    Yes, that is what I should do - keep a tab on eof() to handle failure
    in IO.
    Oh yes, I didn't even realize that. Thank you for your valuable inputs.
    I have a long way to go in C++.

    Use*n*x
     
    Use*n*x, Dec 8, 2006
    #8
  9. Use*n*x

    Micah Cowan Guest

    Actually, eof() won't report I/O failure(), you should check
    bool(file), or file.good(), which handles all of eof, failure, and
    bad-state.

    It is sometimes useful to call file.exceptions(<some std::iostate
    values>), to cause the stream to throw an exception upon failure or
    corruption.
     
    Micah Cowan, Dec 8, 2006
    #9
  10. Use*n*x

    BobR Guest

    Use*n*x wrote in message ...

    Huh?

    int main (){
    // using namespace std;
    int counter=0;
    char *memblock( new char[4] );
    std::ifstream file( "test", std::ios::in | std::ios::binary );

    while( file.read( memblock, 4 ) ){
    counter++;
    }

    std::cout << "Number of loops: " << counter << "\n";
    delete[] memblock;

    if( not file ){
    std::cout<<" file error="<<file.flags()<<std::endl;
    std::cout<<" ios::good="<<file.good()<<std::endl;
    std::cout<<" ios::bad="<<file.bad()<<std::endl;
    std::cout<<" ios::eof="<<file.eof()<<std::endl;
    std::cout<<" ios::fail="<<file.fail()<<std::endl;
    }

    file.clear();
    file.seekg( 0, std::ios::end );
    long long end = file.tellg();
    long long filesize = end / 4;
    std::cout << "long long end = file.tellg(): "<<end<< "\n";
    std::cout << "long long filesize = end / 4: "<<filesize<< "\n";

    file.close();
    return 0;
    }

    // - output -
    // Number of loops: 3
    // file error=4098
    // ios::good=false
    // ios::bad=false
    // ios::eof=true
    // ios::fail=true
    // long long end = file.tellg(): 14
    // long long filesize = end / 4: 3
     
    BobR, Dec 8, 2006
    #10
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.