Binary file I/O

Discussion in 'C++' started by J. Campbell, Jul 25, 2003.

  1. J. Campbell

    J. Campbell Guest

    OK...I'm in the process of learning C++. In my old (non-portable)
    programming days, I made use of binary files a lot...not worrying
    about endian issues. I'm starting to understand why C++ makes it
    difficult to read/write an integer directly as a bit-stream to a file.
    However, I'm at a bit of a loss for how to do the following. So as
    not to obfuscate the issue, I won't show what I've been attempting ;-)

    What I want to do is the following, using the standare IO streams.

    1) open an arbitrary file (file1).
    2) starting with the first byte in (file1), read a chunk of data into
    an array of integers.
    3) manipulate the array, as integer data, and then output the contents
    of the array to another file (file2).
    4) read the next data-chunk from file1 into the array.
    5) goto 3 until end of file.

    If anyone knows of a tutorial that contains concrete examples of this,
    I'd appreciate a pointer to the info. Thanks
    J. Campbell, Jul 25, 2003
    #1
    1. Advertising

  2. On 24 Jul 2003 16:56:53 -0700, (J. Campbell)
    wrote:

    >OK...I'm in the process of learning C++. In my old (non-portable)
    >programming days, I made use of binary files a lot...not worrying
    >about endian issues. I'm starting to understand why C++ makes it
    >difficult to read/write an integer directly as a bit-stream to a file.
    > However, I'm at a bit of a loss for how to do the following. So as
    >not to obfuscate the issue, I won't show what I've been attempting ;-)
    >
    >What I want to do is the following, using the standare IO streams.


    # include <fstream>
    # include <iostream>
    # include <vector>
    # include <sstream>
    # include <string>
    # include <algorithm>


    >1) open an arbitrary file (file1).


    std::ifstream file1("f.txt");

    >2) starting with the first byte in (file1), read a chunk of data into
    >an array of integers.


    const int CHUNK = 128;

    char buffer[CHUNK];
    file1.read(buffer, CHUNK);

    std::vector<int> data;
    std::copy(buffer, buffer + 128, std::back_inserter(data));

    >3) manipulate the array, as integer data,


    void manipulate(std::vector<int> &v);


    manipulate(data);

    >and then output the contents
    >of the array to another file (file2).


    std::eek:fstream file2("g.txt");;
    std::copy(data.begin(), data.end(),
    std::eek:stream_iterator<int>(std::cout, "\n"));

    >4) read the next data-chunk from file1 into the array.
    >5) goto 3 until end of file.


    goto 3; :)

    >If anyone knows of a tutorial that contains concrete examples of this,
    >I'd appreciate a pointer to the info. Thanks


    The C++ Standard Library by Josuttis.

    Jonathan
    Jonathan Mcdougall, Jul 25, 2003
    #2
    1. Advertising

  3. On Thu, 24 Jul 2003 20:39:23 -0400, Jonathan Mcdougall
    <> wrote:

    >On 24 Jul 2003 16:56:53 -0700, (J. Campbell)
    >wrote:
    >
    >>OK...I'm in the process of learning C++. In my old (non-portable)
    >>programming days, I made use of binary files a lot...not worrying
    >>about endian issues. I'm starting to understand why C++ makes it
    >>difficult to read/write an integer directly as a bit-stream to a file.
    >> However, I'm at a bit of a loss for how to do the following. So as
    >>not to obfuscate the issue, I won't show what I've been attempting ;-)
    >>
    >>What I want to do is the following, using the standare IO streams.

    >
    ># include <fstream>
    ># include <vector>
    ># include <algorithm>


    Forget these ones :

    ># include <sstream>
    ># include <iostream>
    ># include <string>
    >
    >
    >>1) open an arbitrary file (file1).

    >
    >std::ifstream file1("f.txt");
    >
    >>2) starting with the first byte in (file1), read a chunk of data into
    >>an array of integers.

    >
    >const int CHUNK = 128;
    >
    >char buffer[CHUNK];
    >file1.read(buffer, CHUNK);
    >
    >std::vector<int> data;
    >std::copy(buffer, buffer + 128, std::back_inserter(data));


    std::copy(buffer, buffer + CHUNK, std::back_inserter(data));

    >
    >>3) manipulate the array, as integer data,

    >
    >void manipulate(std::vector<int> &v);
    >
    >
    >manipulate(data);
    >
    >>and then output the contents
    >>of the array to another file (file2).

    >
    >std::eek:fstream file2("g.txt");;
    >std::copy(data.begin(), data.end(),
    > std::eek:stream_iterator<int>(std::cout, "\n"));


    std::copy(data.begin(), data.end(),
    std::eek:stream_iterator<int>(file2, "\n"));


    Sorry about that,

    Jonathan
    Jonathan Mcdougall, Jul 25, 2003
    #3
  4. J. Campbell

    J. Campbell Guest

    Thanks Jonathan.

    Your response is most helpful. Now, I need to digest why it works,
    and why it's necessarry.

    I want to clairify a few things. Assuming int is 32-bits, then,
    after:
    -----
    const int CHUNK = 128;

    char buffer[CHUNK];
    file1.read(buffer, CHUNK);
    ------
    at this point the char array, "buffer" contains 128 elements of 1-byte
    each, right?

    -----
    std::vector<int> data;
    std::copy(buffer, buffer + 128, std::back_inserter(data));
    -----
    now, the vector named "data" contains 32 elements, each of which is a
    4-byte integer, right?

    How do I know if the bytes that went into the vector integers went in
    head-first or feet-first? in other words, if the first 4 bytes of the
    file were (HEX):
    FF 00 00 00
    will the first int in the vector "data" be FF000000 (dec 4278190080)
    or will it be 000000FF (dec 255)? Or is it machine dependent?

    can I avoid all the "std::" by using "using namespace std;" or is it
    necessary to scope-resolve all the keywords?

    Another thing... Do you think it's better to read chunks of a file as
    I've indicated, or is it better to load the whole file into memory?

    Also, your method leaves 2-duplicates of the data in memory...one as
    the char array, and once as the vector. is this a problem?

    One more thing...I asked a question here recently:

    http://groups.google.com/groups?hl=&rnum=1

    about accessing a char array as an array of int. How is the vector
    method different/safer than the (unsafe & non-portable) method I
    demonstrated in the earlier post.

    thanks again for the help.

    I don't seem to be able to quit typing;-) Sorry to innundate you with
    so many questions...I realize that you may not choose to address them
    all..
    J. Campbell, Jul 25, 2003
    #4
  5. J. Campbell

    Rolf Magnus Guest

    Thomas Matthews wrote:

    > To nitpick, the constant should be "unsigned" since a quantity can't
    > be negative. i.e.
    > const unsigned int CHUNK_SIZE = 128;


    I'd disagree. It should be signed, since you might have negative offsets
    when accessing the array elements, and mixing signed and unsigned
    arithmetic can be problematic, and some compilers warn if you do.
    Besides, what would you really gain from making it unsigned?

    >> std::vector<int> data;
    >> std::copy(buffer, buffer + 128, std::back_inserter(data));
    >> -----
    >> now, the vector named "data" contains 32 elements, each of which is a
    >> 4-byte integer, right?

    > A 4-byte _signed_ integer.


    Yes, as int is by default signed.

    >> How do I know if the bytes that went into the vector integers went in
    >> head-first or feet-first? in other words, if the first 4 bytes of
    >> the file were (HEX):
    >> FF 00 00 00
    >> will the first int in the vector "data" be FF000000 (dec 4278190080)
    >> or will it be 000000FF (dec 255)? Or is it machine dependent?

    > It is machine dependent. The topic is called Endianism.


    I've only seen it be called Enidaness.

    > Try this experiment:
    > const unsigned int endian_test = 0x01020304;
    > unsigned char byte0;
    > unsigned char byte1;
    > unsigned char byte2;
    > unsigned char byte3;
    > unsigned char * ptr = (unsigned char *) &endian_test;
    > byte0 = *ptr++;
    > byte1 = *ptr++;
    > byte2 = *ptr++;
    > byte3 = *ptr++;
    > cout << hex << (unsigned short) byte0 << endl;
    > cout << hex << (unsigned short) byte1 << endl;
    > cout << hex << (unsigned short) byte2 << endl;
    > cout << hex << (unsigned short) byte3 << endl;
    >
    >>
    >> can I avoid all the "std::" by using "using namespace std;" or is it
    >> necessary to scope-resolve all the keywords?

    > This is a personal, style, issue. Here are some popular styles:
    > 1. Declare each function and class with a separate "using" statement:
    > using std::cout;
    > using std::vector;
    > 2. Use the global "using" statement:
    > using namespace std;
    > 3. Prefix each function and class with its namespace:
    > std::cout << "hello" << std::endl;
    > There are different opinions on which to use. Use a search engine
    > and search this newsgroup for "namespace" and "using".


    At least, most people seem to agree that it's a bad idea to put
    something like this in a header.
    Btw: you can also put using into functions.

    >> Another thing... Do you think it's better to read chunks of a file
    >> as I've indicated, or is it better to load the whole file into
    >> memory?

    > If you have the space, read in the whole file; otherwise read it
    > in as chunks. The fewer reads, the faster the execution.


    Not necessarily. If you need maximum speed, you should test it for
    different block sizes.
    Rolf Magnus, Jul 25, 2003
    #5
  6. On 25 Jul 2003 08:38:23 -0700, (J. Campbell)
    wrote:

    >Thanks Jonathan.
    >
    >Your response is most helpful. Now, I need to digest why it works,
    >and why it's necessarry.
    >
    >I want to clairify a few things. Assuming int is 32-bits, then,
    >after:


    You can't "assume" this, it depends on the platform. Anyways it does
    not matter in this case.

    >-----
    >const int CHUNK = 128;
    >
    >char buffer[CHUNK];
    >file1.read(buffer, CHUNK);
    >------
    >at this point the char array, "buffer" contains 128 elements of 1-byte
    >each, right?


    Yes.

    >-----
    >std::vector<int> data;
    >std::copy(buffer, buffer + 128, std::back_inserter(data));
    >-----
    >now, the vector named "data" contains 32 elements, each of which is a
    >4-byte integer, right?


    No, 'data' contains 128 elements of type int. Each element has a size
    of sizeof(int), which *could* be 4 bytes.

    data[0]

    contains the value which was in

    buffer[0]

    For example, if the first byte in the file was 65, then buffer[0]
    contains char(65) (which is 'A') and data[0] simply contains 65.

    >can I avoid all the "std::" by using "using namespace std;" or is it
    >necessary to scope-resolve all the keywords?


    Yes, but I personnaly not recommend it. I prefer to qualify
    everything, but it is a matter of style (and carefulness).

    >Another thing... Do you think it's better to read chunks of a file as
    >I've indicated, or is it better to load the whole file into memory?


    Depends on the file size and the memory available.

    >Also, your method leaves 2-duplicates of the data in memory...one as
    >the char array, and once as the vector. is this a problem?


    Well you explicitly wanted an array of integers and since there is no
    function which takes an int[], I needed to do a conversion.

    >One more thing...I asked a question here recently:
    >
    >http://groups.google.com/groups?hl=&rnum=1
    >
    >about accessing a char array as an array of int. How is the vector
    >method different/safer than the (unsafe & non-portable) method I
    >demonstrated in the earlier post.


    Variable-length arrays are, afaik, illegal in C++ anyways. Take a
    look at that :

    http://www.btinternet.com/~chrisnewton/pp/contarray.xml


    Jonathan
    Jonathan Mcdougall, Jul 25, 2003
    #6
  7. >> -----
    >> std::vector<int> data;
    >> std::copy(buffer, buffer + 128, std::back_inserter(data));
    >> -----
    >> now, the vector named "data" contains 32 elements, each of which is a
    >> 4-byte integer, right?

    >A 4-byte _signed_ integer.


    I just want to remind you that 'data' contains *128* elements, not 32
    and that the endianness discussion does not apply.

    <snip>

    Jonathan
    Jonathan Mcdougall, Jul 25, 2003
    #7
  8. J. Campbell

    J. Campbell Guest

    Jonathan,

    I just tried out your method, and it leaves me scratching my head.
    After stumbling briefly for lack of the header to define
    back_inserter() and ostream_iterator() (thanks Google and SGI), the
    code compiles fine:
    __________code__________________

    #include <fstream>
    #include <vector>
    #include <iterator>

    using namespace std;

    int main(){
    const int DATACHUNK = 20;
    char buffer[DATACHUNK];

    ifstream filein("shifttest.cpp");
    filein.read(buffer, DATACHUNK);

    vector<int> filedata;
    copy(buffer, buffer + DATACHUNK, back_inserter(filedata));

    ofstream fileout("shifttest.joe");
    copy(filedata.begin(), filedata.end(),
    ostream_iterator<int>(fileout, "\n" ));
    }

    _____end code_________________

    However, when I look at the file out, it contains:

    35
    105
    110
    99
    108
    117
    100
    101
    32
    60
    105
    111
    115
    116
    114
    101
    97
    109
    62
    10

    which is the ASCII representation of the integer representation of the
    ASCII sequence "#include <iostream>"

    which, strangely enough, happens to be the first line of
    "shifttest.cpp" ;-)

    This is really not at all what I am wanting to do. Now my 20 bytes is
    represented by 93 bytes of a rather odd data-type...neither characters
    nor integers, but rather some strange beast that combines the worst of
    both worlds.

    I'm left wondering, in this strange new world of C++ do I need to get
    used to dealing with ASCII representations of numbers for file I/O?
    Or do I need to always break my 4-byte integers into individual bytes
    prior to I/O if I don't want to waste storage space? I suppose this
    would be pretty easy...something like:

    //not tested
    int bytetowrite;
    char holdword[4];

    for(int i = 0; i < 4; i++)
    holdword = (bytetowrite & (255 << (i * 8))) >> (i * 8);
    //holdword now contains, small-byte first, the data from bytetowrite

    However, this seems a bit tedious, considering that this rigamarole
    doesn't really do anything to the internal data. I feel like there's
    something really basic that I don't *get* about streams... All I
    really want to do is "get at" the data in a file and treat that data
    as numbers typed to the native processor word size...then, manipulate
    the data and write the data out to a second file. Consider, for
    example, that the file consists of a binary bitmap and I want to
    invert it, or rotate it or something.

    Anyway...It's apparent that I have a lot to learn. This C++ is
    tantalizing me...the code is about 10 to 20 x faster than my old
    16-bit compiler...but jeez...what would seem to be a simple
    manipulation can become so frustrating!!! It feels a little like
    typing with my toes.

    Thanks for the help people. It is beginning to make some sense.

    Joe

    Jonathan Mcdougall <> wrote in message news:<>...
    > On Thu, 24 Jul 2003 20:39:23 -0400, Jonathan Mcdougall
    > <> wrote:
    >
    > >On 24 Jul 2003 16:56:53 -0700, (J. Campbell)
    > >wrote:
    > >
    > >>OK...I'm in the process of learning C++. In my old (non-portable)
    > >>programming days, I made use of binary files a lot...not worrying
    > >>about endian issues. I'm starting to understand why C++ makes it
    > >>difficult to read/write an integer directly as a bit-stream to a file.
    > >> However, I'm at a bit of a loss for how to do the following. So as
    > >>not to obfuscate the issue, I won't show what I've been attempting ;-)
    > >>
    > >>What I want to do is the following, using the standare IO streams.

    > >
    > ># include <fstream>
    > ># include <vector>
    > ># include <algorithm>

    >
    > Forget these ones :
    >
    > ># include <sstream>
    > ># include <iostream>
    > ># include <string>
    > >
    > >
    > >>1) open an arbitrary file (file1).

    > >
    > >std::ifstream file1("f.txt");
    > >
    > >>2) starting with the first byte in (file1), read a chunk of data into
    > >>an array of integers.

    > >
    > >const int CHUNK = 128;
    > >
    > >char buffer[CHUNK];
    > >file1.read(buffer, CHUNK);
    > >
    > >std::vector<int> data;
    > >std::copy(buffer, buffer + 128, std::back_inserter(data));

    >
    > std::copy(buffer, buffer + CHUNK, std::back_inserter(data));
    >
    > >
    > >>3) manipulate the array, as integer data,

    > >
    > >void manipulate(std::vector<int> &v);
    > >
    > >
    > >manipulate(data);
    > >
    > >>and then output the contents
    > >>of the array to another file (file2).

    > >
    > >std::eek:fstream file2("g.txt");;
    > >std::copy(data.begin(), data.end(),
    > > std::eek:stream_iterator<int>(std::cout, "\n"));

    >
    > std::copy(data.begin(), data.end(),
    > std::eek:stream_iterator<int>(file2, "\n"));
    >
    >
    > Sorry about that,
    >
    > Jonathan
    J. Campbell, Jul 25, 2003
    #8
  9. J. Campbell

    J. Campbell Guest

    Jonathan Mcdougall <> wrote in message news:<>...
    > >> -----
    > >> std::vector<int> data;
    > >> std::copy(buffer, buffer + 128, std::back_inserter(data));
    > >> -----
    > >> now, the vector named "data" contains 32 elements, each of which is a
    > >> 4-byte integer, right?

    > >A 4-byte _signed_ integer.

    >
    > I just want to remind you that 'data' contains *128* elements, not 32
    > and that the endianness discussion does not apply.
    >
    > <snip>
    >
    > Jonathan


    Jonathan...I now understand what's going on and the endianness
    discussion. My news reader has serious lag, so I may not be current
    with the discussion. However...I understand more after this post.
    when I said I wanted the file bytes represented by integers, I meant
    that I wanted the first ((char)/sizeof(int)) (eg 4) bytes of data to
    be put into integerarray[0], the next into integerarray[1]...etc.
    Anyway...thanks for clairifying this.
    J. Campbell, Jul 25, 2003
    #9
  10. >I just tried out your method, and it leaves me scratching my head.
    >After stumbling briefly for lack of the header to define
    >back_inserter() and ostream_iterator() (thanks Google and SGI), the
    >code compiles fine:


    This depends on the implementation. The standard does not specify
    which header must be included by which; <iterator> probably got
    included by <algorithm>, sorry about that.

    >__________code__________________
    >
    >#include <fstream>
    >#include <vector>
    >#include <iterator>
    >
    >using namespace std;
    >
    >int main(){
    > const int DATACHUNK = 20;
    > char buffer[DATACHUNK];
    >
    > ifstream filein("shifttest.cpp");
    > filein.read(buffer, DATACHUNK);
    >
    > vector<int> filedata;
    > copy(buffer, buffer + DATACHUNK, back_inserter(filedata));
    >
    > ofstream fileout("shifttest.joe");
    > copy(filedata.begin(), filedata.end(),
    > ostream_iterator<int>(fileout, "\n" ));
    >}
    >
    >_____end code_________________
    >
    >However, when I look at the file out, it contains:
    >
    >35
    >105
    >110
    >99
    >108
    >117
    >100
    >101
    >32
    >60
    >105
    >111
    >115
    >116
    >114
    >101
    >97
    >109
    >62
    >10
    >
    >which is the ASCII representation of the integer representation of the
    >ASCII sequence "#include <iostream>"
    >which, strangely enough, happens to be the first line of
    >"shifttest.cpp" ;-)


    You asked for binary, that is what I gave you. If you want the ASCII
    representation, just make the ostream_iterator <char>, that's it.

    >This is really not at all what I am wanting to do. Now my 20 bytes is
    >represented by 93 bytes


    93 ?? Why do you say that?

    > of a rather odd data-type...neither characters
    >nor integers, but rather some strange beast that combines the worst of
    >both worlds.


    These numbers you saw are the ASCII value of the characters in the
    file. The thing is, characters and integers are actually the very
    same thing, it's just the output which makes the difference : ints are
    displayed as numbers and chars are displayed as characters, which
    depend on your implementation (but you are probably using ASCII).

    Remember your subject is "Binary file I/O", not "Text file I/O".

    >I'm left wondering, in this strange new world of C++ do I need to get
    >used to dealing with ASCII representations of numbers for file I/O?


    It depends on what you want. In the case of a simple text file
    (remember, *text* is a ambiguous term in programming, everything boils
    down to zeros and ones) , values would be ASCII numbers and text would
    be the representation on the screen (65 would be 'A').

    In the case of a binary file (such as an image), values would be
    simple numbers formatted according to the image's type (jpg, bmp..)
    and text would be... garbage, since these numbers would be printed
    according to the ASCII table (remember when you first started and
    tried to display binary files on screen? Loads of smileys and beeps
    and ascii graphics..).

    >However, this seems a bit tedious, considering that this rigamarole
    >doesn't really do anything to the internal data. I feel like there's
    >something really basic that I don't *get* about streams... All I
    >really want to do is "get at" the data in a file and treat that data
    >as numbers typed to the native processor word size...then, manipulate
    >the data and write the data out to a second file. Consider, for
    >example, that the file consists of a binary bitmap and I want to
    >invert it, or rotate it or something.


    In that case, you would store every byte in a vector of whatever
    (unsigned char would be the best, I think), you skip the header until
    the data, you invert it and store the whole thing in a new file.

    The actual type of the vector (or array, as you wish) does not matter
    except for the memory wasted.

    >Anyway...It's apparent that I have a lot to learn. This C++ is
    >tantalizing me...the code is about 10 to 20 x faster than my old
    >16-bit compiler...but jeez...what would seem to be a simple
    >manipulation can become so frustrating!!! It feels a little like
    >typing with my toes.


    Hehe.. and you're still only playing with i/o.


    Jonathan
    Jonathan Mcdougall, Jul 25, 2003
    #10
  11. >> >> -----
    >> >> std::vector<int> data;
    >> >> std::copy(buffer, buffer + 128, std::back_inserter(data));
    >> >> -----
    >> >> now, the vector named "data" contains 32 elements, each of which is a
    >> >> 4-byte integer, right?
    >> >A 4-byte _signed_ integer.

    >>
    >> I just want to remind you that 'data' contains *128* elements, not 32
    >> and that the endianness discussion does not apply.
    >>
    >> <snip>
    >>
    >> Jonathan

    >
    >Jonathan...I now understand what's going on and the endianness
    >discussion. My news reader has serious lag, so I may not be current
    >with the discussion. However...I understand more after this post.
    >when I said I wanted the file bytes represented by integers, I meant
    >that I wanted the first ((char)/sizeof(int)) (eg 4) bytes of data to
    >be put into integerarray[0], the next into integerarray[1]...etc.
    >Anyway...thanks for clairifying this.


    Oh, sorry.

    Well the std::copy() is not good in that case, you will have to make a
    loop and to assign the values manually :

    for (int i=0; i<CHUNK; i+=4)
    {
    int temp = 0;
    for ( int j=0; j<4; ++j)
    {
    temp |= (buffer[i + j] << (8 * (3 - j));
    }

    data.push_back(temp);
    }

    Something like that?

    And sorry for the brutal endianness conversation break, I didn't mean
    it.

    Jonathan
    Jonathan Mcdougall, Jul 26, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fangs
    Replies:
    3
    Views:
    9,739
    darshana
    Oct 26, 2008
  2. Marc Schellens
    Replies:
    8
    Views:
    2,971
    John Harrison
    Jul 15, 2003
  3. Ron Eggler

    writing binary file (ios::binary)

    Ron Eggler, Apr 25, 2008, in forum: C++
    Replies:
    9
    Views:
    895
    James Kanze
    Apr 28, 2008
  4. scad
    Replies:
    4
    Views:
    934
    James Kanze
    May 28, 2009
  5. Jim
    Replies:
    6
    Views:
    695
Loading...

Share This Page