Streaming file IO and binary files

M

masood.iqbal

Hi,

Kindly excuse my novice question. In all the literature on ifstream
that I have seen, nowhere have I read what happens if you try to read
a binary file using the ">>" operator. I ran into the two problems
while trying to read a binary file.

1). All whitespace characters were skipped
2). Certain binary files gave a core dump

The problems went away when I used the read() member function on the
input file stream instead. Is this the right way to go about?

I was able to recreate my problem using simple sample source as below:

Thanks,
Saleem




#include <iostream>
#include <fstream>

using namespace std;


main(int argc, char* argv[])
{
if(argc < 2)
{
cerr << "usage: " << argv[0] << " <input-file>\n";
return 1;
}

ifstream ifs(argv[1], ios::in|ios::binary);

char ch;
size_t bytesRead = 0;
while(ifs)
{
ifs >> ch;
//ifs.read(&ch, 1);
bytesRead ++;
}

cout << "Successfully read " << bytesRead << " bytes\n";
return 0;
}
 
O

Ondra Holub

Hi,

Kindly excuse my novice question. In all the literature on ifstream
that I have seen, nowhere have I read what happens if you try to read
a binary file using the ">>" operator. I ran into the two problems
while trying to read a binary file.

1). All whitespace characters were skipped
2). Certain binary files gave a core dump

The problems went away when I used the read() member function on the
input file stream instead. Is this the right way to go about?

I was able to recreate my problem using simple sample source as below:

Thanks,
Saleem

#include <iostream>
#include <fstream>

using namespace std;

main(int argc, char* argv[])
{
if(argc < 2)
{
cerr << "usage: " << argv[0] << " <input-file>\n";
return 1;
}

ifstream ifs(argv[1], ios::in|ios::binary);

char ch;
size_t bytesRead = 0;
while(ifs)
{
ifs >> ch;
//ifs.read(&ch, 1);
bytesRead ++;
}

cout << "Successfully read " << bytesRead << " bytes\n";
return 0;

}

Hi.

read and write methods are methods you need, so you're right with your
question.

Data are read from/written to binary files as they are. There is no
change. In text files may be done some conversions. It depends on
paltform. For example on DOS/Windows platform it translates line
nedings to CRLF characters (and vice versa). As far as I know there
may be made more translation on some platforms, although I am not able
to give any example.

operator >> is used for reading value in text form, however read
method is for reading of value in binary form (same applies to
operator>> and write method). Usualy is better to use text form,
because it is more portable among different platforms.
 
J

James Kanze

Kindly excuse my novice question. In all the literature on ifstream
that I have seen, nowhere have I read what happens if you try to read
a binary file using the ">>" operator. I ran into the two problems
while trying to read a binary file.

Attention. The ">>" operator means "parse the next characters
in the file into the target type, interpreting them as text".
More generally, the abstraction of istream is that of a
transparent stream of characters (not raw bytes). All binary
does is control the interface with the OS.
1). All whitespace characters were skipped

Did you reset the skipws flag? If not, that's what you asked it
to do.
2). Certain binary files gave a core dump

Then there's a bug in your library. Good code never core dumps,
regardless of input. (Of course, the bug may simply be that you
forgot to replace the new handler, and aren't catching
bad_alloc. If you're reading into a string, for example, and
the input data contains a couple of GB without any white space,
something is going to give.)
The problems went away when I used the read() member function on the
input file stream instead. Is this the right way to go about?

It depends what you want to do. Read is good when you know you
have a block bytes of fixed size, with some special, possibly
non-text, format.
I was able to recreate my problem using simple sample source as below:
#include <iostream>
#include <fstream>
using namespace std;
main(int argc, char* argv[])

Just a nit, but "implicit int" was removed from C++ a long, long
time ago. This shouldn't compile without a return type for
main.
{
if(argc < 2)
{
cerr << "usage: " << argv[0] << " <input-file>\n";
return 1;
}
ifstream ifs(argv[1], ios::in|ios::binary);
char ch;
size_t bytesRead = 0;
while(ifs)
{
ifs >> ch;
//ifs.read(&ch, 1);
bytesRead ++;
}

If the above loop ever core dumps, you should file a bug report.

If you want to just read characters, I'd use get:

while ( ifs.get( ch ) ) {
++ bytesRead ;
}

(Note too that your loop also counts one too many. For an empty
file, for example, it will count 1.)

Read is really for buffers which you will later unformat
yourself.
 
J

James Kanze

On 25 ec, 08:51, (e-mail address removed) wrote:
Data are read from/written to binary files as they are. There
is no change. In text files may be done some conversions.

It's a bit more subtle than that. Especially with the standard
streams, which do code translation using the codecvt facet of
the imbued locale regardless of whether the file is binary or
text.

Text mode only guarantees textual integrity: you're only
guaranteed to read what you've written if what you've written
consisted only in printable characters, and even then, there are
exceptions. (You're not guaranteed to be able to read trailing
white space, for example. And it's not specified what happens
if the last character written wasn't a '\n'.) On the other
hand, you're guaranteed that a '\n' will result in whatever the
system normally uses as a line separator (e.g. the two character
sequence 0x0D, 0x0A under Windows, or a new record on systems
with record oriented files). And that there are no extra
characters at the end. Also, you can only seek in a limited
number of cases. In binary mode, you'll also get the bytes you
wrote. All of them, not just printable characters. You can
legally write anything, and will reread exactly what you have
written; '\n' will result in one byte being written, with
whatever the encoding of '\n' is on your system. And you can
seek anyway. But you might read extra 0's that you didn't write
at the end of the file.

Also, on some systems, files written in text mode cannot be read
in binary, and vice versa.
It depends on paltform. For example on DOS/Windows platform it
translates line nedings to CRLF characters (and vice versa).
As far as I know there may be made more translation on some
platforms, although I am not able to give any example.

Even on DOS/Windows, a 0x1A in a text input stream is treated as
EOF, and you won't see anything else after it.
operator >> is used for reading value in text form, however read
method is for reading of value in binary form (same applies to
operator>> and write method).

Operator >> formats, as text. Regardless of file mode. Read
extracts char's from the stream, regardless of file mode. I
regularly use >> on files opened in binary mode, and there are
cases where it is reasonable to use read on files opened in text
mode.
Usualy is better to use text form,
because it is more portable among different platforms.

It depends what you mean by "portable". If you're writing files
to be read on the same system, or reading files that were
written as text on the same system, text mode gives you a larger
degree of source code portability; a new line will always be the
single character '\n', regardless of how it is represented on
the system. If you're writing files that will be read by many
different systems, you should define a "portable" format for
them, which most likely will require that they be accessed in
binary mode.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top