Binary file IO: Converting imported sequences of chars to desiredtype

Brian · Nov 1, 2009

Shorter development times, less expensive development, greater
reliability...

In sum, lower cost.

Since a message using a text format is generally longer than
binary formats, text leaves systems more vulnerable to
network problems caused by storms, cyber attacks, etc.
I won't argue the point about it being easier to use text,
but think it's a little like buying an SUV. If the price of
gas goes way up, many wish they had never bought an SUV.
Using binary might be a way to mitigate the pain caused by
volatile markets/conditions.

Brian Wood
Ebenezer Enterprises
http://webEbenezer.net

Gerhard Fiedler · Nov 1, 2009

Brian said:
Since a message using a text format is generally longer than binary
formats, text leaves systems more vulnerable to network problems
caused by storms, cyber attacks, etc. I won't argue the point about
it being easier to use text, but think it's a little like buying an
SUV. If the price of gas goes way up, many wish they had never
bought an SUV. Using binary might be a way to mitigate the pain
caused by volatile markets/conditions.

If you're talking about sending something over a potentially unstable
network connection, simple binary is pretty bad. With text encoding
(could be e.g. base64 encoded binary, or pretty much everything else
that's guaranteed not to use all available symbols), you have a few
symbols left that you can use for stream synchronization. This is in
general much more important that a few bytes more to transmit. This may
even be important when storing data on disk: the chances of recovering
data if there's a problem is much higher if you have sync symbols in the
data stream.

There's a point for (simple) binary protocols when all you have is an
8bit microcontroller with 100 bytes of RAM and 1k of Flash. But you
typically don't program these in standard-compliant C++

IMO this has nothing to do with SUVs... more with seat belts, if you
really want an automotive analogy. While they add weight to the vehicle,
and on (very) rare occasions may complicate things if there's a problem,
in most problem cases they can save your face, and more. (Which, back to
programming, may save your job -- and with it the payments for your SUV.
Now here we're back to the SUV

Gerhard

James Kanze · Nov 2, 2009

As long as you keep two factors in mind:

1) The user's time is not yours (the programmer) to waste.
2) The users's storage facilities (disk space, network
bandwidth etc) are not yours (the programmer) to waste.

The user pays for your time. Spending it to do something which
results in a less reliable program, and that he doesn't need, is
irresponsible, and borders on fraud.

Those who want easy, not awfully challenging jobs might be
better off flipping burgers.

Writing the most reliable programs for the lowest cost is
challenging enough without going out of your way to make it
harder. If you're an amateur, doing this for fun, do whatever
amuses you the most. If you're a professional, selling your
services, professional ontology requires provided the best
service possible at the lowest price possible.

Brian · Nov 2, 2009

If you're talking about sending something over a potentially unstable
network connection, simple binary is pretty bad. With text encoding
(could be e.g. base64 encoded binary, or pretty much everything else
that's guaranteed not to use all available symbols), you have a few
symbols left that you can use for stream synchronization. This is in
general much more important that a few bytes more to transmit. This may
even be important when storing data on disk: the chances of recovering
data if there's a problem is much higher if you have sync symbols in the
data stream.

If it were just a "few bytes more" I wouldn't be saying
anything. Likewise the difference between an SUV and
a fuel efficient vehicle isn't trivial. People wouldn't
be wishing they had never bought an SUV if that were
the case.

Brian Wood
Ebenezer Enterprises
http://webEbenezer.net

Brian · Nov 2, 2009

[...]
So what does text-based formats actually buy you?
Shorter development times, less expensive development, greater
reliability...
In sum, lower cost.

Click to expand...

As long as you keep two factors in mind:
1) The user's time is not yours (the programmer) to waste.
2) The users's storage facilities (disk space, network
bandwidth etc) are not yours (the programmer) to waste.

Click to expand...

The user pays for your time. Spending it to do something which
results in a less reliable program, and that he doesn't need, is
irresponsible, and borders on fraud.

Those who want easy, not awfully challenging jobs might be
better off flipping burgers.

Click to expand...

Writing the most reliable programs for the lowest cost is
challenging enough without going out of your way to make it
harder. If you're an amateur, doing this for fun, do whatever
amuses you the most. If you're a professional, selling your
services, professional ontology requires provided the best
service possible at the lowest price possible.

I'm interested in binary in this context as an
alternative to text because I believe markets and
conditions are likely to continue to be volatile for
a while. If I had more confidence in various
officials, B.O. (Obama), Putin, Ahmadinejad, etc.,
I'd be less likely to think things are going to be
volatile. I like what Rabbi Michael Healer said
when he met the governor of Texas -- Rick Perry --
a few years ago: "I didn't vote for you and I
don't trust you." I didn't vote for B.O. and I
don't trust him either.

Brian Wood
Ebenezer Enterprises
http://webEbenezer.net

Brian · Nov 2, 2009

[...]
So what does text-based formats actually buy you?
Shorter development times, less expensive development, greater
reliability...
In sum, lower cost.
As long as you keep two factors in mind:
1) The user's time is not yours (the programmer) to waste.
2) The users's storage facilities (disk space, network
bandwidth etc) are not yours (the programmer) to waste.

Click to expand...

Click to expand...

The user pays for your time. Spending it to do something which
results in a less reliable program, and that he doesn't need, is
irresponsible, and borders on fraud.

Click to expand...

Writing the most reliable programs for the lowest cost is
challenging enough without going out of your way to make it
harder. If you're an amateur, doing this for fun, do whatever
amuses you the most. If you're a professional, selling your
services, professional ontology requires provided the best
service possible at the lowest price possible.

Click to expand...

I'm interested in binary in this context as an
alternative to text because I believe markets and
conditions are likely to continue to be volatile for
a while.

This is interesting --

http://stackoverflow.com/questions/1058051/boost-serialization-performance-text-vs-binary-format

M. Troyer, who I think is still around the Boost list,
considered using binary to be "essential."

http://lists.boost.org/Archives/boost/2002/11/39601.php

I'm not sure if those participating in this thread
come from a scientific application background as Troyer
does.

Brian Wood
Ebenezer Enterprises
http://webEbenezer.net

Gerhard Fiedler · Nov 3, 2009

Brian said:
If it were just a "few bytes more" I wouldn't be saying anything.
Likewise the difference between an SUV and a fuel efficient vehicle
isn't trivial. People wouldn't be wishing they had never bought an
SUV if that were the case.

It is longer, but you were talking about unreliable networks. And
resyncing a binary stream is by design very problematic. Since you often
don't know beforehand the length of records (think strings), you have
length information encoded in your binary stream. If one length field is
bad and unrecoverable, pretty much the complete rest of the stream is
unreadable because you're out of sync from that point on. This is also
valid for data on disks.

Now, if you used an encoding with a few unused symbols, you can use
those symbols to add synchronization markers (records, whatever), and
even if a length field is bad, you maybe lost a record but not the whole
remainder of the stream.

On unreliable networks, I take that any day over the size advantage of
raw binary. Of course, this is not about text vs binary, this is about
whether raw binary is the best choice for unreliable networks. It isn't.

If you want both (speed and reliability), you'd create a custom encoding
that leaves only a few symbols unused that you then can use for syncing.
But raw binary is not a good choice over unreliable networks.

And I still think that this has nothing to do with SUVs. How many people
do you know that are wishing they never had used a text protocol? How
many are there wishing they never had used raw binary over an unreliable
network link?

Gerhard

Brian · Nov 3, 2009

It is longer, but you were talking about unreliable networks. And
resyncing a binary stream is by design very problematic. Since you often
don't know beforehand the length of records (think strings), you have
length information encoded in your binary stream.
Yes.

If one length field is
bad and unrecoverable, pretty much the complete rest of the stream is
unreadable because you're out of sync from that point on. This is also
valid for data on disks.

I think there are ways to avoid that. Sentinel values are
often used in binary streams. If you get to the end of a
message and don't find the sentinel, you can scan until
you do find it. It's true that you may find a false
positive with binary, but the whole stream isn't lost.
Additionally, the message length can be embedded two times.
If the two lengths match, then an errant sublength within
the message won't cause any trouble to the whole stream,
but it may make it impossible to interpret one message.
If the two message lengths don't match then you have to
do some checking. If you have a max message length, you
check both values against that. If both are less than
that you would have to proceed with caution.

Now, if you used an encoding with a few unused symbols, you can use
those symbols to add synchronization markers (records, whatever), and
even if a length field is bad, you maybe lost a record but not the whole
remainder of the stream.

On unreliable networks, I take that any day over the size advantage of
raw binary. Of course, this is not about text vs binary, this is about
whether raw binary is the best choice for unreliable networks. It isn't.

Just saying "it isn't" doesn't convince me.

If you want both (speed and reliability), you'd create a custom encoding
that leaves only a few symbols unused that you then can use for syncing.
But raw binary is not a good choice over unreliable networks.

And I still think that this has nothing to do with SUVs. How many people
do you know that are wishing they never had used a text protocol? How
many are there wishing they never had used raw binary over an unreliable
network link?

I don't know any in either of those two categories.
Some predicted spiking oil prices 10 years ago and
they based their decisions on those predictions.
Something similar may happen with bandwidth prices.

Brian Wood
Ebenezer Enterprises
http://webEbenezer.net

I read today of a man who was fired for saying,
"I think homosexuality is bad stuff."
http://www.wnd.com/index.php?fa=PAGE.view&pageId=114779
I agree with him - it is bad stuff.

Rune Allnor · Nov 3, 2009

[...]
So what does text-based formats actually buy you?
Shorter development times, less expensive development, greater
reliability...
In sum, lower cost.
As long as you keep two factors in mind:
1) The user's time is not yours (the programmer) to waste.
2) The users's storage facilities (disk space, network
bandwidth etc) are not yours (the programmer) to waste.
The user pays for your time. Spending it to do something which
results in a less reliable program, and that he doesn't need, is
irresponsible, and borders on fraud.
Those who want easy, not awfully challenging jobs might be
better off flipping burgers.
Writing the most reliable programs for the lowest cost is
challenging enough without going out of your way to make it
harder. If you're an amateur, doing this for fun, do whatever
amuses you the most. If you're a professional, selling your
services, professional ontology requires provided the best
service possible at the lowest price possible.

Click to expand...

Click to expand...

I'm interested in binary in this context as an
alternative to text because I believe markets and
conditions are likely to continue to be volatile for
a while.

Click to expand...

This is interesting --

http://stackoverflow.com/questions/1058051/boost-serialization-perfor...

M. Troyer, who I think is still around the Boost list,
considered using binary to be "essential."

http://lists.boost.org/Archives/boost/2002/11/39601.php

I'm not sure if those participating in this thread
come from a scientific application background as Troyer
does.

I used to be involved with seismic data porcessing. About 12
years ago, the company I worked for got the first TByte disk
stack nationwide. Before that time, the guys who went offshore
came back with truckloads of EXAByte tapes. Just loading the
tapes to the disk drives took weeks.

The applciation I'm working with has to do with bathymetry
map processing. 'Bathymetry' just means 'underwater terrain',
so the end product is a map of the sea floor.

There are huge amounts of data flowing through (I wouldn't
be surprised if present day 'simple' mapping tasks are comparable
to late '80s seismic processing, what computational through-put
is concerned), and the job is essentially real-time: A directive
to discontinue present survey activities might be recieved at any
time (surveying is done from general-purpose vessles), in which
case the vessel ad crew needs to shut down all activities and
switch focus to whatevere assignment is coming up, in a matter
of minutes or hours. At best one might accept a couple of hours
latency on the processed result after a new batch of survey
data is available, but that's it. Since any survey can go on
for indefinite lengths of time, one needs to be able to process
each data batch faster than it took to measure, or one will
accumulate backlog.

The processing is done in multiple stages, so one just can't
wait for text-based file IO to complete. Those who base their
data flow on text files are not able to complete even the
shortest survey processing whithin the time it takes to survey
the data - which is the essential aspect of a real-time operation.

Rune

Gerhard Fiedler · Nov 3, 2009

Brian said:
I don't know any in either of those two categories.

Wasn't it you who wrote "People wouldn't be wishing they had never
bought an SUV if that were the case", while using the analogy of text
format and SUVs? I thought you'd know at least "people" who wished they
had used binary -- if not, how do you get to the analogy in the first
place?

Some predicted spiking oil prices 10 years ago and they based their
decisions on those predictions. Something similar may happen with
bandwidth prices.

Right, may. In general, when programming, I don't base my decisions on
such "predictions". If you take all those predictions made, you get
probably more misses than hits. I tend to try to get more hits than
misses when programming... this is better for the near-term financial
situation, and I can know this without making any shaky predictions

Gerhard

Jorgen Grahn · Nov 4, 2009

Not that much. For (casual, not precision) reading, a few digits are
usually enough, and most people who read this type of output (meant to
be communication between programs) are programmers, hence typically
reasonably fluent in octal and hex.

I disagree there, in two ways:

- I belong to the school that claims protocols should be human-readable,
because, well, it opens them up. They get so much easier to
manipulate, and even talk about. Take HTTP as an example, or SMTP.

- I doubt that programmers are that good with hex. Even if I limit
myself to unsigned int, I can't tell what 0xbabe is. Probably 40000
or so. Or 30000? Who knows? There is a reason decimal is the default
base in pretty much every language I know of ... including assembly
languages.

....

Since what we're talking about is only relevant for huge amounts of
data, doing anything more with that data than just a cursory look at
some numbers (which IMO is fine in octal or hex) generally needs a
program anyway.

But for the text version of the data, that "program" is often a Unix
pipeline involving tools like grep, sort and uniq, or a Perl one-liner
you make up as you go. Or it can be fed directly into gnuplot or
Excel. If the data is binary, you probably simply won't bother.

I think we have been misled a bit here, too. I haven't read the whole
thread, but it started with something like "dump a huge array of
floats to disk, collect it later". If you take the more common case
"take this huge complex data structure and dump it to disk in a
portable format", you have a completely different situation, where the
non-text format isn't that much smaller or faster.

/Jorgen

Brian · Nov 5, 2009

I disagree there, in two ways:

- I belong to the school that claims protocols should be human-readable,
because, well, it opens them up. They get so much easier to
manipulate, and even talk about. Take HTTP as an example, or SMTP.

- I doubt that programmers are that good with hex. Even if I limit
myself to unsigned int, I can't tell what 0xbabe is. Probably 40000
or so. Or 30000? Who knows? There is a reason decimal is the default
base in pretty much every language I know of ... including assembly
languages.

...

But for the text version of the data, that "program" is often a Unix
pipeline involving tools like grep, sort and uniq, or a Perl one-liner
you make up as you go. Or it can be fed directly into gnuplot or
Excel. If the data is binary, you probably simply won't bother.

I think we have been misled a bit here, too. I haven't read the whole
thread, but it started with something like "dump a huge array of
floats to disk, collect it later". If you take the more common case
"take this huge complex data structure and dump it to disk in a
portable format", you have a completely different situation, where the
non-text format isn't that much smaller or faster.

I guess you're saying that the results are closer in some
cases because there's a lot of non-numeric data involved
in those complex data structures. But aren't you ignoring
scientific applications where the majority of the data is
numeric?

Much earlier in the thread, Allnor wrote, "Binary files
are usually about 20%-70% of the size of the text file,
depending on numbers of significant digits and other
formatting text glyphs." I don't think anyone has
directly disagreed with that statement yet.

Brian Wood
Ebenezer Enterprises
www.webEbenezer.net

"How much better is it to get wisdom than gold! and to
get understanding rather to chosen than silver!"
Proverbs 16:16

James Kanze · Nov 6, 2009

On Nov 4, 3:47 pm, Jorgen Grahn <[email protected]> wrote:

[...]

I guess you're saying that the results are closer in some
cases because there's a lot of non-numeric data involved in
those complex data structures. But aren't you ignoring
scientific applications where the majority of the data is
numeric?

He spoke of the "more common case". Certainly, most common
cases do include a lot of text data. On the other hand, the
origine of this thread was dumping doubles: purely numeric data.
And while perhaps less common, they do exist, and aren't really
rare either. (I've encountered them once or twice in my career,
and I'm not a numerics specialist.)

Much earlier in the thread, Allnor wrote, "Binary files
are usually about 20%-70% of the size of the text file,
depending on numbers of significant digits and other
formatting text glyphs." I don't think anyone has
directly disagreed with that statement yet.

The original requirement, if I remember correctly, included
rereading the data with no loss of precision. This means 17
digits precision for an IEEE double, with an added sign, decimal
point and four or five characters for the exponent (using
scientific notation). Add a separator, and that's 24 or 25
bytes, rather than 8. So the 20% is off; 33% seems to be the
lower limit. But in a lot of cases, that's a lot; it's
certainly something that has to be considered in some
applications.

Rune Allnor · Nov 6, 2009

On Nov 4, 3:47 pm, Jorgen Grahn <[email protected]> wrote:

Click to expand...

[...]

I guess you're saying that the results are closer in some
cases because there's a lot of non-numeric data involved in
those complex data structures. But aren't you ignoring
scientific applications where the majority of the data is
numeric?

Click to expand...

He spoke of the "more common case".

As I recall, I started by a purely technical question about
binary typecasts. Others started bringing in text formats.
I have only attempted to explain - in vain, it seems - why
text-based numerical formats is a no-go in technical
applications.

Certainly, most common
cases do include a lot of text data.

I am not talking about 'common' cases. I am talking about heavy-duty
work. Once you are talking about numeric data in the hundreds of
MBytes
(regardless of the storage format), any amount of accompagnying text
is irrelevant. One page of plain text takes about 2 kbytes.

There was, in fact, an 'improvment' to the ancient SEG-Y seismic
data format,

http://en.wikipedia.org/wiki/SEG_Y

the SEG-2,

http://diwww.epfl.ch/lami/detec/seg2.html

where a lot of the auxillary (numeric) information was specificed
to be stored on text format. I first saw the SEG-2 spec about ten
years ago, but I have never heard that it has actually been used.
The speed losses involved with converting data back and forth from
text to binary would fully explain why SEG-2 does not gain wide-
spread acceptence among the heavy-duty users.

Rune

Brian · Nov 6, 2009

On Nov 4, 3:47 pm, Jorgen Grahn <[email protected]> wrote:

Click to expand...

[...]

I guess you're saying that the results are closer in some
cases because there's a lot of non-numeric data involved in
those complex data structures. But aren't you ignoring
scientific applications where the majority of the data is
numeric?

Click to expand...

He spoke of the "more common case". Certainly, most common
cases do include a lot of text data. On the other hand, the
origine of this thread was dumping doubles: purely numeric data.
And while perhaps less common, they do exist, and aren't really
rare either. (I've encountered them once or twice in my career,
and I'm not a numerics specialist.)

I've worked on one scientific application for a little over
six months. I hope to work with/on more scientific projects
in the future.

The original requirement, if I remember correctly, included
rereading the data with no loss of precision. This means 17
digits precision for an IEEE double, with an added sign, decimal
point and four or five characters for the exponent (using
scientific notation). Add a separator, and that's 24 or 25
bytes, rather than 8. So the 20% is off; 33% seems to be the
lower limit. But in a lot of cases, that's a lot; it's
certainly something that has to be considered in some
applications.

Yes. I brought it up because I wasn't sure if Grahn was
agreeing with something Fiedler said about it being just a few
more bytes. Even if it were 70% I wouldn't describe that as
a minor difference.

Brian Wood
http://www.webEbenezer.net

James Kanze · Nov 8, 2009

As I recall, I started by a purely technical question about
binary typecasts.

Which, of course, raises the question as to why. They're not
very useful unless you're doing exceptionally low level work.

Others started bringing in text formats.

The original comment was just that---a parenthetical comment.
Text formats have many advantages, WHEN you can use them. It's
also obvious that they have additional overhead---not nearly as
much as you claimed in terms of CPU, but they aren't free
either, neither in CPU time nor in data size.

I have only attempted to explain - in vain, it seems - why
text-based numerical formats is a no-go in technical
applications.

And you blew it by giving exagerated figures

. Other than
that: they're not a no-go in technical applications. They do
have too much overhead for some applications (not all), and in
such cases, you have to use a binary format. Depending on other
requirements (portability, external requirements, etc.), you may
need a more or less complicated binary format.

I am not talking about 'common' cases. I am talking about
heavy-duty work. Once you are talking about numeric data in
the hundreds of MBytes (regardless of the storage format), any
amount of accompagnying text is irrelevant. One page of plain
text takes about 2 kbytes.

Yes. I understand that.

In fact, now that you've mentionned seismic data, I agree that a
text format is probably not going to cut it. I've actually
worked on one project in the field, and I know just how much
floating point data they can generate.

Rune Allnor · Nov 8, 2009

I'm getting tired with re-iterating this for people who
are not interested in actually evaluating the numbers.

Look for an upcomimg post on comp.lang.c++.moderated, where
I distill the problem statement a bit, as well as present
a C++ test to see what kind of timing ratios I am talking about.

Rune

Brian Wood · Nov 8, 2009

I'm getting tired with re-iterating this for people who
are not interested in actually evaluating the numbers.

Look for an upcomimg post on comp.lang.c++.moderated, where
I distill the problem statement a bit, as well as present
a C++ test to see what kind of timing ratios I am talking about.

Rune

I took the liberty of copying your post from clc++m to here
as this newsgroup is faster as far as getting the posts out
there.

Hi all.

A couple of weeks ago I posted a question on comp.lang.c++ about some
technicality
about binary file IO. Over the course of the discussion, I discovered
to my
amazement - and, quite frankly, horror - that there seems to be a
school of
thought that text-based storage formats are universally preferable to
binary text
formats for reasons of portability and human readability.

The people who presented such ideas appeared not to appreciate two
details that
counter any benefits text-based numerical formats might offer:

1) Binary files are about 70-20% of the file size of the text files,
depending
on the number of significant digits stored in the text files and
other
formatting text glyphs.
2) Text-formatted numerical data take significantly longer to read and
write
than binary formats.

Timings are difficult to compare, since the exact numbers depend on
buffering
strategies, buffer sizes, disk speeds, network bandwidths and so on.

I have therefore sketched a 'distilled' test (code below) to test what
overheads
are involved with formatting numerical data back and forth between
text and
binary formats. To eliminate the impact of peripherical devices, I
have used
a std::stringstream to store the data. The binary bufferes are
represented
by vectors, and I have assumed that a memcpy from the file buffer to
the
destination memory location is all that is needed to import the binary
format
from the file buffer. (If there are significant run-time overheads
associated with
moving NATIVE binary formats to the destination, please let me
know.)

The output on my computer is (do note the _different_ numbers of IO
cycles in the two cases!):

Sun Nov 08 19:48:54 2009 : Binary IO cycles started
Sun Nov 08 19:49:00 2009 : 1000 Binary IO cycles completed
Sun Nov 08 19:49:00 2009 : Text-format IO cycles started
Sun Nov 08 19:49:16 2009 : 100 Text-format IO cycles completed

A little bit of math produces *average*, *crude* numbers for IO
cycles:

Text: 6 seconds / (1000 * 1e6) read/write cycles = 6e-9 s per r/w
cycle
Binary: 16 seconds / (100 * 1e6) read/write cycles = 160e-9 s per r/w
cycle

which in turn means there is an overhead on the order of of
160e-9/6e-9 = 26x
associated with the text formats.

Add a little bit of other overheads, e.g. caused by the significantly
larger text file sizes in combination with suboptimal buffering
strategies,
and the relative numbers easily hit the triple digits. Not at all
insignificant when one works with large amounts of data under tight
deadlines.

So please: Shoot this demo down! Give it your best, and prove me
and my numbers wrong.

And to the textbook authors who might be lurking: Please include a
chapter on relative binary and text-based IO speeds in your upcoming
editions. Binary file formats might not fit into your overall
philosophies about human readability and universal portability of C++
code, but some of your readers might appreciate being made aware of
such practical details.

Rune

/
***************************************************************************/
#include <iostream>
#include <sstream>
#include <time.h>
#include <vector>

int main()
{
const size_t NumElements = 1000000;
std::vector<double> SourceBuffer;
std::vector<double> DestinationBuffer;

for (size_t n=0;n<NumElements;++n)
{
SourceBuffer.push_back(n);
DestinationBuffer.push_back(0);
}

time_t rawtime;
struct tm * timeinfo;

time( &rawtime );
timeinfo = localtime( & rawtime );
std::string message( asctime (timeinfo) );
message.erase(message.size()-1);

std::cout << message.c_str() << " : Binary IO cycles started"
<< std::endl;

size_t NumBinaryIOCycles = 1000;
for (size_t n = 0; n < NumBinaryIOCycles; ++n)
{
for (size_t m = 0; m<NumElements; ++m )
{
DestinationBuffer[m] = SourceBuffer[m];
}
}

time( &rawtime );
timeinfo = localtime( & rawtime );
message=std::string( asctime (timeinfo) );
message.erase(message.size()-1);

std::cout << message.c_str() << " : " << NumBinaryIOCycles
<< " Binary IO cycles completed " << std:: endl;

std::stringstream ss;
const size_t NumTextFormatIOCycles = 100;

time( &rawtime );
timeinfo = localtime( & rawtime );
message=std::string( asctime (timeinfo) );
message.erase(message.size()-1);

std::cout << message.c_str() << " : Text-format IO cycles
started"
<< std::endl;

for (size_t n = 0; n < NumTextFormatIOCycles; ++n)
{
size_t m;
for (m = 0; m < NumElements; ++m)
ss << SourceBuffer[m];

m = 0;
while(!ss.eof())
{
ss >> DestinationBuffer[m];
++m;
}
}

time( &rawtime );
timeinfo = localtime( & rawtime );
message=std::string( asctime (timeinfo) );
message.erase(message.size()-1);

std::cout << message.c_str() << " : " << NumTextFormatIOCycles
<< " Text-format IO cycles completed " << std:: endl;

return 0;

}

Brian Wood

Brian Wood · Nov 8, 2009

I took the liberty of copying your post from clc++m to here
as this newsgroup is faster as far as getting the posts out
there.

Hi all.

A couple of weeks ago I posted a question on comp.lang.c++ about some
technicality
about binary file IO. Over the course of the discussion, I discovered
to my
amazement - and, quite frankly, horror - that there seems to be a
school of
thought that text-based storage formats are universally preferable to
binary text
formats for reasons of portability and human readability.

That seems to me an inaccurate description of this thread.
Kanze has pointed out the strengths of text formats, but
has also noted that there are times when binary formats
are needed. Who has been saying that text formats are
"universally preferable" to binary formats?

Brian Wood

James Kanze · Nov 9, 2009

On 8 Nov, 15:27, James Kanze <[email protected]> wrote:

I'm getting tired with re-iterating this for people who
are not interested in actually evaluating the numbers.

I actually did some measures, to check the numbers. Your
numbers were wrong. More to the point, actual numbers will vary
enormously from one implemenation to the next.

Look for an upcomimg post on comp.lang.c++.moderated,

Not every one reads that group. Not everyone agrees with its
moderation policy (as currently practiced).

reading binary file into memory. Converting from char to uint32,float, double, ASCII strings etc (st	37	Oct 15, 2011
Streaming file IO and binary files	3	Jul 25, 2007
binary file to CString	2	Nov 19, 2008
converting char to int (reading from a binary file)	12	May 16, 2008
converting char to float (reading binary data from file)	17	May 21, 2008
How to convert CSV to parquet file without RLE_DICTIONARY encoding?	0	Sep 2, 2022
Writing binary data from database to file	2	Sep 3, 2010
performance of script to write very long lines of random chars	15	Apr 11, 2013

Binary file IO: Converting imported sequences of chars to desiredtype

Brian

Gerhard Fiedler

James Kanze

Brian

Brian

Brian

Gerhard Fiedler

Brian

Rune Allnor

Gerhard Fiedler

Jorgen Grahn

Brian

James Kanze

Rune Allnor

Brian

James Kanze

Rune Allnor

Brian Wood

Brian Wood

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads