pubsetbuf, filebuf and when to flush

nitroamos · Nov 17, 2005

i'm working on improving the IO for the software project i'm working on
to do two extra things. first, i'm going to add HDF5 functionality, and
second, add the ability to write binary output. the software is
computational science where a lot of stuff is going to be calculated
and written to a file. e.g. could be up to gigabytes of data.

my question has to do with the latter -- and how often i need to call
"flush" on the stream. what I want the software to do is store up
several entries in memory and then dump several megabytes (or some
optimal number) all at once since this is the optimal setting that
makes the most sense to me.

so how does the buffering in a fstream, (or a filebuf) work?
specifically, if i just keep doing "write" commands, will it dump the
entire buffer when the buffer gets full? or will it only dump enough of
the buffer to make room for each "write"? can I rely on the standard
c++ libraries written for each machine to choose an optimal buffering
scheme and buffer size? it seems like at least some of these details
are implementation dependent... but i don't know which.

here's a reference:
http://www.cplusplus.com/ref/iostream/filebuf/
but nobody really discusses this issue that I can find any reference
to.

basically i'm wondering if i need to add the following to my code:
1) choose an optimal buffer size somehow
2) create and give it to the filebuf via pubsetbuf
3) figure out how many entries would fill my buffer

then in the process of the calculation
when I know the buffer is full, flush

so my question is, how much can i rely on the standard IO library to
handle this for me? i assume that each company who writes an IO library
for their machine would know the best way to handle it, but i just want
to be sure that none of their choices affect me.

thanks!

Amos.
nitroamos a t y a h o o

John Harrison · Nov 17, 2005

i'm working on improving the IO for the software project i'm working on
to do two extra things. first, i'm going to add HDF5 functionality, and
second, add the ability to write binary output. the software is
computational science where a lot of stuff is going to be calculated
and written to a file. e.g. could be up to gigabytes of data.

my question has to do with the latter -- and how often i need to call
"flush" on the stream. what I want the software to do is store up
several entries in memory and then dump several megabytes (or some
optimal number) all at once since this is the optimal setting that
makes the most sense to me.

so how does the buffering in a fstream, (or a filebuf) work?
specifically, if i just keep doing "write" commands, will it dump the
entire buffer when the buffer gets full? or will it only dump enough of
the buffer to make room for each "write"?

Every implementation I've seen does the former.

can I rely on the standard

c++ libraries written for each machine to choose an optimal buffering
scheme and buffer size? it seems like at least some of these details
are implementation dependent... but i don't know which.

No you can't. IO buffering is really done by the operating system and it
is difficult for portable code (like the C++ library) to take full
advantage. I would start with fstream using the default buffering size,
then try pubsetbuf, but if performance is really critical then you might
have to drop fstream and write your own stream classes which can use the
underlying operating system directly.

here's a reference:
http://www.cplusplus.com/ref/iostream/filebuf/
but nobody really discusses this issue that I can find any reference
to.

basically i'm wondering if i need to add the following to my code:
1) choose an optimal buffer size somehow
2) create and give it to the filebuf via pubsetbuf

Before you do any IO, call

file->rdbuf()->pubsetbuf(buffer, buffer_size);

3) figure out how many entries would fill my buffer

then in the process of the calculation
when I know the buffer is full, flush

You don't need to flush. The only time you ever flush is when your
application is crashing and you want to make sure that all output is
written before the crash. Calling flush yourself is going to seriously
degrade performance.

so my question is, how much can i rely on the standard IO library to
handle this for me? i assume that each company who writes an IO library
for their machine would know the best way to handle it, but i just want
to be sure that none of their choices affect me.

As I said, remember that the C++ IO library is likely to be cross
platform. Plus their likely to be writing for a typical application, not
one that needs to write giga bytes of data.

thanks!

Amos.
nitroamos a t y a h o o

john

FAQ 5.1 How do I flush/unbuffer an output filehandle? Why must I do this?	0	Apr 2, 2011
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
py3k buffered IO - flush() required between read/write?	7	May 11, 2011
Unable to flush o/p buffer	2	Jun 13, 2006
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
how to flush a file	5	Jan 28, 2007
Looking for feedback on this markup language I developed and my website idea?	0	Jun 17, 2023
Collecting multiple items and saving to one list item, for eventual storage as a record.	8	Mar 5, 2023

pubsetbuf, filebuf and when to flush

nitroamos

John Harrison

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads