fstream Buffers

Ian Collins · May 20, 2012

I tried to google "fstream buffer" with great confusion and little
understanding of what I'd need to do to change the code below to use
explicit buffers: hoping to get a better performance match to my old
processing.
I'd like to use C++ strings and I/o streams if possible. but I wonder
if the overhead to use these functions can match older C processing
(fgets, setbuf, etc.) at all. The following seems probable:
1. The overhead to size and allocate/release a string variable for each
logical record will always be slower than use of an explicit char
[some_size] for the text string being read from the input file.
2. The "<<" operators are probably much slower than "fputs".
Are my assumptions valid? Are there options or function calls I can
use to specify fstream buffers (and how are they used?)? Please advise.
TIA

//////////////////// Code ////////////////////
string line;
fstream fVar1, fVar2;
fVar1.open("pat12.n00", fstream::in);
if(fVar1.is_open())
{
fVar2.open("test.txt", fstream:ut);
while(getline(fVar1, line))
{
fVar2<< line<< endl;
}
fVar1.close(), fVar2.close();
}
else cout<< "Unable to open file";
//////////////////////////////////////////////

Click to expand...

You haven't provided enough information (anywhere in the thread) to get
a meaningful answer.

What is the code that runs so much faster?

Is the data format line or record based?

Click to expand...

The code above _is_ the code that runs much slower, and the code that
I'm basing it on uses fgets/fputs and setbuf with a size of 4096
characters/bytes. That is the only distinction between the 2 code
fragments, but including the I/o class definitions I wrote to prepare
and handle the fgets/fputs with *FILE and open/close logic is too much
to post here.
Bottom line: the executing code is only comparing *FILE
fgets/fputs/setbuf with the fstream getline/<< code above. The only
difference I can see if that the absence of a "setbuf" capability with
fstreams is an enormous performance hit...or getline/<< is a terrible
way to to text file I/o. 8<{{

You still haven't provided enough information for anyone to validate
your results.

These things are tricky to to test. For instance, I have to keep using
new files, or use a file > 16GB to avoid the OS file cache. So any code
that compares reading the same file will be biased to the second test.

By the way, on my Solaris box, comparing

while( (fgets( buf, bufSize, from ) ) )
{
lines++;
fputs( buf, to );
}

to

while( std::getline(in, line) )
{
lines++;
out << line << '\n';
}

shows the iostream version to take a little under twice as long as the C
version. Using a custom streambuf with a 32K buffer narrows this slightly.

jamin.hanson · May 21, 2012

Bottom line: the executing code is only comparing *FILE
fgets/fputs/setbuf with the fstream getline/<< code above. The only
difference I can see if that the absence of a "setbuf" capability with
fstreams is an enormous performance hit...or getline/<< is a terrible
way to to text file I/o. 8<{{

OK, here is my opening gambit for provoking more focused discussion. I wondered the same thing and the subject has come up a couple of times by users of my library (http://www.benhanson.net/lexertl.html) Personally, I've onlyneeded to use read() and gcount() on istream derived classes, so the following code is based around that assumption:

template<typename CharT, class Traits>
class basic_fast_filebuf : public std::basic_streambuf<CharT, Traits>
{
public:
basic_fast_filebuf (const char *filename_) :
_fp (0)
{
_fp = ::fopen(filename_, "r");
}

virtual ~basic_fast_filebuf()
{
::fclose(_fp);
_fp = 0;
}

protected:
FILE *_fp;

virtual std::streamsize xsgetn (CharT *ptr_, std::streamsize count_)
{
return ::fread (ptr_, sizeof(CharT), static_cast<std::size_t>(count_), _fp);
}
};

typedef basic_fast_filebuf<char, std::char_traits<char> > fast_filebuf;
typedef basic_fast_filebuf<wchar_t, std::char_traits<wchar_t> > wfast_filebuf;

I used the code like so:

fast_filebuf buf ("Unicode/PropList.txt");
std::istream if_(&buf);

etc.

I haven't tested to see if this really is faster, but presumably it couldn't get a lot simpler than that without foregoing C++ streams entirely.

Anyway, does that help at all?

Regards,

Ben

jamin.hanson · May 22, 2012

OK, here is my opening gambit for provoking more focused discussion. I wondered the same thing and the subject has come up a couple of times by users of my library (http://www.benhanson.net/lexertl.html) Personally, I've only needed to use read() and gcount() on istream derived classes, so the following code is based around that assumption:

template<typename CharT, class Traits>
class basic_fast_filebuf : public std::basic_streambuf<CharT, Traits>
{
public:
basic_fast_filebuf (const char *filename_) :
_fp (0)
{
_fp = ::fopen(filename_, "r");
}

virtual ~basic_fast_filebuf()
{
::fclose(_fp);
_fp = 0;
}

protected:
FILE *_fp;

virtual std::streamsize xsgetn (CharT *ptr_, std::streamsize count_)
{
return ::fread (ptr_, sizeof(CharT), static_cast<std::size_t>(count_), _fp);
}
};

typedef basic_fast_filebuf<char, std::char_traits<char> > fast_filebuf;
typedef basic_fast_filebuf<wchar_t, std::char_traits<wchar_t> > wfast_filebuf;

I used the code like so:

fast_filebuf buf ("Unicode/PropList.txt");
std::istream if_(&buf);

etc.

I haven't tested to see if this really is faster, but presumably it couldn't get a lot simpler than that without foregoing C++ streams entirely.

Anyway, does that help at all?

Regards,

Ben

OK, I timed this and it was very slightly faster compiled in Release using VC++ 2010. On GCC 4.7.0, again a release build, it is slightly slower. I would look at replacing getline() with something similar to your original C code and re-timing.

Regards,

Ben

jamin.hanson · May 27, 2012

If anyone is still following this thread, this looks like the ultimate way to access files that will fit in memory: http://en.wikibooks.org/wiki/Optimi...on_techniques/Input/Output#Memory-mapped_file

Regards,

Ben

Luca Risolia · May 27, 2012

I have the following routine that determines the time required to
copy a large text file to another. I'm comparing this code to a class I
wrote (too large to post here) that does text file I/o with explicit
buffer size designation (4096 character buffers). The code below runs
12-13 times slower, although doing exactly the same work.

fVar1.open("pat12.n00", fstream::in);
if(fVar1.is_open())
{
fVar2.open("test.txt", fstream:ut);
while(getline(fVar1, line))
{
fVar2<< line<< endl;
}
fVar1.close(), fVar2.close();
}

It's not really clear why you need the string in the above example, but
if you just want to copy large text files efficiently using fstream
buffers, do this:

ifstream in("file1");
ofstream out("file2");
out << in.rdbuf();

It performs better than my system cp, which is written in C:

$ time cp file1 file2

real 1m36.491s
user 0m0.040s
sys 0m4.220s

$ rm file2; time ./fstreamcp

real 1m25.223s
user 0m0.164s
sys 0m5.252s

$ ls -l file*
-rw-rw-r-- 1 luca luca 1073741824 mag 27 17:35 file1
-rw-rw-r-- 1 luca luca 1073741824 mag 27 17:57 file2

fstreamcp has been compiled with g++ -O3.

Luca Risolia · May 28, 2012

how did you flush file1 from the OS file cache before you ran fstreamcp?

with

$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

Pavel · May 28, 2012

Luca said:
It's not really clear why you need the string in the above example, but if you
just want to copy large text files efficiently using fstream buffers, do this:

ifstream in("file1");
ofstream out("file2");
out << in.rdbuf();

It performs better than my system cp, which is written in C:

$ time cp file1 file2

real 1m36.491s
user 0m0.040s
sys 0m4.220s

$ rm file2; time ./fstreamcp

real 1m25.223s
user 0m0.164s
sys 0m5.252s

$ ls -l file*
-rw-rw-r-- 1 luca luca 1073741824 mag 27 17:35 file1
-rw-rw-r-- 1 luca luca 1073741824 mag 27 17:57 file2

fstreamcp has been compiled with g++ -O3.

I concur with Scott. cp might have warmed up the cache and fstreamcp capitalize
on that. Now, to measure the difference in performance of user-space code, you
probably *want* cache warmed-up (in both cases) to get file system i/o (it's
equally page-based at kernel level and device-dependent-buffer-based at device
level anyway).

On the other hand, analysing "user" time in your result (which includes all C++
stream code in question and its 'cp' counterpart) you can see that 'cp' takes
more than 4 times less 'user' time than 'fstreamcp'.

-Pavel

fstream File i/o	1	May 16, 2012
fstream - write a file	3	Mar 26, 2009
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
Understanding fstream seeking	2	May 24, 2009
fstream vs FILE	4	Mar 7, 2008
fstream issue with ! operator	6	Jun 17, 2008
fstream problem	3	Nov 12, 2007
formatting buffers	3	Nov 14, 2005

fstream Buffers

Ian Collins

jamin.hanson

jamin.hanson

jamin.hanson

Luca Risolia

Luca Risolia

Pavel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads