fstream Buffers

I

Ian Collins

I tried to google "fstream buffer" with great confusion and little
understanding of what I'd need to do to change the code below to use
explicit buffers: hoping to get a better performance match to my old
processing.
I'd like to use C++ strings and I/o streams if possible. but I wonder
if the overhead to use these functions can match older C processing
(fgets, setbuf, etc.) at all. The following seems probable:
1. The overhead to size and allocate/release a string variable for each
logical record will always be slower than use of an explicit char
[some_size] for the text string being read from the input file.
2. The "<<" operators are probably much slower than "fputs".
Are my assumptions valid? Are there options or function calls I can
use to specify fstream buffers (and how are they used?)? Please advise.
TIA

//////////////////// Code ////////////////////
string line;
fstream fVar1, fVar2;
fVar1.open("pat12.n00", fstream::in);
if(fVar1.is_open())
{
fVar2.open("test.txt", fstream::eek:ut);
while(getline(fVar1, line))
{
fVar2<< line<< endl;
}
fVar1.close(), fVar2.close();
}
else cout<< "Unable to open file";
//////////////////////////////////////////////

You haven't provided enough information (anywhere in the thread) to get
a meaningful answer.

What is the code that runs so much faster?

Is the data format line or record based?

The code above _is_ the code that runs much slower, and the code that
I'm basing it on uses fgets/fputs and setbuf with a size of 4096
characters/bytes. That is the only distinction between the 2 code
fragments, but including the I/o class definitions I wrote to prepare
and handle the fgets/fputs with *FILE and open/close logic is too much
to post here.
Bottom line: the executing code is only comparing *FILE
fgets/fputs/setbuf with the fstream getline/<< code above. The only
difference I can see if that the absence of a "setbuf" capability with
fstreams is an enormous performance hit...or getline/<< is a terrible
way to to text file I/o. 8<{{

You still haven't provided enough information for anyone to validate
your results.

These things are tricky to to test. For instance, I have to keep using
new files, or use a file > 16GB to avoid the OS file cache. So any code
that compares reading the same file will be biased to the second test.

By the way, on my Solaris box, comparing

while( (fgets( buf, bufSize, from ) ) )
{
lines++;
fputs( buf, to );
}

to

while( std::getline(in, line) )
{
lines++;
out << line << '\n';
}

shows the iostream version to take a little under twice as long as the C
version. Using a custom streambuf with a 32K buffer narrows this slightly.
 
J

jamin.hanson

Bottom line: the executing code is only comparing *FILE
fgets/fputs/setbuf with the fstream getline/<< code above. The only
difference I can see if that the absence of a "setbuf" capability with
fstreams is an enormous performance hit...or getline/<< is a terrible
way to to text file I/o. 8<{{

OK, here is my opening gambit for provoking more focused discussion. I wondered the same thing and the subject has come up a couple of times by users of my library (http://www.benhanson.net/lexertl.html) Personally, I've onlyneeded to use read() and gcount() on istream derived classes, so the following code is based around that assumption:

template<typename CharT, class Traits>
class basic_fast_filebuf : public std::basic_streambuf<CharT, Traits>
{
public:
basic_fast_filebuf (const char *filename_) :
_fp (0)
{
_fp = ::fopen(filename_, "r");
}

virtual ~basic_fast_filebuf()
{
::fclose(_fp);
_fp = 0;
}

protected:
FILE *_fp;

virtual std::streamsize xsgetn (CharT *ptr_, std::streamsize count_)
{
return ::fread (ptr_, sizeof(CharT), static_cast<std::size_t>(count_), _fp);
}
};

typedef basic_fast_filebuf<char, std::char_traits<char> > fast_filebuf;
typedef basic_fast_filebuf<wchar_t, std::char_traits<wchar_t> > wfast_filebuf;

I used the code like so:

fast_filebuf buf ("Unicode/PropList.txt");
std::istream if_(&buf);

etc.

I haven't tested to see if this really is faster, but presumably it couldn't get a lot simpler than that without foregoing C++ streams entirely.

Anyway, does that help at all? :)

Regards,

Ben
 
J

jamin.hanson

OK, here is my opening gambit for provoking more focused discussion. I wondered the same thing and the subject has come up a couple of times by users of my library (http://www.benhanson.net/lexertl.html) Personally, I've only needed to use read() and gcount() on istream derived classes, so the following code is based around that assumption:

template<typename CharT, class Traits>
class basic_fast_filebuf : public std::basic_streambuf<CharT, Traits>
{
public:
basic_fast_filebuf (const char *filename_) :
_fp (0)
{
_fp = ::fopen(filename_, "r");
}

virtual ~basic_fast_filebuf()
{
::fclose(_fp);
_fp = 0;
}

protected:
FILE *_fp;

virtual std::streamsize xsgetn (CharT *ptr_, std::streamsize count_)
{
return ::fread (ptr_, sizeof(CharT), static_cast<std::size_t>(count_), _fp);
}
};

typedef basic_fast_filebuf<char, std::char_traits<char> > fast_filebuf;
typedef basic_fast_filebuf<wchar_t, std::char_traits<wchar_t> > wfast_filebuf;

I used the code like so:

fast_filebuf buf ("Unicode/PropList.txt");
std::istream if_(&buf);

etc.

I haven't tested to see if this really is faster, but presumably it couldn't get a lot simpler than that without foregoing C++ streams entirely.

Anyway, does that help at all? :)

Regards,

Ben

OK, I timed this and it was very slightly faster compiled in Release using VC++ 2010. On GCC 4.7.0, again a release build, it is slightly slower. I would look at replacing getline() with something similar to your original C code and re-timing.

Regards,

Ben
 
L

Luca Risolia

I have the following routine that determines the time required to
copy a large text file to another. I'm comparing this code to a class I
wrote (too large to post here) that does text file I/o with explicit
buffer size designation (4096 character buffers). The code below runs
12-13 times slower, although doing exactly the same work.
fVar1.open("pat12.n00", fstream::in);
if(fVar1.is_open())
{
fVar2.open("test.txt", fstream::eek:ut);
while(getline(fVar1, line))
{
fVar2<< line<< endl;
}
fVar1.close(), fVar2.close();
}

It's not really clear why you need the string in the above example, but
if you just want to copy large text files efficiently using fstream
buffers, do this:

ifstream in("file1");
ofstream out("file2");
out << in.rdbuf();

It performs better than my system cp, which is written in C:

$ time cp file1 file2

real 1m36.491s
user 0m0.040s
sys 0m4.220s

$ rm file2; time ./fstreamcp

real 1m25.223s
user 0m0.164s
sys 0m5.252s

$ ls -l file*
-rw-rw-r-- 1 luca luca 1073741824 mag 27 17:35 file1
-rw-rw-r-- 1 luca luca 1073741824 mag 27 17:57 file2

fstreamcp has been compiled with g++ -O3.
 
P

Pavel

Luca said:
It's not really clear why you need the string in the above example, but if you
just want to copy large text files efficiently using fstream buffers, do this:

ifstream in("file1");
ofstream out("file2");
out << in.rdbuf();

It performs better than my system cp, which is written in C:

$ time cp file1 file2

real 1m36.491s
user 0m0.040s
sys 0m4.220s

$ rm file2; time ./fstreamcp

real 1m25.223s
user 0m0.164s
sys 0m5.252s

$ ls -l file*
-rw-rw-r-- 1 luca luca 1073741824 mag 27 17:35 file1
-rw-rw-r-- 1 luca luca 1073741824 mag 27 17:57 file2

fstreamcp has been compiled with g++ -O3.
I concur with Scott. cp might have warmed up the cache and fstreamcp capitalize
on that. Now, to measure the difference in performance of user-space code, you
probably *want* cache warmed-up (in both cases) to get file system i/o (it's
equally page-based at kernel level and device-dependent-buffer-based at device
level anyway).

On the other hand, analysing "user" time in your result (which includes all C++
stream code in question and its 'cp' counterpart) you can see that 'cp' takes
more than 4 times less 'user' time than 'fstreamcp'.

-Pavel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top