output to a file (stream performance)

L

Lee

Hi,

I'm a stream virgin and am attempting to output strings to a file. My
approach is to write the string initially to a 'stringstream' and only when
complete write the stringstream to the file (ofstream).

The process works fine however appears to be rather slow. For example
outputting about 2Mb of data takes a couple of minutes (most of the time
appears to be writing to the stringstream) and as I'm creating several
hundred files in one run the whole time can run into hours. Oh, the files
are on my harddrive hence there is no network performance issues.

Is there anything basic I'm doing wrong (see code below) or does anyone have
any suggestions to improve this performance.

Thank in advance
Lee


stringstream m_outputFileReference;
ofstream m_outputFile ;

//initialisation for each file produced
m_outputFile.open(fullPathName, fstream::eek:ut) ;
m_outputFileReference.str("") ;
m_outputFileReference.clear() ;

// output data using the format below
m_outputFileReference << CC_NUM_START_OPEN << tempStr << CC_NUM_START_CLOSE
;

when complete send the stringstream to the file
m_outputFile << m_outputFileReference.str()
m_outputFile.close();
 
B

benben

Lee said:
Hi,

I'm a stream virgin and am attempting to output strings to a file. My
approach is to write the string initially to a 'stringstream' and only when
complete write the stringstream to the file (ofstream).

The process works fine however appears to be rather slow. For example
outputting about 2Mb of data takes a couple of minutes (most of the time
appears to be writing to the stringstream) and as I'm creating several
hundred files in one run the whole time can run into hours. Oh, the files
are on my harddrive hence there is no network performance issues.

Is there anything basic I'm doing wrong (see code below) or does anyone have
any suggestions to improve this performance.

Thank in advance
Lee


stringstream m_outputFileReference;
ofstream m_outputFile ;

//initialisation for each file produced
m_outputFile.open(fullPathName, fstream::eek:ut) ;
m_outputFileReference.str("") ;
m_outputFileReference.clear() ;

// output data using the format below
m_outputFileReference << CC_NUM_START_OPEN << tempStr << CC_NUM_START_CLOSE
;

when complete send the stringstream to the file
m_outputFile << m_outputFileReference.str()
m_outputFile.close();

why not just:

ofstream m_outputFile(fullPathName, fstream::eek:ut);
m_outputFile << CC_NUM_START_OPEN << tempStr << CC_NUM_START_CLOSE;

Regards,
Ben
 
D

Dietmar Kuehl

Lee said:
My
approach is to write the string initially to a 'stringstream' and only
when complete write the stringstream to the file (ofstream).

Why though? You could immediately write to the file.
The process works fine however appears to be rather slow.

It probably depends on the implementation of the stream buffer
underlying the string stream: some implementation stick to the
[original] words in the standard which mandate that the buffer
in a string stream shall increase by just one position and/or
they implement the buffer to grow by some fixed amount, e.g.
128 characters (yes, I have seen this is in actual code of a
commercial 'basic_stringbuf' implementation). This will cause
the string stream to spent its time mostly for copying
increasingly larger chunks of memory around.

The obvious work-around is to avoid this by using a file stream
immediately and just streaming the data there. If this is not
an option for whatever reason, the best bet is to *not* use a
string stream but rather a stream based on a simple handcrafted
stream buffer which simply extends the internal buffer by some
factor e.g. duplicating the size whenever the buffer runs full.
I think I have posted the corresponding code in the past but it
is not really hard to create anyway (just something like 20 lines
of code).
stringstream m_outputFileReference;

There is no good reason to use a 'std::stringstream' in this
situation anyway: you want to use a 'std::eek:stringstream'. However,
this change will probably not remove your problem.
ofstream m_outputFile ;

//initialisation for each file produced
m_outputFile.open(fullPathName, fstream::eek:ut) ;
m_outputFileReference.str("") ;
m_outputFileReference.clear() ;

You don't need to perform the above two operations: after
construction, the string stream is empty and in a 'good()'
state.
 
L

Lee

I tried this originally but when the files were across a network the
performance was even worse.

Lee
 
B

benben

I tried this originally but when the files were across a network the
performance was even worse.

Looks like the ofstream buffer isn't generous enough...and so it causes
lots of network traffic if I am not too mistaken.

If cross-platform is not an issue you may try platform-dependent support
which tends to have more optimization features.

Using a stringstream like a buffer is very odd a solution.

Regards,
Ben
 
L

Lee

Many thanks for the reply. I tried using the file stream directly but when
the files were over a network the performance was even worse.

Using a hand crafted stream buffer sounds good. I'm not to sure how
exactly - can this be achieved by creating a new class based upon streambuf
and utilsing the 'setbuf' function to control the buffer size.

thanks
Lee


Dietmar Kuehl said:
Lee said:
My
approach is to write the string initially to a 'stringstream' and only
when complete write the stringstream to the file (ofstream).

Why though? You could immediately write to the file.
The process works fine however appears to be rather slow.

It probably depends on the implementation of the stream buffer
underlying the string stream: some implementation stick to the
[original] words in the standard which mandate that the buffer
in a string stream shall increase by just one position and/or
they implement the buffer to grow by some fixed amount, e.g.
128 characters (yes, I have seen this is in actual code of a
commercial 'basic_stringbuf' implementation). This will cause
the string stream to spent its time mostly for copying
increasingly larger chunks of memory around.

The obvious work-around is to avoid this by using a file stream
immediately and just streaming the data there. If this is not
an option for whatever reason, the best bet is to *not* use a
string stream but rather a stream based on a simple handcrafted
stream buffer which simply extends the internal buffer by some
factor e.g. duplicating the size whenever the buffer runs full.
I think I have posted the corresponding code in the past but it
is not really hard to create anyway (just something like 20 lines
of code).
stringstream m_outputFileReference;

There is no good reason to use a 'std::stringstream' in this
situation anyway: you want to use a 'std::eek:stringstream'. However,
this change will probably not remove your problem.
ofstream m_outputFile ;

//initialisation for each file produced
m_outputFile.open(fullPathName, fstream::eek:ut) ;
m_outputFileReference.str("") ;
m_outputFileReference.clear() ;

You don't need to perform the above two operations: after
construction, the string stream is empty and in a 'good()'
state.
 
D

Dietmar Kuehl

Lee said:
Using a hand crafted stream buffer sounds good. I'm not to sure how
exactly - can this be achieved by creating a new class based upon
streambuf and utilsing the 'setbuf' function to control the buffer size.

'setbuf()' is the wrong tool: it has essentially no useful guarantees.
The only guarantee it has is that 'setbuf(0, 0)' will turn a file
stream to become unbuffered. You might be able to set a buffer size
suiting your need for file streams but this is implementation specific.

To create a useful surrogate for a string stream, you would derive a
class from 'std::streambuf' and essentially just override the
'overflow()' method to install more room. Essentially the code for
the stream buffer would look like this (note: the code is untested,
not even compiled):

class mystringbuf:
std::streambuf
{
public:
enum { initial = 1024 };
mystringbuf(): m_buffer(new char[initial])
{ this->setp(this->m_buffer, this->m_buffer + initial); }
~mystringbuf() { delete[] this->m_buffer; }

private:
int_type overflow(int_type c)
{
// increase the buffer
if (this->pptr() == this->epptr())
{
ptrdiff_t size = this->pptr() - this->pbase();
char* tmp = new char[2 * size];
std::copy(this->pbase(), this->pptr(), tmp);
this->setp(tmp, tmp + 2 * size);
this->pbump(size);
std::swap(tmp, this->m_buffer);
delete[] tmp;
}

// put the character into the buffer
if (c != std::char_traits<char>::eof())
{
*this->pptr() = std::char_traits<char>::to_char_type(c);
this->pbump(1);
}

// signal success
return std::char_traits<char>::not_eof(c);
}
char* m_buffer;
};

You would, of course, still need some method to access the buffer
but this can be anything to your liking. I would probably provide
a pair of iterators and/or a pointer to the start plus the current
size (this would be useful to pass it to 'sputn()' of the file
stream's 'rdbuf()').
 
A

Alex Vinokur

Lee said:
Hi,

I'm a stream virgin and am attempting to output strings to a file. My
approach is to write the string initially to a 'stringstream' and only when
complete write the stringstream to the file (ofstream). [snip]
Is there anything basic I'm doing wrong (see code below) or does anyone have
any suggestions to improve this performance.
[snip]

http://groups.google.com/group/perfo/msg/8273f4d1a05cfbd1 contains
various testsuites to measure comparative performance of "Reading file
into string".
Perhaps, it is worth building similar testsuites to measure comparative
performance of "Writing string to file", for instance.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top