ifstream speed

F

Frank Neuhaus

Hi,

I have a large file that I am opening with an std::ifstream. This file
contains a number of objects. My classes know how to deserialize from
a std::istream, so right now, I am just passing this std::ifstream to
my class constructors and they read themselfes from the stream. Those
classes read their members right from the stream (i.e. they dont read
a number of bytes into a buffer and then extract their data from there
or anything). Unfortunately I have the impression that my current
approach is somewhat slow. I believe that the ifstream is not
buffering correctly. I would expect it to read a big chunk into
memory, and then have my deserialization basically work right inside
memory. Could this be? What else could be the cause of the slowdown?
How could I make this faster?

Thank you
 
J

Juha Nieminen

Frank said:
What else could be the cause of the slowdown?
How could I make this faster?

Although I have not measured in many years if this has changed with
more modern compilers, at least years ago the C++ streams were
significantly slower than the C streams in most (if not all) compilers.
I'm not exactly sure about why this is so.

Two completely different projects I have been involved in saw a very
significant increase in reading and writing speed when the usage of C++
streams was changed to C streams.

I recommend that you write a small test program in your system which
does the same thing with lots of input (or output) data using C++
streams and then C streams, and measure if there is a significant
difference in speed. If there is, then your actual program might require
a refactoring.
 
M

Marcel Müller

Frank said:
I have a large file that I am opening with an std::ifstream. This file
contains a number of objects. My classes know how to deserialize from
a std::istream, so right now, I am just passing this std::ifstream to
my class constructors and they read themselfes from the stream [...]
I would expect it to read a big chunk into
memory, and then have my deserialization basically work right inside
memory. Could this be? What else could be the cause of the slowdown?
How could I make this faster?

Well, the iostream classes...

If you are talking about really much data, the big chunks must be in the
order of a few megabytes to get rid of the awful access times of common
direct accessible storage devices. Maybe your file system cache does the
job for you, maybe not. At least the standard buffers of the I/O
libraries are not that large.

Have you checked whether the deserialization basically eats CPU
resources or more I/O?
Furthermore, are you using portable I/O? This usually ends up with
working byte by byte and many shift operations.

Some operating systems have a way of specifying sequential access to a
stream. This can significantly improve the cache efficiency and the
throughput. Unfortunately neither C nor C++ has a standard way to set
such flags.

As a start you might tweak the buffers of the underlying filebuf.
(Method setbuf)


Marcel
 
F

Frank Neuhaus

Marcel Müller said:
Frank said:
I have a large file that I am opening with an std::ifstream. This file
contains a number of objects. My classes know how to deserialize from
a std::istream, so right now, I am just passing this std::ifstream to
my class constructors and they read themselfes from the stream [...]
I would expect it to read a big chunk into
memory, and then have my deserialization basically work right inside
memory. Could this be? What else could be the cause of the slowdown?
How could I make this faster?

Well, the iostream classes...

If you are talking about really much data, the big chunks must be in the
order of a few megabytes to get rid of the awful access times of common
direct accessible storage devices. Maybe your file system cache does the
job for you, maybe not. At least the standard buffers of the I/O libraries
are not that large.

Have you checked whether the deserialization basically eats CPU resources
or more I/O?
Furthermore, are you using portable I/O? This usually ends up with working
byte by byte and many shift operations.

Hm its a bit hard to benchmark in my app but i strongly believe it was IO.
Some operating systems have a way of specifying sequential access to a
stream. This can significantly improve the cache efficiency and the
throughput. Unfortunately neither C nor C++ has a standard way to set such
flags.

As a start you might tweak the buffers of the underlying filebuf. (Method
setbuf)

I tried that with no success (changed it to 5 mb buffersize but the
performance didnt improve).
I just replaced the stream with a custom class that asynchronously reads
chunks of 10 mb using an additional io thread (all with fopen/fread/...).
Now the performance is ok. Note to self: dont use iostream again for large
amounts of data...

Thanks
Frank
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top