Fastest way to read from a file into a vector<unsigned char>

S

sshock

Hi all,

I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.

Thanks,
Phillip Hellewell
 
V

Victor Bazarov

I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector
here)? I tried to do some things with reserve(), but they didn't help.

You could read it into an array and then initialise your vector with it.

As for "fastest", you'd have to clock it. There is no way to tell unless
a special tool (a profiler) is involved.

V
 
S

sshock

Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?
 
V

Victor Bazarov

Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?

Initialise it from the stream buffer directly, or from the extractor
iterator (like "istream_iterator" or something).

V
 
I

Ian Collins

Surely making a copy of an array is slower than a memset.

I want a faster solution, not a slower one.

Short of implementing my own byte_vector class, does anyone have a
solution to remove the unnecessary initialization of the vector?
If you system supports memory mapping (mmap) a file, write a simple
input iterator object to iterate over an array of unsigned char. Map
your file, point a pair of integrators at the beginning and end and
construct the vector from the two iterators.

Might be quicker, might not.

Benchmark.
 
I

Ivan Vecerina

: Hi all,
:
: I want to read from a file into a vector<unsigned char>. Right now my
: code looks like this:
:
: FILE* f = fopen( "datafile", "rb" );
: enum { SIZE = 100 };
: vector<unsigned char> buf(SIZE);
: fread(&buf[0], 1, SIZE, f);
:
: The problem is that the vector's constructor initializes the buffer to
: all zeroes. I don't want it to initialize to all zeroes. It is
: pointless and a waste of time since I will just be reading in from the
: file overtop of it.
Believing that initializing the vector will slow things down is
probably a misconception. Writing memory takes very little processor
time, and will be handled in the cache. You won't even have a cache
flush before you overwrite the same memory anyway.
Slow <=> disc i/o >> memory i/o >> cache i/o >> processor

: So, does anyone know how I could eliminate the initialization of the
: vector (without switching to a raw array; I really want a vector here)?
: I tried to do some things with reserve(), but they didn't help.

If you need top performance for a large file, using platform-specific
ways to map the file into memory is likely to provide the best results
- as Ian suggested.
Why is a vector needed in the first place?
It is a very safe bet to say that you have much more to win from other
optimizations than what you will gain by skipping the vector init.

Regards,
Ivan
 
J

Jacek Dziedzic

Hi all,

I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

That's premature optimization. Have you actually timed your
code to see which part takes how much time? I bet you haven't,
because then you'd see that it's the disk I/O that takes
over 99% of the time -- trying to optimize away the vector
initialization is pointless.

Unless, of course, you read from a RAM disk of some sorts
or some other device that is way faster than the fastest HDDs
around.
So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.

Time your routine, identify the bottleneck. I'm fairly certain
you will notice that the call(s) to fread() take(s) your time.
Others have suggested means to speed that up already.

HTH,
- J.
 
M

Markus Schoder

Hi all,

I want to read from a file into a vector<unsigned char>. Right now my
code looks like this:

FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);

The problem is that the vector's constructor initializes the buffer to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

So, does anyone know how I could eliminate the initialization of the
vector (without switching to a raw array; I really want a vector here)?
I tried to do some things with reserve(), but they didn't help.

What did you try with reserve()? Something like the following?

vector<unsigned char> buf;
buf.reserve(SIZE);
flockfile(f);
for(size_t i = 0; i < SIZE; ++i)
{
const int r = getc_unlocked(f);
if(r == EOF)
break;
buf.push_back(r);
}
funlockfile(f);

This avoids the vector initialization. Clearly there are other costs
however. You would need to test it (with optimization) to see wether it
is overall faster.
 
R

Richard Herring

Jacek said:
Hi all,
I want to read from a file into a vector<unsigned char>. Right now
my
code looks like this:
FILE* f = fopen( "datafile", "rb" );
enum { SIZE = 100 };
vector<unsigned char> buf(SIZE);
fread(&buf[0], 1, SIZE, f);
The problem is that the vector's constructor initializes the buffer
to
all zeroes. I don't want it to initialize to all zeroes. It is
pointless and a waste of time since I will just be reading in from the
file overtop of it.

That's premature optimization. Have you actually timed your
code to see which part takes how much time? I bet you haven't,
because then you'd see that it's the disk I/O that takes
over 99% of the time -- trying to optimize away the vector
initialization is pointless.

Unless, of course, you read from a RAM disk of some sorts
or some other device that is way faster than the fastest HDDs
around.

Or the OS kindly takes care of read-ahead caching of the file access for
you.

I have experienced exactly this problem, and determined by profiling
that, to my surprise, vector initialisation was indeed taking a large
fraction of the time. (Typical size of the data being read was something
like a megabyte.)

I gave up and wrote my own simple "lightweight vector" class - basically
just a pointer and size and capacity counter.
 
P

Paul Dubuc

Victor said:
Initialise it from the stream buffer directly, or from the extractor
iterator (like "istream_iterator" or something).

Yes, try something like this:

std::vector<unsigned char> buf;
std::ifstream strm("datafile", std::ios_base::binary);
if (!strm)
{
std::cerr << "cannot open file\n" << std::endl;
exit();
}
strm.unsetf(std::ios_base::skipws);
std::istream_iterator<unsigned char> isi(strm), isiEOF;
buf.assign(isi, isiEOF);
if (!strm.eof()) std::cerr << "read error\n" << std::endl;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top