T
ts8807385
Hey guys,
I have a process where I'm throwing files out based on their file
header. This works fine, but when I have a lot of files (millions)
it's slow. What I do now is open each file and push the first ten
bytes into a vector I call 'header_bytes'. I basically do fd.get() ten
times while incrementing an int and pushing_back into the vector.
I then have a bunch of if statements that look similar to the below
code for about 12 common files headers (jpegs, pngs, wavs, riffs, etc)
that I want to exclude from further processing:
if (byte1 == 10 and byte2 == 14 and byte3 == 12)
return false;
else if ()
return false;
else if ()
return false;
else
//process the file further
return true;
As I said, this works fine. When I only have to process a few thousand
files, I'm done quickly. How can I speed it up?
Thanks,
Tom
I have a process where I'm throwing files out based on their file
header. This works fine, but when I have a lot of files (millions)
it's slow. What I do now is open each file and push the first ten
bytes into a vector I call 'header_bytes'. I basically do fd.get() ten
times while incrementing an int and pushing_back into the vector.
I then have a bunch of if statements that look similar to the below
code for about 12 common files headers (jpegs, pngs, wavs, riffs, etc)
that I want to exclude from further processing:
if (byte1 == 10 and byte2 == 14 and byte3 == 12)
return false;
else if ()
return false;
else if ()
return false;
else
//process the file further
return true;
As I said, this works fine. When I only have to process a few thousand
files, I'm done quickly. How can I speed it up?
Thanks,
Tom