J
Jason K
Let me preface this by saying this obviously isn't a C++ *language*
issue per se; rather probably an issue relating to quality of
implementation, unless I'm just misusing iostream...
I wrote a function to count lines in a large file that start with
a particular pattern. The file can contain large amounts of non-text
crap, which may make some lines very long (so using getline() with
a std::string isn't feasable). So I dug around looking at what
else is available in terms of the unformatted io functions and such,
and found istream::ignore, but the performance was crap compared
to an implementation using <cstdio>. (I tried it on both GNU C++
and MSVC++).
I'm kinda new to iostream, so I'm guessing (*hoping* is more like
it, because I'd rather use them for this) that I'm going about this
in the wrong way (and that that is part of why the performance is
so much worse). The other possibility is that compiler vendors
have spent more time making the C-style stdio functions fast, and
only implement iostream to be able to say they support the language
standard. Still another possibility (that I hope isn't the case)
is that iostreams are intrinsically unable to perform as well for
some reason relating to the design (lots of creation/destruction
of sentry objects maybe??).
Before anyone mentions it, this isn't on cin, so the sync_with_stdio
crap isn't the issue...
Anyway, here's the implementation I came up with using iostream.
(Btw, this is coming from memory and I'm not compiling it. So go
easy on the nitpicks if you grok the general idea).
----
// I was mildly suprised there's not something like this
// in <algorithm>, btw.
namespace {
template<class InputIterator, class OutputIterator, class T>
void
copy_n_until(InputIterator in, InputIterator end, OutputIterator out,
size_t limit, T until)
{
while (limit-- && in != end && *in != until)
*out++ = *in++;
}
int
count_special_lines(istream &in)
{
int cnt = 0;
do {
// Ok, I wanted to use istream::get(char_type *, size_type),
// but unfortunately it sets failbit if it stores no characters
// into the output array. Which kinda sucks for me, because
// adjacent newlines is not a real failure (and how can I tell
// this apart from a propagated failure on the low-level write()
// or WriteFile() or whatever system-specific function?).
// Maybe I'm not understanding something about the state bits...
//
// istream::getline also didn't seem so nice because it fails
// when it stores the max chars you told it it was allowed to
// read, but I'm deliberately limiting that number so I don't
// have to read a whole (potentially large) line into core.
//
// This iterator version seemed to be the best way to avoid
// having to screw with failure states that aren't real I/O
// failures, and its syntax is pretty nice. I'm not so sure
// about the performance implications, however. I'd note,
// though, that this certainly isn't the slowest part; just
// running a loop on the ignore is ridiculously slow...
//
string buf;
istreambuf_iterator<char> it(in), end;
copy_n_until(it, end, back_inserter(buf), 5, '\n');
if (buf == "Magic")
cnt++;
} while (in.ignore(numerical_limits<streamsize>.max(), '\n');
return cnt;
}
}
int
count_file(const string &file)
{
return count_special_lines(
ifstream(file.c_str(), ios::binary | ios::in));
}
----
The above approach worked, but it goes way slow for what it is doing
(essentially nothing). Using cstdio runs *many* times faster:
----
int
count_file(const string &file)
{
FILE *fp;
if (!(fp = fopen(file.c_str(), "r"))
return 0;
int cnt = 0;
for (; {
char buf[6];
if (!fgets(buf, sizeof buf, fp))
break;
if (!strcmp(buf, "Magic"))
cnt++;
// If we already finished a line, we don't need to
// skip to the next one.
if (strrchr(buf, '\n'))
continue;
// Skip to the next line.
int c;
while ((c = fgetc(fp)) != EOF && c != '\n')
;
if (c == EOF)
break;
}
fclose(fp);
return cnt;
}
----
Anyone know how I can rewrite the iostreams version to not suck (if
it is possible)? Or is the iostreams lib (at least, in the GNU and
MS implementations) not really useful for this sort of (very simple)
real world task, if speed is even slightly an issue?
Jason K
issue per se; rather probably an issue relating to quality of
implementation, unless I'm just misusing iostream...
I wrote a function to count lines in a large file that start with
a particular pattern. The file can contain large amounts of non-text
crap, which may make some lines very long (so using getline() with
a std::string isn't feasable). So I dug around looking at what
else is available in terms of the unformatted io functions and such,
and found istream::ignore, but the performance was crap compared
to an implementation using <cstdio>. (I tried it on both GNU C++
and MSVC++).
I'm kinda new to iostream, so I'm guessing (*hoping* is more like
it, because I'd rather use them for this) that I'm going about this
in the wrong way (and that that is part of why the performance is
so much worse). The other possibility is that compiler vendors
have spent more time making the C-style stdio functions fast, and
only implement iostream to be able to say they support the language
standard. Still another possibility (that I hope isn't the case)
is that iostreams are intrinsically unable to perform as well for
some reason relating to the design (lots of creation/destruction
of sentry objects maybe??).
Before anyone mentions it, this isn't on cin, so the sync_with_stdio
crap isn't the issue...
Anyway, here's the implementation I came up with using iostream.
(Btw, this is coming from memory and I'm not compiling it. So go
easy on the nitpicks if you grok the general idea).
----
// I was mildly suprised there's not something like this
// in <algorithm>, btw.
namespace {
template<class InputIterator, class OutputIterator, class T>
void
copy_n_until(InputIterator in, InputIterator end, OutputIterator out,
size_t limit, T until)
{
while (limit-- && in != end && *in != until)
*out++ = *in++;
}
int
count_special_lines(istream &in)
{
int cnt = 0;
do {
// Ok, I wanted to use istream::get(char_type *, size_type),
// but unfortunately it sets failbit if it stores no characters
// into the output array. Which kinda sucks for me, because
// adjacent newlines is not a real failure (and how can I tell
// this apart from a propagated failure on the low-level write()
// or WriteFile() or whatever system-specific function?).
// Maybe I'm not understanding something about the state bits...
//
// istream::getline also didn't seem so nice because it fails
// when it stores the max chars you told it it was allowed to
// read, but I'm deliberately limiting that number so I don't
// have to read a whole (potentially large) line into core.
//
// This iterator version seemed to be the best way to avoid
// having to screw with failure states that aren't real I/O
// failures, and its syntax is pretty nice. I'm not so sure
// about the performance implications, however. I'd note,
// though, that this certainly isn't the slowest part; just
// running a loop on the ignore is ridiculously slow...
//
string buf;
istreambuf_iterator<char> it(in), end;
copy_n_until(it, end, back_inserter(buf), 5, '\n');
if (buf == "Magic")
cnt++;
} while (in.ignore(numerical_limits<streamsize>.max(), '\n');
return cnt;
}
}
int
count_file(const string &file)
{
return count_special_lines(
ifstream(file.c_str(), ios::binary | ios::in));
}
----
The above approach worked, but it goes way slow for what it is doing
(essentially nothing). Using cstdio runs *many* times faster:
----
int
count_file(const string &file)
{
FILE *fp;
if (!(fp = fopen(file.c_str(), "r"))
return 0;
int cnt = 0;
for (; {
char buf[6];
if (!fgets(buf, sizeof buf, fp))
break;
if (!strcmp(buf, "Magic"))
cnt++;
// If we already finished a line, we don't need to
// skip to the next one.
if (strrchr(buf, '\n'))
continue;
// Skip to the next line.
int c;
while ((c = fgetc(fp)) != EOF && c != '\n')
;
if (c == EOF)
break;
}
fclose(fp);
return cnt;
}
----
Anyone know how I can rewrite the iostreams version to not suck (if
it is possible)? Or is the iostreams lib (at least, in the GNU and
MS implementations) not really useful for this sort of (very simple)
real world task, if speed is even slightly an issue?
Jason K