barcaroller said:
I have a large block of memory. I need to (1) check if it contains only
ASCII characters (including newlines and/or carriage-returns) and, if so,
(2) extract the lines into individual C++ strings.
Currently I loop the entire block (byte for byte), run isascii(byte) on
each byte, and then call getline() (either string.getline or
iostream.getline
does the job). This is proving too slow. I'm sure this problem has been
solved using more efficient methods. Any suggestions?
You could use the first pass to store information about where the lines
start and end. Something like:
/*
appends the lines in [from,to) to the_text provided, all
characters in the range are ascii. If not, no lines will
be appended.
*/
template < typename ConstCharIter, typename StringSequence >
bool append_lines ( ConstCharIter from,
ConstCharIter to,
StringSequence & the_text ) {
typedef std:
air< ConstCharIter, ConstCharIter > line;
std::deque< line > the_lines;
CharConstIter line_beg = from;
CharConstIter line_end = line_beg;
while ( true ) {
if ( line_end == to ) {
the_lines.push_back( line( line_beg, line_end ) );
break;
}
if ( *line_end == '\n' ) {
the_text.push_back( line( line_beg, line_end ) );
++line_end;
line_beg = line_end;
continue;
}
if ( ! isascii( *line_end ) ) {
return ( false );
}
++ line_end;
}
for ( std::deque< line >::const_iterator line_iter = the_lines.begin();
line_iter != the_lines.end(); ++ line_iter ) {
// prematurely optimizing away a copy-constructor that might
// be elided by the implementation anyway:
// the_text.push_back
// ( std::string( line_iter->first, line_iter->second ) );
the_text.push_back( std::string() );
the_text.back().swap
( std::string( line_iter->first, line_iter->second ) );
}
return ( true );
}
Note: code not touched by a compiler.
Also: if it is expected that non-ascii characters only occur with negligible
probability, you might be able to save time by inserting the lines right
away and roll-back the transaction if you encounter a non-ascii character.
Best
Kai-Uwe Bux