memory block -> C++ strings

B

barcaroller

I have a large block of memory. I need to (1) check if it contains only
ASCII characters (including newlines and/or carriage-returns) and, if so,
(2) extract the lines into individual C++ strings.

Currently I loop the entire block (byte for byte), run isascii(byte) on each
byte, and then call getline() (either string.getline or iostream.getline
does the job). This is proving too slow. I'm sure this problem has been
solved using more efficient methods. Any suggestions?
 
O

Obnoxious User

I have a large block of memory. I need to (1) check if it contains only
ASCII characters (including newlines and/or carriage-returns) and, if
so, (2) extract the lines into individual C++ strings.

Currently I loop the entire block (byte for byte), run isascii(byte) on
each byte, and then call getline() (either string.getline or
iostream.getline does the job). This is proving too slow. I'm sure
this problem has been solved using more efficient methods. Any
suggestions?

For each asserted byte that contains only the basic latin alphabet,
copy it directly before continuing testing. Or if you're familiar
working with iterators, return an iterator pair and prolong copying
the data; do you even need std::string?
 
K

Kai-Uwe Bux

barcaroller said:
I have a large block of memory. I need to (1) check if it contains only
ASCII characters (including newlines and/or carriage-returns) and, if so,
(2) extract the lines into individual C++ strings.

Currently I loop the entire block (byte for byte), run isascii(byte) on
each byte, and then call getline() (either string.getline or
iostream.getline
does the job). This is proving too slow. I'm sure this problem has been
solved using more efficient methods. Any suggestions?

You could use the first pass to store information about where the lines
start and end. Something like:

/*
appends the lines in [from,to) to the_text provided, all
characters in the range are ascii. If not, no lines will
be appended.
*/
template < typename ConstCharIter, typename StringSequence >
bool append_lines ( ConstCharIter from,
ConstCharIter to,
StringSequence & the_text ) {
typedef std::pair< ConstCharIter, ConstCharIter > line;
std::deque< line > the_lines;
CharConstIter line_beg = from;
CharConstIter line_end = line_beg;
while ( true ) {
if ( line_end == to ) {
the_lines.push_back( line( line_beg, line_end ) );
break;
}
if ( *line_end == '\n' ) {
the_text.push_back( line( line_beg, line_end ) );
++line_end;
line_beg = line_end;
continue;
}
if ( ! isascii( *line_end ) ) {
return ( false );
}
++ line_end;
}
for ( std::deque< line >::const_iterator line_iter = the_lines.begin();
line_iter != the_lines.end(); ++ line_iter ) {
// prematurely optimizing away a copy-constructor that might
// be elided by the implementation anyway:
// the_text.push_back
// ( std::string( line_iter->first, line_iter->second ) );
the_text.push_back( std::string() );
the_text.back().swap
( std::string( line_iter->first, line_iter->second ) );
}
return ( true );
}

Note: code not touched by a compiler.


Also: if it is expected that non-ascii characters only occur with negligible
probability, you might be able to save time by inserting the lines right
away and roll-back the transaction if you encounter a non-ascii character.


Best

Kai-Uwe Bux
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top