What is the C++ idiom to strip leading and lagging white space from astring?

  • Thread starter Ramon F Herrera
  • Start date
R

Ramon F Herrera

I have done blank stripping a million times, in C and Perl.

Please bear with me as I am trying to pick up the C++ "way of doing
things".

Thx,

-RFH
 
F

Francesco

I have done blank stripping a million times, in C and Perl.

Please bear with me as I am trying to pick up the C++ "way of doing
things".

Thx,

Just a clarification, what do you mean with "lagging"? Do you mean
"trailing"? Or you do mean multiple whitespace occurrences between
words, i.e "just like this" to become "just like this"?

STL streams have a manipulator that skips whitespace, make a search
for "skipws".

I could easily write a function for doing all the above, I believe you
can do the same, pretty like as you have done in C.

You could also use some string functions (take a look to
http://www.cppreference.com/wiki/string/find_first_not_of) to
implement such a trim_lead_double_trail_ws() function.

Best regards,
Francesco
 
V

Victor Bazarov

Ramon said:
I have done blank stripping a million times, in C and Perl.

Please bear with me as I am trying to pick up the C++ "way of doing
things".

Not sure about the idiom, but

char const WS[] = "..."; // whatever you consider WS
str.erase(0, str.find_first_not_of(WS));
str.erase(str.find_last_not_of(WS) + 1);

should do it. If you want your WS to be determined by a function, like
'isspace' or whatnot, you need to write a functor and remove/erase those
for which the functor is true or vice versa, keep those for which the
functor yields false... A bit more convoluted.

V
 
L

Lucius Sanctimonious

> Just a clarification, what do you mean with
> "lagging"? Do you mean "trailing"?

Exactly.

I realize that I could write a function, but the reason I am moving
from C to C++ is because there are so many of those functions and
classes already.

-Ramon
 
V

Victor Bazarov

Lucius said:
Exactly.

I realize that I could write a function, but the reason I am moving
from C to C++ is because there are so many of those functions and
classes already.

Just so you don't get disappointed on your way from C to C++, most of
the code C++ programmers write is *functions*. Just happens to be that
way. The solution I proposed isn't a single function call and it would
be better wrapped in a function (or even perhaps a function template
<gasp!>).

V
 
F

Francesco

Just so you don't get disappointed on your way from C to C++, most of
the code C++ programmers write is *functions*.  Just happens to be that
way.  The solution I proposed isn't a single function call and it would
be better wrapped in a function (or even perhaps a function template
<gasp!>).

Heck, I can see Victor's reply to Lucius' (???) reply to my reply to
Ramon's post, but I cannot see Lucius (???) reply! Damn, this NNTP
thing is driving me crazy =/

Anyway, this is one of those many functions that Victor spoke about:

-------
#include <iostream>
#include <string>
#include <sstream>

using namespace std;

string trim_ws(const string& s) {
size_t begin = s.find_first_not_of(' ');
if (begin == string::npos) {
return "";
}
size_t end = s.find_last_not_of(' ') + 1;
stringstream ss;
bool skipit = false;
for (size_t i = begin; i < end; ++i) {
char ch = s;
if (ch != ' ') {
ss << ch;
skipit = false;
} else if (!skipit) {
ss << ch;
skipit = true;
}
}
return ss.str();
}

int main()
{
string s = " just like this ";
cout << "[" << s << "]" << endl;
cout << "[" << trim_ws(s) << "]" << endl;
return 0;
}
-------

Just wrote it on the fly for exercise.
You could easily extend it with the "whatever character" idea
mentioned by Victor.

Cheers,
Francesco
 
R

Ramon F Herrera

> Heck, I can see Victor's reply to Lucius' (???)
> reply to my reply to Ramon's post, but I cannot see
> Lucius (???) reply! Damn, this NNTP
> thing is driving me crazy =/

It is my fault, I posted from a co-worker's account which should be
only used for work, so I removed it.

-Ramon
 
R

Ramon F Herrera

boost::trim

http://www.boost.org/doc/libs/1_37_0/boost/algorithm/string/trim.hpp

Markus

--http://www.markus-raab.org| Probleme kann man nie mit derselben
                      -o)  | Denkweise lösen, durch die sie entstanden
Kernel 2.6.24-1-a      /\  | sind.  -- Albert Einstein  
on a x86_64           _\_v |


Thanks, Markus! That is exactly what I had in mind when I posted the
question. I happen to be a recent boost convert. Downloaded and
installed it yesterday.

Just to make sure, is this the correct way to add the include?

#include <boost/algorithm/string/trim.hpp>

Regards,

-Ramon
 
R

Ramon F Herrera

> Just a clarification, what do you mean with "lagging"?
> Do you mean "trailing"?

Blame it on my Economics classes, peppered with "leading indicators"
and "lagging indicators". Those are used to predict recessions and
exit from them. Very pertinent stuff these days! :)

-Ramon
 
F

Francesco

 > Heck, I can see Victor's reply to Lucius' (???)
 > reply to my reply to Ramon's post, but I cannot see
 > Lucius (???) reply! Damn, this NNTP
 > thing is driving me crazy =/

It is my fault, I posted from a co-worker's account which should be
only used for work, so I removed it.

Fine, no problem. I also thought that Google Groups went mad at some
point =>

Francesco
 
F

Francesco

 > Just a clarification, what do you mean with "lagging"?
 > Do you mean "trailing"?

Blame it on my Economics classes, peppered with "leading indicators"
and "lagging indicators". Those are used to predict recessions and
exit from them. Very pertinent stuff these days!  :)

Never ever touched such arguments, thanks for the explanation, chances
are I could face that term again and now I know what it could mean.

Cheers,
Francesco
 
C

Chris M. Thomasson

Ramon F Herrera said:
I have done blank stripping a million times, in C and Perl.

Please bear with me as I am trying to pick up the C++ "way of doing
things".

Perhaps something like this:
________________________________________________________________
#include <string>
#include <cctype>


#define xisspace(c) isspace((unsigned char)(c))


size_t
trim_find(std::string const& self,
size_t* pend)
{
char const* str = self.c_str();
char const* start = str;

while (*start && xisspace(*start))
{
start++;
}

if (*start)
{
char const* end = start;
char const* pos = start + 1;

while (*pos)
{
if (! xisspace(*pos))
{
end = pos;
}

++pos;
}

*pend = (end + 1) - start;

return start - str;
}

*pend = 0;

return 0;
}


std::string
trim(std::string const& self)
{
size_t end;
size_t start = trim_find(self, &end);

return self.substr(start, end);
}




#include <cstdio>


int
main()
{
std::string str(" xx x x xxx ");

std::printf("non-trimmed: ->|%s|<-\n", str.c_str());
std::printf("trimmed: ->|%s|<-\n", trim(str).c_str());

return 0;
}
________________________________________________________________




lol. ;^)
 
C

Chris M. Thomasson

Chris M. Thomasson said:
Perhaps something like this:
[...]

Since `std::string' can give us the length of the stored string, we can
re-write the `trim_find()' function to take advantage of that fact:
____________________________________________________________
size_t
trim_find(std::string const& self,
size_t* pend)
{
char const* str = self.c_str();
char const* start = str;

while (*start && xisspace(*start))
{
start++;
}

if (*start)
{
char const* end = str + (self.length() - 1);

while (end != str && xisspace(*end))
{
--end;
}

*pend = (end + 1) - start;

return start - str;
}

*pend = 0;

return 0;
}
____________________________________________________________




This has MUCH better scalability characteristics since it does not need to
scan the entire string in order to trim white spaces. You can have a string
that is 50,000 characters in length, but if it has say, 100 leading white
spaces, and 50 trailing white spaces, the scan is only going to observe 150
characters, NOT all 50,000!

;^)
 
J

James Kanze

Not sure about the idiom, but
char const WS[] = "..."; // whatever you consider WS
str.erase(0, str.find_first_not_of(WS));
str.erase(str.find_last_not_of(WS) + 1);
should do it. If you want your WS to be determined by a
function, like 'isspace' or whatnot, you need to write a
functor and remove/erase those for which the functor is true
or vice versa, keep those for which the functor yields
false... A bit more convoluted.

Yes and no. To begin with, the problem isn't adequately
specified; I use UTF-8 a lot, and the problem is a lot more
complex if you have to deal with multibyte encodings. And if
you're doing much text manipulation, you'll want to have such
functional objects in your tool kit anyway, at least for single
byte encodings. My own trim function is just:

std::string
leftTrim(
std::string const& s,
SetOfCharacter const&
toRemove )
{
return std::string(
std::find_if(
s.begin(), s.end(),
std::not1( toRemove.contains() ) ),
s.end() ) ;
}

std::string
rightTrim(
std::string const& s,
SetOfCharacter const&
toRemove )
{
return std::string(
s.begin(),
std::find_if(
s.rbegin(), s.rend(), std::not1
( toRemove.contains() ) )
.base() ) ;
}

std::string
trim(
std::string const& s,
SetOfCharacter const&
toRemove )
{
return leftTrim( rightTrim( s, toRemove ), toRemove ) ;
}

My SetOfCharacter class has a function contains() (with no
parameters) which returns the appropriate functional object, and
I have pre-defined instances of SetOfCharacter (actually
CharacterClass---a derived class with some more complex
constructors) for each of the standard isxxx functions (and then
some).

(FWIW: the UTF-8 versions look very similar; I've got some
special iterators, and the UTF-8 version of SetOfCharacter knows
how to handle what dereferencing them returns. So I end up
with:

std::string
leftTrim(
std::string const& original,
BasicSetOfCharacter const&
toRemove )
{
return std::string(
std::find_if( begin( original ),
end( original ),
std::not1( toRemove.contains() ) ).begin(),
original.end() ) ;
}

std::string
rightTrim(
std::string const& original,
BasicSetOfCharacter const&
toRemove )
{
return std::string(
original.begin(),
std::find_if( rbegin( original ),
rend( original ),
std::not1( toRemove.contains() ) )
.end() ) ;
}

std::string
trim(
std::string const& original,
BasicSetOfCharacter const&
toRemove )
{
return leftTrim( rightTrim( original, toRemove ), toRemove ) ;
}

std::string
trim(
std::string const& original )
{
return trim( original, CharacterData::space() ) ;
}

..)
 
J

Jerry Coffin

(e-mail address removed)>, (e-mail address removed)
says...

[ ... code to normalize spaces in a string elided ]

There are two rather different possibilities here:
1) Remove only leading and trailing spaces.
2) Remove leading and trailing spaces, and convert all other runs of
white space to a single space.

The OP originally asked for the first, but what you've supplied is
really the second. If he wants the first, it can be made rather
simpler -- basically just one call to find_first_not_of, and one to
find_last_not_of:

std::string trim1(std::string const &input) {
size_t begin = input.find_first_not_of(" \t\v\n");

if (begin == std::string::npos)
return "";

size_t end = input.find_last_not_of(" \t\v\n");
return std::string(input, begin, end-begin+1);
}

If he really wants to remove leading and trailing spaces, and convert
all intervening runs of white space to a single space apiece, there's
a bit simpler way to do that as well:

std::string trim2(std::string const &input) {
std::istringstream in(input);
std::eek:stringstream ret;

std::copy(std::istream_iterator<std::string>(in),
std::istream_iterator<std::string>(),
std::eek:stream_iterator<std::string>(ret, " "));

return std::string(ret.str(), 0, ret.str().length()-1);
}

The former uses only the characters you specify as being whitespace
(I've used those specified for the "C" locale, at least as closely as
I recall them). The latter uses the current locale to decide what's
white space. You could, of course, imbue the istringstream with a
different locale if you wanted a different definition of white space.
The (likely) trade off is that while trim2 is pretty simple, you may
pay for that in lower speed.
 
F

Francesco

(e-mail address removed)>, (e-mail address removed)
says...

[ ... code to normalize spaces in a string elided ]

There are two rather different possibilities here:
1) Remove only leading and trailing spaces.
2) Remove leading and trailing spaces, and convert all other runs of
white space to a single space.

The OP originally asked for the first, but what you've supplied is
really the second.

I know, I mentioned the two cases in the second post of this thread.
As you could have read from the part you "elided":
Anyway, this is one of those many functions that Victor spoke about:
[...code...]
Just wrote it on the fly for exercise.

In other words, the purpose was not to spoon-feed the OP with exactly
what he needed, but to show an example. In this case, an example of
valid C++ which makes use of the std::string facilities yet handles
chars directly in a loop. The OP is perfectly able to understand my
snippet and modify it at his will, being a C and Perl programmer.

Best regards,
Francesco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top