What is the C++ idiom to strip leading and lagging white space from astring?

Ramon F Herrera · Sep 8, 2009

I have done blank stripping a million times, in C and Perl.

Please bear with me as I am trying to pick up the C++ "way of doing
things".

Thx,

-RFH

Francesco · Sep 8, 2009

I have done blank stripping a million times, in C and Perl.

Please bear with me as I am trying to pick up the C++ "way of doing
things".

Thx,

Just a clarification, what do you mean with "lagging"? Do you mean
"trailing"? Or you do mean multiple whitespace occurrences between
words, i.e "just like this" to become "just like this"?

STL streams have a manipulator that skips whitespace, make a search
for "skipws".

I could easily write a function for doing all the above, I believe you
can do the same, pretty like as you have done in C.

You could also use some string functions (take a look to
http://www.cppreference.com/wiki/string/find_first_not_of) to
implement such a trim_lead_double_trail_ws() function.

Best regards,
Francesco

Victor Bazarov · Sep 8, 2009

Ramon said:
I have done blank stripping a million times, in C and Perl.

Please bear with me as I am trying to pick up the C++ "way of doing
things".

Not sure about the idiom, but

char const WS[] = "..."; // whatever you consider WS
str.erase(0, str.find_first_not_of(WS));
str.erase(str.find_last_not_of(WS) + 1);

should do it. If you want your WS to be determined by a function, like
'isspace' or whatnot, you need to write a functor and remove/erase those
for which the functor is true or vice versa, keep those for which the
functor yields false... A bit more convoluted.

V

Lucius Sanctimonious · Sep 8, 2009

> Just a clarification, what do you mean with
> "lagging"? Do you mean "trailing"?

Exactly.

I realize that I could write a function, but the reason I am moving
from C to C++ is because there are so many of those functions and
classes already.

-Ramon

Victor Bazarov · Sep 8, 2009

Lucius said:
Exactly.

I realize that I could write a function, but the reason I am moving
from C to C++ is because there are so many of those functions and
classes already.

Just so you don't get disappointed on your way from C to C++, most of
the code C++ programmers write is *functions*. Just happens to be that
way. The solution I proposed isn't a single function call and it would
be better wrapped in a function (or even perhaps a function template
<gasp!>).

V

Francesco · Sep 8, 2009

Just so you don't get disappointed on your way from C to C++, most of
the code C++ programmers write is *functions*. Just happens to be that
way. The solution I proposed isn't a single function call and it would
be better wrapped in a function (or even perhaps a function template
<gasp!>).

Heck, I can see Victor's reply to Lucius' (???) reply to my reply to
Ramon's post, but I cannot see Lucius (???) reply! Damn, this NNTP
thing is driving me crazy =/

Anyway, this is one of those many functions that Victor spoke about:

-------
#include <iostream>
#include <string>
#include <sstream>

using namespace std;

string trim_ws(const string& s) {
size_t begin = s.find_first_not_of(' ');
if (begin == string::npos) {
return "";
}
size_t end = s.find_last_not_of(' ') + 1;
stringstream ss;
bool skipit = false;
for (size_t i = begin; i < end; ++i) {
char ch = s;
if (ch != ' ') {
ss << ch;
skipit = false;
} else if (!skipit) {
ss << ch;
skipit = true;
}
}
return ss.str();
}

int main()
{
string s = " just like this ";
cout << "[" << s << "]" << endl;
cout << "[" << trim_ws(s) << "]" << endl;
return 0;
}
-------

Just wrote it on the fly for exercise.
You could easily extend it with the "whatever character" idea
mentioned by Victor.

Cheers,
Francesco

Ramon F Herrera · Sep 8, 2009

> Heck, I can see Victor's reply to Lucius' (???)
> reply to my reply to Ramon's post, but I cannot see
> Lucius (???) reply! Damn, this NNTP
> thing is driving me crazy =/

It is my fault, I posted from a co-worker's account which should be
only used for work, so I removed it.

-Ramon

Ramon F Herrera · Sep 8, 2009

boost::trim

http://www.boost.org/doc/libs/1_37_0/boost/algorithm/string/trim.hpp

Markus

--http://www.markus-raab.org| Probleme kann man nie mit derselben
-o) | Denkweise lösen, durch die sie entstanden
Kernel 2.6.24-1-a /\ | sind. -- Albert Einstein
on a x86_64 _\_v |

Thanks, Markus! That is exactly what I had in mind when I posted the
question. I happen to be a recent boost convert. Downloaded and
installed it yesterday.

Just to make sure, is this the correct way to add the include?

#include <boost/algorithm/string/trim.hpp>

Regards,

-Ramon

Ramon F Herrera · Sep 8, 2009

> Just a clarification, what do you mean with "lagging"?
> Do you mean "trailing"?

Blame it on my Economics classes, peppered with "leading indicators"
and "lagging indicators". Those are used to predict recessions and
exit from them. Very pertinent stuff these days!

-Ramon

Francesco · Sep 8, 2009

> Heck, I can see Victor's reply to Lucius' (???)
> reply to my reply to Ramon's post, but I cannot see
> Lucius (???) reply! Damn, this NNTP
> thing is driving me crazy =/

It is my fault, I posted from a co-worker's account which should be
only used for work, so I removed it.

Fine, no problem. I also thought that Google Groups went mad at some
point =>

Francesco

Francesco · Sep 8, 2009

> Just a clarification, what do you mean with "lagging"?
> Do you mean "trailing"?

Blame it on my Economics classes, peppered with "leading indicators"
and "lagging indicators". Those are used to predict recessions and
exit from them. Very pertinent stuff these days!

Never ever touched such arguments, thanks for the explanation, chances
are I could face that term again and now I know what it could mean.

Cheers,
Francesco

Chris M. Thomasson · Sep 8, 2009

Ramon F Herrera said:
I have done blank stripping a million times, in C and Perl.

Please bear with me as I am trying to pick up the C++ "way of doing
things".

Perhaps something like this:
________________________________________________________________
#include <string>
#include <cctype>

#define xisspace(c) isspace((unsigned char)(c))

size_t
trim_find(std::string const& self,
size_t* pend)
{
char const* str = self.c_str();
char const* start = str;

while (*start && xisspace(*start))
{
start++;
}

if (*start)
{
char const* end = start;
char const* pos = start + 1;

while (*pos)
{
if (! xisspace(*pos))
{
end = pos;
}

++pos;
}

*pend = (end + 1) - start;

return start - str;
}

*pend = 0;

return 0;
}

std::string
trim(std::string const& self)
{
size_t end;
size_t start = trim_find(self, &end);

return self.substr(start, end);
}

#include <cstdio>

int
main()
{
std::string str(" xx x x xxx ");

std:

rintf("non-trimmed: ->|%s|<-\n", str.c_str());
std:

rintf("trimmed: ->|%s|<-\n", trim(str).c_str());

return 0;
}
________________________________________________________________

lol. ;^)

Chris M. Thomasson · Sep 8, 2009

Chris M. Thomasson said:
Perhaps something like this:

[...]

Since `std::string' can give us the length of the stored string, we can
re-write the `trim_find()' function to take advantage of that fact:
____________________________________________________________
size_t
trim_find(std::string const& self,
size_t* pend)
{
char const* str = self.c_str();
char const* start = str;

while (*start && xisspace(*start))
{
start++;
}

if (*start)
{
char const* end = str + (self.length() - 1);

while (end != str && xisspace(*end))
{
--end;
}

*pend = (end + 1) - start;

return start - str;
}

*pend = 0;

return 0;
}
____________________________________________________________

This has MUCH better scalability characteristics since it does not need to
scan the entire string in order to trim white spaces. You can have a string
that is 50,000 characters in length, but if it has say, 100 leading white
spaces, and 50 trailing white spaces, the scan is only going to observe 150
characters, NOT all 50,000!

;^)

James Kanze · Sep 8, 2009

Not sure about the idiom, but

char const WS[] = "..."; // whatever you consider WS
str.erase(0, str.find_first_not_of(WS));
str.erase(str.find_last_not_of(WS) + 1);

should do it. If you want your WS to be determined by a
function, like 'isspace' or whatnot, you need to write a
functor and remove/erase those for which the functor is true
or vice versa, keep those for which the functor yields
false... A bit more convoluted.

Yes and no. To begin with, the problem isn't adequately
specified; I use UTF-8 a lot, and the problem is a lot more
complex if you have to deal with multibyte encodings. And if
you're doing much text manipulation, you'll want to have such
functional objects in your tool kit anyway, at least for single
byte encodings. My own trim function is just:

std::string
leftTrim(
std::string const& s,
SetOfCharacter const&
toRemove )
{
return std::string(
std::find_if(
s.begin(), s.end(),
std::not1( toRemove.contains() ) ),
s.end() ) ;
}

std::string
rightTrim(
std::string const& s,
SetOfCharacter const&
toRemove )
{
return std::string(
s.begin(),
std::find_if(
s.rbegin(), s.rend(), std::not1
( toRemove.contains() ) )
.base() ) ;
}

std::string
trim(
std::string const& s,
SetOfCharacter const&
toRemove )
{
return leftTrim( rightTrim( s, toRemove ), toRemove ) ;
}

My SetOfCharacter class has a function contains() (with no
parameters) which returns the appropriate functional object, and
I have pre-defined instances of SetOfCharacter (actually
CharacterClass---a derived class with some more complex
constructors) for each of the standard isxxx functions (and then
some).

(FWIW: the UTF-8 versions look very similar; I've got some
special iterators, and the UTF-8 version of SetOfCharacter knows
how to handle what dereferencing them returns. So I end up
with:

std::string
leftTrim(
std::string const& original,
BasicSetOfCharacter const&
toRemove )
{
return std::string(
std::find_if( begin( original ),
end( original ),
std::not1( toRemove.contains() ) ).begin(),
original.end() ) ;
}

std::string
rightTrim(
std::string const& original,
BasicSetOfCharacter const&
toRemove )
{
return std::string(
original.begin(),
std::find_if( rbegin( original ),
rend( original ),
std::not1( toRemove.contains() ) )
.end() ) ;
}

std::string
trim(
std::string const& original,
BasicSetOfCharacter const&
toRemove )
{
return leftTrim( rightTrim( original, toRemove ), toRemove ) ;
}

std::string
trim(
std::string const& original )
{
return trim( original, CharacterData::space() ) ;
}

..)

Jerry Coffin · Sep 9, 2009

(e-mail address removed)>, (e-mail address removed)
says...

[ ... code to normalize spaces in a string elided ]

There are two rather different possibilities here:
1) Remove only leading and trailing spaces.
2) Remove leading and trailing spaces, and convert all other runs of
white space to a single space.

The OP originally asked for the first, but what you've supplied is
really the second. If he wants the first, it can be made rather
simpler -- basically just one call to find_first_not_of, and one to
find_last_not_of:

std::string trim1(std::string const &input) {
size_t begin = input.find_first_not_of(" \t\v\n");

if (begin == std::string::npos)
return "";

size_t end = input.find_last_not_of(" \t\v\n");
return std::string(input, begin, end-begin+1);
}

If he really wants to remove leading and trailing spaces, and convert
all intervening runs of white space to a single space apiece, there's
a bit simpler way to do that as well:

std::string trim2(std::string const &input) {
std::istringstream in(input);
std:

stringstream ret;

std::copy(std::istream_iterator<std::string>(in),
std::istream_iterator<std::string>(),
std:

stream_iterator<std::string>(ret, " "));

return std::string(ret.str(), 0, ret.str().length()-1);
}

The former uses only the characters you specify as being whitespace
(I've used those specified for the "C" locale, at least as closely as
I recall them). The latter uses the current locale to decide what's
white space. You could, of course, imbue the istringstream with a
different locale if you wanted a different definition of white space.
The (likely) trade off is that while trim2 is pretty simple, you may
pay for that in lower speed.

Francesco · Sep 9, 2009

(e-mail address removed)>, (e-mail address removed)
says...

[ ... code to normalize spaces in a string elided ]

There are two rather different possibilities here:
1) Remove only leading and trailing spaces.
2) Remove leading and trailing spaces, and convert all other runs of
white space to a single space.

The OP originally asked for the first, but what you've supplied is
really the second.

I know, I mentioned the two cases in the second post of this thread.
As you could have read from the part you "elided":

Anyway, this is one of those many functions that Victor spoke about:
[...code...]
Just wrote it on the fly for exercise.

Click to expand...

In other words, the purpose was not to spoon-feed the OP with exactly
what he needed, but to show an example. In this case, an example of
valid C++ which makes use of the std::string facilities yet handles
chars directly in a loop. The OP is perfectly able to understand my
snippet and modify it at his will, being a C and Perl programmer.

Best regards,
Francesco

Boomer trying to learn coding in C and C++	6	Dec 16, 2022
FAQ 4.32 How do I strip blank space from the beginning/end of a string?	0	Feb 25, 2011
How to get education and coding job coming from abroad starting new in the US? Advice of courses or places to look?	2	May 18, 2023
FAQ 6.11 How do I use a regular expression to strip C style comments from a file?	0	Feb 10, 2011
NVI idiom and patterns such as Decorator and Proxy	9	Jun 11, 2007
What is the best way to freeze a Python 3 app (Windows)?	1	Apr 3, 2012
if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just 'C'	8	Nov 12, 2010
How to send email programmatically from a gmail email a/c when port587(smtp) is blocked	5	Sep 11, 2012

What is the C++ idiom to strip leading and lagging white space from astring?

Ramon F Herrera

Francesco

Victor Bazarov

Lucius Sanctimonious

Victor Bazarov

Francesco

Ramon F Herrera

Ramon F Herrera

Ramon F Herrera

Francesco

Francesco

Chris M. Thomasson

Chris M. Thomasson

James Kanze

Jerry Coffin

Francesco

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads