C++ - how to convert string to uppercase/lowercase

M

Michal

Hallo
I looked through ANSI/ISO C++ standard string, and I did not find any
function from string class that would do so. Did I overlooked
something or it is so?

regards,
Michal
 
S

sean_in_raleigh

Hallo
I looked through ANSI/ISO C++ standard string, and I did not find any
function from string class that would do so. Did I overlooked
something or it is so?

Here's the STL-ish way.

//////
#include <cctype>
#include <algorithm>
#include <string>

int my_toupper(int c)
{
return toupper(c);
}

int
main(int argc, char **argv)
{
using namespace std;
string s = "hello world";
transform(s.begin(), s.end(), s.begin(), my_toupper);
}
////

You have to create my_toupper (or use a cast) due to a
C++ wart.

Sean
 
K

Kai-Uwe Bux

Michal said:
Hallo
I looked through ANSI/ISO C++ standard string, and I did not find any
function from string class that would do so. Did I overlooked
something or it is so?

Maybe something like:

#include <locale>
#include <string>

template < typename Char, typename Traits >
std::basic_string< Char, Traits > &
to_lower ( std::basic_string< Char, Traits > & str,
std::locale loc = std::locale() ) {
typedef std::basic_string< Char, Traits > string;
typedef std::ctype< Char > char_type;
char_type const * the_type_ptr = &std::use_facet< char_type >( loc );
for ( typename string::size_type i = 0; i < str.size(); ++i ) {
str = the_type_ptr->tolower( str );
}
return ( str );
}

#include <iostream>
#include <ostream>

int main ( void ) {
std::string msg ( "Hello World!" );
to_lower( msg );
std::cout << msg << '\n';
}


Best

Kai-Uwe Bux
 
N

Noah Roberts

You have to create my_toupper (or use a cast) due to a
C++ wart.

What wart?


#include <cctype>
#include <algorithm>
#include <string>
#include <iostream>

int main()
{
std::string x = "hello world";
transform(x.begin(), x.end(), x.begin(), toupper);
std::cout << x << std::endl;
}


Works fine in VS and G++.
 
S

sean_in_raleigh

You have to create my_toupper (or use a cast) due to a
C++ wart.

What wart?
[...]
Works fine in VS and G++.

Well, you changed the program I posted, so
presumably you know the one I mean!

That said, you might not consider the
fact that importing std:: into your
main() causes it to break to be a wart,
but I do.

Sean
 
A

alfps

Michal said:
Hallo
I looked through ANSI/ISO C++ standard string, and I did not find any
function from string class that would do so. Did I overlooked
something or it is so?

Maybe something like:

#include <locale>
#include <string>

template < typename Char, typename Traits >
std::basic_string< Char, Traits > &
to_lower ( std::basic_string< Char, Traits > & str,
           std::locale loc = std::locale() ) {
  typedef std::basic_string< Char, Traits > string;
  typedef std::ctype< Char > char_type;
  char_type const *  the_type_ptr = &std::use_facet< char_type >( loc );
  for ( typename string::size_type i = 0; i < str.size(); ++i ) {
    str = the_type_ptr->tolower( str );
  }
  return ( str );

}


This is indeed the *intended* way for C++ level conversion.

Unfortunately MinGW g++ 3.4.5 for Windows lacks locale support, hence
the above won't uppercase e.g. Norwegian characters.

The practical solution is to use the C library's toupper function
(MinGW g++ uses Microsoft's runtime library which does this
correctly).

Others have posted such code that won't work in general.

It's important to remember to add a setlocale( "", LC_ALL ) (or
perhaps the arguments are in the opposite order, check).

#include <iostream>
#include <ostream>

int main ( void )

This void is a C-ism, best a-voided in C++.

{
  std::string msg ( "Hello World!" );
  to_lower( msg );
  std::cout << msg << '\n';

}


Cheers & hth.,

- Alf
 
J

Juha Nieminen

Noah said:
transform(x.begin(), x.end(), x.begin(), toupper);

Does the standard guarantee that toupper() will always be a function
and never a preprocessor macro?
 
K

Kai-Uwe Bux

alfps said:
Michal said:
Hallo
I looked through ANSI/ISO C++ standard string, and I did not find any
function from string class that would do so. Did I overlooked
something or it is so?

Maybe something like:

#include <locale>
#include <string>

template < typename Char, typename Traits >
std::basic_string< Char, Traits > &
to_lower ( std::basic_string< Char, Traits > & str,
std::locale loc = std::locale() ) {
typedef std::basic_string< Char, Traits > string;
typedef std::ctype< Char > char_type;
char_type const *  the_type_ptr = &std::use_facet< char_type >( loc );
for ( typename string::size_type i = 0; i < str.size(); ++i ) {
str = the_type_ptr->tolower( str );
}
return ( str );

}


This is indeed the *intended* way for C++ level conversion.

Unfortunately MinGW g++ 3.4.5 for Windows lacks locale support, hence
the above won't uppercase e.g. Norwegian characters.

The practical solution is to use the C library's toupper function
(MinGW g++ uses Microsoft's runtime library which does this
correctly).


Doesn't that have other problems (such as not distinguishing Norwegian from
English in programs that deal with input from both languages)?
Others have posted such code that won't work in general.

If code doesn't work "in general" because implementations are deficient, you
are screwed anyway.
It's important to remember to add a setlocale( "", LC_ALL ) (or
perhaps the arguments are in the opposite order, check).

That's interesting. Where would one need to add that?
This void is a C-ism, best a-voided in C++.

Incorrect: there is no technical reason to avoid "void". I never learned C,
but I happen to like this keyword here to distinguish definitions from call
in a grep-able way. In any case, it's a habit and I am unlikely to change
since it never caused any problems.


Best

Kai-Uwe Bux
 
J

James Kanze

What wart?

To begin with, the fact (inherited from C) that you can't call
toupper with a char without encurring undefined behavior.
#include <cctype>
#include <algorithm>
#include <string>
#include <iostream>
int main()
{
std::string x = "hello world";
transform(x.begin(), x.end(), x.begin(), toupper);
std::cout << x << std::endl;
}
Works fine in VS and G++.

Like his original code, it has undefined behavior. Unlike his
original code, there's also a very good chance that it won't
compile. There are two problems: the use of toupper is
ambiguous; this is easily resolved by including <ctype.h>
instead of <cctype>, and specifying ::toupper. The second
problem is more subtle: ::toupper takes an int as argument, not
a char, and it has a pre-condition that the value is in the
range 0...UCHAR_MAX (or EOF). If plain char is signed (as is
all too often the case), calling it with a plain char results in
undefined behavior. (There are other problems with this
solution, e.g. ::toupper uses mutable global state, which may
cause problems in a multithreaded environment. But they don't
apply here.)

And of course, there's the more general problem that transform
can't handle case transformations, because there's not a one to
one mapping of lower case to upper case. But that issue affects
any code which attempts to use any of the toupper functions in
the standard library.
 
J

James Kanze

Does the standard guarantee that toupper() will always be a
function and never a preprocessor macro?

It guarantees that all of the toupper functions will always be
functions, and not macros. The problem is that there are
several toupper functions in the standard library, including a
template, and when using it as an argument to a function
template, type deduction doesn't work. (This is one case where
using <ctype.h> and ::toupper would be preferable.) The problem
is that the ::toupper in <ctype.h> causes undefined behavior if
invoked with a char as argument if plain char is signed (as it
often is). The problem is that all of the toupper functions in
the standard library assume a one to one mapping of lower to
upper, which simply isn't true; the results of toupper( "aß" )
should be "ASS" (and toupper( "Maße" ) should be "MASZE", to
avoid confusion with "Masse", according to Duden, so you really
need a very, very intelligent function).
 
J

James Kanze

What does using std::ptr_fun acheve? toupper is still
ambiguous, and there's no way for template type deduction to
work.
The non-wart in question is also easily avoided by qualifying
the name as ::toupper.

Only if you include <ctype.h>, rather than <cctype>. (According
to the current standard, if you include <cctype>, and the
compiler finds a ::toupper, the implementation isn't conform.
Most aren't in this regard, however.)

And of course, none of this addresses the fact that you can't
call ::toupper with a char without invoking undefined behavior.
The minimum STL solution would use something like:

struct ToUpper
{
char operator()( char ch ) const
{
return toupper( static_cast< unsigned char >( ch ) ) ;
}
} ;

A better solution would use a functional object which contained
a reference to an std::ctype< char >, and called toupper on it.
And while arguably, something like this really belongs in the
standard library, in practice, it doesn't really work either.
Especially if you're using Unicode (UTF-8):).
 
J

James Kanze

Michal wrote:
Maybe something like:
#include <locale>
#include <string>
template < typename Char, typename Traits >
std::basic_string< Char, Traits > &
to_lower ( std::basic_string< Char, Traits > & str,
std::locale loc = std::locale() ) {
typedef std::basic_string< Char, Traits > string;
typedef std::ctype< Char > char_type;
char_type const * the_type_ptr = &std::use_facet< char_type >( loc );
for ( typename string::size_type i = 0; i < str.size(); ++i ) {
str = the_type_ptr->tolower( str );
}
return ( str );
}


Why not?

template< typename Char, typename Traits >
std::basic_string< Char, Traits >
toLower(
std::basic_string< Char, Traits > const&
in,
std::locale loc = std::locale() )
{
std::basic_string< Char, Traits >
result( in ) ;
std::use_facet< std::ctype< Char > >( loc )
.toupper( &result[ 0 ], &result[ 0 ] + result.size() ) ;
return result ;
}

Technically, this isn't guaranteed in the current standard, but
it works with all known implementations, and will be guaranteed
in the next.

Of course, it still doesn't work for something like "er isst und
aß in hohem Maße", which should return "ER ISST UND ASS IN HOHEM
MASZE" (and the tolower equivalent won't get you back to the
original, either).
 
T

Triple-DES

To begin with, the fact (inherited from C) that you can't call
toupper with a char without encurring undefined behavior.

I don't think that's strictly correct. As long as the argument is an
int whose value can be represented as an unsigned char, (a value
between 0 and CHAR_MAX) it should be well-defined.
Like his original code, it has undefined behavior.  Unlike his
original code, there's also a very good chance that it won't
compile.  There are two problems: the use of toupper is
ambiguous; this is easily resolved by including <ctype.h>
instead of <cctype>, and specifying ::toupper.  The second
problem is more subtle: ::toupper takes an int as argument, not
a char, and it has a pre-condition that the value is in the
range 0...UCHAR_MAX (or EOF).  If plain char is signed (as is
all too often the case), calling it with a plain char results in
undefined behavior.  (There are other problems with this
solution, e.g. ::toupper uses mutable global state, which may
cause problems in a multithreaded environment.  But they don't
apply here.)

For the reason above, I think this is well-defined unless any of the
characters in "hello world" has a negative numerical value. Which
makes the example "possibly UB", not "guaranteed UB".
 
J

James Kanze

[...]
I've find locale support to still be a major portability issue.
Doesn't that have other problems (such as not distinguishing
Norwegian from English in programs that deal with input from
both languages)?

You can change the global C locale as often as you want. Of
course, it's global state, which does introduce some issues in
the case of a multi-threaded application.
If code doesn't work "in general" because implementations are
deficient, you are screwed anyway.

No, the code doesn't work in general because it contains
undefined behavior (at least if Alf is talking about what I
think he's talking about). You can't include <cctype>, and
expect to be able to pass toupper (or std::toupper) to
std::transform, even if it might seem to work in simple cases.
(Any standard header may include <locale>, and once <locale> is
included, type deduction for std::transform fails.) And of
That's interesting. Where would one need to add that?

Everywhere the "C" locale isn't appropriate. A call to
setlocale is the first line in main in most C programs; in C++,
you'd probably want to replace it with:
std::locale::global( std::locale() ) ;
Except that as Alf says, you can't really count on std::locale
working the way it should.
Incorrect: there is no technical reason to avoid "void".

Still, it's a C-ism:). There isn't a technical reason, but
there is a stylistic one: it communicates the wrong message.
(It says that you're a C hacker who doesn't really understand
C++. I know that this isn't the case in your case, but that's
the way most people will read it.)
I never learned C, but I happen to like this keyword here to
distinguish definitions from call in a grep-able way.

That's easy:
int
main()
and grep ^main. That's been the rule in every shop I've worked
in, since my earliest days in C. The name of the function in a
function definition is always in column 1; the name of a
function is never in column 1 otherwise (generally as a result
of other formatting rules).
In any case, it's a habit and I am unlikely to change since it
never caused any problems.

What if your employer imposes coding guidelines that ban it:)?
Or your collegues start treating you badly because they don't
like it:)?
 
R

Rolf Magnus

Christian said:
James Kanze ha scritto:


Actually, it's even more complicated. See http://faql.de/eszett.html

Personally, I'd never write "MASZE", I would always write "MASSE" and
consider that the correct form. On the other hand, in Switzerland
"MASSE" would be converted back to "Masse" even if it would be converted
to "Maße" in Germany and Austria.

The problem is that MASSE could be any of those two. It depends on the
context.
Having an intelligent function is not enough, it must also be able to
read your mind and know where you come from :)

Well, the latter should be possible with locales. There are usually different
locales for German (Swizerland) and German (Germany). So if you selected the
right one, the computer knows where you come from.
 
R

Rolf Magnus

Triple-DES said:
I don't think that's strictly correct.
As long as the argument is an int whose value can be represented as an
unsigned char, (a value between 0 and CHAR_MAX) it should be well-defined.

Yes. As long as you _ensure_ that the function will never be called with a
character outside that range, the behaviour is well-defined.
For the reason above, I think this is well-defined unless any of the
characters in "hello world" has a negative numerical value. Which
makes the example "possibly UB", not "guaranteed UB".

I'm not sure if it's useful at all to talk about "guaranteed UB",
considering that UB means that nothing is guaranteed.
 
J

Joe Smith

Why not?

template< typename Char, typename Traits >
std::basic_string< Char, Traits >
toLower(
std::basic_string< Char, Traits > const&
in,
std::locale loc = std::locale() )
{
std::basic_string< Char, Traits >
result( in ) ;
std::use_facet< std::ctype< Char > >( loc )
.toupper( &result[ 0 ], &result[ 0 ] + result.size() ) ;
return result ;
}

Why not use the one-liner (assuming t is an std::string)

std::transform(t.begin(),t.end(),t.begin(),std::bind1st(std::mem_fun(&std::ctype<char>::toupper),&std::use_facet<std::ctype<
char > >(std::locale())));

That will work as well as any of the others.
Assuming taking the address of the the return value of std::use_facet,
is not UB. Am I not taking the address of an r-value in that line?

Comeau C++ does not complain about it, in any case.

If that is good, then the following form will aslo work nicely:

template< typename charT, typename Traits >
std::basic_string< charT, Traits >
toLower(
std::basic_string< charT, Traits > const&
in,
std::locale loc = std::locale() )
{
std::basic_string<charT, Traits> result(in);
std::transform(in.begin(),in.end(),result.begin(),
std::bind1st(std::mem_fun(&std::ctype<charT>::tolower),
&std::use_facet<std::ctype<charT> >(loc)));
return result;
}
 
N

Noah Roberts

James said:
What if your employer imposes coding guidelines that ban it:)?

Code the way they want until I can find a different employer. Such a
shop is way too pedantic to be a comfortable working environment.
Or your collegues start treating you badly because they don't
like it:)?

Tell them to get lives.

Frankly, I don't concern myself about the opinions of losers that treat
others badly over trivial matters of opinion.
 
J

James Kanze

James Kanze ha scritto:
Actually, it's even more complicated. Seehttp://faql.de/eszett.html

I know.
Personally, I'd never write "MASZE", I would always write
"MASSE" and consider that the correct form.

That's pretty much what I'd do as well, as it corresponds to
what I've seen. But according to Duden...
On the other hand, in Switzerland "MASSE" would be converted
back to "Masse" even if it would be converted to "Maße" in
Germany and Austria.

The problem is that whether "MASSE" becomes "Maße" or "Masse"
depends on the meaning of the word in the sentence (outside of
Switzerland, where it would always be "Masse"). (Of course, in
Switzerland, "ä" becomes "AE".)
Having an intelligent function is not enough, it must also be
able to read your mind and know where you come from :)

Locales are meant to take care of the "knowing where you come
from". Knowing whether the "MASSE" signifies measurements or
mass, however, requires a bit more intelligence than is present
in most programs. (Most programs probably wouldn't even be able
to handle ASS->aß vs. ISST->isst.)
 
J

James Kanze

Code the way they want until I can find a different employer.
Such a shop is way too pedantic to be a comfortable working
environment.

I guess the real question, then, is whether you want to work on
projects that succeed or not. All of the shops I've seen where
projects regularly succeeded do impose fairly strict coding
guidelines. And all of those I've seen that didn't typically
had most projects failing.

Personally (and it really is a question of personality and
taste), I get my greatest satisfaction in seeing my code used in
a successful, working project. Which means that I tend to
prefer shops with strict coding guidelines.
Tell them to get lives.
Frankly, I don't concern myself about the opinions of losers
that treat others badly over trivial matters of opinion.

So you don't care about the quality of your code, or whether
the project you're working on actually produces something that
works. To each his own.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top