tolower used by transform

Q

qazmlp

I was using the following code to convert the string to
lowercase.

string foo = "Some Mixed Case Text";
transform(foo.begin(), foo.end(), foo.begin(), tolower);

I thought the above code is portable.

But, the following page has a different view on this:
http://lists.debian.org/debian-gcc/2002/debian-gcc-200204/msg00092.html

Can anybody comment on it ?
Also, I would like to know whether tolower template or function will be used
in the above code.
transform(foo.begin(), foo.end(), foo.begin(), tolower);
 
S

Sergei Matusevich

string foo = "Some Mixed Case Text";
transform(foo.begin(), foo.end(), foo.begin(), tolower);

I thought the above code is portable.

But, the following page has a different view on this:
http://lists.debian.org/debian-gcc/2002/debian-gcc-200204/msg00092.html

Well that's a damn good question!

The problem is that most implementations of the standard C <ctype.h>
header define functions like toupper/tolower/etc as macros. To make it
work in STL algorithms, you have to include <cctype> header instead of
<ctype.h>. At least on my PC (Debian/gcc 3.3), <cctype> undefines all
tolower/etc macros and pulls ::tolower/::toupper/etc functions into
std namespace, so that your sample will work fine.

However, in general it is recommended to drop old C functions in favor
of new standard library functionality. In this particular case, you
may want use the ctype locale facet, i.e.

#include <locale>

// ..............

std::locale loc;
char s[] = "Test String";
std::use_facet< std::ctype<char> >( loc ).tolower( s, s + sizeof(s)
);

Too bad it does not work for std::string, i.e the following code will
not compile:

std::locale loc;
string s = "Test String";
std::use_facet< std::ctype<char> >( loc ).tolower( s.begin(),
s.end() );

because std::ctype::tolower() definition has only two variants:

char_type tolower(char_type __c) const;
const char_type* tolower(char_type* __lo, const char_type* __hi)
const;

This leads me to the following piece of code:

std::transform( s.begin(), s.end(), s.begin(),
std::bind1st( std::mem_fun( &std::ctype<char>::tolower ),
&std::use_facet< std::ctype<char> >( loc ) ) );

Nice, eh? :)

Now it's truly C++, but I am not sure if I want to use such thing
instead of good old tolower() from <cctype>. Can anyone suggest a
better solution?

PS. <locale> header also defines a standalone std::tolower() function
that takes locale as a second parameter, but I don't know if it can be
used in transform, because it is a template/inline function, i.e.
std::bind2nd and std::ptr_fun do not work with it.

PPS. It would be REALLY great to hear other opinions on this subject!

Thanks,
Sergei.
 
T

tom_usenet

Well that's a damn good question!

The problem is that most implementations of the standard C <ctype.h>
header define functions like toupper/tolower/etc as macros. To make it
work in STL algorithms, you have to include <cctype> header instead of
<ctype.h>. At least on my PC (Debian/gcc 3.3), <cctype> undefines all
tolower/etc macros and pulls ::tolower/::toupper/etc functions into
std namespace, so that your sample will work fine.

I'm not sure a conforming C++ implementation can have macro versions
of the ctype.h headers. Most versions I have seen have #ifdef __cpp__
or similar, using inline functions for the C++ version and macros for
the C one.
However, in general it is recommended to drop old C functions in favor
of new standard library functionality. In this particular case, you
may want use the ctype locale facet, i.e.

#include <locale>

// ..............

std::locale loc;
char s[] = "Test String";
std::use_facet< std::ctype<char> >( loc ).tolower( s, s + sizeof(s)
);

Too bad it does not work for std::string, i.e the following code will
not compile:

std::locale loc;
string s = "Test String";
std::use_facet< std::ctype<char> >( loc ).tolower( s.begin(),
s.end() );

because std::ctype::tolower() definition has only two variants:

char_type tolower(char_type __c) const;
const char_type* tolower(char_type* __lo, const char_type* __hi)
const;

This leads me to the following piece of code:

std::transform( s.begin(), s.end(), s.begin(),
std::bind1st( std::mem_fun( &std::ctype<char>::tolower ),
&std::use_facet< std::ctype<char> >( loc ) ) );

Nice, eh? :)

tolower is overloaded so you can't take its address as you are trying
above, since you don't say which overload you want. You'd need
something like:

static_cast<char(std::ctype<char>::*)(char) const>(
&std::ctype said:
Now it's truly C++, but I am not sure if I want to use such thing
instead of good old tolower() from <cctype>. Can anyone suggest a
better solution?

Converting a string to lower case can involve changing the length of
the string in some languages, and a general solution is going to be
quite complicated and involve complex heuristics. In english though,
in place modification is of course possible, and it is best to just
write a couple of functions that operate on strings. There are various
implementation possibilities, the simplest being an explicit loop.
PS. <locale> header also defines a standalone std::tolower() function
that takes locale as a second parameter, but I don't know if it can be
used in transform, because it is a template/inline function, i.e.
std::bind2nd and std::ptr_fun do not work with it.

Just because it is template and inline doesn't mean bind2nd won't work
(you just need to cast to choose the correct instantiation). However,
because it takes the locale argument by reference, it won't work since
bind2nd will attempt to form a reference to reference argument, which
is currently illegal.

Tom
 
S

Sergei Matusevich

Thank you Tom for a great posting!

[...]
std::transform( s.begin(), s.end(), s.begin(),
std::bind1st( std::mem_fun( &std::ctype<char>::tolower ),
&std::use_facet< std::ctype<char> >( loc ) ) );
tolower is overloaded so you can't take its address as you are trying
above, since you don't say which overload you want. You'd need
something like:

static_cast<char(std::ctype<char>::*)(char) const>(
&std::ctype<char>::tolower)

Not sure about other implementations of the standard library, but in
my gcc 3.3.1 tolower is non-virtual and it delegates all functionality
to the protected virtual do_tolower method. Therefore, taking address
of the tolower method is absolutely OK, I'm just not sure about its
portability [now after your posting :))].

[...]
Converting a string to lower case can involve changing the length of
the string in some languages, and a general solution is going to be
quite complicated and involve complex heuristics. In english though,
in place modification is of course possible, and it is best to just
write a couple of functions that operate on strings. There are various
implementation possibilities, the simplest being an explicit loop.

Good point! But then, as far as I understand, tolower from the
standard library is not a general solution because it's an "one char
in - one char out" implementation. From the other hand, I don't know
any locales that may require such a sophisticated case conversion
procedures.. :)

But the question remains open - what is the best (generic and
portable) way to do toupper/tolower conversion for an std::string (or
std::wstring or any std::basic_string incarnation) in C++? Is there
any? :))

Thank you,
Sergei.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top