Manipulation of strings: upper/lower case

P

Pierre

Hello!

I've been looking for a portable means of changing the case of a
string but i've found nothing so far. Does it exists? I guess (and
hope) it does..

Thanks
Pierre
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello!

I've been looking for a portable means of changing the case of a
string but i've found nothing so far.

Such can be easily built from the existing standard C functions
Does it exists? I guess (and hope) it does..

If you can't find one, try this...

#include <ctype.h>

void UppercaseString(char *string)
{
for(;*string;++string)
if (islower(*string)) *string = toupper(*string);
}

void LowercaseString(char *string)
{
for(;*string;++string)
if (isupper(*string)) *string = tolower(*string);
}



- --
Lew Pitcher

Master Codewright and JOAT-in-training
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB6T19agVFX4UWr64RAmrqAJ4gTLptYf+LpCT67ruc88tAQoPmyACcCKQT
lBuQV/LkjuvpFyBzPs+qdhY=
=Mz/Z
-----END PGP SIGNATURE-----
 
I

infobahn

#include <ctype.h>

void UppercaseString(char *string)
{
for(;*string;++string)
if (islower(*string)) *string = toupper(*string);
}

Caution is necessary here. The behaviours of islower and toupper
are undefined if they are passed a value that is neither EOF nor
representable as an unsigned char. It is good practice, therefore,
to cast *string to unsigned char. (No need to cast it back to
int afterwards, since the normal promotion rules handle that.)

The islower() call smacks of premature optimisation. :)

<snip>
 
C

CBFalconer

Pierre said:
I've been looking for a portable means of changing the case of a
string but i've found nothing so far. Does it exists? I guess (and
hope) it does..

Unusual to want to simply change the case, but try something like:

#include <ctype.h>

void flipcase(char *s)
{
unsigned char ch;

if (s) /* assuming you want to protect against NULL */
while (ch = *s) {
if (isupper(ch) *s = tolower(ch);
else if (islower(ch) *s = toupper(ch);
s++;
}
} /* flipcase, untested */

which allows for the fact that some chars do not have an upper or
lower case to be flipped.
 
J

Joe Wright

infobahn said:
Lew Pitcher wrote:





Caution is necessary here. The behaviours of islower and toupper
are undefined if they are passed a value that is neither EOF nor
representable as an unsigned char. It is good practice, therefore,
to cast *string to unsigned char. (No need to cast it back to
int afterwards, since the normal promotion rules handle that.)

The islower() call smacks of premature optimisation. :)

<snip>

The islower() call is unnecessary.

char *upper(char *st) {
char *s = st;
while ((*s = toupper(*s))) ++s;
return st;
}

There is no need to cast the argument to toupper() to unsigned char.
We assume that st points to a valid string. All characters of such a
string are within the range 0..CHAR_MAX by definition. CHAR_MAX is
within UCHAR_MAX by definition.

If st points to something not a valid string, and toupper() is
presented with something out of range, (-20 for example) it may
SEGFAULT. And why not? It might tell you where your error is.
 
M

Mathew Hendry

The islower() call is unnecessary.

char *upper(char *st) {
char *s = st;
while ((*s = toupper(*s))) ++s;
return st;
}

There is no need to cast the argument to toupper() to unsigned char.
We assume that st points to a valid string. All characters of such a
string are within the range 0..CHAR_MAX by definition.

Only if char happens to be unsigned, surely?

-- Mat.
 
C

Chris Torek

The islower() call is unnecessary.
Indeed.

char *upper(char *st) {
char *s = st;
while ((*s = toupper(*s))) ++s;
return st;
}

There is no need to cast the argument to toupper() to unsigned char.
We assume that st points to a valid string.

And someone whose name is "Pól" has a name that is an "invalid
string"? :)
All characters of such a string are within the range 0..CHAR_MAX
by definition. CHAR_MAX is within UCHAR_MAX by definition.

If you use ISO-Latin-1, and have signed characters -- and both of
these are quite commonly true today -- you *will* have characters
whose value is outside the [0..CHAR_MAX] range. For instance, the
o-with-accent-acute above is 0xf3 or -13.
If st points to something not a valid string, and toupper() is
presented with something out of range, (-20 for example) it may
SEGFAULT. And why not? It might tell you where your error is.

Or it may change the guy's name from Pól (the Celtic form of
the name "Paul") to PzL, which might just annoy him. If he happens
to have a large sword, this could be a bad strategy. :)
 
E

Eric Sosman

Joe said:
[...]
Lew Pitcher wrote:

Caution is necessary here. The behaviours of islower and toupper
are undefined if they are passed a value that is neither EOF nor
representable as an unsigned char. It is good practice, therefore,
to cast *string to unsigned char. (No need to cast it back to
int afterwards, since the normal promotion rules handle that.)
[...]

There is no need to cast the argument to toupper() to unsigned char.

Didn't we just do this a week or so ago? Perhaps it's
a candidate for the FAQ; it seems at any rate to be FA.
We
assume that st points to a valid string. All characters of such a string
are within the range 0..CHAR_MAX by definition.

No, they are in the range CHAR_MIN through CHAR_MAX.
Since `char' may be a signed type (it's the implementation's
choice), CHAR_MIN can be negative. It's true that all the
characters mandated by the Standard are required to be non-
negative, but the Standard allows the implementation to define
additional characters, too -- and some of these may have
negative codes.
CHAR_MAX is within
UCHAR_MAX by definition.

True, but CHAR_MIN can be negative, hence outside the
range of `unsigned char'.
If st points to something not a valid string, and toupper() is presented
with something out of range, (-20 for example) it may SEGFAULT. And why
not? It might tell you where your error is.

Except that the "error" isn't the presence of a -20 in
the string (in one widely-used scheme, -20 is "Latin small
i with grave accent"). The real error is the failure to
use the cast that Lew recommends.
 
J

Jack Klein

Unusual to want to simply change the case, but try something like:

#include <ctype.h>

void flipcase(char *s)
{
unsigned char ch;

if (s) /* assuming you want to protect against NULL */
while (ch = *s) {
if (isupper(ch) *s = tolower(ch);

Completely unnecessary conditional test.
else if (islower(ch) *s = toupper(ch);

Completely unnecessary conditional test.
s++;
}
} /* flipcase, untested */

which allows for the fact that some chars do not have an upper or
lower case to be flipped.

(sigh)

7.4.2.1 The tolower function
Synopsis
1 #include <ctype.h>
int tolower(int c);
Description
2 The tolower function converts an uppercase letter to a corresponding
lowercase letter.
Returns
3 If the argument is a character for which isupper is true and there
are one or more corresponding characters, as specified by the current
locale, for which islower is true, the tolower function returns one of
the corresponding characters (always the same one for any given
locale); otherwise, the argument is returned unchanged.

7.4.2.2 The toupper function
Synopsis
1 #include <ctype.h>
int toupper(int c);
Description
2 The toupper function converts a lowercase letter to a corresponding
uppercase letter.
Returns
3 If the argument is a character for which islower is true and there
are one or more corresponding characters, as specified by the current
locale, for which isupper is true, the toupper function returns one of
the corresponding characters (always the same one for any given
locale); otherwise, the argument is returned unchanged.

So the tests are totally unnecessary.

But suppose:

char test [] = "Hello" "\xf0" "World";

....then your function causes undefined behavior on an implementation
with CHAR_BIT 8 and signed char, because you will pass an invalid
value to tolower() or toupper().
 
J

Joe Wright

Eric said:
Joe said:
[...]
Lew Pitcher wrote:

Caution is necessary here. The behaviours of islower and toupper
are undefined if they are passed a value that is neither EOF nor
representable as an unsigned char. It is good practice, therefore,
to cast *string to unsigned char. (No need to cast it back to
int afterwards, since the normal promotion rules handle that.)
[...]


There is no need to cast the argument to toupper() to unsigned char.


Didn't we just do this a week or so ago? Perhaps it's
a candidate for the FAQ; it seems at any rate to be FA.
Yes we did. It remains to be seen whether I can learn enough from
one beating to avoid the next one. :)
No, they are in the range CHAR_MIN through CHAR_MAX.
Since `char' may be a signed type (it's the implementation's
choice), CHAR_MIN can be negative. It's true that all the
characters mandated by the Standard are required to be non-
negative, but the Standard allows the implementation to define
additional characters, too -- and some of these may have
negative codes.
Yes, and I truly missed that until just now. Thank you.
True, but CHAR_MIN can be negative, hence outside the
range of `unsigned char'.
Yes, but I never mentioned CHAR_MIN.
Except that the "error" isn't the presence of a -20 in
the string (in one widely-used scheme, -20 is "Latin small
i with grave accent"). The real error is the failure to
use the cast that Lew recommends.
It didn't occur to me that the value of é (130) was negative as a
signed char (10000010) and when promoted to int would be -126.

I apologize to you and the group for my noise. I'll get it right
next time, I promise. :=)
 
M

Mysidia

char test [] = "Hello" "\xf0" "World";
...then your function causes undefined behavior on an implementation
with CHAR_BIT 8 and signed char, because you will pass an invalid
value to tolower() or toupper().


But checking islower() or isupper() does not protect from this,
because islower() and isupper() have the same fundamental requirement..
From ISO/IEC 9899:1999 (E) :
"The header <ctype.h> declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the argument has any other value,
the behavior is undefined."
isupper(0xf0) is just as undefined as toupper(0xf0) is.
 
J

Joe Wright

Chris said:
Joe Wright said:
The islower() call is unnecessary.

Indeed.


char *upper(char *st) {
char *s = st;
while ((*s = toupper(*s))) ++s;
return st;
}

There is no need to cast the argument to toupper() to unsigned char.
We assume that st points to a valid string.


And someone whose name is "Pól" has a name that is an "invalid
string"? :)

All characters of such a string are within the range 0..CHAR_MAX
by definition. CHAR_MAX is within UCHAR_MAX by definition.


If you use ISO-Latin-1, and have signed characters -- and both of
these are quite commonly true today -- you *will* have characters
whose value is outside the [0..CHAR_MAX] range. For instance, the
o-with-accent-acute above is 0xf3 or -13.
It looks something like ó (162) at my house. 10100010 is -94 but
your point is taken. I didn't consider negative char as valid.
Or it may change the guy's name from Pól (the Celtic form of
the name "Paul") to PzL, which might just annoy him. If he happens
to have a large sword, this could be a bad strategy. :)

I'll try to stay away from that sword. I'm sorry to have muddied the
water. I'll get it wright next time, I promise. :)
 
I

infobahn

Eric said:
Except that the "error" isn't the presence of a -20 in
the string (in one widely-used scheme, -20 is "Latin small
i with grave accent"). The real error is the failure to
use the cast that Lew recommends.

Ahem. That /Lew/ recommends? Am I invisible all of a sudden?
 
C

CBFalconer

Jack said:
CBFalconer said:
Unusual to want to simply change the case, but try something like:

#include <ctype.h>

void flipcase(char *s)
{
unsigned char ch;

if (s) /* assuming you want to protect against NULL */
while (ch = *s) {
if (isupper(ch) *s = tolower(ch);

Completely unnecessary conditional test.
else if (islower(ch) *s = toupper(ch);

Completely unnecessary conditional test.
s++;
}
} /* flipcase, untested */

which allows for the fact that some chars do not have an upper or
lower case to be flipped.
.... snip ...

So the tests are totally unnecessary.

But suppose:

char test [] = "Hello" "\xf0" "World";

...then your function causes undefined behavior on an implementation
with CHAR_BIT 8 and signed char, because you will pass an invalid
value to tolower() or toupper().

If you examine my function you will find that isupper/lower and
toupper/lower are always operating on an unsigned char. The tests
are necessary, to decide whether to upshift or downshift, although
the second can probably be eliminated. However that would leave
the action somewhat unclear, as it is no longer obvious that some
characters are never transformed.

While busily charging off in all directions you failed to even read
the verbiage I attached, and missed the fact that the conditional
expressions lacked a closing parenthesis, and thus were syntax
errors.

The function will convert test[] to "hELLO" "\xf0" "wORLD".
 
E

Eric Sosman

infobahn said:
Ahem. That /Lew/ recommends? Am I invisible all of a sudden?

My apologies; I mistook >>> for >> (or maybe the
other way around) in the attrisnipbutions.
 
E

Eric Sosman

Jack said:
Unusual to want to simply change the case, but try something like:

#include <ctype.h>

void flipcase(char *s)
{
unsigned char ch;

if (s) /* assuming you want to protect against NULL */
while (ch = *s) {
if (isupper(ch) *s = tolower(ch);
[...]
But suppose:

char test [] = "Hello" "\xf0" "World";

...then your function causes undefined behavior on an implementation
with CHAR_BIT 8 and signed char, because you will pass an invalid
value to tolower() or toupper().

No: The argument is always in the range of `unsigned char'
as required by the Standard. You'll see why this must be so
if you examine the type of the variable `ch' ...
 
G

Giorgos Keramidas

Unusual to want to simply change the case, but try something like:

#include <ctype.h>

void flipcase(char *s)
{
unsigned char ch;

if (s) /* assuming you want to protect against NULL */
while (ch = *s) {
if (isupper(ch) *s = tolower(ch);
else if (islower(ch) *s = toupper(ch);
s++;
}
} /* flipcase, untested */

Missing parentheses in both conditionals :-(
 
J

Jack Klein


If you examine my function you will find that isupper/lower and
toupper/lower are always operating on an unsigned char. The tests
are necessary, to decide whether to upshift or downshift, although
the second can probably be eliminated. However that would leave
the action somewhat unclear, as it is no longer obvious that some
characters are never transformed.

While busily charging off in all directions you failed to even read
the verbiage I attached, and missed the fact that the conditional
expressions lacked a closing parenthesis, and thus were syntax
errors.

The function will convert test[] to "hELLO" "\xf0" "wORLD".

Sorry, need to have my meds adjusted again, I guess. Please disregard
my previous post.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top