Manipulation of strings: upper/lower case

Discussion in 'C Programming' started by Pierre, Jan 15, 2005.

  1. Pierre

    Pierre Guest

    Hello!

    I've been looking for a portable means of changing the case of a
    string but i've found nothing so far. Does it exists? I guess (and
    hope) it does..

    Thanks
    Pierre
     
    Pierre, Jan 15, 2005
    #1
    1. Advertising

  2. Pierre

    BMarsh Guest

    Hi there,

    #include <ctype.h>
    int toupper(int c);

    cheers

    B.
     
    BMarsh, Jan 15, 2005
    #2
    1. Advertising

  3. Pierre

    Lew Pitcher Guest

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Pierre wrote:
    > Hello!
    >
    > I've been looking for a portable means of changing the case of a
    > string but i've found nothing so far.


    Such can be easily built from the existing standard C functions

    > Does it exists? I guess (and hope) it does..


    If you can't find one, try this...

    #include <ctype.h>

    void UppercaseString(char *string)
    {
    for(;*string;++string)
    if (islower(*string)) *string = toupper(*string);
    }

    void LowercaseString(char *string)
    {
    for(;*string;++string)
    if (isupper(*string)) *string = tolower(*string);
    }



    - --
    Lew Pitcher

    Master Codewright and JOAT-in-training
    Registered Linux User #112576 (http://counter.li.org/)
    Slackware - Because I know what I'm doing.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)
    Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

    iD8DBQFB6T19agVFX4UWr64RAmrqAJ4gTLptYf+LpCT67ruc88tAQoPmyACcCKQT
    lBuQV/LkjuvpFyBzPs+qdhY=
    =Mz/Z
    -----END PGP SIGNATURE-----
     
    Lew Pitcher, Jan 15, 2005
    #3
  4. Pierre

    infobahn Guest

    Lew Pitcher wrote:
    >


    <snip>

    > #include <ctype.h>
    >
    > void UppercaseString(char *string)
    > {
    > for(;*string;++string)
    > if (islower(*string)) *string = toupper(*string);
    > }


    Caution is necessary here. The behaviours of islower and toupper
    are undefined if they are passed a value that is neither EOF nor
    representable as an unsigned char. It is good practice, therefore,
    to cast *string to unsigned char. (No need to cast it back to
    int afterwards, since the normal promotion rules handle that.)

    The islower() call smacks of premature optimisation. :)

    <snip>
     
    infobahn, Jan 15, 2005
    #4
  5. Pierre

    CBFalconer Guest

    Pierre wrote:
    >
    > I've been looking for a portable means of changing the case of a
    > string but i've found nothing so far. Does it exists? I guess (and
    > hope) it does..


    Unusual to want to simply change the case, but try something like:

    #include <ctype.h>

    void flipcase(char *s)
    {
    unsigned char ch;

    if (s) /* assuming you want to protect against NULL */
    while (ch = *s) {
    if (isupper(ch) *s = tolower(ch);
    else if (islower(ch) *s = toupper(ch);
    s++;
    }
    } /* flipcase, untested */

    which allows for the fact that some chars do not have an upper or
    lower case to be flipped.

    --
    "If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers." - Keith Thompson
     
    CBFalconer, Jan 15, 2005
    #5
  6. Pierre

    Joe Wright Guest

    infobahn wrote:
    > Lew Pitcher wrote:
    >
    >
    > <snip>
    >
    >>#include <ctype.h>
    >>
    >>void UppercaseString(char *string)
    >>{
    >> for(;*string;++string)
    >> if (islower(*string)) *string = toupper(*string);
    >>}

    >
    >
    > Caution is necessary here. The behaviours of islower and toupper
    > are undefined if they are passed a value that is neither EOF nor
    > representable as an unsigned char. It is good practice, therefore,
    > to cast *string to unsigned char. (No need to cast it back to
    > int afterwards, since the normal promotion rules handle that.)
    >
    > The islower() call smacks of premature optimisation. :)
    >
    > <snip>


    The islower() call is unnecessary.

    char *upper(char *st) {
    char *s = st;
    while ((*s = toupper(*s))) ++s;
    return st;
    }

    There is no need to cast the argument to toupper() to unsigned char.
    We assume that st points to a valid string. All characters of such a
    string are within the range 0..CHAR_MAX by definition. CHAR_MAX is
    within UCHAR_MAX by definition.

    If st points to something not a valid string, and toupper() is
    presented with something out of range, (-20 for example) it may
    SEGFAULT. And why not? It might tell you where your error is.

    --
    Joe Wright mailto:
    "Everything should be made as simple as possible, but not simpler."
    --- Albert Einstein ---
     
    Joe Wright, Jan 15, 2005
    #6
  7. On Sat, 15 Jan 2005 15:05:28 -0500, Joe Wright <>
    wrote:

    >infobahn wrote:
    >
    >>Lew Pitcher wrote:
    >>
    >>>#include <ctype.h>
    >>>
    >>>void UppercaseString(char *string)
    >>>{
    >>> for(;*string;++string)
    >>> if (islower(*string)) *string = toupper(*string);
    >>>}

    >>...
    >> The islower() call smacks of premature optimisation. :)
    >>
    >> <snip>

    >
    >The islower() call is unnecessary.
    >
    >char *upper(char *st) {
    > char *s = st;
    > while ((*s = toupper(*s))) ++s;
    > return st;
    >}
    >
    >There is no need to cast the argument to toupper() to unsigned char.
    >We assume that st points to a valid string. All characters of such a
    >string are within the range 0..CHAR_MAX by definition.


    Only if char happens to be unsigned, surely?

    -- Mat.
     
    Mathew Hendry, Jan 15, 2005
    #7
  8. Pierre

    Chris Torek Guest

    In article <>
    Joe Wright <> wrote:
    >The islower() call is unnecessary.


    Indeed.

    >char *upper(char *st) {
    > char *s = st;
    > while ((*s = toupper(*s))) ++s;
    > return st;
    >}
    >
    >There is no need to cast the argument to toupper() to unsigned char.
    >We assume that st points to a valid string.


    And someone whose name is "Pól" has a name that is an "invalid
    string"? :)

    >All characters of such a string are within the range 0..CHAR_MAX
    >by definition. CHAR_MAX is within UCHAR_MAX by definition.


    If you use ISO-Latin-1, and have signed characters -- and both of
    these are quite commonly true today -- you *will* have characters
    whose value is outside the [0..CHAR_MAX] range. For instance, the
    o-with-accent-acute above is 0xf3 or -13.

    >If st points to something not a valid string, and toupper() is
    >presented with something out of range, (-20 for example) it may
    >SEGFAULT. And why not? It might tell you where your error is.


    Or it may change the guy's name from Pól (the Celtic form of
    the name "Paul") to PzL, which might just annoy him. If he happens
    to have a large sword, this could be a bad strategy. :)
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
     
    Chris Torek, Jan 15, 2005
    #8
  9. Pierre

    Eric Sosman Guest

    Joe Wright wrote:
    > [...]
    >> Lew Pitcher wrote:
    >>
    >> Caution is necessary here. The behaviours of islower and toupper
    >> are undefined if they are passed a value that is neither EOF nor
    >> representable as an unsigned char. It is good practice, therefore,
    >> to cast *string to unsigned char. (No need to cast it back to
    >> int afterwards, since the normal promotion rules handle that.)
    >> [...]

    >
    > There is no need to cast the argument to toupper() to unsigned char.


    Didn't we just do this a week or so ago? Perhaps it's
    a candidate for the FAQ; it seems at any rate to be FA.

    > We
    > assume that st points to a valid string. All characters of such a string
    > are within the range 0..CHAR_MAX by definition.


    No, they are in the range CHAR_MIN through CHAR_MAX.
    Since `char' may be a signed type (it's the implementation's
    choice), CHAR_MIN can be negative. It's true that all the
    characters mandated by the Standard are required to be non-
    negative, but the Standard allows the implementation to define
    additional characters, too -- and some of these may have
    negative codes.

    > CHAR_MAX is within
    > UCHAR_MAX by definition.


    True, but CHAR_MIN can be negative, hence outside the
    range of `unsigned char'.

    > If st points to something not a valid string, and toupper() is presented
    > with something out of range, (-20 for example) it may SEGFAULT. And why
    > not? It might tell you where your error is.


    Except that the "error" isn't the presence of a -20 in
    the string (in one widely-used scheme, -20 is "Latin small
    i with grave accent"). The real error is the failure to
    use the cast that Lew recommends.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Jan 15, 2005
    #9
  10. Pierre

    Jack Klein Guest

    On Sat, 15 Jan 2005 19:00:59 GMT, CBFalconer <>
    wrote in comp.lang.c:

    > Pierre wrote:
    > >
    > > I've been looking for a portable means of changing the case of a
    > > string but i've found nothing so far. Does it exists? I guess (and
    > > hope) it does..

    >
    > Unusual to want to simply change the case, but try something like:
    >
    > #include <ctype.h>
    >
    > void flipcase(char *s)
    > {
    > unsigned char ch;
    >
    > if (s) /* assuming you want to protect against NULL */
    > while (ch = *s) {
    > if (isupper(ch) *s = tolower(ch);


    Completely unnecessary conditional test.

    > else if (islower(ch) *s = toupper(ch);


    Completely unnecessary conditional test.

    > s++;
    > }
    > } /* flipcase, untested */
    >
    > which allows for the fact that some chars do not have an upper or
    > lower case to be flipped.


    (sigh)

    7.4.2.1 The tolower function
    Synopsis
    1 #include <ctype.h>
    int tolower(int c);
    Description
    2 The tolower function converts an uppercase letter to a corresponding
    lowercase letter.
    Returns
    3 If the argument is a character for which isupper is true and there
    are one or more corresponding characters, as specified by the current
    locale, for which islower is true, the tolower function returns one of
    the corresponding characters (always the same one for any given
    locale); otherwise, the argument is returned unchanged.

    7.4.2.2 The toupper function
    Synopsis
    1 #include <ctype.h>
    int toupper(int c);
    Description
    2 The toupper function converts a lowercase letter to a corresponding
    uppercase letter.
    Returns
    3 If the argument is a character for which islower is true and there
    are one or more corresponding characters, as specified by the current
    locale, for which isupper is true, the toupper function returns one of
    the corresponding characters (always the same one for any given
    locale); otherwise, the argument is returned unchanged.

    So the tests are totally unnecessary.

    But suppose:

    char test [] = "Hello" "\xf0" "World";

    ....then your function causes undefined behavior on an implementation
    with CHAR_BIT 8 and signed char, because you will pass an invalid
    value to tolower() or toupper().

    --
    Jack Klein
    Home: http://JK-Technology.Com
    FAQs for
    comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
    comp.lang.c++ http://www.parashift.com/c -faq-lite/
    alt.comp.lang.learn.c-c++
    http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
     
    Jack Klein, Jan 16, 2005
    #10
  11. Pierre

    Joe Wright Guest

    Eric Sosman wrote:
    > Joe Wright wrote:
    >
    >> [...]
    >>
    >>> Lew Pitcher wrote:
    >>>
    >>> Caution is necessary here. The behaviours of islower and toupper
    >>> are undefined if they are passed a value that is neither EOF nor
    >>> representable as an unsigned char. It is good practice, therefore,
    >>> to cast *string to unsigned char. (No need to cast it back to
    >>> int afterwards, since the normal promotion rules handle that.)
    >>> [...]

    >>
    >>
    >> There is no need to cast the argument to toupper() to unsigned char.

    >
    >
    > Didn't we just do this a week or so ago? Perhaps it's
    > a candidate for the FAQ; it seems at any rate to be FA.
    >

    Yes we did. It remains to be seen whether I can learn enough from
    one beating to avoid the next one. :)

    >> We assume that st points to a valid string. All characters of such a
    >> string are within the range 0..CHAR_MAX by definition.

    >
    >
    > No, they are in the range CHAR_MIN through CHAR_MAX.
    > Since `char' may be a signed type (it's the implementation's
    > choice), CHAR_MIN can be negative. It's true that all the
    > characters mandated by the Standard are required to be non-
    > negative, but the Standard allows the implementation to define
    > additional characters, too -- and some of these may have
    > negative codes.
    >

    Yes, and I truly missed that until just now. Thank you.

    >> CHAR_MAX is within UCHAR_MAX by definition.

    >
    >
    > True, but CHAR_MIN can be negative, hence outside the
    > range of `unsigned char'.
    >

    Yes, but I never mentioned CHAR_MIN.

    >> If st points to something not a valid string, and toupper() is
    >> presented with something out of range, (-20 for example) it may
    >> SEGFAULT. And why not? It might tell you where your error is.

    >
    >
    > Except that the "error" isn't the presence of a -20 in
    > the string (in one widely-used scheme, -20 is "Latin small
    > i with grave accent"). The real error is the failure to
    > use the cast that Lew recommends.
    >

    It didn't occur to me that the value of é (130) was negative as a
    signed char (10000010) and when promoted to int would be -126.

    I apologize to you and the group for my noise. I'll get it right
    next time, I promise. :=)

    --
    Joe Wright mailto:
    "Everything should be made as simple as possible, but not simpler."
    --- Albert Einstein ---
     
    Joe Wright, Jan 16, 2005
    #11
  12. Pierre

    Mysidia Guest

    > char test [] = "Hello" "\xf0" "World";
    >
    > ...then your function causes undefined behavior on an implementation
    > with CHAR_BIT 8 and signed char, because you will pass an invalid
    > value to tolower() or toupper().



    But checking islower() or isupper() does not protect from this,
    because islower() and isupper() have the same fundamental requirement..

    >From ISO/IEC 9899:1999 (E) :

    "The header <ctype.h> declares several functions useful for classifying
    and mapping characters.166) In all cases the argument is an int, the
    value of which shall be representable as an unsigned char or shall
    equal the value of the macro EOF. If the argument has any other value,
    the behavior is undefined."
    isupper(0xf0) is just as undefined as toupper(0xf0) is.
     
    Mysidia, Jan 16, 2005
    #12
  13. Pierre

    Joe Wright Guest

    Chris Torek wrote:
    > In article <>
    > Joe Wright <> wrote:
    >
    >>The islower() call is unnecessary.

    >
    >
    > Indeed.
    >
    >
    >>char *upper(char *st) {
    >> char *s = st;
    >> while ((*s = toupper(*s))) ++s;
    >> return st;
    >>}
    >>
    >>There is no need to cast the argument to toupper() to unsigned char.
    >>We assume that st points to a valid string.

    >
    >
    > And someone whose name is "Pól" has a name that is an "invalid
    > string"? :)
    >
    >
    >>All characters of such a string are within the range 0..CHAR_MAX
    >>by definition. CHAR_MAX is within UCHAR_MAX by definition.

    >
    >
    > If you use ISO-Latin-1, and have signed characters -- and both of
    > these are quite commonly true today -- you *will* have characters
    > whose value is outside the [0..CHAR_MAX] range. For instance, the
    > o-with-accent-acute above is 0xf3 or -13.
    >

    It looks something like ó (162) at my house. 10100010 is -94 but
    your point is taken. I didn't consider negative char as valid.
    >
    >>If st points to something not a valid string, and toupper() is
    >>presented with something out of range, (-20 for example) it may
    >>SEGFAULT. And why not? It might tell you where your error is.

    >
    >
    > Or it may change the guy's name from Pól (the Celtic form of
    > the name "Paul") to PzL, which might just annoy him. If he happens
    > to have a large sword, this could be a bad strategy. :)


    I'll try to stay away from that sword. I'm sorry to have muddied the
    water. I'll get it wright next time, I promise. :)

    --
    Joe Wright mailto:
    "Everything should be made as simple as possible, but not simpler."
    --- Albert Einstein ---
     
    Joe Wright, Jan 16, 2005
    #13
  14. Pierre

    S.Tobias Guest

    Jack Klein <> wrote:
    > On Sat, 15 Jan 2005 19:00:59 GMT, CBFalconer <>
    > wrote in comp.lang.c:


    > > #include <ctype.h>
    > >
    > > void flipcase(char *s)
    > > {
    > > unsigned char ch;
    > >
    > > if (s) /* assuming you want to protect against NULL */
    > > while (ch = *s) {
    > > if (isupper(ch) *s = tolower(ch);


    > Completely unnecessary conditional test.


    > > else if (islower(ch) *s = toupper(ch);


    > Completely unnecessary conditional test.


    Why completely unnecessary? This is case *toggling* function, so at
    least one test must remain (note "else").

    > > s++;
    > > }
    > > } /* flipcase, untested */
    > >


    --
    Stan Tobias
    mailx `echo LID | sed s/[[:upper:]]//g`
     
    S.Tobias, Jan 16, 2005
    #14
  15. Pierre

    infobahn Guest

    Eric Sosman wrote:
    >
    > Except that the "error" isn't the presence of a -20 in
    > the string (in one widely-used scheme, -20 is "Latin small
    > i with grave accent"). The real error is the failure to
    > use the cast that Lew recommends.


    Ahem. That /Lew/ recommends? Am I invisible all of a sudden?
     
    infobahn, Jan 16, 2005
    #15
  16. Pierre

    CBFalconer Guest

    Jack Klein wrote:
    > CBFalconer <>
    >> Pierre wrote:
    >>>
    >>> I've been looking for a portable means of changing the case of a
    >>> string but i've found nothing so far. Does it exists? I guess (and
    >>> hope) it does..

    >>
    >> Unusual to want to simply change the case, but try something like:
    >>
    >> #include <ctype.h>
    >>
    >> void flipcase(char *s)
    >> {
    >> unsigned char ch;
    >>
    >> if (s) /* assuming you want to protect against NULL */
    >> while (ch = *s) {
    >> if (isupper(ch) *s = tolower(ch);

    >
    > Completely unnecessary conditional test.
    >
    >> else if (islower(ch) *s = toupper(ch);

    >
    > Completely unnecessary conditional test.
    >
    >> s++;
    >> }
    >> } /* flipcase, untested */
    >>
    >> which allows for the fact that some chars do not have an upper or
    >> lower case to be flipped.

    >

    .... snip ...
    >
    > So the tests are totally unnecessary.
    >
    > But suppose:
    >
    > char test [] = "Hello" "\xf0" "World";
    >
    > ...then your function causes undefined behavior on an implementation
    > with CHAR_BIT 8 and signed char, because you will pass an invalid
    > value to tolower() or toupper().


    If you examine my function you will find that isupper/lower and
    toupper/lower are always operating on an unsigned char. The tests
    are necessary, to decide whether to upshift or downshift, although
    the second can probably be eliminated. However that would leave
    the action somewhat unclear, as it is no longer obvious that some
    characters are never transformed.

    While busily charging off in all directions you failed to even read
    the verbiage I attached, and missed the fact that the conditional
    expressions lacked a closing parenthesis, and thus were syntax
    errors.

    The function will convert test[] to "hELLO" "\xf0" "wORLD".

    --
    "If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers." - Keith Thompson
     
    CBFalconer, Jan 16, 2005
    #16
  17. Pierre

    Eric Sosman Guest

    infobahn wrote:
    > Eric Sosman wrote:
    >
    >> Except that the "error" isn't the presence of a -20 in
    >>the string (in one widely-used scheme, -20 is "Latin small
    >>i with grave accent"). The real error is the failure to
    >>use the cast that Lew recommends.

    >
    >
    > Ahem. That /Lew/ recommends? Am I invisible all of a sudden?


    My apologies; I mistook >>> for >> (or maybe the
    other way around) in the attrisnipbutions.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Jan 16, 2005
    #17
  18. Pierre

    Eric Sosman Guest

    Jack Klein wrote:

    > On Sat, 15 Jan 2005 19:00:59 GMT, CBFalconer <>
    > wrote in comp.lang.c:
    >
    >>Unusual to want to simply change the case, but try something like:
    >>
    >>#include <ctype.h>
    >>
    >>void flipcase(char *s)
    >>{
    >> unsigned char ch;
    >>
    >> if (s) /* assuming you want to protect against NULL */
    >> while (ch = *s) {
    >> if (isupper(ch) *s = tolower(ch);

    > [...]
    > But suppose:
    >
    > char test [] = "Hello" "\xf0" "World";
    >
    > ...then your function causes undefined behavior on an implementation
    > with CHAR_BIT 8 and signed char, because you will pass an invalid
    > value to tolower() or toupper().


    No: The argument is always in the range of `unsigned char'
    as required by the Standard. You'll see why this must be so
    if you examine the type of the variable `ch' ...

    --
    Eric Sosman
    lid
     
    Eric Sosman, Jan 16, 2005
    #18
  19. On 2005-01-15 19:00, CBFalconer wrote:
    > Pierre wrote:
    >> I've been looking for a portable means of changing the case of a
    >> string but i've found nothing so far. Does it exists? I guess (and
    >> hope) it does..

    >
    > Unusual to want to simply change the case, but try something like:
    >
    > #include <ctype.h>
    >
    > void flipcase(char *s)
    > {
    > unsigned char ch;
    >
    > if (s) /* assuming you want to protect against NULL */
    > while (ch = *s) {
    > if (isupper(ch) *s = tolower(ch);
    > else if (islower(ch) *s = toupper(ch);
    > s++;
    > }
    > } /* flipcase, untested */


    Missing parentheses in both conditionals :-(
     
    Giorgos Keramidas, Jan 16, 2005
    #19
  20. Pierre

    Jack Klein Guest

    On Sun, 16 Jan 2005 05:16:37 GMT, CBFalconer <>
    wrote in comp.lang.c:

    > > CBFalconer <>

    >
    > If you examine my function you will find that isupper/lower and
    > toupper/lower are always operating on an unsigned char. The tests
    > are necessary, to decide whether to upshift or downshift, although
    > the second can probably be eliminated. However that would leave
    > the action somewhat unclear, as it is no longer obvious that some
    > characters are never transformed.
    >
    > While busily charging off in all directions you failed to even read
    > the verbiage I attached, and missed the fact that the conditional
    > expressions lacked a closing parenthesis, and thus were syntax
    > errors.
    >
    > The function will convert test[] to "hELLO" "\xf0" "wORLD".


    Sorry, need to have my meds adjusted again, I guess. Please disregard
    my previous post.

    --
    Jack Klein
    Home: http://JK-Technology.Com
    FAQs for
    comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
    comp.lang.c++ http://www.parashift.com/c -faq-lite/
    alt.comp.lang.learn.c-c++
    http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
     
    Jack Klein, Jan 16, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    721
    Jürgen Exner
    Dec 7, 2004
  2. David
    Replies:
    10
    Views:
    803
    James Lothian
    May 7, 2004
  3. Janice

    lower case to upper case

    Janice, Dec 10, 2004, in forum: C Programming
    Replies:
    17
    Views:
    1,211
    Richard Bos
    Dec 14, 2004
  4. penny
    Replies:
    28
    Views:
    2,880
    Charlton Wilbur
    Mar 10, 2008
  5. BlackHelicopter
    Replies:
    0
    Views:
    557
    BlackHelicopter
    Jan 31, 2013
Loading...

Share This Page