PLEASE HELP - odd string sorting related problem

Discussion in 'C Programming' started by cpptutor2000@yahoo.com, Oct 19, 2007.

  1. Guest

    Could some C guru provide some hints on my problem? I am trying to
    sort an array of character strings, where each string contains
    lowercase, uppercase, digits as well as non-alphanumeric characters as
    '-', '(' or '/'. Obviously, standard C functions as 'strcmp' would
    fail in these cases. I can convert all the non-digit characters to
    lowercase, but how do I deal with the non-alphanumeric characters?
    Any hints or suggestions would be greatly helpful. Thanks in advance
    for your help.
    , Oct 19, 2007
    #1
    1. Advertising

  2. Default User Guest

    wrote:

    > Could some C guru provide some hints on my problem? I am trying to
    > sort an array of character strings, where each string contains
    > lowercase, uppercase, digits as well as non-alphanumeric characters as
    > '-', '(' or '/'. Obviously, standard C functions as 'strcmp' would
    > fail in these cases. I can convert all the non-digit characters to
    > lowercase, but how do I deal with the non-alphanumeric characters?
    > Any hints or suggestions would be greatly helpful. Thanks in advance
    > for your help.



    Devise and algorithm for comparing such strings.

    Implement the algorithm.




    Brian
    Default User, Oct 20, 2007
    #2
    1. Advertising

  3. pete Guest

    wrote:
    >
    > Could some C guru provide some hints on my problem? I am trying to
    > sort an array of character strings, where each string contains
    > lowercase, uppercase, digits as well as non-alphanumeric characters as
    > '-', '(' or '/'. Obviously, standard C functions as 'strcmp' would
    > fail in these cases. I can convert all the non-digit characters to
    > lowercase, but how do I deal with the non-alphanumeric characters?
    > Any hints or suggestions would be greatly helpful. Thanks in advance
    > for your help.


    You could use a case insensitive variation on strcmp:

    #include <ctype.h>
    int str_ccmp(const char *s1, const char *s2)
    {
    for (;;) {
    if (*s1 != *s2) {
    const int c1 = tolower((unsigned char)*s1);
    const int c2 = tolower((unsigned char)*s2);

    if (c2 != c1) {
    return c2 > c1 ? -1 : 1;
    }
    } else {
    if (*s1 == '\0') {
    return 0;
    }
    }
    ++s1;
    ++s2;
    }
    }

    --
    pete
    pete, Oct 20, 2007
    #3
  4. "" <> writes:
    > Could some C guru provide some hints on my problem? I am trying to
    > sort an array of character strings, where each string contains
    > lowercase, uppercase, digits as well as non-alphanumeric characters as
    > '-', '(' or '/'. Obviously, standard C functions as 'strcmp' would
    > fail in these cases.


    What's obvious about that?

    > I can convert all the non-digit characters to
    > lowercase, but how do I deal with the non-alphanumeric characters?


    I don't know. How do you want to deal with non-alphanumeric characters?

    Using strcmp() directly is certainly a valid way to sort strings, but
    you apparently want to map uppercase letters to lowercase before
    comparing them. That still leaves a plethora of ways you might want
    to compare strings that contain things other than letters. We have no
    way of knowing (and C doesn't define) which of those ways is valid.

    You need to decide how you want to do the comparisons. Once you've
    done that, it's likely you'll be able to implement the comparison in C
    yourself. If not, show us what you've tried and we can help you fix
    it.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Oct 20, 2007
    #4
  5. CBFalconer Guest

    "" wrote:
    >
    > Could some C guru provide some hints on my problem? I am trying
    > to sort an array of character strings, where each string contains
    > lowercase, uppercase, digits as well as non-alphanumeric
    > characters as '-', '(' or '/'. Obviously, standard C functions as
    > 'strcmp' would fail in these cases. I can convert all the
    > non-digit characters to lowercase, but how do I deal with the
    > non-alphanumeric characters? Any hints or suggestions would be
    > greatly helpful. Thanks in advance for your help.


    Simply apply strcmp. The strings will be sorted in accordance with
    the native ordering of the default char set. The sort will
    succeed. Whether it is what you want is another matter, and you
    can make whatever substitutions you need to affect that.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>


    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Oct 20, 2007
    #5
  6. >Could some C guru provide some hints on my problem? I am trying to

    It might help if you STATE A PROBLEM.

    >sort an array of character strings, where each string contains
    >lowercase, uppercase, digits as well as non-alphanumeric characters as
    >'-', '(' or '/'. Obviously, standard C functions as 'strcmp' would
    >fail in these cases.


    It is obvious to me that strcmp() would succeed in comparing any C
    strings whatever, as long as you put no requirements on sort order
    that conflict with what strcmp() does (and you didn't say anything
    about sort order at all).

    >I can convert all the non-digit characters to
    >lowercase, but how do I deal with the non-alphanumeric characters?


    It isn't necessary to convert punctuation to lower case in any
    character set I'm aware of.

    It is generally considered a hanging offense to deal characters
    from the bottom of the deck.
    Gordon Burditt, Oct 20, 2007
    #6
  7. On Oct 19, 11:57 pm, "" <>
    wrote:
    > Could some C guru provide some hints on my problem? I am trying to
    > sort an array of character strings, where each string contains
    > lowercase, uppercase, digits as well as non-alphanumeric characters as
    > '-', '(' or '/'. Obviously, standard C functions as 'strcmp' would
    > fail in these cases.


    What do you mean by "fail"? strcmp() will work perfectly well in these
    cases.

    > I can convert all the non-digit characters to
    > lowercase, but how do I deal with the non-alphanumeric characters?


    I've no idea - it depends how you want to deal with them. Why do you
    want to do anything other than use their normal values?

    > Any hints or suggestions would be greatly helpful.


    You need to define exactly what you want to do, then write code to do
    it.
    J. J. Farrell, Oct 20, 2007
    #7
  8. Guest

    On Oct 19, 8:28 pm, (Gordon Burditt) wrote:
    > >Could some C guru provide some hints on my problem? I am trying to

    >
    > It might help if you STATE A PROBLEM.
    >
    > >sort an array of character strings, where each string contains
    > >lowercase, uppercase, digits as well as non-alphanumeric characters as
    > >'-', '(' or '/'. Obviously, standard C functions as 'strcmp' would
    > >fail in these cases.

    >
    > It is obvious to me that strcmp() would succeed in comparing any C
    > strings whatever, as long as you put no requirements on sort order
    > that conflict with what strcmp() does (and you didn't say anything
    > about sort order at all).
    >
    > >I can convert all the non-digit characters to
    > >lowercase, but how do I deal with the non-alphanumeric characters?

    >
    > It isn't necessary to convert punctuation to lower case in any
    > character set I'm aware of.
    >
    > It is generally considered a hanging offense to deal characters
    > from the bottom of the deck.


    As far as I remember from my trusty K & R C text,
    the source code for the strcmp fumction is:

    int strcmp(const char *s1, const char *s2)
    {
    while (*s1 == *s2++)
    if (*s1++ == 0)
    return (0);
    return (*(const unsigned char *)s1 - *(const unsigned char *)
    (s2 - 1));
    }

    Given that, I am trying to find a way of sorting strings for example:
    1. Bungie.net - TCP623
    2. Doom(Id Sofware) - version 1
    Obviously, the non-digit and non-alphanumeric (a -> z, A - > Z) cannot
    be converted to lower case, how do I deal with these special
    characters - straightforward application of 'strcmp' would not provide
    very accurate results.
    , Oct 20, 2007
    #8
  9. "" <> writes:
    [...]
    > As far as I remember from my trusty K & R C text,
    > the source code for the strcmp fumction is:
    >
    > int strcmp(const char *s1, const char *s2)
    > {
    > while (*s1 == *s2++)
    > if (*s1++ == 0)
    > return (0);
    > return (*(const unsigned char *)s1 - *(const unsigned char *)
    > (s2 - 1));
    > }


    That's one possible definition for strcmp() (I'll assume it's correct,
    but I haven't taken the time to check it). But strcmp() is specified
    in terms of how it behaves. It needn't even be implemented in C.

    > Given that, I am trying to find a way of sorting strings for example:
    > 1. Bungie.net - TCP623
    > 2. Doom(Id Sofware) - version 1
    > Obviously, the non-digit and non-alphanumeric (a -> z, A - > Z) cannot
    > be converted to lower case, how do I deal with these special
    > characters - straightforward application of 'strcmp' would not provide
    > very accurate results.


    What. Are. You. Trying. To. Do. ???.

    What do you mean by "accurate"?

    There is no one definition of how to compare strings. You've told us
    that you want to treat corresponding upper and lower case letters as
    if they were equal ('A' and 'a' equal). You haven't given us a clue
    about how you want to deal with non-alphanumeric characters.

    For example, if you're comparing "foobar" vs. "FooBarBaz", you
    apparently want to map all letters to lowercase, then use strcmp() to
    compare "foobar" vs. "foobarbaz" (result: "foobar" < "FooBarBaz").

    What if you're comparing "foo.bar" vs. "foo:bar"? What result do you
    want? Do you want them to compare equal? Do you want "foo.bar" <
    "foo:bar"? or "foo.bar" > "foo:bar"? Or do you not care as long as
    you get a consistent ordering? Any of those would be equally correct.

    We can't possibly guess how to accomplish your goal if you won't tell
    us what your goal is.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Oct 20, 2007
    #9
  10. santosh Guest

    wrote:

    > On Oct 19, 8:28 pm, (Gordon Burditt) wrote:
    >> >Could some C guru provide some hints on my problem? I am trying to

    >>
    >> It might help if you STATE A PROBLEM.
    >>
    >> >sort an array of character strings, where each string contains
    >> >lowercase, uppercase, digits as well as non-alphanumeric characters
    >> >as '-', '(' or '/'. Obviously, standard C functions as 'strcmp'
    >> >would fail in these cases.

    >>
    >> It is obvious to me that strcmp() would succeed in comparing any C
    >> strings whatever, as long as you put no requirements on sort order
    >> that conflict with what strcmp() does (and you didn't say anything
    >> about sort order at all).
    >>
    >> >I can convert all the non-digit characters to
    >> >lowercase, but how do I deal with the non-alphanumeric characters?

    >>
    >> It isn't necessary to convert punctuation to lower case in any
    >> character set I'm aware of.
    >>
    >> It is generally considered a hanging offense to deal characters
    >> from the bottom of the deck.

    >
    > As far as I remember from my trusty K & R C text,
    > the source code for the strcmp fumction is:
    >
    > int strcmp(const char *s1, const char *s2)
    > {
    > while (*s1 == *s2++)
    > if (*s1++ == 0)
    > return (0);
    > return (*(const unsigned char *)s1 - *(const unsigned char
    > *)
    > (s2 - 1));
    > }
    >
    > Given that, I am trying to find a way of sorting strings for example:
    > 1. Bungie.net - TCP623
    > 2. Doom(Id Sofware) - version 1
    > Obviously, the non-digit and non-alphanumeric (a -> z, A - > Z) cannot
    > be converted to lower case, how do I deal with these special
    > characters - straightforward application of 'strcmp' would not provide
    > very accurate results.


    Sorting only makes sense when the data are related or are given
    meaningful relatedness by the programmer. Your two strings are only
    very tenuously related without _you_ as the programmer giving
    additional context.

    IOW, you'll have to define a set of rules on how to sort your data and
    write your own sorting function. Of course it can itself use strcmp and
    co., at a lower level for parts of the data.
    santosh, Oct 20, 2007
    #10
  11. On Oct 20, 9:01 pm, "" <>
    wrote:
    > On Oct 19, 8:28 pm, (Gordon Burditt) wrote:
    >
    >
    >
    >
    >
    > > >Could some C guru provide some hints on my problem? I am trying to

    >
    > > It might help if you STATE A PROBLEM.

    >
    > > >sort an array of character strings, where each string contains
    > > >lowercase, uppercase, digits as well as non-alphanumeric characters as
    > > >'-', '(' or '/'. Obviously, standard C functions as 'strcmp' would
    > > >fail in these cases.

    >
    > > It is obvious to me that strcmp() would succeed in comparing any C
    > > strings whatever, as long as you put no requirements on sort order
    > > that conflict with what strcmp() does (and you didn't say anything
    > > about sort order at all).

    >
    > > >I can convert all the non-digit characters to
    > > >lowercase, but how do I deal with the non-alphanumeric characters?

    >
    > > It isn't necessary to convert punctuation to lower case in any
    > > character set I'm aware of.

    >
    > > It is generally considered a hanging offense to deal characters
    > > from the bottom of the deck.

    >
    > As far as I remember from my trusty K & R C text,
    > the source code for the strcmp fumction is:
    >
    > int strcmp(const char *s1, const char *s2)
    > {
    > while (*s1 == *s2++)
    > if (*s1++ == 0)
    > return (0);
    > return (*(const unsigned char *)s1 - *(const unsigned char *)
    > (s2 - 1));
    > }
    >
    > Given that,


    I've no idea if that's how K&R implement it, or if what you give above
    implements strcmp. I'll assume we're talking about the strcmp that's
    part of C.

    > I am trying to find a way of sorting strings for example:
    > 1. Bungie.net - TCP623
    > 2. Doom(Id Sofware) - version 1
    > Obviously, the non-digit and non-alphanumeric (a -> z, A - > Z) cannot
    > be converted to lower case,


    Why not? If it's appropriate to do such a conversion for whatever
    comparison algorithm you're trying to implement, it's obvious that
    they can.

    > how do I deal with these special
    > characters


    However you want!

    > - straightforward application of 'strcmp' would not provide
    > very accurate results.


    It would provide entirely accurate results. If the results it provides
    aren't the results you want, you need to define exactly how you want
    the different characters to compare and then write a comparison
    routine to implement your algorithm.
    J. J. Farrell, Oct 21, 2007
    #11
  12. CBFalconer <> writes:
    > "" wrote:
    >> Could some C guru provide some hints on my problem? I am trying
    >> to sort an array of character strings, where each string contains
    >> lowercase, uppercase, digits as well as non-alphanumeric
    >> characters as '-', '(' or '/'. Obviously, standard C functions as
    >> 'strcmp' would fail in these cases. I can convert all the
    >> non-digit characters to lowercase, but how do I deal with the
    >> non-alphanumeric characters? Any hints or suggestions would be
    >> greatly helpful. Thanks in advance for your help.

    >
    > Simply apply strcmp. The strings will be sorted in accordance with
    > the native ordering of the default char set. The sort will
    > succeed. Whether it is what you want is another matter, and you
    > can make whatever substitutions you need to affect that.


    The OP has already made it clear that the ordering provided by
    strcmp() will not meet his requirements, since he wants to map
    uppercase letters to lowercase. Unfortunately, that's *all* he's made
    clear.

    (Strictly speaking, nothing in the C standard prohibits a character
    set with an ordering like 'A', 'a', 'B', 'b', 'C', 'c', ..., but I
    know of no actual character set that does this.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Oct 22, 2007
    #12
  13. CBFalconer Guest

    Keith Thompson wrote:
    >

    .... snip ...
    >
    > (Strictly speaking, nothing in the C standard prohibits a character
    > set with an ordering like 'A', 'a', 'B', 'b', 'C', 'c', ..., but I
    > know of no actual character set that does this.)


    I did that for a cross-reference program 20 years ago. It involved
    two 256 byte conversion tables. One was applied at input, and the
    other on eventual output. Debugging was a bear until I stopped
    using shortcuts. :)

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>



    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Oct 23, 2007
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rob O
    Replies:
    9
    Views:
    569
    Rob O
    Jul 27, 2005
  2. KK
    Replies:
    2
    Views:
    550
    Big Brian
    Oct 14, 2003
  3. Maxwell Hammer
    Replies:
    7
    Views:
    640
    Peter Hansen
    Jun 18, 2005
  4. Michael Speer

    Odd behavior with odd code

    Michael Speer, Feb 16, 2007, in forum: C Programming
    Replies:
    33
    Views:
    1,090
    Richard Heathfield
    Feb 18, 2007
  5. Replies:
    3
    Views:
    412
    Michael Angelo Ravera
    Apr 28, 2007
Loading...

Share This Page