qsort semantics

Discussion in 'C Programming' started by gauss010@gmail.com, Sep 20, 2007.

  1. Guest

    Suppose I have an object A of type char[M][N]. Each A is a buffer
    containing a string, and I want to sort the M strings of A using the
    strcmp function. The description of the qsort function says that I
    must pass a pointer to the first object of the array to be sorted.
    This leads me to write the following:

    char A[M][N];
    int cmp(const void *a, const void *b) {
    int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);
    if (v < 0) return -1;
    else if (v == 0) return 0;
    else return 1;
    }
    void sortA(void) {
    qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
    }

    But I also want to be able to use my comparison function to sort other
    arrays of strings whose second array dimension is of a different size.
    For example, I may want to use it to do the same sort on an object of
    type char[2*M][2*N]. Then I think of rewriting it like this:

    char A[M][N], B[2*M][2*N];
    int cmp(const void *a, const void *b) {
    int v = strcmp(a,b);
    if (v < 0) return -1;
    else if (v == 0) return 0;
    else return 1;
    }
    void sortA(void) {
    qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
    }
    void sortB(void) {
    qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
    }

    But now I'm not strictly following the description of the qsort
    function. I'm now not *really* passing a pointer to the first object
    of the array to be sorted, rather I'm passing a pointer to the first
    byte of the first object of the array to be sorted.

    Imagining how I would implement the qsort function myself, I know that
    this will work fine, but abusing the description of the qsort function
    like this makes me uncomfortable.

    Do you think it's OK to write such code?
     
    , Sep 20, 2007
    #1
    1. Advertising

  2. Army1987 Guest

    On Thu, 20 Sep 2007 00:09:12 -0700, gauss010 wrote:

    > Suppose I have an object A of type char[M][N]. Each A is a buffer
    > containing a string, and I want to sort the M strings of A using the
    > strcmp function. The description of the qsort function says that I
    > must pass a pointer to the first object of the array to be sorted.
    > This leads me to write the following:
    >
    > char A[M][N];
    > int cmp(const void *a, const void *b) {
    > int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);

    strcmp takes pointers to const chars, not to arrays.
    If strcmp has a visible prototype here, you could just
    int cmp(const void *a, const void *b) { return strcmp(a, b); }

    K&R2 uses (int (* )(const void *, const void *))strcmp as an
    argument to qsort, but I think the standard committee has since
    deemed it UB (whereas const void *s are required to have the same
    representations as const char*s, a particularly perverse
    implementation could pass them to functions in different ways...)

    > if (v < 0) return -1;
    > else if (v == 0) return 0;
    > else return 1;

    Why couldn't you just return v?
    > }
    > void sortA(void) {
    > qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
    > }
    >
    > But I also want to be able to use my comparison function to sort other
    > arrays of strings whose second array dimension is of a different size.
    > For example, I may want to use it to do the same sort on an object of
    > type char[2*M][2*N]. Then I think of rewriting it like this:
    >
    > char A[M][N], B[2*M][2*N];
    > int cmp(const void *a, const void *b) {
    > int v = strcmp(a,b);
    > if (v < 0) return -1;
    > else if (v == 0) return 0;
    > else return 1;
    > }
    > void sortA(void) {
    > qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
    > }
    > void sortB(void) {
    > qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
    > }
    >
    > But now I'm not strictly following the description of the qsort
    > function. I'm now not *really* passing a pointer to the first object
    > of the array to be sorted, rather I'm passing a pointer to the first
    > byte of the first object of the array to be sorted.

    You're passing void *, which are essentially pointers to char with
    syntactic salt on them. Of course they point to the first byte.
    The only way for qsort() to know how large the objects they are
    intended to point to are, is through the second and third
    argument.
    --
    Army1987 (Replace "NOSPAM" with "email")
    If you're sending e-mail from a Windows machine, turn off Microsoft's
    stupid “Smart Quotes†feature. This is so you'll avoid sprinkling garbage
    characters through your mail. -- Eric S. Raymond and Rick Moen
     
    Army1987, Sep 20, 2007
    #2
    1. Advertising

  3. user923005 Guest

    Maybe use:
    int compare (const void *aa, const void *bb)
    {
    const char * const *a = aa;
    const char * const *b = bb;
    return strcmp (a, b);
    }

    to compare C strings using qsort.
     
    user923005, Sep 20, 2007
    #3
  4. Army1987 Guest

    On Thu, 20 Sep 2007 12:02:28 -0700, user923005 wrote:

    > Maybe use:
    > int compare (const void *aa, const void *bb)
    > {
    > const char * const *a = aa;
    > const char * const *b = bb;
    > return strcmp (a, b);
    > }
    >
    > to compare C strings using qsort.

    Why?
    strcmp takes two const char *s...

    --
    Army1987 (Replace "NOSPAM" with "email")
    If you're sending e-mail from a Windows machine, turn off Microsoft's
    stupid “Smart Quotes†feature. This is so you'll avoid sprinkling garbage
    characters through your mail. -- Eric S. Raymond and Rick Moen
     
    Army1987, Sep 21, 2007
    #4
  5. On Thu, 20 Sep 2007 12:02:28 -0700, user923005 <>
    wrote:

    >Maybe use:
    >int compare (const void *aa, const void *bb)
    >{
    > const char * const *a = aa;
    > const char * const *b = bb;
    > return strcmp (a, b);
    >}
    >
    >to compare C strings using qsort.


    a and b are both qualified types of char**. strcmp takes slightly
    different qualified types of char*.


    Remove del for email
     
    Barry Schwarz, Sep 21, 2007
    #5
  6. pete Guest

    wrote:
    >
    > Suppose I have an object A of type char[M][N]. Each A is a buffer
    > containing a string, and I want to sort the M strings of A using the
    > strcmp function. The description of the qsort function says that I
    > must pass a pointer to the first object of the array to be sorted.
    > This leads me to write the following:
    >
    > char A[M][N];
    > int cmp(const void *a, const void *b) {
    > int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);
    > if (v < 0) return -1;
    > else if (v == 0) return 0;
    > else return 1;
    > }
    > void sortA(void) {
    > qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
    > }
    >
    > But I also want to be able to use my comparison function to sort other
    > arrays of strings whose second array dimension is of a different size.
    > For example, I may want to use it to do the same sort on an object of
    > type char[2*M][2*N]. Then I think of rewriting it like this:
    >
    > char A[M][N], B[2*M][2*N];
    > int cmp(const void *a, const void *b) {
    > int v = strcmp(a,b);
    > if (v < 0) return -1;
    > else if (v == 0) return 0;
    > else return 1;
    > }
    > void sortA(void) {
    > qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
    > }
    > void sortB(void) {
    > qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
    > }
    >
    > But now I'm not strictly following the description of the qsort
    > function. I'm now not *really* passing a pointer to the first object
    > of the array to be sorted, rather I'm passing a pointer to the first
    > byte of the first object of the array to be sorted.
    >
    > Imagining how I would implement the qsort function myself, I know that
    > this will work fine, but abusing the description of the qsort function
    > like this makes me uncomfortable.
    >
    > Do you think it's OK to write such code?


    /* BEGIN new.c */

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define NMEMB(A) (sizeof (A) / sizeof *(A))

    int compar(const void *a, const void *b);

    int main(void)
    {
    char array_1[][sizeof "three"] = {
    "zero","one","two","three","four",
    "five","six","seven","eight","nine"
    };
    char array_2[][2] = {
    "9","8","7","6","5","4","3","2","1","0"
    };
    size_t index;

    for (index = 0; index != NMEMB(array_1); ++index) {
    puts(array_1[index]);
    }
    putchar('\n');
    for (index = 0; index != NMEMB(array_2); ++index) {
    puts(array_2[index]);
    }
    puts("After sorting with strcmp:");
    qsort(array_1, NMEMB(array_1), sizeof *array_1, compar);
    qsort(array_2, NMEMB(array_2), sizeof *array_2, compar);
    for (index = 0; index != NMEMB(array_1); ++index) {
    puts(array_1[index]);
    }
    putchar('\n');
    for (index = 0; index != NMEMB(array_2); ++index) {
    puts(array_2[index]);
    }
    return 0;
    }

    int compar(const void *a, const void *b)
    {
    return strcmp(a, b);
    }

    /* END new.c */


    --
    pete
     
    pete, Sep 21, 2007
    #6
  7. user923005 Guest

    On Sep 20, 5:50 pm, Barry Schwarz <> wrote:
    > On Thu, 20 Sep 2007 12:02:28 -0700, user923005 <>
    > wrote:
    >
    > >Maybe use:
    > >int compare (const void *aa, const void *bb)
    > >{
    > > const char * const *a = aa;
    > > const char * const *b = bb;
    > > return strcmp (a, b);
    > >}

    >
    > >to compare C strings using qsort.

    >
    > a and b are both qualified types of char**. strcmp takes slightly
    > different qualified types of char*.


    I can't believe I posted that. Let's call it temporary insanity.
     
    user923005, Sep 21, 2007
    #7
  8. On Thu, 20 Sep 2007 00:09:12 -0700, wrote:

    >Suppose I have an object A of type char[M][N]. Each A is a buffer
    >containing a string, and I want to sort the M strings of A using the
    >strcmp function. The description of the qsort function says that I
    >must pass a pointer to the first object of the array to be sorted.
    >This leads me to write the following:
    >
    > char A[M][N];
    > int cmp(const void *a, const void *b) {
    > int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);


    You cast a to "pointer to an array of N char". You then dereference
    this expression which yields an array of N char. This expression does
    not meet one of the three exceptions and is therefore converted to the
    address of the first char in the array with type pointer to char. This
    is what you want to pass to strcmp but you don't really care about the
    size of the array. You could achieve the same with less typing with a
    simple (char*)a.

    But as you note below, even that is not necessary.

    > if (v < 0) return -1;
    > else if (v == 0) return 0;
    > else return 1;


    This is not necessary. You can just return v since the requirement is
    for negative, 0, or positive but not for exactly -1 or +1.

    > }
    > void sortA(void) {
    > qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
    > }
    >
    >But I also want to be able to use my comparison function to sort other
    >arrays of strings whose second array dimension is of a different size.
    >For example, I may want to use it to do the same sort on an object of
    >type char[2*M][2*N]. Then I think of rewriting it like this:
    >
    > char A[M][N], B[2*M][2*N];
    > int cmp(const void *a, const void *b) {
    > int v = strcmp(a,b);
    > if (v < 0) return -1;
    > else if (v == 0) return 0;
    > else return 1;
    > }
    > void sortA(void) {
    > qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
    > }
    > void sortB(void) {
    > qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
    > }
    >
    >But now I'm not strictly following the description of the qsort
    >function. I'm now not *really* passing a pointer to the first object
    >of the array to be sorted, rather I'm passing a pointer to the first
    >byte of the first object of the array to be sorted.


    Sure you are. Since the qsort prototype will force the first argument
    to be converted to void*, the type of the pointer in the calling
    statement is irrelevant. While the expressions A, &A, A[0], &A[0],
    and &A[0][0] all have different types~, the address they evaluate to
    points to the same byte, namely the first byte of A. There is no way
    for qsort to know which one you actually used in your calling
    statement.

    ~Yes, I know A and &A[0] have the same type as do A[0] and &A[0][0].

    You could even go one step further and remove the need for sortA and
    sortB by using the calls to qsort in place of the calls to these
    functions (eliminates the need for global variables) or a single call
    to qsort with

    void* ptr;
    size_t count;
    size_t size;
    ptr = $; /* replace $ with A or B as appropriate */
    size = sizeof *$;
    count = sizeof $ / size;
    qsort(ptr, count, size, cmp); /* will work for both A and B */

    >
    >Imagining how I would implement the qsort function myself, I know that
    >this will work fine, but abusing the description of the qsort function
    >like this makes me uncomfortable.


    A void* is a void*, no matter what type the value was converted from.

    >
    >Do you think it's OK to write such code?


    Yes.


    Remove del for email
     
    Barry Schwarz, Sep 21, 2007
    #8
  9. Guest

    On Sep 20, 10:27 pm, Barry Schwarz <> wrote:
    > On Thu, 20 Sep 2007 00:09:12 -0700, wrote:
    > >Suppose I have an object A of type char[M][N]. Each A is a buffer
    > >containing a string, and I want to sort the M strings of A using the
    > >strcmp function. The description of the qsort function says that I
    > >must pass a pointer to the first object of the array to be sorted.
    > >This leads me to write the following:

    >
    > > char A[M][N];
    > > int cmp(const void *a, const void *b) {
    > > int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);

    >
    > You cast a to "pointer to an array of N char". You then dereference
    > this expression which yields an array of N char. This expression does
    > not meet one of the three exceptions and is therefore converted to the
    > address of the first char in the array with type pointer to char. This
    > is what you want to pass to strcmp but you don't really care about the
    > size of the array. You could achieve the same with less typing with a
    > simple (char*)a.
    >
    > But as you note below, even that is not necessary.
    >
    > > if (v < 0) return -1;
    > > else if (v == 0) return 0;
    > > else return 1;

    >
    > This is not necessary. You can just return v since the requirement is
    > for negative, 0, or positive but not for exactly -1 or +1.
    >


    My mistake. I incorrectly recalled that the comparison function needs
    to return either -1, 0, or 1.

    > > }
    > > void sortA(void) {
    > > qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
    > > }

    >
    > >But I also want to be able to use my comparison function to sort other
    > >arrays of strings whose second array dimension is of a different size.
    > >For example, I may want to use it to do the same sort on an object of
    > >type char[2*M][2*N]. Then I think of rewriting it like this:

    >
    > > char A[M][N], B[2*M][2*N];
    > > int cmp(const void *a, const void *b) {
    > > int v = strcmp(a,b);
    > > if (v < 0) return -1;
    > > else if (v == 0) return 0;
    > > else return 1;
    > > }
    > > void sortA(void) {
    > > qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
    > > }
    > > void sortB(void) {
    > > qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
    > > }

    >
    > >But now I'm not strictly following the description of the qsort
    > >function. I'm now not *really* passing a pointer to the first object
    > >of the array to be sorted, rather I'm passing a pointer to the first
    > >byte of the first object of the array to be sorted.

    >
    > Sure you are. Since the qsort prototype will force the first argument
    > to be converted to void*, the type of the pointer in the calling
    > statement is irrelevant. While the expressions A, &A, A[0], &A[0],
    > and &A[0][0] all have different types~, the address they evaluate to
    > points to the same byte, namely the first byte of A. There is no way
    > for qsort to know which one you actually used in your calling
    > statement.
    >


    Ah, this gets at the heart of my problem. I'm unsure when switching
    between pointer types like this, and that's the reason why I used the
    weird cast to char (*)[N] --- treating qsort as a magic black box that
    I should receive pointers from as the same type I passed in seemed
    more correct.

    Suppose I have an array defined as T x[2][2], where T is any type. The
    expression &x then has type pointer-to-array-of-2-array-of-2-T, and
    the expression &x would be said to point to the object named x,
    correct? But is this is only because of the type of &x (&x could be
    considered to point to both x[0] and x[0][0] as well)? Is it always
    OK then, to write expressions such as:

    - (T *)&x or (T *)&x[0] to produce a pointer to x[0][0]?
    - (T *)((char *)&x + sizeof x[0]) to produce a pointer to x[1][0]?
    - any similarly contrived expression to point to a subobject of x?

    These are intuitively correct to me, but I have trouble tracking the
    reasoning through the standard as to why they should behave properly.

    > ~Yes, I know A and &A[0] have the same type as do A[0] and &A[0][0].
    >


    Nitpick: they don't have the same type, but are equivalent when not
    used as the operand of sizeof nor &.

    [snip]
     
    , Sep 21, 2007
    #9
  10. On Thu, 20 Sep 2007 21:39:09 -0700, wrote:


    snip

    >> >But I also want to be able to use my comparison function to sort other
    >> >arrays of strings whose second array dimension is of a different size.
    >> >For example, I may want to use it to do the same sort on an object of
    >> >type char[2*M][2*N]. Then I think of rewriting it like this:

    >>
    >> > char A[M][N], B[2*M][2*N];
    >> > int cmp(const void *a, const void *b) {
    >> > int v = strcmp(a,b);
    >> > if (v < 0) return -1;
    >> > else if (v == 0) return 0;
    >> > else return 1;
    >> > }
    >> > void sortA(void) {
    >> > qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
    >> > }
    >> > void sortB(void) {
    >> > qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
    >> > }

    >>
    >> >But now I'm not strictly following the description of the qsort
    >> >function. I'm now not *really* passing a pointer to the first object
    >> >of the array to be sorted, rather I'm passing a pointer to the first
    >> >byte of the first object of the array to be sorted.

    >>
    >> Sure you are. Since the qsort prototype will force the first argument
    >> to be converted to void*, the type of the pointer in the calling
    >> statement is irrelevant. While the expressions A, &A, A[0], &A[0],
    >> and &A[0][0] all have different types~, the address they evaluate to
    >> points to the same byte, namely the first byte of A. There is no way
    >> for qsort to know which one you actually used in your calling
    >> statement.
    >>

    >
    >Ah, this gets at the heart of my problem. I'm unsure when switching
    >between pointer types like this, and that's the reason why I used the
    >weird cast to char (*)[N] --- treating qsort as a magic black box that
    >I should receive pointers from as the same type I passed in seemed
    >more correct.


    Since qsort has no idea what type or array you are dealing with, it
    cannot pass that type info on to your compare function. (More
    technically, you passed qsort a void* which removed any type
    information from the argument.) The only thing qsort knows is the
    starting address of the array, the size of each element, and the
    number of elements.

    Your compare function needs to know that each element is a string but
    in this case it does not care what the max size of each string is. It
    is sufficient to know the starting location of the string because that
    is what strcmp needs to know.

    >
    >Suppose I have an array defined as T x[2][2], where T is any type. The
    >expression &x then has type pointer-to-array-of-2-array-of-2-T, and
    >the expression &x would be said to point to the object named x,
    >correct?


    Yes

    >But is this is only because of the type of &x (&x could be
    >considered to point to both x[0] and x[0][0] as well)?


    While the address &x evaluates to is starting location of x which is
    also the starting location of x[0] and also the starting location of
    x[0][0], &x has the wrong type to point to either x[0] or x[0][0].
    (This "nit" is similar to the problem first year physics students have
    in remembering that most values have units as well as magnitude.)

    >Is it always
    >OK then, to write expressions such as:
    >
    >- (T *)&x or (T *)&x[0] to produce a pointer to x[0][0]?


    Probably. But the use of an unnecessarily complex expression with a
    cast is the sign of a coder who couldn't be bothered to get things
    correct in the first place. If you want a pointer to x[0][0], use
    &x[0][0]. On the other hand, if you need the address of the sub-array
    in the context of pointer to element (as with the strings we have been
    discussing) use x.

    Furthermore, consider the situation where x started out as an array of
    floats. Somewhere later in your code you used (float*)&x to point to
    x[0][0]. Then after reading this group for a while you decided it
    would be better to change x to double. You would have to search
    through your code looking for casts to change. If you had coded
    &x[0][0] in the first place, that expression's type would change
    automatically (what I call self adjusting code). If it were used in a
    place where a double* was not appropriate, your compiler would
    generate a diagnostic. And worst of all, if you had missed changing
    the cast, you would be using the address of a double in a place where
    you thought you were using the address of a float which would probably
    lead to undesirable results(tm).

    By the way, the & is superfluous in either of your two expressions.

    >- (T *)((char *)&x + sizeof x[0]) to produce a pointer to x[1][0]?
    >- any similarly contrived expression to point to a subobject of x?
    >
    >These are intuitively correct to me, but I have trouble tracking the
    >reasoning through the standard as to why they should behave properly.


    In any array, sizeof x is a constant and does not depend on i.
    Furthermore, the address of x[i+1] is always sizeof x[n] bytes beyond
    the address of x. You need the cast to char* simply because
    pointer arithmetic includes the implied scaling factor of
    sizeof(object pointed to). For an array x, x+i is guaranteed to point
    to x. For int with size 4, x[1] is obviously 4 bytes beyond x[0]
    so the "pointer expression" x+1 must be equivalent to the "arithmetic
    expression" x + 1*4. Therefore you could replace your expression
    above with (T*)(x+1). But you really shouldn't use either. At this
    point, you should also be able to explain why (T*)(&x+1) is wrong.

    >
    >> ~Yes, I know A and &A[0] have the same type as do A[0] and &A[0][0].
    >>

    >
    >Nitpick: they don't have the same type, but are equivalent when not
    >used as the operand of sizeof nor &.


    It is stronger than equivalent. The standard says that under these
    conditions (there is a third exception also), an expression of array
    of T is **converted** to an expression of type pointer to T that
    points to element 0 of the array. So A and &A[0] have exactly the
    same type.


    Remove del for email
     
    Barry Schwarz, Sep 21, 2007
    #10
  11. Tor Rustad Guest

    user923005 wrote:

    > I can't believe I posted that. Let's call it temporary insanity.


    I do remember an old regular named Dann Corbit.

    Yup, he knew how how to call qsort()!

    --
    Tor <torust [at] online [dot] no>
     
    Tor Rustad, Sep 21, 2007
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. richard

    Re: qsort and structs and ptrs

    richard, Aug 14, 2003, in forum: C Programming
    Replies:
    0
    Views:
    406
    richard
    Aug 14, 2003
  2. richard

    crashing qsort

    richard, Aug 14, 2003, in forum: C Programming
    Replies:
    34
    Views:
    1,379
  3. Debashish Chakravarty

    [possibly OT] the comparison function in qsort

    Debashish Chakravarty, Nov 23, 2003, in forum: C Programming
    Replies:
    0
    Views:
    343
    Debashish Chakravarty
    Nov 23, 2003
  4. Ingo Brueckl
    Replies:
    5
    Views:
    500
    Keith Thompson
    Dec 19, 2003
  5. Charlie Zender

    Need warning-free qsort() comparison functions

    Charlie Zender, Jan 1, 2004, in forum: C Programming
    Replies:
    21
    Views:
    1,293
Loading...

Share This Page