qsort semantics

Discussion in 'C Programming' started by gauss010@gmail.com, Sep 20, 2007.

1. Guest

Suppose I have an object A of type char[M][N]. Each A is a buffer
containing a string, and I want to sort the M strings of A using the
strcmp function. The description of the qsort function says that I
must pass a pointer to the first object of the array to be sorted.
This leads me to write the following:

char A[M][N];
int cmp(const void *a, const void *b) {
int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
}
void sortA(void) {
qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
}

But I also want to be able to use my comparison function to sort other
arrays of strings whose second array dimension is of a different size.
For example, I may want to use it to do the same sort on an object of
type char[2*M][2*N]. Then I think of rewriting it like this:

char A[M][N], B[2*M][2*N];
int cmp(const void *a, const void *b) {
int v = strcmp(a,b);
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
}
void sortA(void) {
qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
}
void sortB(void) {
qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
}

But now I'm not strictly following the description of the qsort
function. I'm now not *really* passing a pointer to the first object
of the array to be sorted, rather I'm passing a pointer to the first
byte of the first object of the array to be sorted.

Imagining how I would implement the qsort function myself, I know that
this will work fine, but abusing the description of the qsort function
like this makes me uncomfortable.

Do you think it's OK to write such code?

, Sep 20, 2007

2. Army1987Guest

On Thu, 20 Sep 2007 00:09:12 -0700, gauss010 wrote:

> Suppose I have an object A of type char[M][N]. Each A is a buffer
> containing a string, and I want to sort the M strings of A using the
> strcmp function. The description of the qsort function says that I
> must pass a pointer to the first object of the array to be sorted.
> This leads me to write the following:
>
> char A[M][N];
> int cmp(const void *a, const void *b) {
> int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);

strcmp takes pointers to const chars, not to arrays.
If strcmp has a visible prototype here, you could just
int cmp(const void *a, const void *b) { return strcmp(a, b); }

K&R2 uses (int (* )(const void *, const void *))strcmp as an
argument to qsort, but I think the standard committee has since
deemed it UB (whereas const void *s are required to have the same
representations as const char*s, a particularly perverse
implementation could pass them to functions in different ways...)

> if (v < 0) return -1;
> else if (v == 0) return 0;
> else return 1;

Why couldn't you just return v?
> }
> void sortA(void) {
> qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
> }
>
> But I also want to be able to use my comparison function to sort other
> arrays of strings whose second array dimension is of a different size.
> For example, I may want to use it to do the same sort on an object of
> type char[2*M][2*N]. Then I think of rewriting it like this:
>
> char A[M][N], B[2*M][2*N];
> int cmp(const void *a, const void *b) {
> int v = strcmp(a,b);
> if (v < 0) return -1;
> else if (v == 0) return 0;
> else return 1;
> }
> void sortA(void) {
> qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
> }
> void sortB(void) {
> qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
> }
>
> But now I'm not strictly following the description of the qsort
> function. I'm now not *really* passing a pointer to the first object
> of the array to be sorted, rather I'm passing a pointer to the first
> byte of the first object of the array to be sorted.

You're passing void *, which are essentially pointers to char with
syntactic salt on them. Of course they point to the first byte.
The only way for qsort() to know how large the objects they are
intended to point to are, is through the second and third
argument.
--
Army1987 (Replace "NOSPAM" with "email")
If you're sending e-mail from a Windows machine, turn off Microsoft's
stupid â€œSmart Quotesâ€ feature. This is so you'll avoid sprinkling garbage
characters through your mail. -- Eric S. Raymond and Rick Moen

Army1987, Sep 20, 2007

3. user923005Guest

Maybe use:
int compare (const void *aa, const void *bb)
{
const char * const *a = aa;
const char * const *b = bb;
return strcmp (a, b);
}

to compare C strings using qsort.

user923005, Sep 20, 2007
4. Army1987Guest

On Thu, 20 Sep 2007 12:02:28 -0700, user923005 wrote:

> Maybe use:
> int compare (const void *aa, const void *bb)
> {
> const char * const *a = aa;
> const char * const *b = bb;
> return strcmp (a, b);
> }
>
> to compare C strings using qsort.

Why?
strcmp takes two const char *s...

--
Army1987 (Replace "NOSPAM" with "email")
If you're sending e-mail from a Windows machine, turn off Microsoft's
stupid â€œSmart Quotesâ€ feature. This is so you'll avoid sprinkling garbage
characters through your mail. -- Eric S. Raymond and Rick Moen

Army1987, Sep 21, 2007
5. Barry SchwarzGuest

On Thu, 20 Sep 2007 12:02:28 -0700, user923005 <>
wrote:

>Maybe use:
>int compare (const void *aa, const void *bb)
>{
> const char * const *a = aa;
> const char * const *b = bb;
> return strcmp (a, b);
>}
>
>to compare C strings using qsort.

a and b are both qualified types of char**. strcmp takes slightly
different qualified types of char*.

Remove del for email

Barry Schwarz, Sep 21, 2007
6. peteGuest

wrote:
>
> Suppose I have an object A of type char[M][N]. Each A is a buffer
> containing a string, and I want to sort the M strings of A using the
> strcmp function. The description of the qsort function says that I
> must pass a pointer to the first object of the array to be sorted.
> This leads me to write the following:
>
> char A[M][N];
> int cmp(const void *a, const void *b) {
> int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);
> if (v < 0) return -1;
> else if (v == 0) return 0;
> else return 1;
> }
> void sortA(void) {
> qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
> }
>
> But I also want to be able to use my comparison function to sort other
> arrays of strings whose second array dimension is of a different size.
> For example, I may want to use it to do the same sort on an object of
> type char[2*M][2*N]. Then I think of rewriting it like this:
>
> char A[M][N], B[2*M][2*N];
> int cmp(const void *a, const void *b) {
> int v = strcmp(a,b);
> if (v < 0) return -1;
> else if (v == 0) return 0;
> else return 1;
> }
> void sortA(void) {
> qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
> }
> void sortB(void) {
> qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
> }
>
> But now I'm not strictly following the description of the qsort
> function. I'm now not *really* passing a pointer to the first object
> of the array to be sorted, rather I'm passing a pointer to the first
> byte of the first object of the array to be sorted.
>
> Imagining how I would implement the qsort function myself, I know that
> this will work fine, but abusing the description of the qsort function
> like this makes me uncomfortable.
>
> Do you think it's OK to write such code?

/* BEGIN new.c */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NMEMB(A) (sizeof (A) / sizeof *(A))

int compar(const void *a, const void *b);

int main(void)
{
char array_1[][sizeof "three"] = {
"zero","one","two","three","four",
"five","six","seven","eight","nine"
};
char array_2[][2] = {
"9","8","7","6","5","4","3","2","1","0"
};
size_t index;

for (index = 0; index != NMEMB(array_1); ++index) {
puts(array_1[index]);
}
putchar('\n');
for (index = 0; index != NMEMB(array_2); ++index) {
puts(array_2[index]);
}
puts("After sorting with strcmp:");
qsort(array_1, NMEMB(array_1), sizeof *array_1, compar);
qsort(array_2, NMEMB(array_2), sizeof *array_2, compar);
for (index = 0; index != NMEMB(array_1); ++index) {
puts(array_1[index]);
}
putchar('\n');
for (index = 0; index != NMEMB(array_2); ++index) {
puts(array_2[index]);
}
return 0;
}

int compar(const void *a, const void *b)
{
return strcmp(a, b);
}

/* END new.c */

--
pete

pete, Sep 21, 2007
7. user923005Guest

On Sep 20, 5:50 pm, Barry Schwarz <> wrote:
> On Thu, 20 Sep 2007 12:02:28 -0700, user923005 <>
> wrote:
>
> >Maybe use:
> >int compare (const void *aa, const void *bb)
> >{
> > const char * const *a = aa;
> > const char * const *b = bb;
> > return strcmp (a, b);
> >}

>
> >to compare C strings using qsort.

>
> a and b are both qualified types of char**. strcmp takes slightly
> different qualified types of char*.

I can't believe I posted that. Let's call it temporary insanity.

user923005, Sep 21, 2007
8. Barry SchwarzGuest

On Thu, 20 Sep 2007 00:09:12 -0700, wrote:

>Suppose I have an object A of type char[M][N]. Each A is a buffer
>containing a string, and I want to sort the M strings of A using the
>strcmp function. The description of the qsort function says that I
>must pass a pointer to the first object of the array to be sorted.
>This leads me to write the following:
>
> char A[M][N];
> int cmp(const void *a, const void *b) {
> int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);

You cast a to "pointer to an array of N char". You then dereference
this expression which yields an array of N char. This expression does
not meet one of the three exceptions and is therefore converted to the
address of the first char in the array with type pointer to char. This
is what you want to pass to strcmp but you don't really care about the
size of the array. You could achieve the same with less typing with a
simple (char*)a.

But as you note below, even that is not necessary.

> if (v < 0) return -1;
> else if (v == 0) return 0;
> else return 1;

This is not necessary. You can just return v since the requirement is
for negative, 0, or positive but not for exactly -1 or +1.

> }
> void sortA(void) {
> qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
> }
>
>But I also want to be able to use my comparison function to sort other
>arrays of strings whose second array dimension is of a different size.
>For example, I may want to use it to do the same sort on an object of
>type char[2*M][2*N]. Then I think of rewriting it like this:
>
> char A[M][N], B[2*M][2*N];
> int cmp(const void *a, const void *b) {
> int v = strcmp(a,b);
> if (v < 0) return -1;
> else if (v == 0) return 0;
> else return 1;
> }
> void sortA(void) {
> qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
> }
> void sortB(void) {
> qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
> }
>
>But now I'm not strictly following the description of the qsort
>function. I'm now not *really* passing a pointer to the first object
>of the array to be sorted, rather I'm passing a pointer to the first
>byte of the first object of the array to be sorted.

Sure you are. Since the qsort prototype will force the first argument
to be converted to void*, the type of the pointer in the calling
statement is irrelevant. While the expressions A, &A, A[0], &A[0],
and &A[0][0] all have different types~, the address they evaluate to
points to the same byte, namely the first byte of A. There is no way
for qsort to know which one you actually used in your calling
statement.

~Yes, I know A and &A[0] have the same type as do A[0] and &A[0][0].

You could even go one step further and remove the need for sortA and
sortB by using the calls to qsort in place of the calls to these
functions (eliminates the need for global variables) or a single call
to qsort with

void* ptr;
size_t count;
size_t size;
ptr = \$; /* replace \$ with A or B as appropriate */
size = sizeof *\$;
count = sizeof \$ / size;
qsort(ptr, count, size, cmp); /* will work for both A and B */

>
>Imagining how I would implement the qsort function myself, I know that
>this will work fine, but abusing the description of the qsort function
>like this makes me uncomfortable.

A void* is a void*, no matter what type the value was converted from.

>
>Do you think it's OK to write such code?

Yes.

Remove del for email

Barry Schwarz, Sep 21, 2007
9. Guest

On Sep 20, 10:27 pm, Barry Schwarz <> wrote:
> On Thu, 20 Sep 2007 00:09:12 -0700, wrote:
> >Suppose I have an object A of type char[M][N]. Each A is a buffer
> >containing a string, and I want to sort the M strings of A using the
> >strcmp function. The description of the qsort function says that I
> >must pass a pointer to the first object of the array to be sorted.
> >This leads me to write the following:

>
> > char A[M][N];
> > int cmp(const void *a, const void *b) {
> > int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);

>
> You cast a to "pointer to an array of N char". You then dereference
> this expression which yields an array of N char. This expression does
> not meet one of the three exceptions and is therefore converted to the
> address of the first char in the array with type pointer to char. This
> is what you want to pass to strcmp but you don't really care about the
> size of the array. You could achieve the same with less typing with a
> simple (char*)a.
>
> But as you note below, even that is not necessary.
>
> > if (v < 0) return -1;
> > else if (v == 0) return 0;
> > else return 1;

>
> This is not necessary. You can just return v since the requirement is
> for negative, 0, or positive but not for exactly -1 or +1.
>

My mistake. I incorrectly recalled that the comparison function needs
to return either -1, 0, or 1.

> > }
> > void sortA(void) {
> > qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
> > }

>
> >But I also want to be able to use my comparison function to sort other
> >arrays of strings whose second array dimension is of a different size.
> >For example, I may want to use it to do the same sort on an object of
> >type char[2*M][2*N]. Then I think of rewriting it like this:

>
> > char A[M][N], B[2*M][2*N];
> > int cmp(const void *a, const void *b) {
> > int v = strcmp(a,b);
> > if (v < 0) return -1;
> > else if (v == 0) return 0;
> > else return 1;
> > }
> > void sortA(void) {
> > qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
> > }
> > void sortB(void) {
> > qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
> > }

>
> >But now I'm not strictly following the description of the qsort
> >function. I'm now not *really* passing a pointer to the first object
> >of the array to be sorted, rather I'm passing a pointer to the first
> >byte of the first object of the array to be sorted.

>
> Sure you are. Since the qsort prototype will force the first argument
> to be converted to void*, the type of the pointer in the calling
> statement is irrelevant. While the expressions A, &A, A[0], &A[0],
> and &A[0][0] all have different types~, the address they evaluate to
> points to the same byte, namely the first byte of A. There is no way
> for qsort to know which one you actually used in your calling
> statement.
>

Ah, this gets at the heart of my problem. I'm unsure when switching
between pointer types like this, and that's the reason why I used the
weird cast to char (*)[N] --- treating qsort as a magic black box that
I should receive pointers from as the same type I passed in seemed
more correct.

Suppose I have an array defined as T x[2][2], where T is any type. The
expression &x then has type pointer-to-array-of-2-array-of-2-T, and
the expression &x would be said to point to the object named x,
correct? But is this is only because of the type of &x (&x could be
considered to point to both x[0] and x[0][0] as well)? Is it always
OK then, to write expressions such as:

- (T *)&x or (T *)&x[0] to produce a pointer to x[0][0]?
- (T *)((char *)&x + sizeof x[0]) to produce a pointer to x[1][0]?
- any similarly contrived expression to point to a subobject of x?

These are intuitively correct to me, but I have trouble tracking the
reasoning through the standard as to why they should behave properly.

> ~Yes, I know A and &A[0] have the same type as do A[0] and &A[0][0].
>

Nitpick: they don't have the same type, but are equivalent when not
used as the operand of sizeof nor &.

[snip]

, Sep 21, 2007
10. Barry SchwarzGuest

On Thu, 20 Sep 2007 21:39:09 -0700, wrote:

snip

>> >But I also want to be able to use my comparison function to sort other
>> >arrays of strings whose second array dimension is of a different size.
>> >For example, I may want to use it to do the same sort on an object of
>> >type char[2*M][2*N]. Then I think of rewriting it like this:

>>
>> > char A[M][N], B[2*M][2*N];
>> > int cmp(const void *a, const void *b) {
>> > int v = strcmp(a,b);
>> > if (v < 0) return -1;
>> > else if (v == 0) return 0;
>> > else return 1;
>> > }
>> > void sortA(void) {
>> > qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
>> > }
>> > void sortB(void) {
>> > qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
>> > }

>>
>> >But now I'm not strictly following the description of the qsort
>> >function. I'm now not *really* passing a pointer to the first object
>> >of the array to be sorted, rather I'm passing a pointer to the first
>> >byte of the first object of the array to be sorted.

>>
>> Sure you are. Since the qsort prototype will force the first argument
>> to be converted to void*, the type of the pointer in the calling
>> statement is irrelevant. While the expressions A, &A, A[0], &A[0],
>> and &A[0][0] all have different types~, the address they evaluate to
>> points to the same byte, namely the first byte of A. There is no way
>> for qsort to know which one you actually used in your calling
>> statement.
>>

>
>Ah, this gets at the heart of my problem. I'm unsure when switching
>between pointer types like this, and that's the reason why I used the
>weird cast to char (*)[N] --- treating qsort as a magic black box that
>I should receive pointers from as the same type I passed in seemed
>more correct.

Since qsort has no idea what type or array you are dealing with, it
cannot pass that type info on to your compare function. (More
technically, you passed qsort a void* which removed any type
information from the argument.) The only thing qsort knows is the
starting address of the array, the size of each element, and the
number of elements.

Your compare function needs to know that each element is a string but
in this case it does not care what the max size of each string is. It
is sufficient to know the starting location of the string because that
is what strcmp needs to know.

>
>Suppose I have an array defined as T x[2][2], where T is any type. The
>expression &x then has type pointer-to-array-of-2-array-of-2-T, and
>the expression &x would be said to point to the object named x,
>correct?

Yes

>But is this is only because of the type of &x (&x could be
>considered to point to both x[0] and x[0][0] as well)?

While the address &x evaluates to is starting location of x which is
also the starting location of x[0] and also the starting location of
x[0][0], &x has the wrong type to point to either x[0] or x[0][0].
(This "nit" is similar to the problem first year physics students have
in remembering that most values have units as well as magnitude.)

>Is it always
>OK then, to write expressions such as:
>
>- (T *)&x or (T *)&x[0] to produce a pointer to x[0][0]?

Probably. But the use of an unnecessarily complex expression with a
cast is the sign of a coder who couldn't be bothered to get things
correct in the first place. If you want a pointer to x[0][0], use
&x[0][0]. On the other hand, if you need the address of the sub-array
in the context of pointer to element (as with the strings we have been
discussing) use x.

Furthermore, consider the situation where x started out as an array of
floats. Somewhere later in your code you used (float*)&x to point to
x[0][0]. Then after reading this group for a while you decided it
would be better to change x to double. You would have to search
through your code looking for casts to change. If you had coded
&x[0][0] in the first place, that expression's type would change
automatically (what I call self adjusting code). If it were used in a
place where a double* was not appropriate, your compiler would
generate a diagnostic. And worst of all, if you had missed changing
the cast, you would be using the address of a double in a place where
you thought you were using the address of a float which would probably
lead to undesirable results(tm).

By the way, the & is superfluous in either of your two expressions.

>- (T *)((char *)&x + sizeof x[0]) to produce a pointer to x[1][0]?
>- any similarly contrived expression to point to a subobject of x?
>
>These are intuitively correct to me, but I have trouble tracking the
>reasoning through the standard as to why they should behave properly.

In any array, sizeof x is a constant and does not depend on i.
Furthermore, the address of x[i+1] is always sizeof x[n] bytes beyond
the address of x. You need the cast to char* simply because
pointer arithmetic includes the implied scaling factor of
sizeof(object pointed to). For an array x, x+i is guaranteed to point
to x. For int with size 4, x[1] is obviously 4 bytes beyond x[0]
so the "pointer expression" x+1 must be equivalent to the "arithmetic
expression" x + 1*4. Therefore you could replace your expression
above with (T*)(x+1). But you really shouldn't use either. At this
point, you should also be able to explain why (T*)(&x+1) is wrong.

>
>> ~Yes, I know A and &A[0] have the same type as do A[0] and &A[0][0].
>>

>
>Nitpick: they don't have the same type, but are equivalent when not
>used as the operand of sizeof nor &.

It is stronger than equivalent. The standard says that under these
conditions (there is a third exception also), an expression of array
of T is **converted** to an expression of type pointer to T that
points to element 0 of the array. So A and &A[0] have exactly the
same type.

Remove del for email

Barry Schwarz, Sep 21, 2007

user923005 wrote:

> I can't believe I posted that. Let's call it temporary insanity.

I do remember an old regular named Dann Corbit.

Yup, he knew how how to call qsort()!

--
Tor <torust [at] online [dot] no>

Tor Rustad, Sep 21, 2007