qsort semantics

G

gauss010

Suppose I have an object A of type char[M][N]. Each A is a buffer
containing a string, and I want to sort the M strings of A using the
strcmp function. The description of the qsort function says that I
must pass a pointer to the first object of the array to be sorted.
This leads me to write the following:

char A[M][N];
int cmp(const void *a, const void *b) {
int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
}
void sortA(void) {
qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
}

But I also want to be able to use my comparison function to sort other
arrays of strings whose second array dimension is of a different size.
For example, I may want to use it to do the same sort on an object of
type char[2*M][2*N]. Then I think of rewriting it like this:

char A[M][N], B[2*M][2*N];
int cmp(const void *a, const void *b) {
int v = strcmp(a,b);
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
}
void sortA(void) {
qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
}
void sortB(void) {
qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
}

But now I'm not strictly following the description of the qsort
function. I'm now not *really* passing a pointer to the first object
of the array to be sorted, rather I'm passing a pointer to the first
byte of the first object of the array to be sorted.

Imagining how I would implement the qsort function myself, I know that
this will work fine, but abusing the description of the qsort function
like this makes me uncomfortable.

Do you think it's OK to write such code?
 
A

Army1987

Suppose I have an object A of type char[M][N]. Each A is a buffer
containing a string, and I want to sort the M strings of A using the
strcmp function. The description of the qsort function says that I
must pass a pointer to the first object of the array to be sorted.
This leads me to write the following:

char A[M][N];
int cmp(const void *a, const void *b) {
int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);

strcmp takes pointers to const chars, not to arrays.
If strcmp has a visible prototype here, you could just
int cmp(const void *a, const void *b) { return strcmp(a, b); }

K&R2 uses (int (* )(const void *, const void *))strcmp as an
argument to qsort, but I think the standard committee has since
deemed it UB (whereas const void *s are required to have the same
representations as const char*s, a particularly perverse
implementation could pass them to functions in different ways...)
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
Why couldn't you just return v?
}
void sortA(void) {
qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
}

But I also want to be able to use my comparison function to sort other
arrays of strings whose second array dimension is of a different size.
For example, I may want to use it to do the same sort on an object of
type char[2*M][2*N]. Then I think of rewriting it like this:

char A[M][N], B[2*M][2*N];
int cmp(const void *a, const void *b) {
int v = strcmp(a,b);
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
}
void sortA(void) {
qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
}
void sortB(void) {
qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
}

But now I'm not strictly following the description of the qsort
function. I'm now not *really* passing a pointer to the first object
of the array to be sorted, rather I'm passing a pointer to the first
byte of the first object of the array to be sorted.
You're passing void *, which are essentially pointers to char with
syntactic salt on them. Of course they point to the first byte.
The only way for qsort() to know how large the objects they are
intended to point to are, is through the second and third
argument.
 
U

user923005

Maybe use:
int compare (const void *aa, const void *bb)
{
const char * const *a = aa;
const char * const *b = bb;
return strcmp (a, b);
}

to compare C strings using qsort.
 
A

Army1987

Maybe use:
int compare (const void *aa, const void *bb)
{
const char * const *a = aa;
const char * const *b = bb;
return strcmp (a, b);
}

to compare C strings using qsort.
Why?
strcmp takes two const char *s...
 
B

Barry Schwarz

Maybe use:
int compare (const void *aa, const void *bb)
{
const char * const *a = aa;
const char * const *b = bb;
return strcmp (a, b);
}

to compare C strings using qsort.

a and b are both qualified types of char**. strcmp takes slightly
different qualified types of char*.


Remove del for email
 
P

pete

Suppose I have an object A of type char[M][N]. Each A is a buffer
containing a string, and I want to sort the M strings of A using the
strcmp function. The description of the qsort function says that I
must pass a pointer to the first object of the array to be sorted.
This leads me to write the following:

char A[M][N];
int cmp(const void *a, const void *b) {
int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
}
void sortA(void) {
qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
}

But I also want to be able to use my comparison function to sort other
arrays of strings whose second array dimension is of a different size.
For example, I may want to use it to do the same sort on an object of
type char[2*M][2*N]. Then I think of rewriting it like this:

char A[M][N], B[2*M][2*N];
int cmp(const void *a, const void *b) {
int v = strcmp(a,b);
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
}
void sortA(void) {
qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
}
void sortB(void) {
qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
}

But now I'm not strictly following the description of the qsort
function. I'm now not *really* passing a pointer to the first object
of the array to be sorted, rather I'm passing a pointer to the first
byte of the first object of the array to be sorted.

Imagining how I would implement the qsort function myself, I know that
this will work fine, but abusing the description of the qsort function
like this makes me uncomfortable.

Do you think it's OK to write such code?


/* BEGIN new.c */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NMEMB(A) (sizeof (A) / sizeof *(A))

int compar(const void *a, const void *b);

int main(void)
{
char array_1[][sizeof "three"] = {
"zero","one","two","three","four",
"five","six","seven","eight","nine"
};
char array_2[][2] = {
"9","8","7","6","5","4","3","2","1","0"
};
size_t index;

for (index = 0; index != NMEMB(array_1); ++index) {
puts(array_1[index]);
}
putchar('\n');
for (index = 0; index != NMEMB(array_2); ++index) {
puts(array_2[index]);
}
puts("After sorting with strcmp:");
qsort(array_1, NMEMB(array_1), sizeof *array_1, compar);
qsort(array_2, NMEMB(array_2), sizeof *array_2, compar);
for (index = 0; index != NMEMB(array_1); ++index) {
puts(array_1[index]);
}
putchar('\n');
for (index = 0; index != NMEMB(array_2); ++index) {
puts(array_2[index]);
}
return 0;
}

int compar(const void *a, const void *b)
{
return strcmp(a, b);
}

/* END new.c */
 
U

user923005

a and b are both qualified types of char**. strcmp takes slightly
different qualified types of char*.

I can't believe I posted that. Let's call it temporary insanity.
 
B

Barry Schwarz

Suppose I have an object A of type char[M][N]. Each A is a buffer
containing a string, and I want to sort the M strings of A using the
strcmp function. The description of the qsort function says that I
must pass a pointer to the first object of the array to be sorted.
This leads me to write the following:

char A[M][N];
int cmp(const void *a, const void *b) {
int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);


You cast a to "pointer to an array of N char". You then dereference
this expression which yields an array of N char. This expression does
not meet one of the three exceptions and is therefore converted to the
address of the first char in the array with type pointer to char. This
is what you want to pass to strcmp but you don't really care about the
size of the array. You could achieve the same with less typing with a
simple (char*)a.

But as you note below, even that is not necessary.
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;

This is not necessary. You can just return v since the requirement is
for negative, 0, or positive but not for exactly -1 or +1.
}
void sortA(void) {
qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
}

But I also want to be able to use my comparison function to sort other
arrays of strings whose second array dimension is of a different size.
For example, I may want to use it to do the same sort on an object of
type char[2*M][2*N]. Then I think of rewriting it like this:

char A[M][N], B[2*M][2*N];
int cmp(const void *a, const void *b) {
int v = strcmp(a,b);
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
}
void sortA(void) {
qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
}
void sortB(void) {
qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
}

But now I'm not strictly following the description of the qsort
function. I'm now not *really* passing a pointer to the first object
of the array to be sorted, rather I'm passing a pointer to the first
byte of the first object of the array to be sorted.

Sure you are. Since the qsort prototype will force the first argument
to be converted to void*, the type of the pointer in the calling
statement is irrelevant. While the expressions A, &A, A[0], &A[0],
and &A[0][0] all have different types~, the address they evaluate to
points to the same byte, namely the first byte of A. There is no way
for qsort to know which one you actually used in your calling
statement.

~Yes, I know A and &A[0] have the same type as do A[0] and &A[0][0].

You could even go one step further and remove the need for sortA and
sortB by using the calls to qsort in place of the calls to these
functions (eliminates the need for global variables) or a single call
to qsort with

void* ptr;
size_t count;
size_t size;
ptr = $; /* replace $ with A or B as appropriate */
size = sizeof *$;
count = sizeof $ / size;
qsort(ptr, count, size, cmp); /* will work for both A and B */
Imagining how I would implement the qsort function myself, I know that
this will work fine, but abusing the description of the qsort function
like this makes me uncomfortable.

A void* is a void*, no matter what type the value was converted from.
Do you think it's OK to write such code?

Yes.


Remove del for email
 
G

gauss010

Suppose I have an object A of type char[M][N]. Each A is a buffer
containing a string, and I want to sort the M strings of A using the
strcmp function. The description of the qsort function says that I
must pass a pointer to the first object of the array to be sorted.
This leads me to write the following:

char A[M][N];
int cmp(const void *a, const void *b) {
int v = strcmp(*(char (*)[N])a, *(char (*)[N])b);

You cast a to "pointer to an array of N char". You then dereference
this expression which yields an array of N char. This expression does
not meet one of the three exceptions and is therefore converted to the
address of the first char in the array with type pointer to char. This
is what you want to pass to strcmp but you don't really care about the
size of the array. You could achieve the same with less typing with a
simple (char*)a.

But as you note below, even that is not necessary.
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;

This is not necessary. You can just return v since the requirement is
for negative, 0, or positive but not for exactly -1 or +1.


My mistake. I incorrectly recalled that the comparison function needs
to return either -1, 0, or 1.
}
void sortA(void) {
qsort(&A[0], sizeof A / sizeof *A, sizeof *A, cmp);
}
But I also want to be able to use my comparison function to sort other
arrays of strings whose second array dimension is of a different size.
For example, I may want to use it to do the same sort on an object of
type char[2*M][2*N]. Then I think of rewriting it like this:
char A[M][N], B[2*M][2*N];
int cmp(const void *a, const void *b) {
int v = strcmp(a,b);
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
}
void sortA(void) {
qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
}
void sortB(void) {
qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
}
But now I'm not strictly following the description of the qsort
function. I'm now not *really* passing a pointer to the first object
of the array to be sorted, rather I'm passing a pointer to the first
byte of the first object of the array to be sorted.

Sure you are. Since the qsort prototype will force the first argument
to be converted to void*, the type of the pointer in the calling
statement is irrelevant. While the expressions A, &A, A[0], &A[0],
and &A[0][0] all have different types~, the address they evaluate to
points to the same byte, namely the first byte of A. There is no way
for qsort to know which one you actually used in your calling
statement.

Ah, this gets at the heart of my problem. I'm unsure when switching
between pointer types like this, and that's the reason why I used the
weird cast to char (*)[N] --- treating qsort as a magic black box that
I should receive pointers from as the same type I passed in seemed
more correct.

Suppose I have an array defined as T x[2][2], where T is any type. The
expression &x then has type pointer-to-array-of-2-array-of-2-T, and
the expression &x would be said to point to the object named x,
correct? But is this is only because of the type of &x (&x could be
considered to point to both x[0] and x[0][0] as well)? Is it always
OK then, to write expressions such as:

- (T *)&x or (T *)&x[0] to produce a pointer to x[0][0]?
- (T *)((char *)&x + sizeof x[0]) to produce a pointer to x[1][0]?
- any similarly contrived expression to point to a subobject of x?

These are intuitively correct to me, but I have trouble tracking the
reasoning through the standard as to why they should behave properly.
~Yes, I know A and &A[0] have the same type as do A[0] and &A[0][0].

Nitpick: they don't have the same type, but are equivalent when not
used as the operand of sizeof nor &.

[snip]
 
B

Barry Schwarz

On Thu, 20 Sep 2007 21:39:09 -0700, (e-mail address removed) wrote:


snip
But I also want to be able to use my comparison function to sort other
arrays of strings whose second array dimension is of a different size.
For example, I may want to use it to do the same sort on an object of
type char[2*M][2*N]. Then I think of rewriting it like this:
char A[M][N], B[2*M][2*N];
int cmp(const void *a, const void *b) {
int v = strcmp(a,b);
if (v < 0) return -1;
else if (v == 0) return 0;
else return 1;
}
void sortA(void) {
qsort(&A[0][0], sizeof A / sizeof *A, sizeof *A, cmp);
}
void sortB(void) {
qsort(&B[0][0], sizeof B / sizeof *B, sizeof *B, cmp);
}
But now I'm not strictly following the description of the qsort
function. I'm now not *really* passing a pointer to the first object
of the array to be sorted, rather I'm passing a pointer to the first
byte of the first object of the array to be sorted.

Sure you are. Since the qsort prototype will force the first argument
to be converted to void*, the type of the pointer in the calling
statement is irrelevant. While the expressions A, &A, A[0], &A[0],
and &A[0][0] all have different types~, the address they evaluate to
points to the same byte, namely the first byte of A. There is no way
for qsort to know which one you actually used in your calling
statement.

Ah, this gets at the heart of my problem. I'm unsure when switching
between pointer types like this, and that's the reason why I used the
weird cast to char (*)[N] --- treating qsort as a magic black box that
I should receive pointers from as the same type I passed in seemed
more correct.

Since qsort has no idea what type or array you are dealing with, it
cannot pass that type info on to your compare function. (More
technically, you passed qsort a void* which removed any type
information from the argument.) The only thing qsort knows is the
starting address of the array, the size of each element, and the
number of elements.

Your compare function needs to know that each element is a string but
in this case it does not care what the max size of each string is. It
is sufficient to know the starting location of the string because that
is what strcmp needs to know.
Suppose I have an array defined as T x[2][2], where T is any type. The
expression &x then has type pointer-to-array-of-2-array-of-2-T, and
the expression &x would be said to point to the object named x,
correct?
Yes

But is this is only because of the type of &x (&x could be
considered to point to both x[0] and x[0][0] as well)?

While the address &x evaluates to is starting location of x which is
also the starting location of x[0] and also the starting location of
x[0][0], &x has the wrong type to point to either x[0] or x[0][0].
(This "nit" is similar to the problem first year physics students have
in remembering that most values have units as well as magnitude.)
Is it always
OK then, to write expressions such as:

- (T *)&x or (T *)&x[0] to produce a pointer to x[0][0]?

Probably. But the use of an unnecessarily complex expression with a
cast is the sign of a coder who couldn't be bothered to get things
correct in the first place. If you want a pointer to x[0][0], use
&x[0][0]. On the other hand, if you need the address of the sub-array
in the context of pointer to element (as with the strings we have been
discussing) use x.

Furthermore, consider the situation where x started out as an array of
floats. Somewhere later in your code you used (float*)&x to point to
x[0][0]. Then after reading this group for a while you decided it
would be better to change x to double. You would have to search
through your code looking for casts to change. If you had coded
&x[0][0] in the first place, that expression's type would change
automatically (what I call self adjusting code). If it were used in a
place where a double* was not appropriate, your compiler would
generate a diagnostic. And worst of all, if you had missed changing
the cast, you would be using the address of a double in a place where
you thought you were using the address of a float which would probably
lead to undesirable results(tm).

By the way, the & is superfluous in either of your two expressions.
- (T *)((char *)&x + sizeof x[0]) to produce a pointer to x[1][0]?
- any similarly contrived expression to point to a subobject of x?

These are intuitively correct to me, but I have trouble tracking the
reasoning through the standard as to why they should behave properly.

In any array, sizeof x is a constant and does not depend on i.
Furthermore, the address of x[i+1] is always sizeof x[n] bytes beyond
the address of x. You need the cast to char* simply because
pointer arithmetic includes the implied scaling factor of
sizeof(object pointed to). For an array x, x+i is guaranteed to point
to x. For int with size 4, x[1] is obviously 4 bytes beyond x[0]
so the "pointer expression" x+1 must be equivalent to the "arithmetic
expression" x + 1*4. Therefore you could replace your expression
above with (T*)(x+1). But you really shouldn't use either. At this
point, you should also be able to explain why (T*)(&x+1) is wrong.
~Yes, I know A and &A[0] have the same type as do A[0] and &A[0][0].

Nitpick: they don't have the same type, but are equivalent when not
used as the operand of sizeof nor &.

It is stronger than equivalent. The standard says that under these
conditions (there is a third exception also), an expression of array
of T is **converted** to an expression of type pointer to T that
points to element 0 of the array. So A and &A[0] have exactly the
same type.


Remove del for email
 
T

Tor Rustad

user923005 said:
I can't believe I posted that. Let's call it temporary insanity.

I do remember an old regular named Dann Corbit.

Yup, he knew how how to call qsort()!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top