"directory order" - K and R 2 exercise 5-16?

G

G Fernandes

Can someone explain what is meant by "directory order" in the questoin
for K and R 2 exercise 5-16?

I can't seem to find a solution for this exercise on the main site
where clc goers have posted solutions, so I'm guessing this phrase
might be ambiguous.

In any case, I'm wondering if anyone knows what might be a suitable
definition for this phrase. Thank you.
 
T

Tor Rustad

G Fernandes said:
Can someone explain what is meant by "directory order" in the
questoin for K and R 2 exercise 5-16?

What it say is, ignore other characters than letters, numbers
and blanks, when sorting.

Just like the UNIX sort, see "man sort -d"
 
K

Keith Thompson

Tor Rustad said:
What it say is, ignore other characters than letters, numbers
and blanks, when sorting.

Just like the UNIX sort, see "man sort -d"

My copy of K&R2 is several thousand miles away at the moment, but it
sounds like "dictionary order" rather than "directory order".
 
G

G Fernandes

Tor said:
What it say is, ignore other characters than letters, numbers
and blanks, when sorting.

Just like the UNIX sort, see "man sort -d"

Yes. I understand how this could work if all the input lines were the
same format, like
abcd@$abcd
efgh#&lkjs
ueid!-slkj

but whatif you have two input lines where one has an alphanumeric or
blank where in the same position the other line as a non-alphanmeric
nor blank?

For example

ab@aaa
abd#gh
a*shdj

How would someone sort that?
 
J

Joe Wright

G said:
Yes. I understand how this could work if all the input lines were the
same format, like
abcd@$abcd
efgh#&lkjs
ueid!-slkj

but whatif you have two input lines where one has an alphanumeric or
blank where in the same position the other line as a non-alphanmeric
nor blank?

For example

ab@aaa
abd#gh
a*shdj

How would someone sort that?

Assuming they are ASCII strings, I would use strcmp() to order them. All
of '@', '#' and '*' have values less than alphanumerics. I suppose they
would sort..

a*shdj
ab@aaa
abd#gh
 
L

Luke Wu

G said:
Yes. I understand how this could work if all the input lines were the
same format, like
abcd@$abcd
efgh#&lkjs
ueid!-slkj

but whatif you have two input lines where one has an alphanumeric or
blank where in the same position the other line as a non-alphanmeric
nor blank?

For example

ab@aaa
abd#gh
a*shdj

How would someone sort that?


Wrap strcmp with something that tests for a flag and acts differently
if d-order is required. Something like this:

#include <string.h>
#include <ctype.h>

int d_strcmp(char *s1, char *s2)
{
if (dorder) {
int i = 0;
while (1) {
if (s1 != s2 &&
( isalpha(s1) || isspace(s1) || !s1 ) &&
( isalpha(s2) || isspace(s2) || !s2 )
)
return s1 - s2;
else if (s1 == '\0' || s2 == '\0')
return 0;
i++;
}
}
else return strcmp(s1, s2);
}


dorder can be an external variable (as would be the case in the
function I've shown above) or it can be passed in as an argument

Some people suggest casting arguments of ctype function to unsigned
char, but I don't think you need to worry about that unless your
implementation has weird differences (padding bits) between signed and
unsigned char [these implementations break the standard, AFAIK]
 
B

Barry Schwarz

Yes. I understand how this could work if all the input lines were the
same format, like
abcd@$abcd
efgh#&lkjs
ueid!-slkj

but whatif you have two input lines where one has an alphanumeric or
blank where in the same position the other line as a non-alphanmeric
nor blank?

For example

ab@aaa
abd#gh
a*shdj

How would someone sort that?

Unless you are trying to be extra fancy (as in a phone book where you
want O'Connel to come between Occam and Odum), ignore the differences.
If the character appears in the execution set, then by definition it
fits in a char. A char is an integer type. Integer types can be
compared using if or, for arrays of char, strcmp and memcmp. The
results of all three are well defined, even if implementation
dependent. (For example, on an ASCII system, 'A' < 'a'. The opposite
is true on an EBCDIC system.)


<<Remove the del for email>>
 
E

Eric Sosman

Luke said:
[...]
Some people suggest casting arguments of ctype function to unsigned
char, but I don't think you need to worry about that unless your
implementation has weird differences (padding bits) between signed and
unsigned char [these implementations break the standard, AFAIK]

The reason for the cast has nothing to do with padding
bits, unusual CHAR_BIT values, exotic representations, or
broken implementations. It's because `char' can be a signed
type, and thus can have negative values. Pass a negative
value to a <ctype.h> function and you get undefined behavior
(unless the value just happens to equal EOF, in which case
you get the small consolation of an answer that's well-defined
but quite possibly wrong).

If you like U.B. and/or wrong answers, omit the cast.
Otherwise, ...
 
M

Mark McIntyre

Can someone explain what is meant by "directory order" in the questoin
for K and R 2 exercise 5-16?

the order it appears in a phone directory, probably. Hence Mc appears in amongst
Ma and before Mb....
 
L

Luke Wu

Luke said:
Wrap strcmp with something that tests for a flag and acts differently
if d-order is required. Something like this:

#include <string.h>
#include <ctype.h>

int d_strcmp(char *s1, char *s2)
{
if (dorder) {
int i = 0;
while (1) {
if (s1 != s2 &&
( isalpha(s1) || isspace(s1) || !s1 ) &&
( isalpha(s2) || isspace(s2) || !s2 )

^^
those should be isalnum (instead of isalpha)
)
return s1 - s2;
else if (s1 == '\0' || s2 == '\0')
return 0;
i++;
}
}
else return strcmp(s1, s2);
}


dorder can be an external variable (as would be the case in the
function I've shown above) or it can be passed in as an argument

Some people suggest casting arguments of ctype function to unsigned
char, but I don't think you need to worry about that unless your
implementation has weird differences (padding bits) between signed and
unsigned char [these implementations break the standard, AFAIK]
 
C

CBFalconer

Luke said:
.... snip ...

Some people suggest casting arguments of ctype function to unsigned
char, but I don't think you need to worry about that unless your
implementation has weird differences (padding bits) between signed
and unsigned char [these implementations break the standard, AFAIK]

Nothing weird needed, just that the native version of char is
signed. Passing any negative value (other than EOF) to the ctype
functions results in undefined behaviour.
 
A

Arthur J. O'Dwyer

Exactly the way you put above: ABAAA before ABDGH before ASHDJ, and
ignore the funny characters in the middles of words. (This also would
sort O'Connel between Occam and Odoul, as mentioned by another poster.)

#include <string.h>
#include <ctype.h>

int d_strcmp(char *s1, char *s2)
{
if (dorder) {
int i = 0;
while (1) {
if (s1 != s2 &&
( isalpha(s1) || isspace(s1) || !s1 ) &&
( isalpha(s2) || isspace(s2) || !s2 )
)
return s1 - s2;
else if (s1 == '\0' || s2 == '\0')
return 0;
i++;
}
}
else return strcmp(s1, s2);
}


This looks really weird; it certainly doesn't seem to do what I inferred
the OP wanted to do, and I'm not sure it does anything reasonable. It
would produce d_strcmp("a", "%")==0, d_strcmp("O'Con","Occam") < 0, and
so on. I think the OP (and K&R) would be happier with

int dict_strcmp(const char *s, const char *t)
{
int i, j, si, tj;
for (i=j=0; s && t[j]; ++i, ++j) {
while (s && !isalpha(s)) ++i;
while (t[j] && !isalpha(t[j])) ++j;
if (toupper(s) != toupper(t[j])) break;
}
si = toupper(s);
tj = toupper(t[j]);
return si < tj? -1: si > tj;
}

It's a little messier due to the extra 'toupper's and my insistence on
returning -1, 0, or +1 instead of just negative, 0, or positive. An
exercise for the interested reader: Extend this function to deal more
reasonably with strings containing no alphabetic characters at all;
e.g. to sort "6" before "777". How difficult is it to sort numeric
strings by their decimal values (e.g., "100" after "99")? How difficult
is it to sort "A1 Steak Sauce" as equal to "A-One Steak Sauce," between
"AOL" and "Aorta"? (Interface design problem: In each case, where would
we sort the string "A4 Paper"? Which result is more reasonable? Why?)
Some people suggest casting arguments of ctype function to unsigned
char, but I don't think you need to worry about that unless your
implementation has weird differences (padding bits) between signed and
unsigned char [these implementations break the standard, AFAIK]

No, padding bits aren't it. You need to worry only if you're planning
to process data containing negative 'char' values. Since both your and
my implementations basically assumed ASCII, I don't think it's worth the
extra opacity in this case. But certainly a line like

k = toupper(getchar());

would be way out of line, as I understand it; we have no guarantee that
the user won't enter negative character values. Whereas we can make
the "no negative values" requirement a precondition of the 'd_strcmp'
function, and put the burden on the client programmer, if we want.

-Arthur
 
C

CBFalconer

Arthur J. O'Dwyer said:
.... snip ...
extra opacity in this case. But certainly a line like

k = toupper(getchar());

would be way out of line, as I understand it; we have no guarantee
that the user won't enter negative character values. Whereas we

Yes we do. getchar returns, in an int, the unsigned value of an
input char. The only negative value it ever returns is EOF.
 
A

Arthur J. O'Dwyer

Yes we do. getchar returns, in an int, the unsigned value of an
input char. The only negative value it ever returns is EOF.

Whoops. You're right. Make that

scanf("%c", &k);
k = toupper(k);

then. I think there is no guarantee that 'scanf' will yield only
positive values for 'char'.

-Arthur
 
K

Kenneth Bull

Arthur said:
Whoops. You're right. Make that

scanf("%c", &k);
k = toupper(k);

then. I think there is no guarantee that 'scanf' will yield only
positive values for 'char'.

Now you're writing about two different things (further evidenced by the
fact that they appear in two separate statements in your code), and
trying needlessly to relate the two to make a point.

The point you are trying to make has very little to do with scanf, and
a lot of do with the type of the variable 'k' (which you have not shown
a declaration for). If 'k' is of type char, and the implementation
makes 'char' equivalent to signed char, then yes, there is no guarantee
that the value you're pushing into toupper will only yield a positive
value for all valid character. This has 'very little' to do with scanf
(or getchar as you previously claimed).

So if anything, your code -somewhat- reverts back to the exact same
point Eric Sosman was making, without adding any special caveat for
scanf whatsoever.
 
P

Peter Nilsson

Arthur said:
Whoops. You're right. Make that

scanf("%c", &k);
k = toupper(k);

then. I think there is no guarantee that 'scanf' will yield only
positive values for 'char'.

scanf will implicitly use fgets to read a (byte) character. If k
is a signed or plain char, then you are interpreting that read byte
through an lvalue of that type. You are better off interpreting
the byte through an unsigned char lvalue.
 
A

Arthur J. O'Dwyer

Now you're writing about two different things (further evidenced by the
fact that they appear in two separate statements in your code), and
trying needlessly to relate the two to make a point.

The point you are trying to make has very little to do with scanf, and
a lot of do with the type of the variable 'k' (which you have not shown
a declaration for).

Nope. I surmise that you have not understood the point I'm trying
to make. My point is that you need to verify user input (such as input
that comes from 'scanf'[1]), as opposed to the kind of input a library
function might get from a client program (such as C-style strings being
passed to a sorting function, the original context of my remark).
So if anything, your code -somewhat- reverts back to the exact same
point Eric Sosman was making, without adding any special caveat for
scanf whatsoever.

Huh? Eric basically said, "Not casting results in UB." I disagree;
the cast is /only/ necessary when you're dealing with potentially
unsafe input, and the only way to get unsafe input is from the user,
via 'getchar', 'scanf', or any other <stdio.h> input function.

There's nothing special about 'scanf' that makes it dangerous in
this respect; but, as CBFalconer pointed out, there is something special
about 'getchar' that makes it innocuous in this respect. That's why I
corrected my "dangerous" code --- it hadn't been as dangerous as I had
thought.

-Arthur

[1] - but not, technically speaking, 'getchar', which was what CBFalconer
pointed out, and which was why I corrected my example to use the 'scanf'
input function instead, which AFAIK provides no guarantee of its results'
<ctype.h>-friendliness.
 
E

Eric Sosman

Arthur said:
Huh? Eric basically said, "Not casting results in UB." I disagree;
the cast is /only/ necessary when you're dealing with potentially
unsafe input, and the only way to get unsafe input is from the user,
via 'getchar', 'scanf', or any other <stdio.h> input function.

char rebuttal[] = "Haben Sie alle Möglichkeiten betrachtet?"

Granted: This cannot appear in a strictly conforming program,
because it uses a character not found in the basic source or
execution sets. "Strictly conforming" programs, though, seem
to be a tiny minority; if you want to write robust code you
should consider the possibility that it might be used outside
the germ-proof bubble.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top