Implementing strstr

Ben Bacarisse · Mar 26, 2010

Tim Rentsch said:
These comments make sense only if the behavior of _Bool were like that
of the other integer types, but it isn't.

I agree with this but I've snipped the discussion because I wanted to
comment on something else.

Storing into a _Bool will
always store either a 0 or a 1; it simply isn't possible to store any
other value because of how conversion to _Bool is defined.

When compiled with gcc, this program:

#include <stdio.h>
#include <string.h>

int main(void)
{
_Bool b;
memcpy(&b, (unsigned char []){2}, sizeof b);
printf("b=%d\n", b);
return 0;
}

prints b=2 and I can't see any reason to think that gcc is wrong to do
that. I don't think the program /has/ to print b=2 but I think it is
permitted.

You can argue that this program does not "store a value" in b (it
copies a representation) but it is does end up with a value other than
0 or 1 stored in b.

<snip>

Tim Rentsch · Mar 27, 2010

Ben Bacarisse said:
Tim Rentsch said:

These comments make sense only if the behavior of _Bool were like that
of the other integer types, but it isn't.

Click to expand...

I agree with this but I've snipped the discussion because I wanted to
comment on something else.

Storing into a _Bool will
always store either a 0 or a 1; it simply isn't possible to store any
other value because of how conversion to _Bool is defined.

Click to expand...

When compiled with gcc, this program:

#include <stdio.h>
#include <string.h>

int main(void)
{
_Bool b;
memcpy(&b, (unsigned char []){2}, sizeof b);
printf("b=%d\n", b);
return 0;
}

prints b=2 and I can't see any reason to think that gcc is wrong to do
that. I don't think the program /has/ to print b=2 but I think it is
permitted.

Yes, I would agree with that; it is permitted because undefined
behavior has occurred.

You can argue that this program does not "store a value" in b (it
copies a representation) but it is does end up with a value other than
0 or 1 stored in b.

The program doesn't store a value into a _Bool. It does store a
value into b, and the declared type of b is _Bool, but it doesn't
(at least not in the sense that I meant the phrase) store a value
into a _Bool. I meant for my statement to apply only to
assignment-like contexts (assignment, initialization, function
argument, return statement) with the target type being _Bool.

It is of course possible for a _Bool variable to hold a value
other than 0 or 1, through undefined behavior (or to be more
precise I should say that implementations are allowed to act as
though a _Bool variable holds a value other than 0 or 1, although
they don't have to). But storing into a _Bool, eg, using _Bool
as the left-hand-side type in an assignment, only ever stores
a 0 or 1 (or of course may do anything at all if previous
undefined behavior has occurred).

(I'm sure none of the technical points here are news to you;
I'm responding just to clarify my meaning.)

Dr Malcolm McLean · Mar 27, 2010

These comments make sense only if the behavior of _Bool were like that
of the other integer types, but it isn't.

Click to expand...

I agree with this but I've snipped the discussion because I wanted to
comment on something else.

Storing into a _Bool will
always store either a 0 or a 1; it simply isn't possible to store any
other value because of how conversion to _Bool is defined.

Click to expand...

When compiled with gcc, this program:

#include <stdio.h>
#include <string.h>

int main(void)
{
_Bool b;
memcpy(&b, (unsigned char []){2}, sizeof b);
printf("b=%d\n", b);
return 0;
}

prints b=2 and I can't see any reason to think that gcc is wrong to do
that. I don't think the program /has/ to print b=2 but I think it is
permitted.

You can argue that this program does not "store a value" in b (it
copies a representation) but it is does end up with a value other than
0 or 1 stored in b.

We need bool pointers in C to get round this weakness in the language.
Each bool pointer can have an offset (these can be stored in the upper
bits in many systems, to avoid inflating pointer size).
This has has the advantage that when we declare an array of bools

bool occupiedflags[8];

The processor can allocate only a single octamer. This represents a
87.5% saving in memory.

There's only one snag, which is with the infamous size_t.

memcpy(&flag1, £flag2, sizeof(bool));

should copy a single bool.
So the answer is to make size_t a double. This system has huge
advantages. For instance fwrite(), which takes a size_t, can now write
bitstreams to files, somethign which was difficult to achieve with the
old defintion of size_t.

Ian Collins · Mar 27, 2010

When compiled with gcc, this program:

#include<stdio.h>
#include<string.h>

int main(void)
{
_Bool b;
memcpy(&b, (unsigned char []){2}, sizeof b);
printf("b=%d\n", b);
return 0;
}

prints b=2 and I can't see any reason to think that gcc is wrong to do
that. I don't think the program /has/ to print b=2 but I think it is
permitted.

You can argue that this program does not "store a value" in b (it
copies a representation) but it is does end up with a value other than
0 or 1 stored in b.

Click to expand...

We need bool pointers in C to get round this weakness in the language.

What weakness?

Each bool pointer can have an offset (these can be stored in the upper
bits in many systems, to avoid inflating pointer size).
This has has the advantage that when we declare an array of bools

What upper bits?

bool occupiedflags[8];

The processor can allocate only a single octamer. This represents a
87.5% saving in memory.

There's only one snag, which is with the infamous size_t.

memcpy(&flag1, £flag2, sizeof(bool));

One is the smallest addressable location. You are trading space for
gross inefficiency.

should copy a single bool.
So the answer is to make size_t a double. This system has huge
advantages. For instance fwrite(), which takes a size_t, can now write
bitstreams to files, somethign which was difficult to achieve with the
old defintion of size_t.

So what happens if you want to write a number of bytes that can't be
represented exactly by a double?

Or was the above a joke? If so, you are 5 (or 6, depending on time
zone) days early.

Ben Bacarisse · Mar 27, 2010

Tim Rentsch said:
Ben Bacarisse said:

Tim Rentsch said:

These comments make sense only if the behavior of _Bool were like that
of the other integer types, but it isn't.

Click to expand...

I agree with this but I've snipped the discussion because I wanted to
comment on something else.

Storing into a _Bool will
always store either a 0 or a 1; it simply isn't possible to store any
other value because of how conversion to _Bool is defined.

Click to expand...

When compiled with gcc, this program:

#include <stdio.h>
#include <string.h>

int main(void)
{
_Bool b;
memcpy(&b, (unsigned char []){2}, sizeof b);
printf("b=%d\n", b);
return 0;
}

prints b=2 and I can't see any reason to think that gcc is wrong to do
that. I don't think the program /has/ to print b=2 but I think it is
permitted.

Click to expand...

Yes, I would agree with that; it is permitted because undefined
behavior has occurred.

That was not quite the point I was making though the difference is
minimal. I thought that an implementation was permitted to do the
above without it being undefined behaviour. It certainly can be UB,
but does it have to be?

I should not have brought up gcc because that is too specific. gcc
(at least my current version of gcc) gives _Bool a single value bit
so, since gcc is a conforming implementation, the above just shows
that all the other bit settings in a _Bool are trap representations.
This was pointed out by Keith Thompson in a thread earlier this year.

I wanted to suggest that _Bool can have more than one value bit.
That's the point I was unable to refute by a looking through the
standard.

The program doesn't store a value into a _Bool. It does store a
value into b, and the declared type of b is _Bool, but it doesn't
(at least not in the sense that I meant the phrase) store a value
into a _Bool. I meant for my statement to apply only to
assignment-like contexts (assignment, initialization, function
argument, return statement) with the target type being _Bool.

It is of course possible for a _Bool variable to hold a value
other than 0 or 1, through undefined behavior (or to be more
precise I should say that implementations are allowed to act as
though a _Bool variable holds a value other than 0 or 1, although
they don't have to).

Is it possible without undefined behaviour?

But storing into a _Bool, eg, using _Bool
as the left-hand-side type in an assignment, only ever stores
a 0 or 1 (or of course may do anything at all if previous
undefined behavior has occurred).

(I'm sure none of the technical points here are news to you;
I'm responding just to clarify my meaning.)

I certainly agree with everything you've said, but I messed up my
point by referring to a specific implementation.

Ben Bacarisse · Mar 27, 2010

<snip>
[You suggesting to add bit pointers to C does not depend on my example
so I'll cut it in case there is confusion.]

We need bool pointers in C to get round this weakness in the language.
Each bool pointer can have an offset (these can be stored in the upper
bits in many systems, to avoid inflating pointer size).
This has has the advantage that when we declare an array of bools

bool occupiedflags[8];

The processor can allocate only a single octamer. This represents a
87.5% saving in memory.

There's only one snag, which is with the infamous size_t.

memcpy(&flag1, Â£flag2, sizeof(bool));

There would be other snags. Both char * and void * would have to use
this pointer+offset representation unless you were prepared to
re-write other whole chunks of the language.

should copy a single bool.
So the answer is to make size_t a double. This system has huge
advantages. For instance fwrite(), which takes a size_t, can now write
bitstreams to files, somethign which was difficult to achieve with the
old defintion of size_t.

Did you omit the smiley?

Tim Rentsch · Mar 27, 2010

Ben Bacarisse said:
Tim Rentsch said:

Ben Bacarisse said:

<snip>
These comments make sense only if the behavior of _Bool were like that
of the other integer types, but it isn't.

I agree with this but I've snipped the discussion because I wanted to
comment on something else.

Storing into a _Bool will
always store either a 0 or a 1; it simply isn't possible to store any
other value because of how conversion to _Bool is defined.

When compiled with gcc, this program:

#include <stdio.h>
#include <string.h>

int main(void)
{
_Bool b;
memcpy(&b, (unsigned char []){2}, sizeof b);
printf("b=%d\n", b);
return 0;
}

prints b=2 and I can't see any reason to think that gcc is wrong to do
that. I don't think the program /has/ to print b=2 but I think it is
permitted.

Click to expand...

Yes, I would agree with that; it is permitted because undefined
behavior has occurred.

Click to expand...

That was not quite the point I was making though the difference is
minimal. I thought that an implementation was permitted to do the
above without it being undefined behaviour. It certainly can be UB,
but does it have to be?

I should not have brought up gcc because that is too specific. gcc
(at least my current version of gcc) gives _Bool a single value bit
so, since gcc is a conforming implementation, the above just shows
that all the other bit settings in a _Bool are trap representations.
This was pointed out by Keith Thompson in a thread earlier this year.

I wanted to suggest that _Bool can have more than one value bit.
That's the point I was unable to refute by a looking through the
standard.

Ahh, now I see the point you're making.

Is it possible without undefined behaviour?

I believe the Standard does not disallow _Bool from holding values
other than 0 or 1. Doing so depends on implmentation-defined
behavior (specifically how _Bool is represented), but getting a
value other than 0 or 1 into a _Bool, using memcpy() for example,
need not transgress undefined behavior. (There also is a dependence
on implentation-defined behavior to know what the mapping is between
bits in another type and the bits in _Bool, but again that's only
implementation-defined behavior, not undefined behavior.)

I certainly agree with everything you've said, but I messed up my
point by referring to a specific implementation.

Now that I understand what you're saying, I see the point
of it, and agree with your reading.

blmblm · Mar 27, 2010

spinoza1111 said:
In the replace() program of last month's flame festival, a little
program was trying to get out. Here it is: an implementation of strstr
including a call that returns the offset of the found substring. Two
hours including all comments and dedicatory ode, written for this
occasion.

If I were going to put as much effort into writing comments as you
appear to have done with this program, I would explicitly discuss
the function's parameters and return value -- i.e., I would say
how the return value depends on the function's inputs and also
describe any side effects [*].

You don't appear to have done this, and while based on your
comments I infer that the function returns a pointer to the
first occurrence of *strTarget in *strMaster, I have no idea
what the third parameter is for, and I don't really know what the
function does if the "target" is not a substring of the "master".
I can make a guess about the latter based on the behavior of the
standard-library function with the same name, but that's what it
is -- a guess.

[*] The only case in which I would omit such discussion would be
when the names of the function and/or parameters are so descriptive
as to make discussion superfluous. In that regard, I find the
names used in the man-page documentation of strstr ("needle" and
"haystack") more descriptive if less formal than your "target"
and "master".

Please don't credit me by name if you make changes based on the
above critique, or on the (half-hearted) one that follows the code.

#include <stdlib.h>
#include <stdio.h>

// ***************************************************************
// * *
// * strstr *
// * *
// * This function (strstr) finds a string, probably as fast as *
// * possible without extra memory usage over and above brute *
// * force. *
// * *
// * In searching a Nul terminated string for a substring, there *
// * are logically three possibilities in a left to right *
// * traversal of the master string that (1) looks for the *
// * first character of the target and then (2) matches all the *
// * remaining characters:
// * *
// * * (Erroneous): on the failure of a partial match, *
// * restart at the first nonmatching character. This is *
// * fast but wrong, since the matching string may *
// * overlap the partial match. *
// * *
// * * (Too slow): on the failure of a partial match, start*
// * one past the first character (of the partial match) *
// * *
// * * (Just right): while matching characters, note the *
// * leftmost character in the searched string, to the *
// * right of the first matched character, that matches *
// * both that character and, of course, the first *
// * character of the target. *
// * *
// * C H A N G E R E C O R D --------------------------------- *
// * DATE PROGRAMMER DESCRIPTION OF CHANGE *
// * -------- ---------- --------------------------------- *
// * 03 18 10 Nilges Version 1.0 *
// * *
// * ----------------------------------------------------------- *
// * *
// * To find a string, oh Muse! I sing, inside another String! *
// * Alternatives to me come in Three, ah, that's the thing: *
// * For the one thing for which the Wise must watch is mayhap, *
// * Partial occurences that most melancholy, overlap. *
// * The first is base, mechanical, low, and tragicomical: *
// * It's to restart from the previous beginning plus but One *
// * Oh what Mayhem to true Programming is thereby, done! *
// * But the job it will do, as did Hercules, *
// * His Labors for the Goddess cruel in Seneca's tragedies: *
// * Arduously and ignobly like unto the meanest Hind *
// * That knoweth not his Elbow from his Behind. *
// * The second is worse, a boner, a solecism, and a Seebach: *
// * The second restarts at the character that doth match! *
// * Oh muse! Such hellish Sights before me yawn: *
// * But be assur'd, 'tis darkest just before the Dawn. *
// * Shout for Victory, oh Thrace, and smite the Harp, and Grin: *
// * For lo, we start at the leftmost "handle" of the string *
// * When it occureth in *
// * The tragic partial match that hath failed us. *
// * If no such handle exists, then we can restart *
// * At the point of match failure: no, 'tis not a brain fart. *
// * Now we spy our magic bus: *
// * For this is the best Al Gore ithm *
// * That we can hope for in C, a language without Rhyme, or *
// * for that matter, Oh Muse! rhythm. *
// * *
// ***************************************************************

#define TRUTH -1
#define FALSITY 0
#define NULLITY 0

char * strstrWithIndex(char *strMaster,
char *strTarget,
int *ptrIndex)
{
char *ptrMaster = NULLITY;
char *ptrTarget = NULLITY;
char *ptrHandle = NULLITY;
int booFound = FALSITY;
if (!*strMaster || !*strTarget) return 0;
for (ptrMaster = strMaster; *ptrMaster
{
for (;
*ptrMaster && *ptrMaster != *strTarget;
ptrMaster++);
ptrTarget = strTarget;
*ptrIndex = ptrMaster - strMaster;
ptrHandle = 0;
for (;
*ptrTarget
?
(*ptrMaster
?
(*ptrMaster==*ptrTarget ? TRUTH : FALSITY)
:
FALSITY)
:
(booFound = TRUTH, FALSITY);
ptrMaster++, ptrTarget++)
{
if (ptrHandle = 0
&&
ptrMaster > strMaster
&&
*ptrMaster == *strTarget)
ptrHandle = ptrTarget;
}
if (booFound) return strMaster + *ptrIndex;
if (ptrHandle) ptrMaster = ptrHandle + 1;
}
*ptrIndex = 0;
return 0;
}

char * strstr(char *strMaster, char *strTarget)
{
int ptrIndex = 0;
return strstrWithIndex(strMaster, strTarget, &ptrIndex);
}

int main(void)
{
char *ptrIndex1 = NULLITY;
int intIndex1 = 0;
printf("strstr Simplified\n\n");
printf("Expect 0: %d\n", strstr("", ""));
printf("Expect 0: %d\n", strstr("0123456789", ""));
printf("Expect 0: %d\n", strstr("", "0"));
printf("Expect 0: %d\n", strstr("Here", "There"));
ptrIndex1 = strstrWithIndex("There", "here", &intIndex1);
printf("Expect 1: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex("They seek him here",
"here",
&intIndex1);
printf("Expect 14: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex("They seek him there",
"here",
&intIndex1);
printf("Expect 15: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex
("The clc regs seek him everywhere",
"here",
&intIndex1);
printf("Expect 28: %d\n", intIndex1);
printf("Expect 'h': %c\n", *ptrIndex1);
ptrIndex1 = strstrWithIndex
("Is he in Heaven? Or in Hell?",
"?",
&intIndex1);
printf("Expect 15: %d\n", intIndex1);
printf("Expect '?': %c\n", *ptrIndex1);
ptrIndex1 = strstrWithIndex
("That damn'd elusive Spinoza won't tell!",
"Spinoza",
&intIndex1);
printf("Expect 20: %d\n", intIndex1);
printf("Expect 'p': %c\n", *(ptrIndex1+1));
printf("Expect '0': %c\n", *strstr("0123456789", "0"));
printf("Expect '1': %c\n", *strstr("0123456789", "1"));
printf("Expect '0': %c\n", *strstr("0123456789", "0"));
printf("Expect '9': %c\n", *strstr("0123456789", "9"));
printf("Expect '5': %c\n", *strstr("0123456789", "345") + 2);
printf("Expect '8': %c\n", *strstr("0123456789", "89"));
ptrIndex1 = strstrWithIndex("0123456789A89AB",
"89AB",
&intIndex1);
printf("Expect 11: %d\n", intIndex1);
return 0;
}

Still printing the output and asking the human to check it rather
than having the program check itself ....

Well, now that I think about it, I suppose that's not as bad as
it might be, since it would be easy enough -- in my preferred
development environment anyway [*] -- to copy the expected output
into a text file, capture actual output in another text file,
and have the computer compare the two.

[*] For short C programs -- text-based tools under Linux.

blmblm · Mar 27, 2010

[ snip ]

There are any number of ways a person might know the assignments in
a first-year CS class without ever having taken one. Do I really need
to list them?

I find it amusing, by the way, that even though Seebs has never
taken a CS course he on many occasions produces paragraphs that
I could almost have written myself for inclusion in something to
be distributed as part of such a course -- the academic party
line about choosing variable names well, for example, or not
optimizing before you know you need to. I guess it *could*
be coincidence ....

[ snip ]

blmblm · Mar 27, 2010

I did not claim that = in place of == was a bug. In fact it is
obvious that putting == back introduces more bugs. My point was that
you did not know what your code was doing: you reported favourably on
a bug fix that could not possibly have any effect. You can't fix (or
do pretty much anything with) code you don't understand.

For suitable definitions of "understand", I would claim. At a
previous place of employment I developed some skill at fixing
bugs in code I understood just well enough to figure out where it
was going wrong. Not that I'm claiming that ignorance is good,
exactly, only that if complete understanding is not feasible,
it is often useful to be able to proceed from partial understanding.

[ snip ]

blmblm · Mar 27, 2010

[ snip ]

The following code, from Peter's "pseudo root simulator", is submitted
to this discussion group as evidence that he is incompetent, and is
unethically using this newsgroup to get debugging assistance for code
that he will then claim as his own. And because he is incompetent, he
has defamed Herb Schildt and myself in a legally actionable sense.

If I am wrong I will apologize for basing this charge on this evidence
but will not withdraw the charge, since other evidence exists. I am
the best programmer in this newsgroup, but by no means the best C
programmer, so I may have missed something.

Sources: the thread starts at http://groups.google.com.hk/group/comp.lang.c/msg/54dfb34c84373f26?hl=en
("A hunk of not-very-portable code I've written"). This post
references code in github at http://github.com/wrpseudo/pseudo/blob/master/pseudo.c
line 664 at this moment in time.

int
pseudo_server_response(pseudo_msg_t *msg, const char *tag) {
switch (msg->type) {
case PSEUDO_MSG_PING:
msg->result = RESULT_SUCCEED;
if (opt_l)
pdb_log_msg(SEVERITY_INFO, msg, tag, "ping");
return 0;
break;
case PSEUDO_MSG_OP:
return pseudo_op(msg, tag);
break;
case PSEUDO_MSG_ACK:
case PSEUDO_MSG_NAK:
default:
pdb_log_msg(SEVERITY_WARN, msg, tag, "invalid message");
return 1;
}
}

OK. My understanding of C is that for ACK and NAK, the code will fall
through to "invalid message", and this is not intended.

I am quite curious about whether in fact the fall-through (which as
far as I know will happen) is intentional or an error.

Good style
alone demands a break after the ack and nak cases. The same sort of
error is repeated above the code at github above this code, therefore
this isn't "ADHD". It is, as far as I can see, incompetence which
taken together to an out of order and in the same code, means that
Seebach has not mastered the C programming language as of this year,
and did not know C when he published "C: The Complete Reference", his
defamatory attack on Herb Schildt.

[ snip ]

blmblm · Mar 27, 2010

[ snip ]

Focussing on the hard part at first and leaving the trivialities
for the future is a good strategy, IMHO. Trivial things will show up.

Huh. I usually adopt exactly the opposite strategy -- do the
easy parts first, as a way of warming up, so to speak, or perhaps
building confidence, and then tackle the hard parts. Could this
be something where different people's minds just work differently?
I find Seebs's comment about the hard part being easy and the easy
part being hard -- oh, "incomprehensible" is too strong a word,
but it's not something I can easily imagine. Experience suggests
that our brains don't all work the same in all respects, but now
I'm wondering which of us is the outlier ....

[ snip ]

blmblm · Mar 27, 2010

[ snip ]

[ snip ]

In this particular case, it is possible that Seebach never got around
to deciding and coding what ack and nak should do. A nak sounds like
an error in which you should wait until a maximum. But I do not
understand how an ack could be an error. "hey, I'm here! **** you,
you're an error! Mom!"

I am waiting for him to lie about this situation.

Furthermore, there is a command processor switch statement directly
above the code I'm discussing: it seems to want to process pseudo root
commands. Many commands seem "stubbed", yet strangely, not with a
break to get out of the switch statement. As a result, a mkdir will do
a mknod.

[Parenthetically, one wonders what a pseudo root is for. OK, you want
to make directories and stuff without bothering Linux or being in
protected mode? Or maybe you are a programmer only in virtual reality
and you like simulating what the big boys do? Hell, I love writing
simulators myself. But isn't Linux really, really good at making
directories in reality? Are we reinventing the wheel? Don't get me
wrong, I might be The Greatest Programmer in the World but I know dick
about Linux.]

The code, which is too large to include, starts at line 456 at
http://github.com/wrpseudo/pseudo/blob/master/pseudo.c.

I do not understand why the operations he has supported at this time
have a break but the unsupported operations do not. The apparent error
is consistent, and Peter has told us that he gets trivial matters
wrong.

However, programming isn't high level conceptual handwaving. When I
wrote the compiler for "Build Your Own .Net Language and Compiler" I
wrote all 26K lines by myself. Real programmers don't like
collaboration save for pair programming.

I believe that if I'd submitted this code, Peter would have been on me
like a fly on shit, claiming that I am both incompetent and insane,
once others noticed the error. I believe he's no better than somebody
coming here for homework completion, and possibly worse.

Here's another troubling error, in the function containing the
troubling case statements. A "pseudo_msg_t" named db_header is
declared with no initialization, and then assigned using this code:

if (found_ino && (prefer_ino || !found_path)) {
db_header = by_ino;
} else if (found_path) {
db_header = by_path;
}

Hmm...if you have found an ino and you prefer it, or if no path was
found, then you set db_header to by_ino. Otherwise if you have a path
you use that.

[Oh shit, I hope I know how else and if and stuff work. Yup I think
that the above if has sprung a leak.]

OK, perhaps this is an error which "can't happen". However, competent
programmers handle many (but not all) things which can't happen. I
worry in strstr about a null target. I don't worry AT THIS TIME (after
two hours work) about a target or a master that is NULL.

But...Peter says this code has been worked on by him for two months.
It is to me unwise to not put in error checking early so as to "eat
your own dog food" when errors occur.

Nope, I think the code is globally incompetent and was brought here to
be beat into shape by slaves for whom Peter, and Wind River Systems,
have naught but contempt, as indicated by his treatment of dissident,
uppity slaves like me.

I'm quoting this because while I haven't read it all carefully some
of it sounds like possibly-legitimate critique, and I'm curious about
what Seebs has to say about it. (I'm guessing that he's still not
reading your posts directly.)

Is this what Open Source has come to?

I would have given db_header an initial null value.

Ah, but would you? The bug I found in your code -- wasn't that
a matter of not always assigning an initial value to a variable?
Just sayin'.

Seebs · Mar 27, 2010

(Thanks for quoting this, I never see his garbage except when quoted.)

There are any number of ways a person might know the assignments in
a first-year CS class without ever having taken one. Do I really need
to list them?

The most obvious: Most of my college friends did CS. I used to kibitz
and offer advice. Furthermore, having read a number of books on C,
including their exercises, I have the luxury of knowing what kinds of
exercises they contain.

Basically, if I were teaching people who had about the cognitive skills
and study abilities I'd expect from first-year college students C, I'd
have expected them to be able to do something like that well before the
end of the first semester, and if a substantial fraction couldn't, I'd
think it reflected badly on me as an instructor.

I find it amusing, by the way, that even though Seebs has never
taken a CS course he on many occasions produces paragraphs that
I could almost have written myself for inclusion in something to
be distributed as part of such a course -- the academic party
line about choosing variable names well, for example, or not
optimizing before you know you need to. I guess it *could*
be coincidence ....

Most of it's honestly pretty obvious if you do any programming and
think about it -- or if you read a lot of books on the topic of
software design and development. Reading things like _The Practice
of Programming_, or the _Programming on Purpose_ series, will do a
lot towards covering the material that would otherwise have been
scattered in among the basic language syntax.

Also, I have to say, several friends have gone far above and beyond
in teaching me about programming (hi Mike!), to say nothing of the
excellent advice and handholding I got from comp.lang.c regulars back
in the late 80s and early 90s. I think it's safe to say that people like
Chris Torek and Steve Summit did a lot to teach me about effective use
of C.

As to whether it's worked... I have published my latest project, and
I continue to push updates to the public git tree before running them
through the internal code review process, in the interests of leaving
a public record.

-s

Seebs · Mar 27, 2010

I am quite curious about whether in fact the fall-through (which as
far as I know will happen) is intentional or an error.

It is intended, although I have since started consistently adding
/* FALLTHROUGH */
comments to such cases.

The reason for this is that ACK and NAK are responses that are by
DEFINITION sent only by the server. The client MUST NOT generate either
such message. Therefore, they are an "invalid message", as is any
other message of an unknown type.

Not when multiple cases are intentionally handled by the same code.

But I will go add the /* FALLTHROUGH */ comments to make it more
explicit. I'd say "good catch" if it weren't so stunningly obvious
that Nilges hadn't read, say, the documentation provided with the
project which clearly indicates what ACK and NAK mean and when
they are valid.

-s

Keith Thompson · Mar 27, 2010

Ben Bacarisse said:
Tim Rentsch said:

Ben Bacarisse said:

<snip>
These comments make sense only if the behavior of _Bool were like that
of the other integer types, but it isn't.

I agree with this but I've snipped the discussion because I wanted to
comment on something else.

Storing into a _Bool will
always store either a 0 or a 1; it simply isn't possible to store any
other value because of how conversion to _Bool is defined.

When compiled with gcc, this program:

#include <stdio.h>
#include <string.h>

int main(void)
{
_Bool b;
memcpy(&b, (unsigned char []){2}, sizeof b);
printf("b=%d\n", b);
return 0;
}

prints b=2 and I can't see any reason to think that gcc is wrong to do
that. I don't think the program /has/ to print b=2 but I think it is
permitted.

Click to expand...

Yes, I would agree with that; it is permitted because undefined
behavior has occurred.

Click to expand...

That was not quite the point I was making though the difference is
minimal. I thought that an implementation was permitted to do the
above without it being undefined behaviour. It certainly can be UB,
but does it have to be?

I should not have brought up gcc because that is too specific. gcc
(at least my current version of gcc) gives _Bool a single value bit
so, since gcc is a conforming implementation, the above just shows
that all the other bit settings in a _Bool are trap representations.
This was pointed out by Keith Thompson in a thread earlier this year.

Hmm. I don't remember making that particular point.

Looking through the gcc documentation ("info gcc"), I don't see any
mention of padding bits. It says that "all bit patterns are ordinary
values" for signed types, but it doesn't say so for unsigned types
(including _Bool).

I wanted to suggest that _Bool can have more than one value bit.
That's the point I was unable to refute by a looking through the
standard.

Note that the existence of padding bits doesn't imply the existence
of trap representations. For example, I think any of the following
would be a valid implementation;

1. _Bool has 8 bits, of which 1 is a value bit and the other 7 are
padding bits.

1a. A representation with any padding bit set to 1 is a trap
representation.

1b. Padding bits are ignored; only the single value bit contributes to
the value. For example, after using memset to copy the byte
values 0x00 and 0xf0 into two _Bool objects, the "==" operator
will report that their values are equal.

2. _Bool has 8 bits, all of which are value bits.

Case 2 can be treated as 1a depending entirely on the implementation's
documentation; it's just a matter of which behavior the implementation
chooses to leave undefined.

Is it possible without undefined behaviour?

If _Bool has more than 1 value bit, then yes. If _Bool has just
1 value bit, then no. Given a straightforward implementation
(_Bool is 1 byte, conversions perform whatever extra work is needed
to satisfy the standard's requirements, examining the value of a
_Bool object just treats the representation as an unsigned byte),
the implementation may document either 1 or 8 as the number of
value bits.

I think.

Seebs · Mar 27, 2010

Very simple: You cannot ACK or NAK without having been sent a request.
Requests are never sent to the client, therefore a message from the
client must never be an ACK or NAK.

Don't hold your breath.

Apparently Nilges is not aware of the convention of using a switch statement
to express things analaguous to:
else if (x == VALUE_ONE || x == VALUE_TWO || x == VALUE_THREE)

[Parenthetically, one wonders what a pseudo root is for.

Click to expand...

Same thing as fakeroot -- to let you create things like package archives
or disk images which reflect specified ownership, permissions, or device
nodes, without requiring root privileges on the machine where you create
them.

That's because you don't understand the semantics of switch().

Pretty much.

What error? There is no possible error. by_ino and by_path are not
pointers, they're objects. They are initialized at the top of the
function. If data of either type have been found, we copy the best
fit into db_header.

The usage, later, is:

if (found_ino || found_path) {
*msg = db_header;
} else {
msg->result = RESULT_FAIL;
}

Which is to say, IF we found either of them, we copy its data in, otherwise,
the operation has failed.

I'm quoting this because while I haven't read it all carefully some
of it sounds like possibly-legitimate critique, and I'm curious about
what Seebs has to say about it. (I'm guessing that he's still not
reading your posts directly.)

Good guess.

Ah, but would you? The bug I found in your code -- wasn't that
a matter of not always assigning an initial value to a variable?
Just sayin'.

I see no reason to assign it a null value, when I am guaranteed that
I will always assign it one or another of two values, both of which
are necessarily initialized.

No, there's at least two others. There's fakeroot (which we used to
use) and fakeroot-ng. Both are, for various reasons, unsuitable to
our purposes.

Yes.

/* warning: GNU getopt permutes arguments, which is just plain
* wrong. The + suppresses this annoying behavior, but may not
* be compatible with sane option libraries.
*/

The issue here is as follows:

1. GNU's implementation of getopt() defaults to reordering arguments,
which violates both the POSIX spec and reasonable (IMHO, but also
in the HO of many other perhaps better-qualified people) expectations.
2. This program is necessarily run on systems where the system getopt()
is GNU getopt().
3. To suppress the undesired behavior, I use the GNU getopt() extension
which removes this behavior.
4. Because a future porter might migrate this code to a system wherein
this extension doesn't exist (and isn't needed), I explain what
extension I'm using and why I'm using it.

I stand by the comment and implementation choice at this time.

(Note that getopt() is a POSIX extension, not a standard C feature, so
this is only marginally topical; however, I think the general question of
what to comment on when different implementations respond to a spec
differently is worth talking about.)

There's no default handler because the function's spec is defined very
clearly to never produce values other than the option characters specified,
a question mark, or -1. Since -1 was handled above, ? is the only other
possibility. I suppose I should have a default: case for the possible
case where a new option is added to the option string but not to the switch
statement, but it hasn't yet come up.

-s

Seebs · Mar 27, 2010

Huh. I usually adopt exactly the opposite strategy -- do the
easy parts first, as a way of warming up, so to speak, or perhaps
building confidence, and then tackle the hard parts. Could this
be something where different people's minds just work differently?
I find Seebs's comment about the hard part being easy and the easy
part being hard -- oh, "incomprehensible" is too strong a word,
but it's not something I can easily imagine. Experience suggests
that our brains don't all work the same in all respects, but now
I'm wondering which of us is the outlier ....

Me. It's a stereotypical trait of ADHD, especially fairly extreme
ADHD. Basically, if I'm not fully engaged, my mind starts wandering
fast -- and simple tasks aren't very engaging.

-s

Ben Bacarisse · Mar 28, 2010

Keith Thompson said:
Ben Bacarisse said:

Tim Rentsch said:

<snip>
These comments make sense only if the behavior of _Bool were like that
of the other integer types, but it isn't.

I agree with this but I've snipped the discussion because I wanted to
comment on something else.

Storing into a _Bool will
always store either a 0 or a 1; it simply isn't possible to store any
other value because of how conversion to _Bool is defined.

When compiled with gcc, this program:

#include <stdio.h>
#include <string.h>

int main(void)
{
_Bool b;
memcpy(&b, (unsigned char []){2}, sizeof b);
printf("b=%d\n", b);
return 0;
}

prints b=2 and I can't see any reason to think that gcc is wrong to do
that. I don't think the program /has/ to print b=2 but I think it is
permitted.

Yes, I would agree with that; it is permitted because undefined
behavior has occurred.

Click to expand...

That was not quite the point I was making though the difference is
minimal. I thought that an implementation was permitted to do the
above without it being undefined behaviour. It certainly can be UB,
but does it have to be?

I should not have brought up gcc because that is too specific. gcc
(at least my current version of gcc) gives _Bool a single value bit
so, since gcc is a conforming implementation, the above just shows
that all the other bit settings in a _Bool are trap representations.
This was pointed out by Keith Thompson in a thread earlier this year.

Click to expand...

Hmm. I don't remember making that particular point.

I was thinking of this: Message-ID: said:
Looking through the gcc documentation ("info gcc"), I don't see any
mention of padding bits. It says that "all bit patterns are ordinary
values" for signed types, but it doesn't say so for unsigned types
(including _Bool).

One can detect the number of value bits by attempting to declare a bit
field of a particular width (6.7.2.1 p3: "width" is the number of
value bits as per 6.2.6.2 p6). The wording is that the bit field
width "does not exceed" the width of the base type which could be
taken to mean that one can only find a lower bound on the number of
value bits. That's an unnatural interpretation to me. I think the
meaning is that all widths from zero up to the base type's width are
permitted.

If that's right, then gcc's _Bool has only one value bit (at least the
last time I tried) so for my example to print b=2, b has to contain a
trap representation. (If it did not hold a trap rep. then the padding
bits would have to be ignored and the value would have to be 0 as you
describe below).

Note that the existence of padding bits doesn't imply the existence
of trap representations. For example, I think any of the following
would be a valid implementation;

1. _Bool has 8 bits, of which 1 is a value bit and the other 7 are
padding bits.

1a. A representation with any padding bit set to 1 is a trap
representation.

1b. Padding bits are ignored; only the single value bit contributes to
the value. For example, after using memset to copy the byte
values 0x00 and 0xf0 into two _Bool objects, the "==" operator
will report that their values are equal.

2. _Bool has 8 bits, all of which are value bits.

Case 2 can be treated as 1a depending entirely on the implementation's
documentation; it's just a matter of which behavior the implementation
chooses to leave undefined.

I am not sure I understand this last remark. Do you mean that no
conforming program can tell the difference between 2 and 1a? If so, I
agree, bit I think a programmer can tell the difference in that a
diagnostic is required for

struct s { _Bool b : 2; };

in the case of 1a.

<snip>

Keith Thompson · Mar 28, 2010

Ben Bacarisse said:
I was thinking of this: Message-ID: <[email protected]>

I see, my argument was based on bit fields. I had forgotten about
that.

[...]

I am not sure I understand this last remark. Do you mean that no
conforming program can tell the difference between 2 and 1a? If so, I
agree, bit I think a programmer can tell the difference in that a
diagnostic is required for

struct s { _Bool b : 2; };

in the case of 1a.

<snip>

I forgot about bit fields. Yes, that is a difference between cases 1a
and 2. (Well, mostly. Implementations are allowed to produce extra
diagnostics whenever they like, so you can't *really* distinguish the
cases based on that, but you can if you assume that this particular
message is not a lie.)

A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
Efficency and the standard library	500	Feb 10, 2010
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Command Line Arguments	0	Mar 7, 2023
Help in this program.	2	May 14, 2022
C is NOT significantly more efficient than C Sharp	7	Dec 27, 2009
strstr crashes on NULL	7	Jul 12, 2007
Need help in debugging tic tac toe (beginner)	2	Jun 28, 2023

Implementing strstr

Ben Bacarisse

Tim Rentsch

Dr Malcolm McLean

Ian Collins

Ben Bacarisse

Ben Bacarisse

Tim Rentsch

blmblm

blmblm

blmblm

blmblm

blmblm

blmblm

Seebs

Seebs

Keith Thompson

Seebs

Seebs

Ben Bacarisse

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads