managed string library

R

Robert Seacord

Paul,

I did have a look at James Anthill's Vstr implementation and I discuss
it in the strings chapter of my book on "Secure Coding in C and C++"
which happens to be available online at:

http://www.informit.com/articles/article.asp?p=430402&seqNum=8&rl=1

I was not aware of your implementation, so I did not evaluate it.

Our goal in writing the managed string library was to define an API that
could be used to program more securely. To my mind, writing secure code
also encompasses eliminating defects in general so I believe this
library also addresses these concerns. You also commented that " the
API is somewhat cumbersome, which makes inline usage impossible". This
was an intentional decision on our part, as in-line usage typically
prevents/discourages a user from checking the return status of the function.

Performance was not a major objective for the reference implementation
of the API. The idea was to allow library vendors and other interested
parties such as yourself the opportunity to provide more efficient
implementations.

We should have a complete implementation of the API available shortly.
I'll post a further announcement to these news groups when it is ready.

Thanks,
rCs
 
W

websnarf

Robert said:
I did have a look at James Anthill's Vstr implementation and I discuss
it in the strings chapter of my book on "Secure Coding in C and C++"
which happens to be available online at:

http://www.informit.com/articles/article.asp?p=430402&seqNum=8&rl=1

I was not aware of your implementation, so I did not evaluate it.

In James' own string library comparisons he makes mention of the Better
String Library, and its about the only one he doesn't evicerate in his
evaluation. Or you could have just searched for "string library" in
google (Bstrlib is currently hit #2).
Our goal in writing the managed string library was to define an API that
could be used to program more securely.

I think both James and I had this in mind when we were developing our
respective libraries as well. The main difference being that we both
also decided it was worthwhile to satisfty even more criteria. I was
more focussed on functionality and ease of use or "Software Crisis"
kind of problems, whereas James was clearly more focussed on ultimate
performance in networking or highly IO centric environments.

And I think you *missed* the crucial point about aliasing entirely.
This is a safety/security issue (at least it is as much as NULL
parameter detection is) and yet it appears to be unaddressed in your
library. (I have not evaluated Vstr deeply enough to know whether or
not he solved that issue, since it only compiles with the
gnu-toolchain).

Another security issue is the question of auditing. As you know, prior
to long term real world usage the only way you can have any sort of
assurance about the security of any system is by having security
experts audit the code. The Better String Library highly facilitates
this by its "security statement". This statement declares up front
exactly what Bstrlib's asserted functionality is with respect to
security. In this way an auditor can delineate his/her strategy by 1)
verifying that the library does what it says (by examining the source
code of Bstrlib) and 2) verifying that the facilities claimed are
sufficient to meet the security requirements of the rest of the
program. This delineation is important as it gives a bounded and well
defined way that the auditor can evaluate all this "extra code" that
Bstrlib provides that another developer has not written themselves.

With the managed string library being propoposed for the next C
standard, the auditor ends up having to "trust" the compiler if its a
closed source compiler, as well as trusting your design. But worse
yet, without any security statement, an auditor will have a harder time
knowing what exactly they should expect the managed library to deliver
from the point of view of security.
[...] To my mind, writing secure code
also encompasses eliminating defects in general so I believe this
library also addresses these concerns. You also commented that " the
API is somewhat cumbersome, which makes inline usage impossible". This
was an intentional decision on our part, as in-line usage typically
prevents/discourages a user from checking the return status of the function.

Some might say that it discourages users from using the library at all.
In Bstrlib I introduce the concept of "error propogation". Along with
supporting inline usage, errors are detected and passed through the
calls (error-in produces error-out.) This more closely matches the
"laziness" of users, in that they need not check each call, they only
need to check the last call in the chain of calls.

You might argue that this is just a different way to solve the same
problem, except that Bstrlib's method enjoys two huge advantages: 1) it
vastly simplifies error checking without compromising correctness,
which is important for dealing with unintentional leaks due to exiting
before freeing resources. 2) it allows one to continue to write code
very concisely which in most cases will lead to easier maintenance.

By *only* focussing on the problem from the very narrow point of view
of security, you are missing ideas like this. And you consequently
lose the potential to appeal to programmers for other reasons, which is
important if you want people to seriously adopt these new functions.
This is why there is such a thing as cherry flavored cough syrup, or
mint flavored toothpaste.
Performance was not a major objective for the reference implementation
of the API. The idea was to allow library vendors and other interested
parties such as yourself the opportunity to provide more efficient
implementations.

I have no idea how I would improve the efficiency of pervasive
character set filtering over the more obvious alternative of performing
the filtering just at the time that system() is called. The
alternative is different semantically -- but I believe that this
difference is what is called for. Charset filter is not a generally
useful feature *UNLESS* you are calling system (*and* under that
assumption that this is good enough for system call safety.) I.e., I
don't believe I *can* solve the performance problems of the managed
string design.

If you ignore perfomance to this degree, you will immediately create
resistance among developers who will shy away from using "managed
strings" because of some performance penalty they percieve. It sets up
a "false dichotomy" in the minds of developers that safety must come at
the expense of performance. A quick look at either Bstrlib or Vstr
shows that this dichotomy is not true. Both substantially out-perform
the standard C library functions (with Vstr, its kind of conditional on
the kind of code, but when it wins it usually wins big) and both also
deliver far more safety.
We should have a complete implementation of the API available shortly.
I'll post a further announcement to these news groups when it is ready.

Ok, but I think the major problem is with the design not any
implementation.

[comp.lang.c.moderated removed because posting there appears to delay
posts for weeks]
 
C

CBFalconer

Robert said:
I did have a look at James Anthill's Vstr implementation and I
discuss it in the strings chapter of my book on "Secure Coding in
C and C++" which happens to be available online at:

http://www.informit.com/articles/article.asp?p=430402&seqNum=8&rl=1

I was not aware of your implementation, so I did not evaluate it.

And, due to the woeful lack of quotation and attribution, nobody
else is aware of it or anything else.

USE maineline address!
 
J

James Dennett

James said:
Jonathan said:
(e-mail address removed) wrote:
[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).
What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.
The simpler expectation from the interface is "append
a copy of the string *currently* pointed to by p to p",
i.e., append it to itself.

Other languages that support this via notation such
as s+=s; or s = s+s implement it this way.

If you think of strcat in terms of its implementation

or, in terms of it's specification by the standard,

But the point was to think of the *intuitive* meaning of
strcat, not its formally specified meaning. The standard
doesn't capture the intuitive (or, if you prefer, naive)
expectation. Which is fine by me, as I don't expect that
intuition will be sufficient for robust programming.

-- James
 
D

Douglas A. Gwyn

Robert Seacord said:
was an intentional decision on our part, as in-line usage typically
prevents/discourages a user from checking the return status of the
function.

My point of view is that requiring the programmer to explicitly test
for correctness is not appreciably better than the current situation,
and that usage errors (as opposed to expected "failures" such as
testing for the existence of a file by name) are best handled by
throwing an exception, so that *some* strategy for handling such
errors is *always* in place. With nested exception handlers, this
strategy can be established at the lowest feasible level for an
intelligent recovery procedure, or allowed to default to a higher-
level strategy that provides a "coarser grained" recovery. The
programmer still retains total control (if he wants to exert it), but
the exceptional-case handling does not clutter the main-line logic.

And yes, nested exception handling is certainly possible in
Standard C; there have been several implementations.
 
K

kuyper

James said:
But the point was to think of the *intuitive* meaning of
strcat, not its formally specified meaning. The standard
doesn't capture the intuitive (or, if you prefer, naive)
expectation. Which is fine by me, as I don't expect that
intuition will be sufficient for robust programming.

I think that was pretty much my point. I didn't have any intuition
about what strcat() would do when I first heard about it. I read its
specification, and expected it to operate as specified. For functions
with more generic names, such as "open", I do have some expectations
that should be met, though they're pretty vague expectations. But
"strcat"? I know it's short for "string catenate", but that's only
obvious after having read the specification. Anybody who would
abbreviate a name that way isn't giving me any intuitive expectations
about what they might have intended by the name. It's not quite as bad
as "grep", but it comes close.
 
D

Douglas A. Gwyn

James said:
But the point was to think of the *intuitive* meaning of
strcat, not its formally specified meaning.

Since intuition is so subjective, more care is needed.
*String concatenation* is one thing, strcat is another.
The concatenation of "abc" with "efg" is "abcefg", but
when you're talking about the data accessed by strcat
there are also storage locations involved, not some
abstract value space. And it is not at all evident
that there is only one "right" way to handle that
storage in cases where there is overlap between input
and output.

From the point of view of run-time efficiency, if a
strcat-like function were required to produce well-
defined behavior of the kind that some seem to think
is desired, it would have to, for *every* invocation,
perform some additional testing to determine whether
there is overlap, or else it would have to use a
considerably less efficient method all the time. The
trade-off would not be acceptable to many users (who
currently don't have any problem using strcat properly).

We went through this with memcpy, and the result was
to provide a separate "better-defined" function memmove.
Note that the advent of memmove did not cause a mass
exodus away from using memcpy, because many programmers
are able to use both of them as appropriate, and cases
of potential overlap are relatively uncommon.
The standard
doesn't capture the intuitive (or, if you prefer, naive)
expectation. Which is fine by me, as I don't expect that
intuition will be sufficient for robust programming.

Indeed, we hear frequent demands that the C standard
ought to specify things so that programmers don't need
to know the specifications or think about what they're
doing. That approach to programming doesn't get one
very far before getting into trouble.
 
W

websnarf

Douglas said:
Since intuition is so subjective, more care is needed.
*String concatenation* is one thing, strcat is another.
The concatenation of "abc" with "efg" is "abcefg", but
when you're talking about the data accessed by strcat
there are also storage locations involved, not some
abstract value space. And it is not at all evident
that there is only one "right" way to handle that
storage in cases where there is overlap between input
and output.

So the correct answer is to twist your intuition to match the standard?
People's intuition do not easily map to definitionism such as this.
By creating this alternate explanation, you are directly acting against
people's intuition. And in the end you are working really hard to
defend something that doesn't have a justifiably solid defence.
From the point of view of run-time efficiency, if a
strcat-like function were required to produce well-
defined behavior of the kind that some seem to think
is desired, it would have to, for *every* invocation,
perform some additional testing to determine whether
there is overlap, or else it would have to use a
considerably less efficient method all the time.

This is utterly false on its face. I've given a solution *IN THIS
THREAD* that so obviously would run in equivalent time as the straight
forward typical non-safe method that is currently endorsed.

If such a thing were true, for example, Bstrlib would have a hard time
keeping up with the performance of the standard library (Bstrlib is
aliasing safe). Bstrlib annihilates the standard library on
performance across the board on many platforms and compilers that were
tested. This achievement does not come from nowhere of course -- just
doing a brief survey of the source code of most standard C compilers,
it is actually fairly straightfoward to outperform the standard library
on many functions, particularly string functions. I have on my
assembly examples web page a demonstration for hugely accelerating the
implementation of "strlen" over most x86 compilers -- its *much* faster
than either the expected implementation or nearly all implementations I
have actually encountered (Sun went ahead and implemented code similar
to mine for their Solaris compilers a few years ago). You cannot speak
of performance without speaking to implementation details.

You people who do not understand performance really should probably not
pretend to comment on it as if from a position of authority. In
particular you can't on the one hand claim that you can't characterize
performance of C in principle, then turn around and claim that some
change in specification will change performance, and of course,
ultimately just be plain wrong anyway.
[...] The
trade-off would not be acceptable to many users (who
currently don't have any problem using strcat properly).

Since there is no down side (except slightly increased code footprint)
there would be nothing to object to.
We went through this with memcpy, and the result was
to provide a separate "better-defined" function memmove.

That is because on hardware that existed at the time there was a
measurable difference on that function. On today's hardware, there is
no difference in performance between memmove and memcpy, BTW. This is
not true of strcat, however, which would never be slower.
Note that the advent of memmove did not cause a mass
exodus away from using memcpy, because many programmers
are able to use both of them as appropriate, and cases
of potential overlap are relatively uncommon.

That's only because it was overshadowed by the larger mass exodus
towards using Perl, Python, C++, Java, etc. People who still use C,
mostly buy into C's weaknesses and just live with it for whatever
reasons.

That's a truism that exists in a narrow field of programming languages.
My point is that it exists to a large extent in C (especially its
libraries) for no good reason.
Indeed, we hear frequent demands that the C standard
ought to specify things so that programmers don't need
to know the specifications or think about what they're
doing.

Actually what we often hear is over-generalized hyperbole from the
committee and committee apologists who feel that they don't need to
address any problems with the language.
[...] That approach to programming doesn't get one
very far before getting into trouble.

You, of course, have never attempted to program in the language
"Python".

You have also lost sight, completely, of the whole point of this
thread. The whole idea of secure programming is concerned with dealing
with programmer's inadvertent bugs. My claim, is that if you more
closely align the programming language with people's intuition and
expectation, then the number of bugs and security flaws will naturally
decrease.

TR 24731 misses this point completely, and instead just exposes the
flaws more explicitely. But you can always stuff RSIZE_MAX into the
extra length parameter, and basically gain no more security. Automated
tools can assist you in finding buffer overflow flaws, and potential
buffer overflow flaws based on old legacy code exactly as effectively
as trying to do so while porting to TR 24731. I.e., in real effect,
this proposal will actually do basically nothing. Richard Seacord has
clearly done much better by doing buffer management for you with his
proposed "managed strings" library, however, he has ignored the usage
and intuition impact.

C99 has not been widely adopted and it never will be, and its primarily
because it just doesn't offer anything that people really care about.
People care about performance, safety, scalability, and C99 offers
precious little on any those fronts, even though in this language there
is no shortage of fertile ground for expanding in all those areas. If
you want c0x to have any impact at all, you have to deliver something
on these fronts that you utter failed to do with C99. And imho neither
TR 24731, nor managed strings rise to that level which is what I'm
trying to point out.

Richard Seacord and the Microsofties simply does rise high enough to
meet the challenge, and the ANSI C committee are too blind to allow,
encourage or seek improvements anyways. The problem with these
proposals is that they don't go far enough to address the problem --
the committee reacts by saying that they go too far to solve a problem
that they don't believe exists.
 
C

CBFalconer

.... snip ...

You have also lost sight, completely, of the whole point of this
thread. The whole idea of secure programming is concerned with
dealing with programmer's inadvertent bugs. My claim, is that if
you more closely align the programming language with people's
intuition and expectation, then the number of bugs and security
flaws will naturally decrease.

And you have lost track of the reasons for having different
languages in the first place. If you want secure programming,
there are quite adequate languages for the purpose, such as Ada and
Pascal. There is no need to destroy C.
 
B

Bjorn Reese

CBFalconer said:
And you have lost track of the reasons for having different
languages in the first place. If you want secure programming,
there are quite adequate languages for the purpose, such as Ada and
Pascal. There is no need to destroy C.

What makes you think that C will be destroyed if it included better
solutions (that is, solutions with a better cognitive fit for the
majority of users) than TR 24731 and managed strings?
 
D

Douglas A. Gwyn

So the correct answer is to twist your intuition to match the standard?

No, I'm saying that intuition is a function of experience,
and thus it may vary among individuals. You have heard in
this thread from people who deny that their intuition about
the expected behavior of strcat(a,a) matches yours.

This is utterly false on its face. I've given a solution *IN THIS
THREAD* that so obviously would run in equivalent time as the straight
forward typical non-safe method that is currently endorsed.

On some platforms, compilers implement strcat, strcpy, and
similar functions using string-op microcoded instructions,
which can malfunction pretty badly when the source and
destination objects overlap. Thus to perform according to
your preferred specification, additional testing would be
necessary to detect that possibility and use alternate,
generally slower code when there would be a problem.
That's only because it was overshadowed by the larger mass exodus
towards using Perl, Python, C++, Java, etc. People who still use C,
mostly buy into C's weaknesses and just live with it for whatever
reasons.

No, obviously I was talking about the effect on C programming.
If memmove's "superior" semantics were so attractive, it would
have supplanted memcpy *among C programmers*, but it hasn't.
Indeed, we hear frequent demands that the C standard
ought to specify things so that programmers don't need
to know the specifications or think about what they're
doing.
[...] That approach to programming doesn't get one
very far before getting into trouble.
You, of course, have never attempted to program in the language
"Python".

Actually I have, but I stand by my statement that catering
to ignorance and laziness is not good for software quality.
You have also lost sight, completely, of the whole point of this
thread. The whole idea of secure programming is concerned with dealing
with programmer's inadvertent bugs. My claim, is that if you more
closely align the programming language with people's intuition and
expectation, then the number of bugs and security flaws will naturally
decrease.

Or, you could align the intuition and expectation with reality
and get that same effect. strcat(a,a) is not something that
a reasonable C programmer would think of doing.
TR 24731 misses this point completely, ...

I have my own criticism of that TR, larger on the basis that
it misses the real problem, which is quality control. Any
attempt at a technological solution to thoughtless programming
practice is doomed to fail, or at best to be an incomplete
solution. To the extent that attention is diverted from the
real causes of erroneous programs, it's a bad thing.
the committee reacts by saying that they go too far to solve a problem
that they don't believe exists.

I haven't heard anybody saying that. I have said that these
kinds of technical solutions try to solve the wrong problem.
 
W

websnarf

Douglas said:
No, I'm saying that intuition is a function of experience,
and thus it may vary among individuals. You have heard in
this thread from people who deny that their intuition about
the expected behavior of strcat(a,a) matches yours.

I have only heard from people disagree with this *after* they have been
indoctrinated by the dictates of the C standard. I have not heard of
people (in this thread or otherwise) *unindoctrinated* whose intuition
is different. In this thread, these are just people mischaracterizing
what is meant by the word intuition.

But of course there are no shortage of people in this thread who are
honest. Remember I never *defined* what I thought strcat(p,p) should
do -- the honest people knew what I meant without prompting.
On some platforms, compilers implement strcat, strcpy, and
similar functions using string-op microcoded instructions,

This has nothing to do with it ...
which can malfunction pretty badly when the source and
destination objects overlap. Thus to perform according to
your preferred specification, additional testing would be
necessary to detect that possibility and use alternate,
generally slower code when there would be a problem.

And that is incorrect. Here:

char * safestrcat (char * dst, const char * src) {
if (*src) {
char * dend = dst + strlen (dst);
strcpy (dend + 1, src + 1);
*dend = *src;
}
return dst;
}

Now tell me this cannot be translated to an optimzed solution on any
platform.
No, obviously I was talking about the effect on C programming.

So was I. It has *reduced* the number of C programmers. The problem
of memcpy() versus memmove() is minor by comparison, and programmers
left before they had to worry about such things. Some people in this
world carry things to their logical conclusion -- obviously memmove vs
memcpy is mere a single straw on the camel's back. And like any
classic public relations person, you argue for the straw.
If memmove's "superior" semantics were so attractive, it would
have supplanted memcpy *among C programmers*, but it hasn't.

First of all, its not necessarily superior if 99% of the time you know
ahead of time that the memory is not going to overlap. The C textbooks
do an incredibly shoddy job on this point, and its much like people
using p++ instead of p+=1. People just do what they are used to.

I didn't claim that memmove was necessarily superior. But
understanding the difference and using it for an analogy for people's
disingenuous claims about how they think intuition served my purposes
(its obvious from the pathetic responses in this thread that barely
anyone has a clue of how memmove is properly implemented, meaning that
implementation-based intuition is pretty ridiculous in real life.)
Indeed, we hear frequent demands that the C standard
ought to specify things so that programmers don't need
to know the specifications or think about what they're
doing.
[...] That approach to programming doesn't get one
very far before getting into trouble.
You, of course, have never attempted to program in the language
"Python".

Actually I have, but I stand by my statement that catering
to ignorance and laziness is not good for software quality.

So you are aware of how to program in a more serious programming
language, and yet you appreciate nothing of it. Then clearly this gulf
is ideological.

This isn't about this false dichotomy you cling to with so much
ferocious desperation. If a programmer is lazy there is nothing you
can do about it. But if a programmer has a finite amount of energy it
might be worth while to meet them in the middle -- especially when it
doesn't actually cost you anything (outside of your own delusions and
paranoia I mean).
Or, you could align the intuition and expectation with reality
and get that same effect.

That is why you fail.

You can't align people's intuition and expectation to anything. You
just can't do it. You can only get people to lie about it (obvious
reference to 1984); they "learn" that their intuition is wrong. You
will not get the same effect because you can't align people's
intiution. Worse yet, the attempt to do so is ridiculously expensive.
[...] strcat(a,a) is not something that
a reasonable C programmer would think of doing.

But it is something every reasonable programmer might think of doing.
Notice that the only real distinction is the word "C".
I have my own criticism of that TR, largely on the basis that
it misses the real problem, which is quality control.

Did you know that Microsoft has the largest quality control
organization of any software institution by far? They also produce the
most bugs of any software institution I have ever heard of too.

There have been real studies of this problem (I'm thinking of studies
cited by a luminary from Lucent who gave a talk on this from a few
years ago.) Post-development testing and q&a tends to capture some
percentage of bugs, but is hardly the answer; there are just too many
corner cases, and people just don't think about them from the outside.

According to these studies, the best solutions were always the ones
that lived closest to the programmer while they wre developing. The
best they found at the time was direct source code peer review (which
explains the rise of the recent "paired programming" paradigm in
"extreme programming".) But this is clearly too expensive and is going
to have a high degree of variability depending on the skill of the
reviewers.

I can attest to this, as the last serious bug I dealt with in Bstrlib
was based on a "memory overflow attack". An ordinary tester just would
not even be able to set up an appropriate test easily (I went back to
*16* bit compilers to set up my testing framework for this.) This is
one of those problems that would have lain dormant waiting to spring
its head as people started transitioning towards 64 bit systems for
standard development. The point, of course, is that nobody ever
reported this bug to me as nobody saw it fail in any test. It required
an insight by me, the developer, while reviewing the code. It was not
technically an independent review of course, except that I came back to
looking at it after a long time away from it (so it was an
approximation of "independent review".)

This leads us to obvious alternatives: 1) changing the programming
language 2) pervasive use of lint or defect detection tools 3)
modifying the language itself through libraries.

That vast majority of programmers have clearly chosen option #1.
Probably because #2 costs money and are not guarantees, and there
hasn't been much culture for #3. Changing the standard could be
address #1 and #3 simultaneously -- if only there were some motivation
to do so.
[...] Any
attempt at a technological solution to thoughtless programming
practice is doomed to fail, or at best to be an incomplete
solution.

That's why of course, they don't make automatic shifting cars. And of
course, that's why they don'tput seatbelts in cars either. Afterall
they are just doomed technologies.
[...] To the extent that attention is diverted from the
real causes of erroneous programs, it's a bad thing.

If you make the real issue disappear through avoiding it at the point
of design, then it isn't a diversion. That is to say, its possible to
solve some of these problems completely at the level of the programming
language itself. Once you consider this in the light of the
realization that programmers have a finite amount of energy with which
to produce programs, this should make the motivation for making the
programming language less error prone pretty compelling.
I haven't heard anybody saying that. I have said that these
kinds of technical solutions try to solve the wrong problem.

Yeah, well its easy to say things. Especially when you are not being
called into account for the things you say.
 
R

Richard Heathfield

(e-mail address removed) said:
I have only heard from people disagree with this *after* they have been
indoctrinated by the dictates of the C standard. I have not heard of
people (in this thread or otherwise) *unindoctrinated* whose intuition
is different. In this thread, these are just people mischaracterizing
what is meant by the word intuition.

But of course there are no shortage of people in this thread who are
honest. Remember I never *defined* what I thought strcat(p,p) should
do -- the honest people knew what I meant without prompting.

I haven't a clue what you think strcat(p, p) will do. Do you think that
makes me dishonest? Ah, but clever people know that I'm honest, so if you
think I'm dishonest, that makes you wrong and stupid.

Anyone can flame. Try constructing an argument that isn't offensive and
insulting, and maybe it'll be worth taking the time to read it.

Yeah, well its easy to say things. Especially when you are not being
called into account for the things you say.

Quite so.
 
F

Francis Glassborow

Yeah, well its easy to say things. Especially when you are not being
called into account for the things you say.

There are numerous places where I would debate your statements in the
post from which this is a quote. However the degree of heat including
direct personal attacks that litter your post lead me to bin the whole
thing. I will confine myself to a single comment:

Intuition is not something that we are born with, it is a term that we
use to refer to our expectations based on prior experience. In that
sense intuition can and certainly is 'educated.' Someone who fails to
adapt their intuition in the light of experience is old beyond their
years.
 
K

kuyper

Douglas A. Gwyn wrote: ....

I have only heard from people disagree with this *after* they have been
indoctrinated by the dictates of the C standard. I have not heard of
people (in this thread or otherwise) *unindoctrinated* whose intuition
is different. ...

Well, I believe that you could reasonably argue that anyone who knows
enough about C to be a regular participant in this newsgroup has been
"indoctrinated", That makes it a perfect excuse for ignoring anything
anyone says about the matter that is inconsistent with your own
intuition.
... In this thread, these are just people mischaracterizing
what is meant by the word intuition.

But of course there are no shortage of people in this thread who are
honest. Remember I never *defined* what I thought strcat(p,p) should
do -- the honest people knew what I meant without prompting.

It seems to me that you've just called me dishonst, and implicitly
called me a liar. On what evidence are you basing your accusation that
I was lying when I wrote: "I didn't have any intuition about what
strcat() would do when I first heard about it. I read its
specification, and expected it to operate as specified."?
You can't align people's intuition and expectation to anything. You
just can't do it. You can only get people to lie about it (obvious
reference to 1984); they "learn" that their intuition is wrong. You
will not get the same effect because you can't align people's
intiution. Worse yet, the attempt to do so is ridiculously expensive.

Realigning people's intuition is a routine, everyday event. Your
intuition is not some unalterable platonic ideal that you somehow
become mystically aware of. It is merely the subconscious expectations
you have developed based upon your prior experience, which means that
it's constantly changing as you acquire new experiences. If you have
prior experience with a language where strings are first class objects,
you might reasonably develop an intuition that says that catenating two
stings creates a new string containing a copy of the second string
appended to a copy of the first string. However, without such prior
experience, I don't see any reason why you'd have any particular
expectations about it.

When I first learned C, the three languages I already knew were Fortran
I (sic), APL, and Basic, in that order, with APL as my favorite of the
three. It's been so long since I've done any Fortran I or Basic that I
don't remember how they handled the equivalent of strcat(). Fortran I
was far more primitive than even Fortran IV - I'm not sure it even had
string catenation. APL supports string catenation with precisely the
semantics described above, but it uses the catenate operator ',' to do
it, not a specially named function, so I came to C with no prior
expectations for what a function named "strcat()" would do. If I had
relied upon my APL background, I would have interpreted strcat("hello
", "world!") as a call to a unary function with a single argument
formed by the ',' operator by catenating those two strings, with
redundant parenthesis around the argument. It's a good thing I didn't
rely on my intuition for such purposes.
[...] strcat(a,a) is not something that
a reasonable C programmer would think of doing.

But it is something every reasonable programmer might think of doing.
Notice that the only real distinction is the word "C".

By using "but" you imply that you're accepting as true the statement
that you're responding to. The combination of his statement and your
statement implies that there is no overlap between the sets of
"reasonable programmers" and "reasonable C programmers". Since the
latter set is a subset of the former set, having an empty overlap
implies that the second set is empty. Are you a C programmer?
 
B

Ben Pfaff

Richard Heathfield said:
I haven't a clue what you think strcat(p, p) will do. Do you think that
makes me dishonest? Ah, but clever people know that I'm honest, so if you
think I'm dishonest, that makes you wrong and stupid.

I think the meaning of strcat(p,p) is related to the question of
how many instances of "ana" there are in "banana". Both
questions have multiple reasonable answers.
 
R

Richard Heathfield

Ben Pfaff said:
I think the meaning of strcat(p,p) is related to the question of
how many instances of "ana" there are in "banana".

It depends on what kind of banana it is, and what language you speak.
(Curiously, according to M-W the Wolof for "banana" is "banaana".)

It seems that Mr Hsieh's interpretation of strcat(p, p) concerns a different
strcat in a different language, because sure as bananas is banaanas he
isn't talking about strcat in C. Even if he thinks he is.
 
D

Douglas A. Gwyn

Richard said:
It seems that Mr Hsieh's interpretation of strcat(p, p) concerns a different
strcat in a different language, because sure as bananas is banaanas he
isn't talking about strcat in C. Even if he thinks he is.

In fairness, he's arguing what he thinks it *should* be
(according to his notion, which he seems to think is the
only sensible or "honest" one).

One wonders whether he thinks that a general matrix
multiplication function matmul(a,b,c,l,m,n) is stupid
if it doesn't work when invoked as matmul(a,a,a,n,n,n).
The conceptual issues are pretty much the same as for
string concatenation.
 
D

David Wagner

Richard said:
It seems that Mr Hsieh's interpretation of strcat(p, p) concerns a different
strcat in a different language, because sure as bananas is banaanas he
isn't talking about strcat in C. Even if he thinks he is.

You're missing his point. Once you've been steeped in C lore, of course
you know what strcat() does. Any good C programmer does, once they've
learned the language. But learning the language, in this case, consists
of unlearning your intuition about what it means to concatenate strings.
The natural intuition about what strcat() should do would be that it
concatenates strings ("str" "cat", get it?). And there is a natural
definition of what it means to concatenate strings. Unfortunately,
C's strcat() does not use that natural definition; it does not match
the natural intuition you might have before you were exposed to C.
That's a problem, because it means that learning C requires unlearning
your intuition. Unlearning an old intuition and learning a new one
is twice as hard as learning something that you never had any prior
intuition about. Strcat() is just one example of this kind of phenomenom;
it occurs in many funny places in the C language. The folks here are
so steeped in C that they've probably forgotten what it was like to
originally learn C and be surprised by some of its oddball semantics.
Those oddball semantics were justified in the days of PDP-11 where CPU
performance was more important than making the programmer's life easier.
Today, those design choices are debateable.
 
D

David Wagner

On what evidence are you basing your accusation that
I was lying when I wrote: "I didn't have any intuition about what
strcat() would do when I first heard about it. I read its
specification, and expected it to operate as specified."?

Are you sure you really had zero intuitions or guesses about what
strcat() does before you read the manual? If I had told you that
strcat()'s semantics were to start up a flight simulator, play the
Yankee Doodle Dandy over the speakers, then erase every 7th file on the
filesystem, you wouldn't be surprised in the least? You wouldn't find
those semantics counterintuitive? If you say so, I'll believe you,
but if so, I doubt that your case is representative of the programmer
population at large. It's only natural to look at the name "strcat",
recognize that it is referring to concatenation of strings (anyone
who has used /bin/cat on Unix should know what "cat" is short for),
make a guess that maybe strcat() concatenates strings -- and then have
your guess be proven wrong. That's the sense in which strcat() doesn't
really match the natural intuition.

Of course, for those rare folks who approach strcat() with absolutely
zero intuitions, guesses, or preconceptions about its semantics before
reading the manual, it doesn't matter what semantics we assign to that
function, as long as we document them. But I would bet that the majority
of programmers start off with some intuitions or guesses about what a
function like strcat() does, just from its name, and if we violate those
intuitions, then that has a cost. It leads to programmer surprise, and
may lead to increased incidence of bugs. We shouldn't incur those kinds
of costs unthinkingly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,217
Latest member
IRMNikole

Latest Threads

Top