managed string library

Guest · Sep 19, 2006

Douglas said:
In fairness, he's arguing what he thinks it *should* be
(according to his notion, which he seems to think is the
only sensible or "honest" one).

One wonders whether he thinks that a general matrix
multiplication function matmul(a,b,c,l,m,n) is stupid
if it doesn't work when invoked as matmul(a,a,a,n,n,n).
The conceptual issues are pretty much the same as for
string concatenation.

Not stupid, but unintuitive. Which may be acceptable if the advantages
are high enough.

Michal Necasek · Sep 19, 2006

David said:
Are you sure you really had zero intuitions or guesses about what
strcat() does before you read the manual?
>

I thought it was for stringing cats together. The name clearly
suggests that, doesn't it?

It's only natural to look at the name "strcat",
recognize that it is referring to concatenation of strings (anyone
who has used /bin/cat on Unix should know what "cat" is short for),
>

There are quite a few C programmers who aren't all that familiar with
UNIX... any assumptions about familiarity with 'cat' are likely unfounded.

Michal

Douglas A. Gwyn · Sep 19, 2006

David said:
...It's only natural to look at the name "strcat",
recognize that it is referring to concatenation of strings (anyone
who has used /bin/cat on Unix should know what "cat" is short for), ...

Good example; what should "cat a > a" do?

There is a big difference between abstract values in some
mathematical model space and actual behavior where real
devices have to be used to implement some approximation to
that model. Naturally it is nicer the closer these match,
but in reality there are often choices to be made and
various trade-offs to be evaluated. A computing
professional can reasonably be expected to learn the
documented properties of the components with which he works.

Richard Heathfield · Sep 19, 2006

Michal Necasek said:

There are quite a few C programmers who aren't all that familiar with
UNIX... any assumptions about familiarity with 'cat' are likely unfounded.

Yup. I cut my C teeth in MS-DOS, and then moved on to Windows. I'd probably
been doing C for - oh, maybe eight or nine years before I finally settled
into a decent Linux distro. Slow thinker that I am, I was about to say that
I didn't know what /bin/cat is, but I just worked out that David means
cat(1), which I generally (mis?)use as a clone of MS-DOS's TYPE command.
So, for me, cat(1) means "clueful advanced type"!

David has a point that many of us probably don't remember when we first
learned strcat. But, being slower of thought than most other geniuses, I
was a slave to the docs when learning C - admittedly the Turbo C reference
manual rather than anything authoritative - and so I read up very carefully
on each function before trying to use it. Not always carefully enough,
alas, but that's humans for you.

But in a way, that's neither here nor there. I'm not overly interested in
what neophytes assume about C, except insofar as understanding those
assumptions helps me to help them when they seek such help. I'm certainly
not interested in changing the language for the sake of making C easier to
learn. It's already as simple as it can reasonably be expected to be (less
so since C99, though), and I think Einstein had exactly the right attitude
about simplicity.

No, what I object to is Mr Hsieh's apparent assumption that anyone who
disagrees with him is either stupid or dishonest. Just because /his/
intuition leads him to a particular conclusion, that doesn't mean that
other people who are at least as bright and honest as he is will
necessarily be led to the same conclusion by /their/ intuition.

kuyper · Sep 19, 2006

David said:
Are you sure you really had zero intuitions or guesses about what
strcat() does before you read the manual?

Yes. I find nothing intuitive about that name, it clearly says string
catenation, but those words never did, and still don't, convey a unique
meaning to me.

... It's only natural to look at the name "strcat",
recognize that it is referring to concatenation of strings

Yes, after reading the documentation I realized that this is what
strcat() was an abbreviation for, but it's not something I would
consider obvious without the other str*() functions as analogs. All of
the str*() functions are described on the same man page, so I was
introduced to all of them at the same time.

Even knowing that it does string catenation doesn't resolve the issue.
Does it work on null-terminated strings, counted strings, or strings
delimited by '"' characters? Is it an in-place catenation, or a
catenation that puts the result in a new location - I know of
non-programming uses of the word catenate that could justify either
expectation. If it is an in-place catenation, how does it handle the
possibilty of overlap between the input strings? If it did use a new
location, is that location user-provided, static, or dynamically
allocated)? Until I read the documentation for strcat(), I didn't know
the answers to any of those questions, and had no particular
expectations. After I read the documentation, I knew, and didn't have
any seriously pre-conconcieved expectations that needed to be revised.
If I'd ever used another library before which had a function named
something like strcat(), it would be a different matter, I'd expect
similar behavior to whatever that other library did. But I got all of
my initial expectations for strcat() from the documentation.

Richard Heathfield · Sep 19, 2006

Douglas A. Gwyn said:

Good example; what should "cat a > a" do?

So I was curious...

me@here:~> cat > del.me
Now is the time for all good men to party.
me@here:~> cat del.me > del.me
cat: del.me: input file is output file
me@here:~> cat < del.me > del.me
me@here:~> cat del.me
me@here:~>

I was impressed that cat was able to detect the first collision. I didn't
expect it. So I was half-expecting it to be able to handle the second, too,
and mildly disappointed when it couldn't.

A computing
professional can reasonably be expected to learn the
documented properties of the components with which he works.

You'd've thunk so, wouldn't you? But nowadays, many computing professionals
can't even work out which way up an email is supposed to go. Sigh.

David Wagner · Sep 20, 2006

Richard said:
David has a point that many of us probably don't remember when we first
learned strcat. But, being slower of thought than most other geniuses, I
was a slave to the docs when learning C - admittedly the Turbo C reference
manual rather than anything authoritative - and so I read up very carefully
on each function before trying to use it. Not always carefully enough,
alas, but that's humans for you.

I wonder if part of the issue here is that folks on this newsgroup are
not representative of the programmer population at large. The folks
on this newsgroup are incredibly knowledgeable about C. Most of the
folks who post here can be fairly characterized as C gurus -- you all
are 3 sigmas out. But many C programmers are not C gurus. Anyone can
design a language where it is _possible_ for gurus to program securely.
Anyone can design a language where it is possible for people who are 3
sigmas out and who have memorized the official language specification to
program securely. But the important challenge is to design a language
where most programmers -- say, programmers who are 1 sigma out, but who
aren't necessarily experts on the official C spec -- can build programs
securely. The real challenge is to design languages that maximize the
chances that programs built by ordinary mortals will be secure.

I think there is a temptation to assume that the everyone in the world is
like you. There is a temptation to design the language to optimize for
a user audience who looks just like the designers. But that temptation
is dangerous, because the language designers (and the folks who post to
this newsgroup) are not representative of programmers at large.

Too many of the posts here seem to draw a false dichotomy: either you are
a C guru (of the level of folks who post here), or else you are a clueless
neophyte. But in reality, things are not black-and-white. There is
an enormous population of programmers who are not clueless neophytes,
but who are also not experts on the esoterics of the official C99 spec.
Just because they don't have all the corner cases of the spec memorized
doesn't mean that they are idiots. I submit that we should be thinking
about how to design our languages and libraries with those vast majority
of programmers in mind as our intended user audience. Most programmers
are well-intentioned but not infallable. Many programmers are expert
in one area or another, but not necessarily experts on the C spec.

I suggest that we should be thinking about how to design the language
and libraries to minimize the chances that these programmers will
inadvertently introduce security bugs and to maximize the chances that
the code they write will be secure. Among other things, this involves
choosing our APIs (both the semantics of the interfaces, and the names of
the interfaces) to minimize the cognitive burden, to make the semantics
as intuitive as possible, and to make the names help to remind you of
the actual semantics as much as possible. Maybe it's too late to make
good choices here for C. But even if we're stuck with the choices that
were made long ago, we should try to appreciate the costs of those choices
forthrightly -- not downplay or ignore them.

P.S. I'm not necessarily talking about making it easier to learn the
language. I'm talking about making it easier to avoid making mistakes
in the language, and about making it easier to avoid inadvertently
introducing bugs and security holes into your code. Well-chosen libraries
can help with that. Making the names match the semantics can help
with that. Choosing the semantics to be as intuitive as possible can
help with that. This is just a matter of good taste, good design,
and good engineering.

No, what I object to is Mr Hsieh's apparent assumption that anyone who
disagrees with him is either stupid or dishonest. Just because /his/
intuition leads him to a particular conclusion, that doesn't mean that
other people who are at least as bright and honest as he is will
necessarily be led to the same conclusion by /their/ intuition.

Understandable. I agree with you here. But I also think he has some
valid points that maybe people haven't fully appreciated, and so I wanted
to call out those points that do seem like they are worth discussing.

Richard Tobin · Sep 20, 2006

Richard Heathfield said:
me@here:~> cat del.me > del.me
cat: del.me: input file is output file
me@here:~> cat < del.me > del.me

I was impressed that cat was able to detect the first collision. I didn't
expect it. So I was half-expecting it to be able to handle the second, too,
and mildly disappointed when it couldn't.

The second case can make sense, or rather cat can't so easily tell
that it doesn't, because it doesn't open both the files. When cat's
input and output are both provided by the operating system, the source
and destination could be in the same file and yet not overlap.
Suppose that standard input is positioned 100 bytes from the end of a
file that's more than 200 bytes long, and standard output is
positioned at the beginning. I would expect the last 100 bytes to be
copied to the beginning, and this works on the machines I tried it on:

(cp junk junk1; (dd bs=100 count=2 >/dev/null ; cat) <junk1) >junk1

(junk1 should contain, say, 100 a's, 100 b's, and 100 c's. The cp is
necessary to get something in the file junk even though the shell
redirection empties it).

In your first case, cat is always copying from the beginning of the
file, so the source and destination are bound to overlap. I suppose
cat could examine the initial file offsets to determine that your
second case can't work either.

-- Richard

Michael Mair · Sep 20, 2006

David said:
Are you sure you really had zero intuitions or guesses about what
strcat() does before you read the manual? If I had told you that
strcat()'s semantics were to start up a flight simulator, play the
Yankee Doodle Dandy over the speakers, then erase every 7th file on the
filesystem, you wouldn't be surprised in the least? You wouldn't find
those semantics counterintuitive? If you say so, I'll believe you,
but if so, I doubt that your case is representative of the programmer
population at large. It's only natural to look at the name "strcat",
recognize that it is referring to concatenation of strings (anyone
who has used /bin/cat on Unix should know what "cat" is short for),
make a guess that maybe strcat() concatenates strings -- and then have
your guess be proven wrong. That's the sense in which strcat() doesn't
really match the natural intuition.

Of course, for those rare folks who approach strcat() with absolutely
zero intuitions, guesses, or preconceptions about its semantics before
reading the manual, it doesn't matter what semantics we assign to that
function, as long as we document them. But I would bet that the majority
of programmers start off with some intuitions or guesses about what a
function like strcat() does, just from its name, and if we violate those
intuitions, then that has a cost. It leads to programmer surprise, and
may lead to increased incidence of bugs. We shouldn't incur those kinds
of costs unthinkingly.

In principle, I agree with you.
However, I had not yet had a single English lesson when I first
encountered C and certainly had not heard of Unix and vaguely about
MS-Dos.
I read the C book I had and tried to understand the explanations
in the appendix. As I mostly worked on system level, I did not use
much of the C standard library apart from sprintf() and some math
functions...
I was perfectly happy with the semantics I found because the
identifiers meant nothing at all to me.
Nowadays, I am _not_ perfectly happy if I find a thing that uses
the same name for semantics different from all other languages I
know. I accept it, though.

If there were a real need to change the semantics of the standard
library as is, then I'd support making names clear, throw out
unnecessarily dangerous functions and lifting unnecessary
restrictions.
Adopting one or more sensible libraries as standardised add-ons
to a hosted implementation to provide convenient,
easy-and-safe-to-use standardised ways of doing certain things
would have my approval, too, even though some care would be
necessary to make sure that programmers do not lock themselves
into too small a set of implementations by choosing certain
libraries.

As it is, I do not expect C to evolve any more (at least not
in an accepted way) from "C95 plus most popular parts of C99".

Cheers
Michael

David Wagner · Sep 20, 2006

Douglas said:
There is a big difference between abstract values in some
mathematical model space and actual behavior where real
devices have to be used to implement some approximation to
that model. Naturally it is nicer the closer these match,
but in reality there are often choices to be made and
various trade-offs to be evaluated.

Well, it sounded to me like Mr. Hsieh was trying to argue that
those alleged tradeoffs have been mischaracterized: that the
claims about the disadvantages of nice semantics are inaccurate.

As you say, it would be nicer if the actual semantics of strcat()
matched the natural mathematical semantics. It's nicer, because the
closer the match between actual and expected semantics, the greater the
likelihood that the program will be correct; conversely, the greater the
mismatch, the greater the chance of inadvertent correctness bugs due to
programmer confusion. So, obviously, we would prefer nicer semantics
over less nice semantics, all else being equal.

strcat() uses not-so-nice semantics. As I understand it, the standard
explanation why is because the performance costs of the nice semantics
would be too high -- or so it is claimed, anyway. As I understand
Mr. Hsieh's point, he is disputing that claim. He is arguing that
in fact the performance costs of the nice semantics (compared to the
current un-nice semantics) for strcat() are negligible. He is saying
that his string implementation manages to both use nicer semantics and
get better performance than the current string library. In other words,
he is calling into question the design decisions made in the current
string library.

I think Mr. Hsieh raises some important points. Nitpicks about
the meaning of intuition seem to me to miss the point. Claims that
programmers should "just read the manuals" seem to me to miss the point.
These design choices have consequences, and trying to shift all the
blame to the programmer does not seem like a fully satisfactory response
to me.

(I'm reminded of several plane crashes triggered by human factors flaws
in the pilot's user interface. Plane manufacturers love to blame those
crashes on "pilot error", but sometimes the root cause is that the user
interface was poorly designed and most pilots couldn't have been expected
to get it right. Transfering blame onto the pilot may the best way to
save the manufacturer money, but these blame transfers aren't always
the best way to make aviation safer.)

Now it may be that, due to legacy considerations, we are stuck with a
sub-optimal string library, and even though better solutions exist, it's
not practically viable to adopt these better solutions. It may be that
there is no hope for improving the C language in this regard. It may be
that, in retrospect, the C language has flaws that weren't recognized
at the time it was designed that and contribute to many security holes
and correctness bugs. All of those things may well be the case. If so,
I think it would be most intellectually honest to admit up front that the
criticisms of the C standard have some validity, even if you believe it
isn't viable to fix those deficiencies at this point.

websnarf · Sep 20, 2006

Well, I believe that you could reasonably argue that anyone who knows
enough about C to be a regular participant in this newsgroup has been
"indoctrinated", That makes it a perfect excuse for ignoring anything
anyone says about the matter that is inconsistent with your own
intuition.

Because we are seeing two clear categories of responses. Those that
just tell the obvious truth, and those that wish to continue to pursue
this ridiculous contrarian position as if aliasing is something
intertwined into people's intiution.

This categorization isn't just some dellusion on my part -- I did not
define what I thought strcat(p,p) should do, yet many people figured it
out without any issue at all (it requires nothing more than honesty)
and on the other hand when I pressed the contrarians about what their
alias-based intuition is all about what do I see?

1) Claims that solutions that satisfy "Hsieh's intiution" (which is so
far still not explicitely declared in this thread, BTW) would be
"slower". Utterly false.
2) Claims that there is only one obvious implementation (and it can't
deal with aliasing) -- I gave one two posts ago that's pretty tight
that doesn't suffer any anti-intuition problem, and even satisfies the
current C specification.
3) Claims that "solving the problem" would require either special
detection or copying through auxilliary buffers.

Think about where your head has to be to be making such fallacious
statements. People who are indoctrinated by some ideology that is
false, can usually be exposed through any simple test of credibility
such as this. In Douglas A. Gwyn's case, his denial extended even
*past* the point where this was made clear (a case of unshakable
indoctrination). So I don't think my categorization is unjustified.

It seems to me that you've just called me dishonst, and implicitly
called me a liar. On what evidence are you basing your accusation that
I was lying when I wrote: "I didn't have any intuition about what
strcat() would do when I first heard about it. I read its
specification, and expected it to operate as specified."?

Did your first reading of the specification include a detailed
description about aliasing? Because that's the way I learned strcat
too -- I read the documentation. As I've pointed out either in this
thread or in others, the vast majority of documentation about strcat
*today* omits mention of the aliasing affect (msdn appears to be
updated with this information -- however this documentation is very new
relative to when I learned C.)

The first few lines of any description of strcat tells us: "The strcat
function appends a copy of the string pointed to by src (including the
terminating null character) to the end of the string pointed to by
dst", we scroll down to the example they give, and we quickly form an
idea about what this function does. Now if *they* were dilligent and
*you* were dilligent, then you can read about how aliasing can screw
you in more documentation, but that clearly comes *after* your initial
understanding.

I.e., your intuition should kick in before you think about aliasing,
which is treated as a big nasty corner case for the whole language (and
thus commonly omitted in the standard documentation).

Realigning people's intuition is a routine, everyday event. Your
intuition is not some unalterable platonic ideal that you somehow
become mystically aware of.

Yeah, well buffer overflows being added into code even to this day is
also an everyday event. I am suggesting there is a link between those
two things.

[...] It is merely the subconscious expectations
you have developed based upon your prior experience, which means that
it's constantly changing as you acquire new experiences. If you have
prior experience with a language where strings are first class objects,
you might reasonably develop an intuition that says that catenating two
stings creates a new string containing a copy of the second string
appended to a copy of the first string.

Actually having to *copy* the string is an implementation based
understanding. People are more likely going to think of strings in
terms of their *contents*. I.e., the contents of the first string,
appended with the second as a whole then is stored in the destination
variable. This is pretty much how I think about it in whatever
language I am using -- including the *first* language I learned.

[...] However, without such prior
experience, I don't see any reason why you'd have any particular
expectations about it.

When I first learned C, the three languages I already knew were Fortran
I (sic), APL, and Basic, in that order, with APL as my favorite of the
three.

For me it was BASIC, Fortran, Assembly, Pascal and Logo. I even *knew*
about the general aliasing problem because of assembly. (But assembly
is clearly in a special category -- you learn about aliasing from the
ground up.)

[...] It's been so long since I've done any Fortran I or Basic that I
don't remember how they handled the equivalent of strcat().

The Fortran language itself bans aliasing of any kind at the source
level. Basic can never have undefined, or truly bizzare behavior from
operations within its own syntax (Basic does not support any kind of
concept of pointers, beyond "peek" and "poke" which are obviously
extensions.)

[...] Fortran I
was far more primitive than even Fortran IV - I'm not sure it even had
string catenation. APL supports string catenation with precisely the
semantics described above, but it uses the catenate operator ',' to do
it, not a specially named function, so I came to C with no prior
expectations for what a function named "strcat()" would do. If I had
relied upon my APL background, I would have interpreted strcat("hello
", "world!") as a call to a unary function with a single argument
formed by the ',' operator by catenating those two strings, with
redundant parenthesis around the argument. It's a good thing I didn't
rely on my intuition for such purposes.

Well, I pretty quickly saw the difference between syntax and semantics,
since no two languages I encountered in my early programing days seemed
to be the same. I'm sorry you let that subvert your intuition --
especially since there's no good reason for that to have occurred.

[...] strcat(a,a) is not something that
a reasonable C programmer would think of doing.

Click to expand...

But it is something every reasonable programmer might think of doing.
Notice that the only real distinction is the word "C".

Click to expand...

By using "but" you imply that you're accepting as true the statement
that you're responding to.

That's because "reasonable" is subjective, so its pointless to argue
against the point directly.

[...] The combination of his statement and your
statement implies that there is no overlap between the sets of
"reasonable programmers" and "reasonable C programmers". Since the
latter set is a subset of the former set, having an empty overlap
implies that the second set is empty. Are you a C programmer?

So you've never seen a "reductio ad absurdum" argument before? Here:
http://en.wikipedia.org/wiki/Reductio_ad_absurdum

David R Tribble · Sep 20, 2006

kuyper said:
When I first learned C, the three languages I already knew were Fortran
I (sic), APL, and Basic, in that order, with APL as my favorite of the
three. It's been so long since I've done any Fortran I or Basic that I
don't remember how they handled the equivalent of strcat(). Fortran I
was far more primitive than even Fortran IV - I'm not sure it even had
string catenation. APL supports string catenation with precisely the
semantics described above, but it uses the catenate operator ',' to do
it, not a specially named function, so I came to C with no prior
expectations for what a function named "strcat()" would do.

It's been years (decades) since I wrote any BASIC code, but it
does allow string concatenation:
LET A$ = A$ + A$
or, in some variants:
LET A$ = CONCAT$(A$, A$)

Ironically, even though BASIC has its roots in FORTRAN,
the former treats strings more like first-class objects than
the latter. FORTRAN strings are essentially the same as
C strings, i.e., fixed-length arrays of characters. You might
say that FORTRAN strings are to C char[]'s as BASIC strings
are to C++ std::strings.

Another very important point to be made is that C and FORTRAN pass
strings by reference, whereas other languages pass strings by
value. Thus func(a) can affect the contents of 'a' in the former but
not in the latter languages. I.e., strcat(s, s) affects the value of
s, but CONCAT(A$, A$) does not change A$.

Which is the whole point: you can't use preexisting assumptions
from one language to guide you too deeply in learning a new one.

-drt

websnarf · Sep 20, 2006

Douglas said:
In fairness, he's arguing what he thinks it *should* be
(according to his notion, which he seems to think is the
only sensible or "honest" one).

One wonders whether he thinks

You know I'm still in the room.

[...] that a general matrix
multiplication function matmul(a,b,c,l,m,n) is stupid
if it doesn't work when invoked as matmul(a,a,a,n,n,n).

Actually in this case its absolutely clear. Matrix multiplication is
of great concern to *non-programmers*. The mountain of steeped
intuition that you would be trying to undo by not supporting aliasing
is just enormous. Mathematics is not going to yield to computer
science (though * as multiplication has crept into publications, this
is clearly superficial) or more specifically bad computer
implementations. For example, in Fortran (the language many science
people start from) they already deal with the issue by disallowing
aliasing at the source level.

And once again, you are not thinking correctly about the performance.
The cost for getting it right is trivially small in comparison to
ignoring the aliasing case. In this case you really do want to detect
and copy -- but the cost of the main core is high enough that that's
just not going to make a difference.

The conceptual issues are pretty much the same as for
string concatenation.

Uh -- no, you just wish this for some reason.

You can implement an aliasing safe strcat() *TODAY* even compliant with
the standard as it exists, at no cost at all. And yet people don't
because people accept the shabby state of the C standard for some
reason. Its apathy all around, and for no good reason.

Matrix multiplication is a completely different issue -- that's a
concern for people who start from their mathematical intiution and
*WILL NOT* bend their understanding no matter you tell them. You could
put it in the standard if you wanted, but people would just balk at it
for sure. In this case people actually care (I mean subject the
assumption that they care about C for some reason); you just wouldn't
get away with it.

kuyper · Sep 20, 2006

....
1) Claims that solutions that satisfy "Hsieh's intiution" (which is so
far still not explicitely declared in this thread, BTW)

I thought that you yourself had pretty explictly stated the behavior
that you consider intuitive, in your message containing the header
"Date: 27 Jul 2006 14:00:07 -0700":

"Most people would intuitively think of this as simply replacing the
string with a doubled version of itself -- i.e., its analogous to the
C++ expression p += p for std::string's (and to be honest, I don't know
if that's legal or not), or just p = p + p in most other programming
languages."

Did your first reading of the specification include a detailed
description about aliasing?

It was 1979, and I no long have access to the precise text that I was
reading at the time to verify whether or not it's description was
complete. However, I do know that I never even considered using
strcat() in the fashion that you've called intuitive, so I suspect that
I must have learned about the restrictions somewhere along the way, and
almost certainly from that text. I don't believe it was as a result of
trial and error learning; I've generally seen little point in using
strcat() - the in-place catenation it performs has seldom been what I
needed. I've generally found sprintf() more convenient for most of the
purposes I might otherwise have used strcat() for.

... Because that's the way I learned strcat
too -- I read the documentation. As I've pointed out either in this
thread or in others, the vast majority of documentation about strcat
*today* omits mention of the aliasing affect (msdn appears to be
updated with this information -- however this documentation is very new
relative to when I learned C.)

I can't speak of "the vast majority of documentation", i only have
access to a small variety of different sources of documentation, but
the documentation I have access to does describe the problem clearly
and accurately.

....

Well, I pretty quickly saw the difference between syntax and semantics,
since no two languages I encountered in my early programing days seemed
to be the same. I'm sorry you let that subvert your intuition --
especially since there's no good reason for that to have occurred.

There was no "subversion", just the accumulation of additional
experiences from which to form my expectations for future experiences.

av · Sep 20, 2006

Re: meaning of strcat(p,p)

char* strcat(char* a, char* b)
{char *p, *h;
if(a==0 || b==0) return 0;
for(p=a; *p; ++p) ;
h=p
if(a==b) {la:;
for( ; b<h; ) *p++=*b++;
*p=0;
return a;
}
else while( *p++=*b++ )
if(b==a) goto la;
return a;
}

not tested...
0 for error
and "strcat(p,p);" is ok (if p has right mem space)
don't know for strcat(p, p+4); or strcat(p+4, p);

av · Sep 20, 2006

Re: meaning of strcat(p,p)

better this

char* strcat(char* a, char* b)
{char *p, *h;
if(a==0 || b==0) return 0;
for(p=a; *p; ++p) ;
h=p
if(a==b) {for( ; b<h; ) *p++=*b++;
la:;
*p=0;
return a;
}
else while( *p++=*b++ )
if(b==h) goto la;
return a;
}

not tested...
0 for error
and "strcat(p,p);" is ok (if p has right mem space)
don't know for strcat(p, p+4); or strcat(p+4, p);

av · Sep 20, 2006

Re: meaning of strcat(p,p)

better this

char* strcat(char* a, char* b)
{char *p, *h;
if(a==0 || b==0) return 0;
for(p=a; *p; ++p) ;
h=p
if(a==b) {for( ; b<h; ) *p++=*b++;
la:;
*p=0;
return a;
}
else while( *p++=*b++ )

if(b==h) {--p; goto la;}

av · Sep 20, 2006

if(b==h) {--p; goto la;}
wrong

Douglas A. Gwyn · Sep 20, 2006

David said:
Another very important point to be made is that C and FORTRAN pass
strings by reference, whereas other languages pass strings by
value. Thus func(a) can affect the contents of 'a' in the former but
not in the latter languages. I.e., strcat(s, s) affects the value of
s, but CONCAT(A$, A$) does not change A$.

Yes, in C programming aliasing (storage) issues are important,
and an abstract-value model is of no help in resolving them.

James Antill · Sep 20, 2006

I haven't a clue what you think strcat(p, p) will do.

I'd assume that Paul is using strcat() to mean "string concatenation".
Where, I'd argue, that most people expect an atomic concatenation. Eg.
given:

x = "abcd"
y = x + x
x += x

....I'd argue that most people expect x to have the same value as y.
Stupidly converting that to C, you get:

char x[9];
char y[9];

strcpy(x, "abcd");
sprintf(y, "%s%s", x, x);
strcat(x, x);

....but, of course, the last call is bad.

And, to be fair to Paul, I've seen my share of C code that something
like:

strcpy(x, x + 1);
sprintf(x "%s%d", x, foo);

....under similar ignorance of the C API specifications.

Managed String Library	7	Jun 14, 2006
secure integer library	40	Aug 17, 2006
CERT C Secure Coding Standard - last call for reviewers	118	Mar 20, 2008
The lcc-win string library	31	Nov 3, 2008
Container library (continued)	21	Dec 25, 2009
Problems in a managed Visual C++ project (CLR Class Library)	3	Jul 9, 2007
CERT C Programming Language Secure Coding Standard	7	Aug 31, 2006
String buffer overruns?	8	Feb 27, 2012

managed string library

Guest

Michal Necasek

Douglas A. Gwyn

Richard Heathfield

kuyper

Richard Heathfield

David Wagner

Richard Tobin

Michael Mair

David Wagner

websnarf

David R Tribble

websnarf

kuyper

av

av

av

av

Douglas A. Gwyn

James Antill

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads