Bounds checking functions

CBFalconer · Feb 27, 2008

Randy said:
CBFalconer wrote
.... snip ...

Not to mention having a name (starting with str) that is not to
be used if not in the standard. Apparently arguing about this
only counts when used by functions that folks don't think should
be part of standard C, because they get flagged over it, but for
other functions, like strlcpy() nobody seems to object.

My release has documentation mentioning that problem, and what to
do to comply.

muntyan · Feb 27, 2008

(e-mail address removed) wrote:

char buf[MAX_PATH];
strcpy(buf, dir);
strcat(buf, file);
The latter seems worse, but the real better alternative is to avoid both,
and for that you shouldn't use that infamous buf[SMART_SIZE] in the first
place. And that's the only case where "safe" functions can really help,
when you can do sizeof(buf).

Click to expand...

This scenario accounts for the vast majority of cases where I handle
"strings". 8, 64, 255, 1024; these are magic numbers that litter innumerable
RFCs and other standards. In many cases its possible to be given input which
doesn't fit the constraint, and often its OK to reject the input. But, point
being, I size char buffers in structures using these values. Having a
statically sized buffer of 64 or 255 bytes, or even 1024, is usually easier
and faster and, probably, safer than dynamically managing that particular
piece of memory. All things being equal, less code is safer code.

Now, much of the time I already know the size of the input before copying.
But there are often cases where the design has an [intentional] gap, and
you're passed a string without a size, often at the junction where a library
or component exposes itself to code usage which expects a more canonical
string interface--just a pointer. In such instances, strlcpy is priceless.

I guess if you do have a fixed size buffer, then yes, you want to be
sure
you don't write more than its size. But I doubt there are many cases
where
you can do with a fixed size buffer. For example: MAX_PATH has bogus
value at least on one major implementation, and so you can't have
MAX_PATH
as an upper bound for filename length. Or LINE_MAX - won't work on
windows
with a file which has unix line endings (and so *user* will have to
fix it).
But I don't have much experience in different domains, and YMMV.

The utility of strlcpy is tacitly recognized and reflected in the signature
of C99's snprintf.

I agree. Particularly the notion that "bounds checking" is some sort of
exceptional or uncharacteristic quality of general programming hygiene.

These new interfaces certainly don't do bounds checking for you. They merely
alleviate a small part of the burden in some circumstances.

Well... lie or not, it's something to aspire to. That sort of defeatist (as
opposed to pragmatic) attitude can't be helpful.

Um, I believe it's perfectly pragmatic attitude that it's hard to
write
correct code. Programming is hard. It's not defeatist. Defeatist is
saying
"I won't write correct code anyway, so I'll be as careless as I like".

Certainly it sounds a bit
presumptive. Knuth and Berstein haven't written many checks. It goes without
saying that nobody's perfect, though.

Note that I said two things there: "here" and "not hard".
If you quote Knuth saying in comp.lang.c how it's not hard
to write programs which handle strings in C (that is, write
correct code), I'll take my words back

Yevgen

muntyan · Feb 27, 2008

When did I rant about a Microsoft proposal? A simple link will do.

Then why not just introduce them as an open source library that
provides these wrappers? If they wanted them to be widely adopted and
quickly, this would be out there. Where can this library be
downloaded?

Good one!

jacob navia · Feb 27, 2008

Good one!

lcc-win implemented most of it. I could put this in the public
domain.

The safe string library from microsoft, was open source.

Randy Howard · Feb 27, 2008

lcc-win implemented most of it. I could put this in the public
domain.

The safe string library from microsoft, was open source.

Oh good, the problem is solved then. So why are you complaining about
it?

Micah Cowan · Feb 27, 2008

William said:
Knuth and Berstein haven't written many checks.

http://en.wikipedia.org/wiki/Knuth_reward_check

....As of March 2005, the total value of the checks signed by Knuth was
over $20,000...

Still, the fact that he's confident enough to offer the cash reward in
the first place is a pretty big deal.

I agree with Yevgen's general point that it is far too difficult to
write correct C programs. Even doing it 80-90% of the time, as most
regulars here can probably manage, is itself a noteworthy accomplishment.

I think that part of being a good programmer, then, is to limit the
opportunities you have to make those mistakes. Set up frameworks to do
all the "good habit" stuff for you, so that you don't have to be
constantly avoiding "bad habit" stuff yourself (if you have to avoid a
mistake 999 times, the 1000th time you may fail to avoid it). This is
why, when it matters, many programs and packages will use their own
string-handling frameworks that do exactly that. The better you
encapsulate/hide away the details of managing buffer sizes, resizing,
concatenation, comparison, etc, the more you can focus on doing other
things.

All that being said, I fail to see how strlcpy() or strcpy_s() help the
matter much. They aren't appreciably easier to use correctly, by which I
mean that they are approximately as prone to "bad habit" problems as
strcpy() is. They certainly don't hide the details of managing buffer
sizes, and you still have that opportunity to mess up on that 1000th
time you use it. And I certainly resent being told otherwise, in the
form of silly linker diagnostics, when I choose to use the more standard
of these all-unsafe facilities.

William Ahern · Feb 27, 2008

Micah Cowan said:
...As of March 2005, the total value of the checks signed by Knuth was
over $20,000...

But how many of those are for MiX and other errors in his books? I meant to
refer to things like TeX, parts of which are written in C.

Still, the fact that he's confident enough to offer the cash reward in
the first place is a pretty big deal.

I agree with Yevgen's general point that it is far too difficult to
write correct C programs. Even doing it 80-90% of the time, as most
regulars here can probably manage, is itself a noteworthy accomplishment.

But I fail to see how that's an argument _against_ including an interface
like strlcpy?

Writing a "hello world" program is harder in C than in Borne Shell, and
harder still in an assembly language.

On the flip side, in a simple "hello world" program string handling doesn't
predominate, and you may very well have persuasive reasons for writing it in
C than in Borne Shell.

I think that part of being a good programmer, then, is to limit the
opportunities you have to make those mistakes. Set up frameworks to do
all the "good habit" stuff for you, so that you don't have to be
constantly avoiding "bad habit" stuff yourself (if you have to avoid a
mistake 999 times, the 1000th time you may fail to avoid it). This is
why, when it matters, many programs and packages will use their own
string-handling frameworks that do exactly that. The better you
encapsulate/hide away the details of managing buffer sizes, resizing,
concatenation, comparison, etc, the more you can focus on doing other
things.

I agree. strlcpy(), though, fills in inevitable gaps between the standard
interfaces, traditional string handling, and whatever design or manner of
approaching the issue one takes. Seems to me that's as good a reason as any
to include strlcpy(). On top of the fact, and more to the point, that it
encapsulates the _minimal_ exact code one would normally and rightly employ
in these situations.

All that being said, I fail to see how strlcpy() or strcpy_s() help the
matter much. They aren't appreciably easier to use correctly, by which I
mean that they are approximately as prone to "bad habit" problems as
strcpy() is. They certainly don't hide the details of managing buffer
sizes, and you still have that opportunity to mess up on that 1000th
time you use it.

That's an impossible criterion. No C library, IMO, can hide the details of
buffer (aka memory, aka resource) management in C, and it's not clear to me
that off-by-ones are substantially more of an issue than NULL or dangling
pointers. They can only grease the wheels, so to speak. That is, better
weave the patterns into your code. Encapsulation being one important way to
accomplish that. But there are many levels of encapsulation, and many/most
string libraries force you to too high a level of encapsulation for what its
worth in many instances; rather than encapsulate they obsfuscate.

user923005 · Feb 27, 2008

That should teach you something. You started with C...
The problem santosh, as I have been telling in ALL this threads since
several ages, is that you can be a "good programmer" only 80-90%
of the time.

Since you are human, you will be always limited by the borders of your
circuit, the human circuit. This circuit can do things that computers
can't do, but it CAN'T do the things computers do, since it is a
DIFFERENT kind of circuit.

Specifically, the human circuit is NOT able to NEVER make a mistake,
what computers ALWAYS DO. They NEVER make "mistakes", they always do
what they are told to do EXACTLY.

This basic fact of software engineering is IGNORED by the "regulars"
here that always boast of their infallible powers.

Have you ever heard of MTBF?
http://www.computerhope.com/jargon/m/mtbf.htm

Disks have MTBF, memory has MTBF, CPUs have MTBF. Each and every
component in the computer that produces calculations eventually
fails. Sometimes the mistakes are caught and corrected (e.g. Reed-
Solomon algorithm). Sometimes the mistakes can be caught but not
repaired. And sometimes the mistakes are not even caught.

And let's not forget the Pentium fiasco:
http://www.ddj.com/184410254

user923005 · Feb 27, 2008

http://en.wikipedia.org/wiki/Knuth_reward_check

...As of March 2005, the total value of the checks signed by Knuth was
over $20,000...

Still, the fact that he's confident enough to offer the cash reward in
the first place is a pretty big deal.

I have a check from Knuth for $2.88 and it is one of my most prized
posessions. I have it framed, and it sits over my desk at work.

I agree with Yevgen's general point that it is far too difficult to
write correct C programs. Even doing it 80-90% of the time, as most
regulars here can probably manage, is itself a noteworthy accomplishment.

I think that part of being a good programmer, then, is to limit the
opportunities you have to make those mistakes. Set up frameworks to do
all the "good habit" stuff for you, so that you don't have to be
constantly avoiding "bad habit" stuff yourself (if you have to avoid a
mistake 999 times, the 1000th time you may fail to avoid it). This is
why, when it matters, many programs and packages will use their own
string-handling frameworks that do exactly that. The better you
encapsulate/hide away the details of managing buffer sizes, resizing,
concatenation, comparison, etc, the more you can focus on doing other
things.

I think that things that could be done to make C safer are probably a
good idea in the long run. Who doesn't want to remove gets() from C?

All that being said, I fail to see how strlcpy() or strcpy_s() help the
matter much. They aren't appreciably easier to use correctly, by which I
mean that they are approximately as prone to "bad habit" problems as
strcpy() is. They certainly don't hide the details of managing buffer
sizes, and you still have that opportunity to mess up on that 1000th
time you use it. And I certainly resent being told otherwise, in the
form of silly linker diagnostics, when I choose to use the more standard
of these all-unsafe facilities.

I think that software reuse is one of the better ways to reduce
defects. That is because:
1. The product is probably debugged fairly well in the first place if
you are reusing it.
2. Using a tool in a variety of settings tends to increase the
robustness because it gets tested even more thoroughly.

In C, the primary method of reuse is the library.

Gordon Burditt · Feb 27, 2008

After seeing the Secure version I/O functions thread, it occured to me

that maybe not everyone agrees with the almost universal adage that I
have heard. I have Always been told that using things like strlcpy and
other explicitly bounded functions were better than using the
non-bounded versions like strcpy.

strlcpy() is not a solution to all the world's problems. A mandate
to use it instead of strcpy() will likely result in some lazy
programmer doing something like:

#define strcpy(d,s) strlcpy(d,s,strlen(s))

which just keeps all the problems of using strcpy().

And if you've already length-checked your input (along with whatever
else is required, like checking it against a regular expression,
removing extraneous blanks, verifying a valid area code, spell-checking
it, rejecting submissions with bad 4-letter words (like "RIAA" or
"Gore"), etc.), using strlcpy() is just redundant and likely
inefficient.

Micah Cowan · Feb 27, 2008

William said:
But how many of those are for MiX and other errors in his books? I meant to
refer to things like TeX, parts of which are written in C.

Are they? I don't think he wrote any of TeX at all in C. He wrote it all
in Pascal (or, to be more accurate, he wrote it in WEB, which compiles
to Pascal).

The resulting Pascal, these days, is generally fed to a program that
compiles _that_ to C. Parts of TeTex may have been written in CWEB,
maybe, but not by him.

(I didn't mean any of that to imply that it's safer _because_ he wrote
it in Pascal rather than C, I was just disputing whether it was the case.)

Writing a "hello world" program is harder in C than in Borne Shell, and
harder still in an assembly language.

On the flip side, in a simple "hello world" program string handling doesn't
predominate, and you may very well have persuasive reasons for writing it in
C than in Borne Shell.

I don't think anyone was talking about how much work it might be to code
in C. What we were talking about was how hard it is to code _safely_ in
C. It's an entirely different question. I don't think a "Hello world"
program's safety is appreciably harder to achieve in C than it is in sh.
More complex programs are a different issue.

And of course there are reasons for choosing C over other implementation
platforms (if that weren't the case, would I be a C programmer?

).

I agree. strlcpy(), though, fills in inevitable gaps between the standard
interfaces, traditional string handling, and whatever design or manner of
approaching the issue one takes. Seems to me that's as good a reason as any
to include strlcpy(). On top of the fact, and more to the point, that it
encapsulates the _minimal_ exact code one would normally and rightly employ
in these situations.

Well, no, it doesn't. strcpy() plus a buffer check does. strlcpy() adds
one more thing: copying what it can of src to dst, regardless of whether
there was enough space for all of it, or whether that's what was wanted.

This has _never_ been what I want (usually, like Yevgen, I want to
allocate more space). I can't say it will never _be_ what I want, and I
know it's sometimes what others (apparently, including yourself) have
needed. Constrained by output limits is a legitimate case. Constrained
by input limits, IMO isn't a good one ("be liberal in what you accept").

Even with your example of RFC limits, most such limits are within the
context of mechanisms that provide ways to represent entities that do
not match those constraints. For instance, if I need to force arbitrary
text files to meet the constraints of RFC 2822, I may be using a fixed
line-buffer size, but I'm sure as hell not using strlcpy() to meet that
constraint. I'd be using quoted-printable or somesuch, instead.

And even if I'm writing an old-style tarfile with fixed block sizes and
a maximum filename length, I'd _still_ probably want to ensure I
generate a unique filename, rather than blindly truncating it.

In short, I rarely want to truncate, and when I _do_, I rarely want to
do it naively (as strlcat() will do).

I'm not against its inclusion, I just think its utility has been _way_
overblown.

And none of this has anything to do with the OP's actual question, which
was whether he'd been misled when people told him to always use
strlcpy() in preference to strcpy(). To which the answer, hopefully
obvious by now, is _yes_, he was misled. The utility of strcpy() is
_far_ more general than that of strlcpy().

And, while strlcpy() may be better than strcpy() for those limited
situations where you want a naive truncation (and don't mind its limited
portability), I don't see any basis for the claim that strlcpy() is
_safer_ than strcpy() (which, after all, is the basis for the claim that
you should always use it in preference to strcpy()). It is precisely as
easy to remember to use strlcpy() instead of strcpy(), as it is to
remember to check the buffer size before you strcpy() (the latter,
though, still gives you more options about what to do after the check
fails).

That's an impossible criterion. No C library, IMO, can hide the details of
buffer (aka memory, aka resource) management in C

struct allocator {
void * (*a_malloc)(void *, size_t);
void * (*a_realloc)(void *, void *, size_t);
void (*a_free)(void *, void *);
void *data;
};

struct str *str_new(struct allocator *);
struct str *str_cat(struct allocator *, struct str *, struct str *);
str_del(struct allocator *, struct str **);

.... etc, etc. I imagine there'd actually be versions of these same
functions that don't take the initial allocator, and just use a default one.

IMO, C++'s string classes (and many others in the standard C++ library)
handle the allocation problem in a quite general and elegant manner.
Surely a C library could emulate something similar, even if the syntax
were somewhat clunkier?

and it's not clear to me
that off-by-ones are substantially more of an issue than NULL or dangling
pointers.

Both of which can be solved fairly gracefully (to the degree they can be
solved in C) by a library with an interface such as the one I've
outlined. And off-by-ones are a pretty small subset of buffer-size
violations. Forgetting to check, using the size variable for the wrong
buffer, forgetting to initialize the size variable, are all common
mistakes. Most of these can also be solved by a general library; none of
them are solved by using strlcpy() (except "forgetting to check", but as
already mentioned, this isn't a solution, it's an indirection. Instead
of forgetting to check buffer size, it becomes forgetting to use strlcpy()).

They can only grease the wheels, so to speak. That is, better
weave the patterns into your code. Encapsulation being one important way to
accomplish that. But there are many levels of encapsulation, and many/most
string libraries force you to too high a level of encapsulation for what its
worth in many instances; rather than encapsulate they obsfuscate.

No argument there.

And I'm not saying that such a library should ever be part of the C
standard (though it might not be terrible, if done as carefully as C++
has done); what I _am_ saying, is that it would go a long way towards
solving the general issue with bounds checking, whereas strlcpy() is
only claimed to do so.

CBFalconer · Feb 28, 2008

Micah said:
.... snip ...

In short, I rarely want to truncate, and when I _do_, I rarely
want to do it naively (as strlcat() will do).

I'm not against its inclusion, I just think its utility has
been _way_ overblown.

And what is the better result? If you want to remove the leading
part of the output string, you can do that. If you want to provide
a larger buffer, you can do that. If you want to truncate, you
already did that.

CBFalconer · Feb 28, 2008

Gordon said:
.... snip ...

strlcpy() is not a solution to all the world's problems. A
mandate to use it instead of strcpy() will likely result in some
lazy programmer doing something like:

#define strcpy(d,s) strlcpy(d,s,strlen(s))

which just keeps all the problems of using strcpy().

Please avoid stripping attributions.

No, the lazy programmer can't do that. strcpy returns s. strlcpy
returns the string length strlcpy attempted to create. If it is
less than the strlen(s) parameter above, all is well. Otherwise
you know exactly how big a string object is needed. Same for
strlcat.

Micah Cowan · Feb 28, 2008

CBFalconer said:
Micah Cowan wrote:
... snip ...

(I had actually meant strlcpy(), there, but the same applies to strlcat().)

And what is the better result? If you want to remove the leading
part of the output string, you can do that.

Which is no different from what already exists.

If you want to provide
a larger buffer, you can do that.

Which is no different from what already exists. Except that, with
strlcpy(), I've wasted time copying to a buffer I didn't want to copy
into in the first place.

If you want to truncate, you
already did that.

As I've already indicated, and you quoted, I have never _wanted_ to do
that, and of the times I might want to do that, it's likely I may not
wish to do it the way strlcpy() does.

"If I don't want to truncate, I..." oh. Well I'm SOL.

Look, what do I care if you like it, find it useful, and want to use it?
I don't, and I think I've made more than a strong enough case to
justify arguing in my previous post that insisting I should is awfully
presumptuous.

And if insisting that I find it useful is presumptuous, then demanding
that I use it in all cases in preference over strcpy() (which, recall,
is the topic of this discussion) is hopelessly deluded.

Richard Heathfield · Feb 28, 2008

Micah Cowan said:

http://en.wikipedia.org/wiki/Knuth_reward_check

...As of March 2005, the total value of the checks signed by Knuth was
over $20,000...

My own cheque is for $2.56. If that's average (some are for more, and some
for less), then that's about 8000 cheques over, what, forty years? Around
200 mistakes a year - fewer than one per day. If we count that as an error
metric, then, I'd call it "very few". Who else amongst us can make so few
mistakes?

Still, the fact that he's confident enough to offer the cash reward in
the first place is a pretty big deal.
Right!

I agree with Yevgen's general point that it is far too difficult to
write correct C programs. Even doing it 80-90% of the time, as most
regulars here can probably manage, is itself a noteworthy accomplishment.

It's probably truer to say that it's difficult to write correct programs,
no matter what the language. C is pretty simple as languages go.

<snip>

Nick Keighley · Feb 28, 2008

I think that things that could be done to make C safer are probably a
good idea in the long run. Who doesn't want to remove gets() from C?

as I don't use get() I don't care.

me too

I think that software reuse is one of the better ways to reduce
defects.
http://en.wikipedia.org/wiki/Ariane_5_Flight_501

That is because:
1. The product is probably debugged fairly well in the first place if
you are reusing it.
2. Using a tool in a variety of settings tends to increase the
robustness because it gets tested even more thoroughly.

In C, the primary method of reuse is the library.

yes, great. All apple pie and motherhood. What does this have to
do with strlcpy() or strcpy_s()?

--
Nick Keighley

"To every complex problem there is a simple solution... and it is
wrong."
-- Turski

CBFalconer · Feb 28, 2008

Micah said:
(I had actually meant strlcpy(), there, but the same applies to
strlcat().)

Which is no different from what already exists.

Yes it is different. strcpy and strcat will write to unowned
memory, and blow something else up. strncpy will not always
terminate the string correctly, and will always waste time zero
filling. Also their return value is different. Note that strlcpy
and strlcat return the size of string they attempted to create, not
a char*, and that you decide about success by comparing that return
with the length you told them they had available.

Which is no different from what already exists. Except that,
with strlcpy(), I've wasted time copying to a buffer I didn't
want to copy into in the first place.

As I've already indicated, and you quoted, I have never _wanted_
to do that, and of the times I might want to do that, it's
likely I may not wish to do it the way strlcpy() does.

"If I don't want to truncate, I..." oh. Well I'm SOL.

Not if you use strlcpy/cat. Then you find out things didn't fit,
and the size needed to allow them to fit.

Look, what do I care if you like it, find it useful, and want to
use it? I don't, and I think I've made more than a strong enough
case to justify arguing in my previous post that insisting I
should is awfully presumptuous.

Nobody is insisting on anything. Your posts indicate that you have
missed some of the advantages available. I am trying to fill in
those gaps.

Randy Howard · Feb 28, 2008

Micah Cowan said:

My own cheque is for $2.56. If that's average (some are for more, and some
for less), then that's about 8000 cheques over, what, forty years? Around
200 mistakes a year - fewer than one per day. If we count that as an error
metric, then, I'd call it "very few". Who else amongst us can make so few
mistakes?

Right!

It's also a very low expense over all those years for hiring a large
number of very dedicated reviewers. That, or he wanted to make sure
people paid attention for that reason alone. Either way, it was and
remains highly effective.

Richard Bos · Feb 28, 2008

CBFalconer said:
And what is the better result?

I don't know, but I bet it can be achieved by using the perfectly
Standard strncat().

Richard

Micah Cowan · Feb 28, 2008

Randy said:
It's also a very low expense over all those years for hiring a large
number of very dedicated reviewers. That, or he wanted to make sure
people paid attention for that reason alone. Either way, it was and
remains highly effective.

All the lower, considering the majority of them are never cashed.
Heathfield, certainly, was not dumb enough to cash his.

That'd be akin to selling an olympic medal, only one that's worth a mere
$0x100.

Bounds checking and safety in C	140	Jul 29, 2007
defaults for template'd functions	5	Jan 12, 2007
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Apr 1, 2008

Bounds checking functions

CBFalconer

muntyan

muntyan

jacob navia

Randy Howard

Micah Cowan

William Ahern

user923005

user923005

Gordon Burditt

Micah Cowan

CBFalconer

CBFalconer

Micah Cowan

Richard Heathfield

Nick Keighley

CBFalconer

Randy Howard

Richard Bos

Micah Cowan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads