managed string library

R

Robert Seacord

The SEI has published CMU/SEI-2006-TR-006 "Specifications for Managed
Strings" and released a "proof-of-concept" implementation of the managed
string library.

The specification, source code for the library, and other resources
related to managed strings are available for download from the CERT web
site at:

http://www.cert.org/secure-coding/managedstring.html

The following is a brief summary of the managed string library:

The managed string library was developed in response to the need for a
string library that can improve the quality and security of newly
developed C-language programs while eliminating obstacles to widespread
adoption and possible standardization. As the name implies, the managed
string library is based on a dynamic approach; memory is allocated and
reallocated as required. This approach eliminates the possibility of
unbounded copies, null-termination errors, and truncation by ensuring
that there is always adequate space available for the resulting string
(including the terminating null character). The one exception is if
memory is exhausted; that is treated as an error condition. In this way,
the managed string library accomplishes the goal of indicating either
success or failure. The managed string library also protects against
improper data sanitization by (optionally) ensuring that all characters
in a string belong to a predefined set of "safe" characters.

rCs

--
Robert C. Seacord
Senior Vulnerability Analyst
CERT/CC

Work: 412-268-7608
FAX: 412-268-6989
 
W

websnarf

Robert said:
The SEI has published CMU/SEI-2006-TR-006 "Specifications for Managed
Strings" and released a "proof-of-concept" implementation of the managed
string library.

The specification, source code for the library, and other resources
related to managed strings are available for download from the CERT web
site at:

http://www.cert.org/secure-coding/managedstring.html

The following is a brief summary of the managed string library:

The managed string library was developed in response to the need for a
string library that can improve the quality and security of newly
developed C-language programs while eliminating obstacles to widespread
adoption and possible standardization.

I'm wondering whether or not you compared it to other available
libraries such as my own ( http://bstring.sf.net/ ) or James Anthill's
( http://www.and.org/vstr/ ) before engaging in this effort?

I understand that need for this solely from the security focus, but
your effort looks like it was born out of a very direct and narrow
approach that takes absolutely nothing else into account. Besides
being slow, the API is somewhat cumbersome, which makes inline usage
impossible. The whole charset filtering thing is not multithreading
friendly, and IMHO, a poorly focused solution to the system() problem.
Instead of hiding all the functionality in the managed string, why not
instead have a "filterstring()" function, or better yet, have a
"safesystem()" function? I have a more complete discussion in the
Bstrlib documentation (
http://bstring.cvs.sourceforge.net/*checkout*/bstring/tree/bstrlib.txt?pathrev=HEAD
, search for "Managed String Library").

Beyond just security concerns, there is 1) The "Software Crisis"
concern. This is the concern that writing software in a scalable
manner (i.e., writing millions of bug-free lines of code) is difficult
to do. 2) Performance. C's string operations often take an additional
O(n) penalty for having to implicitely call strlen(), or the
equivalent, redundantly. 3) The Clib's poor base functionality (no
insert/delete, split, replace, substring functions), additionally
hampered by its inability to deal with aliasing (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).

Obviously, I think Bstrlib fares well by such criteria. My point,
though, is that the "Managed String Library" really does not do very
well at all. I think this is important because I think it will be hard
to compell developers to use Managed Strings (or TR 24731) if it
continues to propogate *other* weaknesses of the C std library while
only attempting to solving a very narrow problem.

I.e., I believe Bstrlib to be a better *SECURITY* solution than Managed
Strings (or TR 24731, or pretty much anything else short of a change in
language altogether) because it is more *compelling* to programmers to
use for *OTHER* reasons. I.e., it is not security gained by some
additionally necessary expended effort forcused solely on security, but
rather its a better way of dealing with strings overall which happens
to supply a good security system as a bonus. Bstrlib has been put
through its paces by other programmers who use it, regardless of their
reason (safety/security is certainly not the only reason). With
Managed Strings, it appears that you are starting from scratch.

Also unlike any other solution, the Bstrlib webpage also includes a
public statement on security, which gives a whole set of security
assertions that auditors can test the library against (
http://bstring.cvs.sourceforge.net/*checkout*/bstring/tree/security.txt?pathrev=HEAD
) How do Managed Strings compare in this regard? For example, in
terms of password input/manipulation, can your proposal make guarantees
that copies of string content be kept out of the heap (from a free or
realloc) outside of programmer control?

Since I have not submitted Bstrlib to the ANSI C committee, perhaps you
see this as a moot point. But what I would suggest to you is that you
at least *study* my library first, and see what ideas from it you
should incorporate into your proposal.

While I see the merit of your intentions, I don't see your effort as
quite there yet. Especially in light of open source alternatives such
as my library. At the very least, I think you should take libraries
such as mine as a yardstick by which to compare yours in developing it.
 
J

Jonathan Leffler

[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).

What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.
 
W

websnarf

Jonathan said:
[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).

What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.

That's not an intuitive meaning. Its just an understanding of an
implementation anomoly. Perhaps for you, implementation details
changes your intuition.

Most people would intuitively think of this as simply replacing the
string with a doubled version of itself -- i.e., its analogous to the
C++ expression p += p for std::string's (and to be honest, I don't know
if that's legal or not), or just p = p + p in most other programming
languages.

You only *know* that this is not the case, because you know that strcat
is implemented as some variation of { d += strlen(d); while (*d++ =
*s++); } instead of { size_t ld = strlen(d), ls = strlen(s); memmove
(d+ld, s, ls); d[ld+ls] = '\0'; }. You know this because the first
variation is going to be faster. This is not intuition -- its just a
technical calculation.
 
A

Andrew Poelstra

Jonathan said:
[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).

What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.

That's not an intuitive meaning. Its just an understanding of an
implementation anomoly. Perhaps for you, implementation details
changes your intuition.

No, knowledge that C strings are null-terminated (which any C programmer
needs to know) suggests that intuitively. Either you calculate strlen(),
add a counter variable, and `for' your way through the string, or you
eliminate the counter and superfluous call to strlen(), and code it
efficiently.

It's more intuitive to use the more efficient, less code-intensive, and
easier-to-read version.
Most people would intuitively think of this as simply replacing the
string with a doubled version of itself -- i.e., its analogous to the
C++ expression p += p for std::string's (and to be honest, I don't know
if that's legal or not), or just p = p + p in most other programming
languages.

"Most people" are not C programmers; if you know enough to use strcat(),
you should have an understanding of how C strings work. (And indeed, I've
never seen a C textbook that introduced strcat() prior to introcuding C-
style strings.) (Although I've heard of some pretty terrible textbooks on
this group that I was fortunate enough to avoid!)
You only *know* that this is not the case, because you know that strcat
is implemented as some variation of { d += strlen(d); while (*d++ =
*s++); } instead of { size_t ld = strlen(d), ls = strlen(s); memmove
(d+ld, s, ls); d[ld+ls] = '\0'; }. You know this because the first
variation is going to be faster. This is not intuition -- its just a
technical calculation.

IMHO, _intuitively_, there is no other way to implement strcat().
 
K

Keith Thompson

Jonathan said:
[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).

What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.

That's not an intuitive meaning. Its just an understanding of an
implementation anomoly. Perhaps for you, implementation details
changes your intuition.
[...]

In this case, intuition is not necessary. If you read the standard's
description of strcat(), you'll see:

... If copying takes place between objects that overlap, the
behavior is undefined.

Any decent description of strcat() (in a man page or text book, for
example) should have similar wording; if it doesn't, that's the fault
of the author of the documentation.

If you attempt to use strcat(), or any other function, without reading
a description of how it's supposed to work, you can't reasonably
expect any particular behavior.
 
J

James Dennett

Andrew said:
Jonathan said:
(e-mail address removed) wrote:
[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).
What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.
That's not an intuitive meaning. Its just an understanding of an
implementation anomoly. Perhaps for you, implementation details
changes your intuition.

No, knowledge that C strings are null-terminated (which any C programmer
needs to know) suggests that intuitively. Either you calculate strlen(),
add a counter variable, and `for' your way through the string, or you
eliminate the counter and superfluous call to strlen(), and code it
efficiently.

It's more intuitive to use the more efficient, less code-intensive, and
easier-to-read version.
[...]

IMHO, _intuitively_, there is no other way to implement strcat().

The argument from an implementors perspective shows how
easily we forget what was intuitive before we came to
think in terms of how to implement string functionality
in C.

Some can't even think of what strcat intuitively means
(append one string to another) without considering how
it's most sensibly implemented in terms of pointer
operations.

-- James
 
W

websnarf

Keith said:
Jonathan said:
(e-mail address removed) wrote:
[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).
What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.

That's not an intuitive meaning. Its just an understanding of an
implementation anomoly. Perhaps for you, implementation details
changes your intuition.
[...]

In this case, intuition is not necessary.

Just *SAYING* this is the ultimate indictment of the C language. If
the language doesn't match your intuition, then its just takes that
much more effort to program in it.
[...] If you read the standard's
description of strcat(), you'll see:

... If copying takes place between objects that overlap, the
behavior is undefined.

Any decent description of strcat() (in a man page

The latest cygwin man page makes no mention of this and WATCOM C/C++'s
documentation omits this.
[...] or text book, for
example) should have similar wording; if it doesn't, that's the fault
of the author of the documentation.

Here's the first hit on google:

http://www.cplusplus.com/ref/cstring/strcat.html

and the second:

http://www.mkssoftware.com/docs/man3/strcat.3.asp

Here's the wikipedia entry as of 07/28/2006:

http://en.wikipedia.org/wiki/Strcat

and here's the Open BSD documentation that it links to:

http://www.openbsd.org/cgi-bin/man.cgi?query=strcat

So I guess none of that counts as "decent documentation".
If you attempt to use strcat(), or any other function, without reading
a description of how it's supposed to work, you can't reasonably
expect any particular behavior.

Right. That's because C is a "throwback to a forgotten era" kind of
language. Compare this to languages like Lua, Ruby and Python, where
your first guess as to how something works after seeing one example of
it has a 99% chance of being correct, and a 99% chance that you don't
even have a candidate second guess in mind.
 
W

websnarf

Andrew said:
Jonathan said:
(e-mail address removed) wrote:
[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).

What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.

That's not an intuitive meaning. Its just an understanding of an
implementation anomoly. Perhaps for you, implementation details
changes your intuition.

No, knowledge that C strings are null-terminated (which any C programmer
needs to know) suggests that intuitively. Either you calculate strlen(),
add a counter variable, and `for' your way through the string, or you
eliminate the counter and superfluous call to strlen(), and code it
efficiently.

Right -- you are either with us, or you are with the terrorists.
It's more intuitive to use the more efficient, less code-intensive, and
easier-to-read version.

There is a *third option*. Skip the first character which overwrites
the '\0', do the append starting from src+1, then go back and do the
'\0' overwrite at the end. This extra work adds at most O(1) to the
execution time. Voila, like magic you have an aliasing safe strcat().
Ain't the "as-if" rule wonderful?

But this is just coding jujitsu, and has nothing to do with intuition.
"Most people" are not C programmers; if you know enough to use strcat(),
you should have an understanding of how C strings work.

Yes, but this "knowledge" is simply bent by force into shape by what
the standard tells you. I.e., its working *against* your intuition.
Which was kind of my point.
You only *know* that this is not the case, because you know that strcat
is implemented as some variation of { d += strlen(d); while (*d++ =
*s++); } instead of { size_t ld = strlen(d), ls = strlen(s); memmove
(d+ld, s, ls); d[ld+ls] = '\0'; }. You know this because the first
variation is going to be faster. This is not intuition -- its just a
technical calculation.

IMHO, _intuitively_, there is no other way to implement strcat().

That is why you fail.

Tell me this. How does *your* intuition tell you how memmove() is
implemented? Keep in mind that this guy actually is completely
aliasing safe.
 
S

SuperKoko

Jonathan said:
[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).

What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.

That's not an intuitive meaning. Its just an understanding of an
implementation anomoly. Perhaps for you, implementation details
changes your intuition.

Intuitive or not, that was not obvious for a beginner in C89, but that
should be obvious in C99, even for a beginner:
char* strcat (char * restrict, const char * restrict);

Thanks to "restrict", the function has a better documentation.

Andrew Poelstra:
IMHO, _intuitively_, there is no other way to implement strcat().
But there are other ways to implement it....
Borland C++ 5.0 and Digital Mars Compiler use alternative
implementations (and they behave weird too, but in another way).
 
A

Al Balmer

Keith said:
Jonathan Leffler wrote:
(e-mail address removed) wrote:
[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).
What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.

That's not an intuitive meaning. Its just an understanding of an
implementation anomoly. Perhaps for you, implementation details
changes your intuition.
[...]

In this case, intuition is not necessary.

Just *SAYING* this is the ultimate indictment of the C language. If
the language doesn't match your intuition, then its just takes that
much more effort to program in it.
[...] If you read the standard's
description of strcat(), you'll see:

... If copying takes place between objects that overlap, the
behavior is undefined.

Any decent description of strcat() (in a man page

The latest cygwin man page makes no mention of this and WATCOM C/C++'s
documentation omits this.

HP-UX man page:
"Character movement is performed differently in different
implementations, so moves involving overlapping source and destination
strings may yield surprises."
 
A

Andrew Poelstra

Right -- you are either with us, or you are with the terrorists.

;-)


There is a *third option*. Skip the first character which overwrites
the '\0', do the append starting from src+1, then go back and do the
'\0' overwrite at the end. This extra work adds at most O(1) to the
execution time. Voila, like magic you have an aliasing safe strcat().
Ain't the "as-if" rule wonderful?

If you start from src+1, you still have to store the original value of
src, or use a counter variable. My points on code-intensivity and ease
of reading still stand.
But this is just coding jujitsu, and has nothing to do with intuition.

It's clever, I admit. However, jumping through hoops to manage odd
inputs which could be fixed with an
assert (dest > src + strlen(src) + 1);
doesn't bode well to me.
Yes, but this "knowledge" is simply bent by force into shape by what
the standard tells you. I.e., its working *against* your intuition.
Which was kind of my point.

There are many reasons why I don't qualify as a "normal person" or
"average programmer"; my intuition agrees with that of strcat().
You only *know* that this is not the case, because you know that strcat
is implemented as some variation of { d += strlen(d); while (*d++ =
*s++); } instead of { size_t ld = strlen(d), ls = strlen(s); memmove
(d+ld, s, ls); d[ld+ls] = '\0'; }. You know this because the first
variation is going to be faster. This is not intuition -- its just a
technical calculation.

IMHO, _intuitively_, there is no other way to implement strcat().

That is why you fail.

I'm trying to avoid a flamewar.
Tell me this. How does *your* intuition tell you how memmove() is
implemented? Keep in mind that this guy actually is completely
aliasing safe.

I suspect that memmove() memcpy()'s the data to a safe place and
memcpy()'s it back to the dest. This adds an intermediate step
which prevents problems with src and dest overlapping.

That doesn't sound particularly efficient to me, though; I assume
that compiler/library writers have found much better ways to code
it.
 
A

Andrew Poelstra

Andrew Poelstra:
But there are other ways to implement it....
Borland C++ 5.0 and Digital Mars Compiler use alternative
implementations (and they behave weird too, but in another way).

Allow me to rephrase that:
IMHO, there is no other _intuitive_ way to implement strcat().
 
D

David R Tribble

That is why you fail.

Tell me this. How does *your* intuition tell you how memmove() is
implemented? Keep in mind that this guy actually is completely
aliasing safe.

....because it's specified that way. And memcpy() is not specified
that way. strcpy() is not specified that way, either.

Are you arguing for ignoring the function specifications?

-drt
 
D

David R Tribble

Andrew said:
I suspect that memmove() memcpy()'s the data to a safe place and
memcpy()'s it back to the dest. This adds an intermediate step
which prevents problems with src and dest overlapping.

That doesn't sound particularly efficient to me, though; I assume
that compiler/library writers have found much better ways to code it.

The "typical" implementation of memmove() does a range check on
its arguments to determine if overlapping areas exists, and if so, does
a copy in reverse (high to low) direction. Such an algorithm is
efficient on CPUs with native bidirectional loop (REP) instructions.

-drt
 
K

kuyper

Keith Thompson wrote: ....
[...] If you read the standard's
description of strcat(), you'll see:

... If copying takes place between objects that overlap, the
behavior is undefined.

Any decent description of strcat() (in a man page

The latest cygwin man page makes no mention of this and WATCOM C/C++'s
documentation omits this.
[...] or text book, for
example) should have similar wording; if it doesn't, that's the fault
of the author of the documentation.

Here's the first hit on google:

http://www.cplusplus.com/ref/cstring/strcat.html

and the second:

http://www.mkssoftware.com/docs/man3/strcat.3.asp

Here's the wikipedia entry as of 07/28/2006:

http://en.wikipedia.org/wiki/Strcat

and here's the Open BSD documentation that it links to:

http://www.openbsd.org/cgi-bin/man.cgi?query=strcat

So I guess none of that counts as "decent documentation".

Correct. You disagree?

The man pages on the machines I use most often are much better:

Linux: "The strings may not overlap, and the dest string must have
enough space for the result."

Irix: "If overflow of s1 occurs, or copying takes place when s1 and s2
overlap, the behavior is undefined."
 
R

Richard Tobin

Andrew Poelstra said:
Allow me to rephrase that:
IMHO, there is no other _intuitive_ way to implement strcat().

Languages are supposed to be intuitive for users, not implementers.

-- Richard
 
K

Keith Thompson

Keith said:
Jonathan Leffler wrote:
(e-mail address removed) wrote:
[...] (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).
What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.

That's not an intuitive meaning. Its just an understanding of an
implementation anomoly. Perhaps for you, implementation details
changes your intuition.
[...]

In this case, intuition is not necessary.

Just *SAYING* this is the ultimate indictment of the C language. If
the language doesn't match your intuition, then its just takes that
much more effort to program in it.

News flash: C is not the most intuitive and beginner-friendly language
ever invented. Does this come as a surprise to you?

In a language where strings are first-class objects, and you can pass
them around as values, use them as operands in expressions, and so
forth, I'd expect something called "strcat" to behave in some
reasonable intuitive manner. I'd still need to see the declaration to
know how to use it, but it would probably be safe to assume that
something like

s1 = strcat(s2, s3)

would do the obvious thing.

C is not like that. Strings are not a data type, they're a data
format, "a contiguous sequence of characters terminated by and
including the first null character", subject to all of C's
complications regarding arrays and pointers. If you think you can
guess, with 99% certainty, how strcat() is going to behave based on
that, you're likely to be disappointed.
[...] If you read the standard's
description of strcat(), you'll see:

... If copying takes place between objects that overlap, the
behavior is undefined.

Any decent description of strcat() (in a man page

The latest cygwin man page makes no mention of this and WATCOM C/C++'s
documentation omits this.

The Cygwin man page doesn't mention this, but it's not intended to be
complete:

strcat is part of the libc library. The full documentation for
libc is maintained as a Texinfo manual. If info and libc are
properly installed at your site, the command

info libc

will give you access to the complete manual.

I'm not convinced that's a good idea, but it's explicitly acknowledged
with a reference to the complete documentation.

"info libc" doesn't work for me under Cygwin (I don't know why, but
the reason is clearly irrelevant), but on another system the section
on strcat clearly says:

This function has undefined results if the strings overlap.

I don't know about Watcom.
[...] or text book, for
example) should have similar wording; if it doesn't, that's the fault
of the author of the documentation.

Here's the first hit on google:

http://www.cplusplus.com/ref/cstring/strcat.html

and the second:

http://www.mkssoftware.com/docs/man3/strcat.3.asp

Here's the wikipedia entry as of 07/28/2006:

http://en.wikipedia.org/wiki/Strcat

and here's the Open BSD documentation that it links to:

http://www.openbsd.org/cgi-bin/man.cgi?query=strcat

So I guess none of that counts as "decent documentation".

I agree. I don't know what cplusplus.com is, and I'm not too
surprised by an error like this in Wikipedia (possibly someone here
will correct it soon). I am surprised that the OpenBSD documentation
doesn't mention this. That's a problem -- but not a problem with C
itself.
Right. That's because C is a "throwback to a forgotten era" kind of
language. Compare this to languages like Lua, Ruby and Python, where
your first guess as to how something works after seeing one example of
it has a 99% chance of being correct, and a 99% chance that you don't
even have a candidate second guess in mind.

Then by all means feel free to go and use those languages. Nobody
here will stop you.

I won't comment on the "throwback to a forgotten era" remark, but
apart from that I think your statement is pretty much factually
correct. You cannot generally look at the name of a C function, or
even a one-sentence description, and infer how it's going to behave in
all circumstances. I don't recall anyone claiming that you could.
 
K

Keith Thompson

Andrew Poelstra wrote: [...]
No, knowledge that C strings are null-terminated (which any C programmer
needs to know) suggests that intuitively. Either you calculate strlen(),
add a counter variable, and `for' your way through the string, or you
eliminate the counter and superfluous call to strlen(), and code it
efficiently.

Right -- you are either with us, or you are with the terrorists.

First you wrote "Yeah, sieg heil!" in a recent thread in comp.lang.c,
and now you bring terrorists into a discussion of C strings.

I'll say it again:

Shove it, Paul.
 
K

Keith Thompson

Andrew Poelstra said:
It's clever, I admit. However, jumping through hoops to manage odd
inputs which could be fixed with an
assert (dest > src + strlen(src) + 1);
doesn't bode well to me.

That assert doesn't bode well. It invokes undefined behavior if dest
and src don't point into the same object (or just past the end of it).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top