strlcpy and strlcat

D

dj3vande

I wrote these for a hobby project (I wanted to use them, but needed to
be able to build it on systems that don't have them), and it's probably
worth letting CLC rip them to shreds before I call them done.

They implement, modulo bugs, the behavior of the BSD strlcat and
strlcpy functions. Interestingly, the system I'm actually posting from
(SunOS 5.8) fails one of the tests at the bottom when it uses the
versions in the system library.

They're intended to be in the common subset of C90 and C99. Comments
on correctness and clarity are welcome; comments on style will be
tolerated.


dave


strl.h:
--------
#ifndef H_STRL
#define H_STRL

/*Implementation of BSD strlcat and strlcpy, for systems that don't have them.
Written by Dave Vandervies, December 2007.
Placed in the public domain; attribution is appreciated.
*/

#ifdef __cplusplus
extern "C" { /*make C++ compilers play nicely with the linker*/
#endif

#ifndef HAS_STRLFUNCS

/*strlcpy copies a string from src to dest, creating a string at most
maxlen bytes long (including the '\0' terminator).
Returns the length of the string that would be created without
truncation, excluding the '\0' terminator. (So if the return value
is >= maxlen, the result was truncated.)
*/
size_t my_strlcpy(char *dest,const char *src,size_t maxlen);

/*strlcat appends the contents of src to dest, creating a string at
most maxlen bytes long (including the '\0' terminator).
If src is already longer than maxlen bytes long, its contents
are not changed.
Returns the length of the string that would be created without
truncation, excluding the '\0' terminator, or maxlen+strlen(src)
if no '\0' is found within maxlen bytes of *dest. (So if the
return value is >= maxlen, the result was truncated.)
*/
size_t my_strlcat(char *dest,const char *src,size_t maxlen);

#ifndef CLC_PEDANTIC
#undef strlcpy
#define strlcpy my_strlcpy
#undef strlcat
#define strlcat my_strlcat
#endif /*CLC_PEDANTIC*/

#else /*HAS_STRLFUNCS*/
#include <string.h>
#endif /*HAS_STRLFUNCS*/

#ifdef __cplusplus
} /*close extern "C"*/
#endif

#endif /*H_STRL #include guard*/
--------


strl.c:
--------
#include <assert.h>
#include <string.h>

#include "strl.c"

/*Implementation of BSD strlcat and strlcpy, for systems that don't have them.
Written by Dave Vandervies, December 2007.
Placed in the public domain; attribution is appreciated.
*/

#ifndef HAS_STRLFUNCS

size_t my_strlcpy(char *dest,const char *src,size_t maxlen)
{
size_t len,needed;

#ifdef PARANOID
assert(dest!=NULL);
assert(src!=NULL);
#endif

len=needed=strlen(src)+1;
if(len >= maxlen)
len=maxlen-1;

memcpy(dest,src,len);
dest[len]='\0';

return needed-1;
}

size_t my_strlcat(char *dest,const char *src,size_t maxlen)
{
size_t src_len,dst_len;
size_t len,needed;

#ifdef PARANOID
assert(dest!=NULL);
assert(src!=NULL);
#endif

src_len=strlen(src);
/*Be paranoid about dest being a properly terminated string*/
{
char *end=memchr(dest,'\0',maxlen);
if(!end)
return maxlen+src_len;
dst_len=end-dest;
}

len=needed=dst_len+src_len+1;
if(len >= maxlen)
len=maxlen-1;

memcpy(dest+dst_len,src,len-dst_len);
dest[len]='\0';

return needed-1;
}

#endif /*!HAS_STRLFUNCS*/

#ifdef UNIT_TEST

#include <stdio.h>

/*
dj3vande@goofy:~/clc (0) $ gcc -W -Wall -ansi -pedantic -O -DUNIT_TEST -ostrl strl.c
dj3vande@goofy:~/clc (0) $ ./strl
strlcpy with truncation: Expect `hel'/5: `hel'/5
strlcat with truncation: Expect `help!'/9: `help!'/9
strlcpy without truncation: Expect `help!'/5: `help!'/5
strlcat without truncation: Expect `help!help!'/10: `help!help!'/10
strlcat with maxlen<strlen(dest): Expect `help!help!'/9: `help!help!'/9
dj3vande@goofy:~/clc (0) $
*/

int main(void)
{
char buf1[256],buf2[256];
unsigned long ret;

#ifdef HAS_STRLFUNCS
#define my_strlcpy strlcpy
#define my_strlcat strlcat
printf("Using system library versions\n");
#endif

ret=my_strlcpy(buf1,"hello",4);
printf("strlcpy with truncation: Expect `hel'/5: `%s'/%lu\n",buf1,ret);

ret=my_strlcat(buf1,"p!!!!!",6);
printf("strlcat with truncation: Expect `help!'/9: `%s'/%lu\n",buf1,ret);

ret=my_strlcpy(buf2,buf1,sizeof buf2);
printf("strlcpy without truncation: Expect `help!'/5: `%s'/%lu\n",buf2,ret);

ret=my_strlcat(buf2,buf1,sizeof buf2);
printf("strlcat without truncation: Expect `help!help!'/10: `%s'/%lu\n",buf2,ret);

ret=my_strlcat(buf2,buf1,4);
printf("strlcat with maxlen<strlen(dest): Expect `help!help!'/9: `%s'/%lu\n",buf2,ret);

return 0;
}

#endif /*UNIT_TEST*/
--------
 
C

CBFalconer

I wrote these for a hobby project (I wanted to use them, but
needed to be able to build it on systems that don't have them),
and it's probably worth letting CLC rip them to shreds before I
call them done.

They implement, modulo bugs, the behavior of the BSD strlcat and
strlcpy functions. Interestingly, the system I'm actually
posting from (SunOS 5.8) fails one of the tests at the bottom
when it uses the versions in the system library.

They're intended to be in the common subset of C90 and C99.
Comments on correctness and clarity are welcome; comments on
style will be tolerated.

Take a look at:

<http://cbfalconer.home.att.net/download/strlcpy.zip>

They are written to be compact and avoid any further use of the
standard library. This improves their usefullness where memory is
tight. I notice yours uses calls to strlen.
 
D

dj3vande

CBFalconer said:
Take a look at:

<http://cbfalconer.home.att.net/download/strlcpy.zip>

They are written to be compact and avoid any further use of the
standard library. This improves their usefullness where memory is
tight.

That's an entirely different environment than I was writing for; I was
targeting an environment where optimizers are aggressive, resources are
relatively cheap, and programmers' cognitive energy is the most
important thing to optimize.
(I would expect a good optimizer to generate code for mine that will be
slower but not by enough to show up outside of performance-critical
areas; a dumber or nonexistent optimizer will almost certainly generate
function calls that are at least as expensive as the code that they
replace on reasonable-length strings even if the library calls run
faster than the inline code.)

Interestingly, the OpenBSD implementation[1] also avoids calling the
rest of the standard library (both are completely self-contained), and
your implementation has the same difference from the BSD implementation
as the SunOS library (strlcat handles strlen(dest) > maxlen differently
- the BSD implementation is (and documents being) more paranoid about
dest not being properly terminated).

I notice yours uses calls to strlen.

Also memchr and memcpy, which don't get inlined by GCC on x86 (strlen
does).



(Over 24 hours and only one response? I need to get my sigmonster set
up on this account, then at least Richard H will read my posts.)


dave

[1] <http://www.openbsd.org/cgi-bin/cvsw....c?rev=1.13&content-type=text/x-cvsweb-markup>
<http://www.openbsd.org/cgi-bin/cvsw....c?rev=1.11&content-type=text/x-cvsweb-markup>
 
T

Tor Rustad

(e-mail address removed) wrote:

[...]
(Over 24 hours and only one response? I need to get my sigmonster set
up on this account, then at least Richard H will read my posts.)

I have seen your request, the main reason for not checking this deeper,
has been primary that those strl* interfaces has IMO a design weakness,
which I eliminated in my own implementation.

I think you should put more effort into your test function, perhaps even
provide some self test function with external linkage, at least use
EXIT_FAILURE in case one test case fail. Also, watching the output from
successful tests, can be tiresome in a big project.

I would remove PARANOID, using assert() isn't paranoid. :) The
CLC_PEDANTIC is not needed, we do know these functions invade the
reserved C name space, but the C committee wouldn't use these names for
something different.

The usage of #ifdef's should be minimized in source, and primary used in
header files instead. Because of all these macros, the code became more
hard-to-read, than it should have been.

I will post another followup, if I get time to write a test function
tomorrow.
 
C

CBFalconer

That's an entirely different environment than I was writing for;
I was targeting an environment where optimizers are aggressive,
resources are relatively cheap, and programmers' cognitive energy
is the most important thing to optimize.

The environmental capability is simply an added feature. It also
avoids non-productive time spent executing calls and returns. Note
that the code is pure standard C.

.... snip ...
Interestingly, the OpenBSD implementation[1] also avoids calling
the rest of the standard library (both are completely
self-contained), and your implementation has the same difference
from the BSD implementation as the SunOS library (strlcat handles
strlen(dest) > maxlen differently - the BSD implementation is
(and documents being) more paranoid about dest not being properly
terminated).

Please explain more fully. I don't believe my coding can ever
leave an improperly terminated string. Please tell me what you
find objectionable (or missing) in the test results (copy
following).

Testing lgh = stringop(dest, source, sz)

dest source opn sz lgh result
==== ====== === == === ======
"" "string1" cpy 10 7 "string1"
"" "string1" cpy 5 7 "stri"
"" "string1" cpy 1 7 ""
"string1" "string1" cat 10 14 "string1st"
"string1st" "x " cpy 10 2 "x "
"x " "string1" cat 10 9 "x string1"
"x string1" "x " cpy 10 2 "x "
"x " "string1" cat 0 9 "x "
"x " "string1" cpy 0 7 "x "
"x " "longer string" cat 0 15 "x "
"x " "(NULL)" cpy 10 0 ""
"" "x " cpy 10 2 "x "
"x " "(NULL)" cat 10 2 "x "
 
D

dj3vande

(e-mail address removed) wrote:
Interestingly, the OpenBSD implementation[1] also avoids calling
the rest of the standard library (both are completely
self-contained), and your implementation has the same difference
from the BSD implementation as the SunOS library (strlcat handles
strlen(dest) > maxlen differently - the BSD implementation is
(and documents being) more paranoid about dest not being properly
terminated).

Please explain more fully.

If the dest argument to strlcat does not in fact point to a correctly
terminated string, the BSD implementation will stop looking for a '\0'
after maxlen bytes. This avoids walking through large amounts of
memory (only to read - it wouldn't be written in any case) when it's
given bad input.
I don't believe my coding can ever
leave an improperly terminated string.

If the inputs are well-formed neither implementation will ever create
an improperly terminated string.


dave
 
D

dj3vande

If the inputs are well-formed neither implementation will ever create
an improperly terminated string.

On second thought, the conditional is irrelevant there; neither
implementation will ever change the contents of memory UNLESS the
inputs are well-formed, in which case the new contents of the
destination buffer will be a properly terminated string.


dave
(needs coffee, or sleep, or both)
 
R

Richard Heathfield

(e-mail address removed) said:

(Over 24 hours and only one response? I need to get my sigmonster set
up on this account, then at least Richard H will read my posts.)

You know me too well, Dave.
 
C

CBFalconer

CBFalconer said:
(e-mail address removed) wrote:
Interestingly, the OpenBSD implementation[1] also avoids calling
the rest of the standard library (both are completely
self-contained), and your implementation has the same difference
from the BSD implementation as the SunOS library (strlcat handles
strlen(dest) > maxlen differently - the BSD implementation is
(and documents being) more paranoid about dest not being properly
terminated).

Please explain more fully.

If the dest argument to strlcat does not in fact point to a correctly
terminated string, the BSD implementation will stop looking for a '\0'
after maxlen bytes. This avoids walking through large amounts of
memory (only to read - it wouldn't be written in any case) when it's
given bad input.

I would argue that my technique is better. It will normally cause
an immediate fault during the call, which should leave traces as to
the cause, and be repairable. IIRC I did this deliberately. Note
that a NULL description of src is considered an empty string.
 
D

dj3vande

(e-mail address removed) wrote:

[...]
(Over 24 hours and only one response? I need to get my sigmonster set
up on this account, then at least Richard H will read my posts.)

I have seen your request, the main reason for not checking this deeper,
has been primary that those strl* interfaces has IMO a design weakness,
which I eliminated in my own implementation.

Out of curiousity, what is that design weakness, and how did you fix
it?

I think you should put more effort into your test function,

Probably. The one I have was intended as a quick sanity check to make
sure nothing was obviously wrong, not an exhaustive test of all the
boundary cases.
(I tend to rely, sometimes too much, on careful design and desk-checks,
and use code tests mostly to make sure I haven't missed something
obvious rather than to try everything that could go wrong.)

I would remove PARANOID, using assert() isn't paranoid. :)

I'm used to using PARANOID to control consistency checks that can get
expensive. (I try to remember to build with them turned on when I
write them to make sure they're correct, but otherwise they don't get
activated unless I'm trying to debug something.)
In this case using it to turn off the asserts is probably overdoing it;
an assert isn't nearly as expensive as, say, walking through a binary
tree to make sure it's ordered the way I expect it to be.
The
CLC_PEDANTIC is not needed, we do know these functions invade the
reserved C name space, but the C committee wouldn't use these names for
something different.

I actually put that in when I first wrote it, and was kind of surprised
to see it when I looked over the code after I decided to post it. :)


dave
 
D

dj3vande

(e-mail address removed) wrote:

I would argue that my technique is better. It will normally cause
an immediate fault during the call, which should leave traces as to
the cause, and be repairable.

Only if it hits unreadable memory and causes a read trap before it
finds a zero byte. I'd expect it to find a zero byte and give
completely bogus results more often than it would cause a trap.

I don't think either is obviously better in general. If you're using
them to make it easier to write safe code (the original motivation for
adding them in OpenBSD, if I'm not mistaken) it makes sense to give
slightly wrong results for incorrect inputs in return for the added
safety, but if you're just treating them as string operations that do
what strncpy and strncat look like they should do, getting the
"expected" result from strlcat when the destination string is longer
than the maxlen argument is probably worth the cost of getting
completely wrong results or maybe a memory protection trap when the
destination string isn't properly terminated.


dave
 
C

CBFalconer

Only if it hits unreadable memory and causes a read trap before
it finds a zero byte. I'd expect it to find a zero byte and give
completely bogus results more often than it would cause a trap.

True. The 'read onward' has at least a chance of blowing on a bad
call. However, the 'stop after maxlen' just gives the 'destination
too small' response, and no sign of a basic error. The natural
reaction is to increase the destination size, and try again. Still
fails (maybe). At any rate, C programmers should be used to having
dire things happen when failing to pass strings to things expecting
strings.
 
T

Tor Rustad

(e-mail address removed) wrote:

[...]
(Over 24 hours and only one response? I need to get my sigmonster set
up on this account, then at least Richard H will read my posts.)
I have seen your request, the main reason for not checking this deeper,
has been primary that those strl* interfaces has IMO a design weakness,
which I eliminated in my own implementation.

Out of curiousity, what is that design weakness, and how did you fix
it?

Many will not check the return value for truncation. There are cases
where this is not a bug, but most of the cases it will be, and strl* may
hurt matters by hiding it.

In the (rare) cases where truncation is ok, I prefer

n = sprintf(dst, "%.*s", max, src);

or

n = snprintf(dst, max, "%s", src);

while my

m = strlcpy(dst, src, max);

will trap via assert on truncation. I also consider it a bug, to
silently reallocate the 'dst' buffer without having some max limit on
it. So recovery code, will get somewhat complex. The strl* functions are
not really designed for dynamic buffer management anyway, so why return
recovery info, if none are done?

As I see it, on truncation, either (1) there is a bug in the code, or
(2) we are under attack (e.g. DOS).

In case (1), I want this to be fixed during development, hence place an
assert() close to the problem. If the bug isn't fixed, I want the code
to enter fail safe mode.

How to handle (2), is rather hard.. to really address the issue, you
typically need to analyze the system/architecture at higher level. The
best thing one can do at low-level, is entering fail safe mode.

I also prefer that the return value of strl* functions, is the length of
the destination buffer after completion. In case dynamic memory handling
is of interest, I consider it better using a different set of functions.

Finally, the size parameter should have been placed in the middle (like
snprintf has). So, IMO a better API design would be

len_dst = my_strlcpy(dst, max_dst, src);
/* where len_dst < max_dst */

I'm used to using PARANOID to control consistency checks that can get
expensive. (I try to remember to build with them turned on when I
write them to make sure they're correct, but otherwise they don't get
activated unless I'm trying to debug something.)
In this case using it to turn off the asserts is probably overdoing it;
an assert isn't nearly as expensive as, say, walking through a binary
tree to make sure it's ordered the way I expect it to be.

PARANOID looked redundant, since with NDEBUG defined, those checks will
not be done, so can't you simply do e.g.

1. gcc <CFLAGS> -g ...
2. gcc <CFLAGS> -g -DNDEBUG ...

and finally

3. gcc <CFLAGS> ...

instead?
 
C

CBFalconer

Tor said:
(e-mail address removed) wrote:
.... snip ...

Many will not check the return value for truncation. There are
cases where this is not a bug, but most of the cases it will be,
and strl* may hurt matters by hiding it.

I gather you consider failing to check the return value is OK and
that writing on unowned (or nonexistant) memory is prefereable to
truncation?
 
T

Tor Rustad

CBFalconer said:
I gather you consider failing to check the return value is OK and
that writing on unowned (or nonexistant) memory is prefereable to
truncation?

Very odd conclusion, what did you think *fail safe* meant?
 
C

CBFalconer

Tor said:
Very odd conclusion, what did you think *fail safe* meant?

I came to this conclusion when you recommend not checking the
returned value, which means you cannot detect an undersized
buffer. Either you are using strlcpy and the result is truncated,
or you are using strcpy (or equivalent) and the result is
overwriting. I am not forgiving failure to check the returned
value.

If I am way off please elucidate. Of course strlcpy can't
auto-expand the destination, since it has to operate into arbitrary
buffers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top