strncpy and 'n'

M

Malcolm McLean

If you know that the the end of the string will be determined either by
the end of a fixed-length field, or by a terminating null character,
strncmp(). If you want to check the entire length of the fixed length
field, regardless of null terminators, memcmp() would do. I don't think
that there's sufficient need for  a function whose behavior falls
between those two extremes, to make it a standard library function.
The main issue is usually that reading chars byte by byte is slow,
reading words is fast.

So if fields are guaranteed to be memory aligned, and a whole number
of words, a comparison will be certainly four times and often many
more times as fast as a byte by byte compare of unaligned strings of
arbitrary length. Aligned fields of whole word size are quite easy to
achieve at a low level, but difficult to specify in ANSI standard C.

Malcolm's website
http://www.malcolmmclean.site11.com/www
 
J

Joe keane

There's rarely *any* reason to use strncpy(). It's not a "safer"
version of strcpy(); it's a quite different function.

It is a safer version of 'strcpy'. There is the issue of what to do if
the full copy can't be done, but that's program logic and the library
function can't read people's minds. Even if the program just calls
'abort', that's a huge improvement.
 
K

Keith Thompson

It is a safer version of 'strcpy'. There is the issue of what to do if
the full copy can't be done, but that's program logic and the library
function can't read people's minds. Even if the program just calls
'abort', that's a huge improvement.

Did you not read my description of what strncpy actually does, or do you
disagree with it?

strncat is a "safer" version of strcat. It takes an extra argument
"n" that specifies the maximum number of characters to be copied. If
the source is longer than n characters, it appends just n characters.
It properly zero-terminates the destination in all cases.

strncpy *looks* like it should be to strcpy as strncat is to strcat,
but it isn't. If the source string is shorter than n characters,
it will pad the destination with multiple null characters, something
that strcpy never does. If the source string is longer than n
characters, it will leave the destination unterminated (i.e.,
not a string).

If it had been defined something like this:

char *better_strncpy(char *dest, const char *src, size_t n) {
dest[0] = '\0';
return strncat(dest, src, n);
}

then it would be reasonable to call it a "safer" version of strcpy.

(It's possible I have an off-by-one error in the above code;
I haven't taken the time to check.)
 
J

Joe keane

strncpy *looks* like it should be to strcpy as strncat is to strcat,
but it isn't.

Well there is a number of options.

a) should it make sure there is a zero terminator
b) maybe you want to clear everything that isn't copied
c) does it handle overlapping copies

Do i think it's slightly stupid that they don't match? A little bit.
But one can imagine that good choices were made, and that consistency is
sometimes negative.

Why does 'puts' add a newline, and 'fputs' doesn't?
Why does 'fgets' take a size argument, and 'gets' doesn't?
Why can i use the same format for float/double in 'fprintf', but not in
'fscanf'?
Why is it that multiply long and long gives long, but short and short
gives int?
Why can i use a bitfield in a struct, but not as a local variable?
Why is it that 'int' means signed, but 'char' can be either one?
 
K

Keith Thompson

Well there is a number of options.

a) should it make sure there is a zero terminator
b) maybe you want to clear everything that isn't copied
c) does it handle overlapping copies

Do i think it's slightly stupid that they don't match? A little bit.
But one can imagine that good choices were made, and that consistency is
sometimes negative.

The point is that strncpy is a very different function from strcpy.
It is not intended to work with a *string* in the target array;
it works with a specialized data structure (used to store file
names in very early Unix systems).
Why does 'puts' add a newline, and 'fputs' doesn't?
Why does 'fgets' take a size argument, and 'gets' doesn't?
Why can i use the same format for float/double in 'fprintf', but not in
'fscanf'?
Why is it that multiply long and long gives long, but short and short
gives int?
Why can i use a bitfield in a struct, but not as a local variable?
Why is it that 'int' means signed, but 'char' can be either one?

There are answers for most of these questions. For others,
it's certainly true that the C standard library is not entirelyi
consistent. But I don't think any of them are particularly relevant
to strncpy.

I think you're understating the differences between strcpy and
strncpy. The strncpy function is radically different from strcpy,
and there are very few legitimate uses for it. On the other hand,
the deceptive name has led many C programmers to use it incorrectly,
and I strongly suspect that it's used incorrectly far more often
than it's used correctly.
 
J

James Kuyper

On 02/20/2012 04:25 PM, Joe keane wrote:
....
Why can i use the same format for float/double in 'fprintf', but not in
'fscanf'?

Because float gets promoted to double when passed to fprintf(), while
the concept of "promoted type" doesn't even apply to the pointer
arguments passed to fscanf(). float* and double* are incompatible types,
which means they must be treated differently, and fscanf() needs to know
about that face.
Why is it that multiply long and long gives long, but short and short
gives int?

Integer types that are smaller than 'int' get promoted to 'int' on the
principle that 'int' should be the natural integer type for a given
platform. Integer types smaller than 'int' should be used only when
needed to save space - types larger than 'int' should be used only when
needed to represent large numbers.

....
Why is it that 'int' means signed, but 'char' can be either one?

Because the normal character type on many of the machines that C was
first ported to was signed, while on many others it was unsigned,
whereas the normal integer type was (almost?) always signed.
 
S

Shao Miller

It is a safer version of 'strcpy'. There is the issue of what to do if
the full copy can't be done, but that's program logic and the library
function can't read people's minds. Even if the program just calls
'abort', that's a huge improvement.

Did you not read my description of what strncpy actually does, or do you
disagree with it?

strncat is a "safer" version of strcat. It takes an extra argument
"n" that specifies the maximum number of characters to be copied. If
the source is longer than n characters, it appends just n characters.
It properly zero-terminates the destination in all cases.

strncpy *looks* like it should be to strcpy as strncat is to strcat,
but it isn't. If the source string is shorter than n characters,
it will pad the destination with multiple null characters, something
that strcpy never does. If the source string is longer than n
characters, it will leave the destination unterminated (i.e.,
not a string).

If it had been defined something like this:

char *better_strncpy(char *dest, const char *src, size_t n) {
dest[0] = '\0';
return strncat(dest, src, n);
}

then it would be reasonable to call it a "safer" version of strcpy.

(It's possible I have an off-by-one error in the above code;
I haven't taken the time to check.)

My impression is that in:

char foo[] = "foo";
char bar[3] = "bar";
char baz[10] = "baz";

each of these could be roughly equivalent to:

char foo[sizeof "foo"];
char bar[3];
char baz[10];

strncat(foo, "foo", sizeof "foo");
strncat(bar, "bar", 3);
strncat(foo, "baz", 10);

And that in:

char blah[40] = { 0 };

this is roughly equivalent to:

char blah[40];

memset(blah, 0, 40);


And that in:

int blee[2][2] = { { 13, 42 } };

this is roughly equivalent to:

static const int hidden_blee_initializer[2][2] =
{ { 13, 42 }, { 0, 0 } };
int blee[2][2];

memcpy(blee, hidden_blee_initializer, sizeof blee);

Where each of these standard functions might be highly optimized and
where an implementation might actually choose to implement the
initializations using just the same logic.
 
R

Rich Webb

[email protected] (Joe keane) said:
It is a safer version of 'strcpy'. There is the issue of what to do if
the full copy can't be done, but that's program logic and the library
function can't read people's minds. Even if the program just calls
'abort', that's a huge improvement.

Did you not read my description of what strncpy actually does, or do you
disagree with it?

strncat is a "safer" version of strcat. It takes an extra argument
"n" that specifies the maximum number of characters to be copied. If
the source is longer than n characters, it appends just n characters.
It properly zero-terminates the destination in all cases.

strncpy *looks* like it should be to strcpy as strncat is to strcat,
but it isn't. If the source string is shorter than n characters,
it will pad the destination with multiple null characters, something
that strcpy never does. If the source string is longer than n
characters, it will leave the destination unterminated (i.e.,
not a string).

If it had been defined something like this:

char *better_strncpy(char *dest, const char *src, size_t n) {
dest[0] = '\0';
return strncat(dest, src, n);
}

then it would be reasonable to call it a "safer" version of strcpy.

(It's possible I have an off-by-one error in the above code;
I haven't taken the time to check.)

I'm surprised that the construct

strncpy(dest, source, BUF_LEN)[BUF_LEN - 1] = '\0';

hasn't been mentioned. It seems to be a reasonably compact way of
dealing with uncontrolled input.
 
K

Keith Thompson

Rich Webb said:
I'm surprised that the construct

strncpy(dest, source, BUF_LEN)[BUF_LEN - 1] = '\0';

hasn't been mentioned. It seems to be a reasonably compact way of
dealing with uncontrolled input.

Yes, that should work. (I've never seen anyone actually use that
idiom; have you?)

Note that if dest is, say, 1000 bytes, and strlen(source)==3,
then it will write 997 null characters into dest, when 1 would do.
If the source string is from user or file input, that's probably
not going to be significant.

This avoids that problem:

dest[0] = '\0';
strncat(dest, source, BUF_LEN - 1);

Another problem is that it *silently* truncates overly long input.
(That might be just what you want.)

The problems of strncat can be worked around if you're aware of them,
but I'm skeptical that it's worth the effort.
 
S

Shao Miller

There's rarely *any* reason to use strncpy(). It's not a "safer"
version of strcpy(); it's a quite different function.

It is a safer version of 'strcpy'. There is the issue of what to do if
the full copy can't be done, but that's program logic and the library
function can't read people's minds. Even if the program just calls
'abort', that's a huge improvement.

Did you not read my description of what strncpy actually does, or do you
disagree with it?

strncat is a "safer" version of strcat. It takes an extra argument
"n" that specifies the maximum number of characters to be copied. If
the source is longer than n characters, it appends just n characters.
It properly zero-terminates the destination in all cases.

strncpy *looks* like it should be to strcpy as strncat is to strcat,
but it isn't. If the source string is shorter than n characters,
it will pad the destination with multiple null characters, something
that strcpy never does. If the source string is longer than n
characters, it will leave the destination unterminated (i.e.,
not a string).

If it had been defined something like this:

char *better_strncpy(char *dest, const char *src, size_t n) {
dest[0] = '\0';
return strncat(dest, src, n);
}

then it would be reasonable to call it a "safer" version of strcpy.

(It's possible I have an off-by-one error in the above code;
I haven't taken the time to check.)

I'm surprised that the construct

strncpy(dest, source, BUF_LEN)[BUF_LEN - 1] = '\0';

hasn't been mentioned. It seems to be a reasonably compact way of
dealing with uncontrolled input.

Or maybe:

strncpy(dest, source, BUF_LEN - 1)[BUF_LEN - 1] = '\0';
 
R

Rich Webb

Rich Webb said:
I'm surprised that the construct

strncpy(dest, source, BUF_LEN)[BUF_LEN - 1] = '\0';

hasn't been mentioned. It seems to be a reasonably compact way of
dealing with uncontrolled input.

Yes, that should work. (I've never seen anyone actually use that
idiom; have you?)

Ermmm, me, actually. Where I use it constantly is in parsing data inputs
that can have real-world issues, such as a noise burst that over-filled
a field or a separator character between two fields that got dropped. I
deal with the correctness of the buffer subsequently but first I need to
ensure that the buffer is safely filled and terminated so that I can
look at it.

There will typically be separate checks for overall format (e.g.,
framing words, number of fields) and a checksum/CRC/hash but some day
the planets will be in a bad alignment where everything else is okay
and, as the saying goes, in a field where "foo" was expected
"supercalifragilisticexpialidocious" was read.
Note that if dest is, say, 1000 bytes, and strlen(source)==3,
then it will write 997 null characters into dest, when 1 would do.
If the source string is from user or file input, that's probably
not going to be significant.

Which is fine, although the difference is usually on the order of an
input length of eight into a buffer with room for, say, twelve.
This avoids that problem:

dest[0] = '\0';
strncat(dest, source, BUF_LEN - 1);

True. A more complete solution would probably be a strlen() test
followed by truncation with \0 and an application diagnostic if the
input is too long. But I'm often dealing with embedded systems where
it's just a black box that can not fail from a data line hiccup.
Another problem is that it *silently* truncates overly long input.
(That might be just what you want.)

Pretty much, yes.
 
J

Joe keane

If it had been defined something like this:

char *better_strncpy(char *dest, const char *src, size_t n) {
dest[0] = '\0';
return strncat(dest, src, n);
}

then it would be reasonable to call it a "safer" version of strcpy.

Well there you go. You pretty much solved your complaint. It you are
catenating several strings, it's almost more natural to zero out at the
beginning, then use 'strcat' from there.

You can understand why you might want to zero out the whole array at the
start; from there 'strcat' is fine.
 
J

James Kuyper

If it had been defined something like this:

char *better_strncpy(char *dest, const char *src, size_t n) {
dest[0] = '\0';
return strncat(dest, src, n);
}

then it would be reasonable to call it a "safer" version of strcpy.

Well there you go. You pretty much solved your complaint.

I don't see how the ability to define such a function changes any of the
issues he complained about - there's still a function named strncpy() in
standard library, and it's name still creates well-justified but false
expectations in newbies about what it might do. Careful reading of the
standard or good documentation will correct those mistaken expectations
- but that shouldn't have been necessary.

I'm not saying that newbies shouldn't need to read the documentation, or
that the standard should be written so that reading it is unnecessary.
I'm merely saying that good naming conventions can make it easier to
guess what the functions do, and easier to remember what they do once
you have learned what it is. Either the behavior should have matched
those expectations, or the name should have been changed to make them
not well-justified.
 
K

Keith Thompson

If it had been defined something like this:

char *better_strncpy(char *dest, const char *src, size_t n) {
dest[0] = '\0';
return strncat(dest, src, n);
}

then it would be reasonable to call it a "safer" version of strcpy.

Well there you go. You pretty much solved your complaint. It you are
catenating several strings, it's almost more natural to zero out at the
beginning, then use 'strcat' from there.

Not really. My complaint is that strncpy() is in the standard library
with the name "strncpy", and that too many programmers use it
incorrectly.
You can understand why you might want to zero out the whole array at the
start; from there 'strcat' is fine.

Note that multiple strcat() calls can be quite inefficient, since
each one has to scan the destination to find the terminating '\0'
before it appends the new data.
 
M

Malcolm McLean

Not really.  My complaint is that strncpy() is in the standard library
with the name "strncpy", and that too many programmers use it
incorrectly.
What's worse is that often the wrong use won't be detected.

strncpy() appears to eb a safe strcpy() if the buffer length is never
exceeded. Since normally the buffer will be larger than any string you
expect, this often won't be tested. Who's going to pass a string of
more than FILE_MAX to a program?

Then even if it is tested, there's a reasonable chance that the
character immediately folowing the buffer is a byte of value zero. So
it might well appear to a casual tester that the fucntin has worked as
expected - he might not notice the extra character.
 
J

Joe keane

My complaint is that strncpy() is in the standard library
with the name "strncpy", and that too many programmers use it
incorrectly.

I think i agree on naming. For example, 'strncpyz' zeros out the
buffer, 'strncpyu' can leave it without a terminator (the default is
less surprising), 'strncpyzu' does both. The zero seems harmless, at
worst it runs slower, at best it runs faster.

But 'strcpy' doesn't give us much guidance here.

It *can't* zero out the buffer, because it doesn't know the buffer size.
It *can't* decide to leave out the terminator, because it doesn't know
the buffer size. It can't avoid trashing your memory, for same reason.
 
K

Keith Thompson

I think i agree on naming. For example, 'strncpyz' zeros out the
buffer, 'strncpyu' can leave it without a terminator (the default is
less surprising), 'strncpyzu' does both. The zero seems harmless, at
worst it runs slower, at best it runs faster.

But 'strcpy' doesn't give us much guidance here.

It *can't* zero out the buffer, because it doesn't know the buffer size.

It doesn't bother to zero out the buffer, because that's rarely a useful
thing to do. If you want to zero a buffer, use memset().
It *can't* decide to leave out the terminator, because it doesn't know
the buffer size.

It doesn't leave out the terminator because it *needs* to store the
terminator in order for the destination to be a valid string.
It can't avoid trashing your memory, for same reason.

It's up to the caller to avoid trashing memory by ensuring that the
destination is big enough to hold the data to be copied into it.
(Admittedly strcpy() doesn't make this easy.)
 
J

Jorgen Grahn

The point is that strncpy is a very different function from strcpy.
It is not intended to work with a *string* in the target array;
it works with a specialized data structure (used to store file
names in very early Unix systems).

Malcolm McLean wrote something similar upthread. Do you have any
references for this?

It would explain the function's weird semantics, but I haven't seen
anything before which says this is its background. (There's also
wcsncpy() for wchar_t -- that one is certainly newer, and useless in
the data structures you mention.)

....
I think you're understating the differences between strcpy and
strncpy. The strncpy function is radically different from strcpy,
and there are very few legitimate uses for it. On the other hand,
the deceptive name has led many C programmers to use it incorrectly,
and I strongly suspect that it's used incorrectly far more often
than it's used correctly.

The static analysis tool we use at work screams bloody murder every
time I use strcpy() and tells me to use strncpy() instead. Argh ...

/Jorgen
 
S

Shao Miller

Shao said:
My impression is that in:

char foo[] = "foo";
char bar[3] = "bar";
char baz[10] = "baz";

each of these could be roughly equivalent to:

char foo[sizeof "foo"];
char bar[3];
char baz[10];

strncat(foo, "foo", sizeof "foo");
strncat(bar, "bar", 3);
strncat(foo, "baz", 10);

ITYM strncat(baz, "baz", 10);

Right. Except...
foo, bar and baz
would each have to be initialized with a null byte
in order to properly use strncat on them like that.

But, even if bar[3] had static duration
and so was initialized with a null byte,
strncat(bar, "bar", 3);
writes 4 characters and overruns the array.


n1570
7.24.3.2 The strncat function
Synopsis
1 #include<string.h>
char *strncat(char * restrict s1,
const char * restrict s2, size_t n);
Description
2 The strncat function appends not more than n characters
(a null character and characters that follow it are not appended)
from the array pointed to by s2 to the end of the string
pointed to by s1.
The initial character of s2 overwrites the null character
at the end of s1.
A terminating null character is always appended to the result.

Somehow, I completely was confusing 'strncpy' with 'strncat', here. :(
Please substitute 'strncpy' in place of 'strncat' in my post. :( Of
course, that makes it irrelevant to Keith's immediately-preceding post.

Thank you for the correction, pete!
 
P

Philip Lantz

Jorgen said:
Malcolm McLean wrote something similar upthread. Do you have any
references for this?

It would explain the function's weird semantics, but I haven't seen
anything before which says this is its background. (There's also
wcsncpy() for wchar_t -- that one is certainly newer, and useless in
the data structures you mention.)

I've never seen a reference for that either, but I guessed that this was
its purpose when I first learned the sematics of the function, oh, about
30 years ago, and I've assumed that ever since. Fixed-length Unix file
names are the only place I know of where a string buffer had to be
padded out to its full length with nulls.
The static analysis tool we use at work screams bloody murder every
time I use strcpy() and tells me to use strncpy() instead. Argh ...

Aargh! Well, presumably a non-broken fix will shut it up just as well.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,273
Latest member
DamonShoem

Latest Threads

Top