Bug/Gross InEfficiency in HeathField's fgetline program

Keith Thompson · Nov 7, 2007

Charlie Gordon said:
Hence my advice: just don't use it.

I disagree.

A string function deals with a certain data structure, called a
"string" by the C standard, consisting of "a contiguous sequence of
characters terminated by and including the first null character" (C99
7.1.1p1).

strncpy() deals with a different data structure, one that has no name
that I'm aware of, consisting of an array of N characters of which the
first M are significant, and the remaining N-M characters are all set
to '\0', where M may be equal to N. Such a data structure happens to
contain a "string" *unless* M==N.

This latter data structure is not very commonly used, but as we've
seen it it used sometimes. If you happen to need it, strncpy() is
probably just the thing (though it would have been easy enough to roll
your own function for the purpose).

The facts that this relatively obscure data structure is supported in
the standard, that there's only one standard function that supports
it, and that that function has a name that misleadingly implies that
it's a string function, are all historical accidents. I don't recall
anyone claiming that the standard C library is a model of coherent
design.

RoS · Nov 7, 2007

In data 06 Nov 2007 09:43:14 +0100, Jean-Marc Bourguet scrisse:

My impression was that there are reglementary texts who precise exactly how
those should be computed, included the rounding rules.

BTW, I doubt very much that they have any relationship with the rounding
rules of any implementation of floating point, even decimal FP. And we
know the problems caused by double rounding.

i don't know them...
can you please to give one simple example of it?

Malcolm McLean · Nov 7, 2007

Tor Rustad said:
We call this a transaction, which typically is done with double-entry
bookkeeping.

The only way to convert dollars into pounds is for the bank / the Americans
here to offer to swap their dollars with someone who has pounds. Like me,
except that as a private individual I've no real use for dollars.
You cannot pulp dollars and do a print run of the equivalent amount in
pounds.

It follows that any valuation of, say, a British company's dollar holdings
in pounds is a notional value. Based on market prices, they are saying what
they expect to be able to trade those dollars for. It doesn't differ in that
respect form their valuation of head office, or unsold inventory. Again,
there are rules, but these are best guess amounts that may not actually be
realised.

This has been explained to you in a previous post. Whilst a programmer
cannot be expected to have much understanding of the financial world, it is
not a very edifying sight to see you criticise Richard Heathfield so
vociferously, after you've been told once what the situation is.

Dik T. Winter · Nov 7, 2007

> In data 06 Nov 2007 09:43:14 +0100, Jean-Marc Bourguet scrisse: ....
>
> i don't know them...
> can you please to give one simple example of it?

If we use bankers rule as I understand it (if the first digit rounded
away is 0 to 4, we round down, otherwise we round up), starting with
0.3476 first to two decimals after the point and then to one decimal
we get first 0.35 and from that we get 0.4. Doing it directly we get 0.3.

And i understand that financial calculation and rounding rules are very
precise in the US.

Antoninus Twink · Nov 8, 2007

Wow, just checking in on this thread again, and it seems to have taken
on a life of its own. It seems like a lifetime ago that it started with
some constructive criticism I made of some code by Richard HeathField's,
which he took with singularly bad grace.

Charlie Gordon said:

...if it ever existed...

Right. An unpublished proof, as far as the mathematics community is
concerned, might as well not exist. I have an interesting demonstration of
this fact, but this Usenet article is too large to contain it.

There are really two meanings of "theorem" being discussed.
Mathematicians use "theorem" all the time as a slightly loose word
denoting a statement of which an explicit proof is known (i.e. has been
written down completely, or experts believe they could write down
completely if forced to).

However, in meta-mathematics (a.k.a. logic), "theorem" has a precise
definition: in a logical system (i.e. a formal grammar plus a set of
axioms plus laws of deduction), a theorem is a sentence that can be
obtained by applying the axioms and laws of deduction a finite number of
times. So a sentence phi is a theorem if a proof exists (abstractly);
it's no more or less a theorem if no one's actually written down the
proof.

We certainly want our logical systems to be sound (i.e. any theorems we
prove should be true); ideally, we'd also like them to be complete (i.e.
we should be able to prove everything that's true). Simple logical
systems like propositional calculus and first-order logic are both sound
and complete; famously, second-order logic (needed to formulate modern
mathematics) was shown by Goedel not to be complete.

Richard Heathfield · Nov 9, 2007

Antoninus Twink said:

Wow, just checking in on this thread again, and it seems to have taken
on a life of its own.

Yes, it moved on to discuss real issues.

It seems like a lifetime ago that it started with
some constructive criticism I made of some code by Richard HeathField's,
which he took with singularly bad grace.

Had the criticism been valid, it would have been welcome. A couple of other
recent threads have demonstrated this adequately. But your criticism was
broken and misguided, as the earlier part of this thread shows. Next time,
think and test before posting.

Charlie Gordon · Nov 12, 2007

Keith Thompson said:
It's a clue that the person who wrote that description of strncpy()
got it wrong. A "string" in C is null terminated by definition; if
it's not null terminated; it's not a string. C99 7.1.1p1: "A string
is a contiguous sequence of characters terminated by and including the
first null character."

The portion you quoted also doesn't say that null characters are
appended if the source array is a string shorter than n characters;
perhaps you just didn't include that part.

The existence of documentation that incorrectly describes the
strncpy() function doesn't prove that it's a "string function". Out
of curiosity, where did you get that description? If it's from a
current system, you might consider submitting a bug report.

The linux man page for strncpy contains that language. Shame on them.

You see what makes it so childishly simple is the "n" bit in the
name. it doesn't take a genius to figure out that n means
something. Possibly the number of characters to copy? Surely not!

Click to expand...

[...]

If I didn't know about the strncpy() function, and didn't have access
to the documentation, I'd probably assume that it's a safer version of
strcpy() that lets you specify the size of the target array. A call
like
strncpy(dest, source, n);
would (hypothetically) behave like strcpy(dest, source), except that
it wouldn't copy more than n characters in the dest array. If
strlen(source)+1 <= n, it would behave just like strcpy(dest, source).
Otherwise, it would copy the first n-1 characters of source into dest,
and set dest[n-1] to '\0'. It would not copy additional '\0'
characters into the dest array. In any case, as long as source points
to a valid string and dest points to an array of at least n
characters, dest would point to a valid string after the call.

That's exactly what BSD's strlcpy does, and I agree with you that it's what
an unsuspecting programmer would expect.

In other words, this:
strncpy(dest, source, n);
would be equivalent to this:
dest[0] = '\0';
strncat(dest, source, n);

NO! strncat is misleading too, but in a different way: it copies no more
than n characters to the end of dest, and appends a '\0' terminator. Thus
strlcpy(dest, source, n) would be equivalent to

if (n > 0) {
dest[0] = '\0';
strncat(dest, source, n - 1);
}

There are several other functions with an added 'n' in their names
that behave in this manner: strncat, snprinf, vsnprintf.

Not strncat. But [v]snprintf do, as well as other functions in the C
library: fgets (with size passed as an int for obscure reasons), and some
that do not truncate the destination: strftime, [v]swprintf, wcsxfrm,
wcsftime...

The behavior of strncpy() is *not obvious*. I completely agree that
programmers should not attempt to use it (or any other function)
without understanding how it works (though a beginning programmer
doesn't need to understand the details of format strings to write
``printf("hello, world\n");''). I don't mind having strncpy() in the
standard library, but I wish it had a different name, and I wish that
there were a strncpy() function that behaves as I've described above.
But it's far too late to make such a change.

I agree, except that I do mind that strncpy be in the Standard library, and
I don't think it is too late to try and deprecate it of at least discourage
its use.

Some questions:

What *exactly* do you mean by the phrase "string function"?

Pretty much yours.

Does strncpy() meet your definition?

Not really: it fits my definition of broken, ill-fated, useless,
to-be-deprecated function.

Do you use strncpy() in your own code?

I most certainly don't ! I have a #define in our in house include files to
prevent any use of this function a few other ones such as gets.

Keith Thompson · Nov 12, 2007

Charlie Gordon said:
The linux man page for strncpy contains that language. Shame on them.

Some Linux systems have that wording, but at least one (RHEL 3) has:

The strncpy() function is similar, except that not more than n
bytes of src are copied. Thus, if there is no null byte among the
first n bytes of src, the result will not be null-terminated.

[snip]

In other words, this:
strncpy(dest, source, n);
would be equivalent to this:
dest[0] = '\0';
strncat(dest, source, n);

Click to expand...

NO! strncat is misleading too, but in a different way: it copies no more
than n characters to the end of dest, and appends a '\0' terminator. Thus
strlcpy(dest, source, n) would be equivalent to

I think you mean "strncpy(dest, source, n).

if (n > 0) {
dest[0] = '\0';
strncat(dest, source, n - 1);
}

You're right.

[snip]

Pretty much yours.

Do you mean that you don't know (or much care) what the phrase "string
function" means? Because that's my definition.

Not really: it fits my definition of broken, ill-fated, useless,
to-be-deprecated function.

I most certainly don't ! I have a #define in our in house include files to
prevent any use of this function a few other ones such as gets.

The above questions were actually meant to be directed at Richard Riley.

Flash Gordon · Nov 12, 2007

Tor Rustad wrote, On 07/11/07 01:00:

Flash said:
Flash said:

Tor Rustad wrote, On 06/11/07 11:38:
[...]

This means that the only way to convert money from one currency
to another is to move it from one account to another, and

Click to expand...

We call this a transaction, which typically is done with double-entry
bookkeeping.

Indeed. We talk about transactions as well

"I'm struggling to imagine any real-world double-entry-relevant
calculation that /cannot/ be done exactly."
-RH

If you consider the rounding to to be part of the conversion equation
rather than something that is done afterwards then the calculation *is*
done exactly.

The whole point, is that this calculation uses rounding.

It is part of the calculation, so the calculation overall is done exactly.

Do UK accounts
have two decimal places?
Yes.

Nothing you said, invalidated my statement here...

I think we have a scoping problem. I consider the equation (restated) to be
100 => 50
101 => 50
102 => 51

You consider it to be
101 => 50.5
With a rounding done after.

I have been trying to leave this thread for some time...

Well, on the application my brother was involved in they did use
floating point variables because no other variable type could cope with
the range. They just accepted that there would be errors and ensured
that they were consistent. Personally I think that was the wrong
decision, but...

Anyway, some details from an insider is very interesting. In particular,
a related problem, how UK banks can implement SEPA w.r.t. double-entry
bookkeeping and risk management.

That I don't know.

In this case, the transactions will be in euro, while the EU payment
scheme rulebook say nothing about possible currency conversion, or the
related risks for the banks.

They probably use whatever method they used before the euro

Neither did I. ;-)

OK, we are all agree Richard makes mistakes. Whether we are talking
about the same Richard is, of course, another matter ;-)

Richard Heathfield · Nov 12, 2007

Flash Gordon said:

Tor Rustad wrote, On 07/11/07 01:00:

OK, we are all agree Richard makes mistakes.

Strictly speaking, we all agree that I don't believe he doesn't make
mistakes. That does not mean we all agree that I *do* make mistakes. Maybe
we do, and maybe we don't, but we haven't said so yet.

Flash Gordon · Nov 12, 2007

Richard Heathfield wrote, On 12/11/07 21:29:

Flash Gordon said:

Strictly speaking, we all agree that I don't believe he doesn't make
mistakes.

OK, can we all agree that I make mistakes?

That does not mean we all agree that I *do* make mistakes. Maybe
we do, and maybe we don't, but we haven't said so yet.

Well, I believe that Richard makes mistakes, but I'm not going to state
which Richard I am referring to yet.

Peter Nilsson · Nov 12, 2007

Keith Thompson said:
I disagree.

A string function deals with a certain data structure, called
a "string" by the C standard, consisting of "a contiguous
sequence of characters terminated by and including the first
null character" (C99 7.1.1p1).

strncpy() deals with a different data structure,

But still caters for strings.

one that has no name that I'm aware of, consisting of an
array of N characters of which the first M are significant,
and the remaining N-M characters are all set to '\0',
where M may be equal to N. Such a data structure happens
to contain a "string" *unless* M==N.

In the old days, this is what I called a record field, though
in some cases a space was used for padding rather than 0.
Fixed width fields, and checksums too apparently, are a thing
of the past.

The facts that this relatively obscure data structure is
supported in the standard, that there's only one standard
function that supports it,

Huh? fread, fwrite, scanf, printf, memcpy, memcmp... to name
a few. And that's obviously excluding other constructs like
zero initialisation.

and that that function has a name that misleadingly implies
that it's a string function, are all historical accidents.

Merely historical in my opinion. I don't believe they were
an accident at the time. In those days, people were using
fixed width fields, and they were squeezing as much out of
limited memory as possible.

The function _can_ be used to copy prefixes of strings to
a string. And it _will_ terminate the destination with a
sufficiently large enough buffer. If you're going to say
that it's not a string function because you can't guarantee
the source or destination will be strings, then you may as
well say that strcpy is not a string function!

I don't recall anyone claiming that the standard C library
is a model of coherent design.

Although apparently single byte character constants being
int and not single byte characters is perfectly coherent!

James Kuyper · Nov 12, 2007

Peter said:
Huh? fread, fwrite, scanf, printf, memcpy, memcmp... to name

Neither fread() nor scanf() stops reading at a null character, nor does
either one fill the rest of a character array with null characters after
having read one. Neither fwrite() nor memcpy() stop writing at a null
character, nor does either one write extra null characters from that
point onward. memcmp() doesn't pay any special attention to null
characters. printf() can be forced to stop writing a string at a fixed
number of bytes, but the corresponding argument is still required to be
a null-terminated string, which is not guaranteed for this data
structure. Only strncpy() does these things, which makes it a very
anomalous function.

....

The function _can_ be used to copy prefixes of strings to
a string. And it _will_ terminate the destination with a
sufficiently large enough buffer. If you're going to say
that it's not a string function because you can't guarantee
the source or destination will be strings, then you may as
well say that strcpy is not a string function!

strcpy() requires that its input be a string, and guarantees, when all
of it's requirements have been met, that it's output is also a string.

Although apparently single byte character constants being
int and not single byte characters is perfectly coherent!

Well, the standard C languange isn't exactly a model of coherent design
either; it has too much history to cope with to allow that.

William Hughes · Nov 13, 2007

The function _can_ be used to copy prefixes of strings to
a string. And it _will_ terminate the destination with a
sufficiently large enough buffer. If you're going to say
that it's not a string function because you can't guarantee
the source or destination will be strings, then you may as
well say that strcpy is not a string function!

Hardly. If you use strcpy correctly, then you *can* guarantee that
that the source and destination will be strings. (If you do not
use it correctly you can guarantee exactly nothing, don't stop the
presses) If you use strncpy correctly then you cannot guarantee that
the source and destination will be strings.

- William Hughes

Richard Heathfield · Nov 13, 2007

Flash Gordon said:

Richard Heathfield wrote, On 12/11/07 21:29:

OK, can we all agree that I make mistakes?

No, I think you're mistaken about that.

Tor Rustad · Nov 13, 2007

Richard said:
Flash Gordon said:

Strictly speaking, we all agree that I don't believe he doesn't make
mistakes. That does not mean we all agree that I *do* make mistakes. Maybe
we do, and maybe we don't, but we haven't said so yet.

Not quite, when we agreed thinking that Richard didn't beleave he is
always correct, it did follow that we believed Richard will be wrong at
some point in space-time. When this event has occurred, Richard must
have been wrong. Either Richard was right being wrong, or Richard was
wrong believing being wrong.

OTOH, what we all believe now, might have changed, so that require a
whole new discussion. That conclude my views on the matter...

user923005 · Nov 13, 2007

Not quite, when we agreed thinking that Richard didn't beleave he is
always correct, it did follow that we believed Richard will be wrong at
some point in space-time. When this event has occurred, Richard must
have been wrong. Either Richard was right being wrong, or Richard was
wrong believing being wrong.

OTOH, what we all believe now, might have changed, so that require a
whole new discussion. That conclude my views on the matter...

I remember clearly when Richard was wrong. It was when he thought
that he was mistaken. He also cut himself once, shaving with Occam's
razor. Seriously, Richard Heathfield is one of the frequent posters
to who knows a lot about the C language. I have
learned things from Richard's posts on many occasions. He has a very
wry and dry sense of British humor that sometimes goes over everyone's
head, including mine.

I suspect that the topicality has had a wee bit of drift here.

Fibonacci	0	May 13, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
C language. work with text	3	Dec 9, 2021
code review	26	Feb 6, 2004
Can't solve problems! please Help	0	Sep 26, 2022
compressing charatcers	35	Apr 2, 2014
Strange bug	65	Nov 19, 2010
K&R exercise 5-5	10	Feb 19, 2007

Bug/Gross InEfficiency in HeathField's fgetline program

Keith Thompson

RoS

Malcolm McLean

Dik T. Winter

Antoninus Twink

Richard Heathfield

Charlie Gordon

Keith Thompson

Flash Gordon

Richard Heathfield

Flash Gordon

Peter Nilsson

James Kuyper

William Hughes

Richard Heathfield

Tor Rustad

user923005

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads