strtok and strtok_r

jacob · Sep 16, 2007

You have the right to misread anything.

I did not misread, I quoted exactly what the committee answered.

You have a responsibility, as
a self-proclaimed expert. to think with something other than your
gonads.

You address none of the technical points I raised. You do not explain
why the committee specifies an obviously too small buffer and fails
to provide for upper limits. But you throw "thinking with your
gonads" into the discussion to provoke an emotional atmosphere and
take people away from the technical discussion... since you have
absolutely NO technical arguments to propose.

The response you quoted clearly encompasses a variety
of nicer behaviors than buffer overflow, but you neglect to take
them in.

Absolutely not since I cited them. What I do not accept is that
the commitee explicitely says that they would rather have an
IMPLICIT UB instead of adjusting the size of the buffer or
providing an upper limit explicitely.

The committee *accepts* that buffer overflow can occur in a
conforming implementation. The same is true of:

int a[10];
a[300] = 4;

Great! Since in C anything goes (see above) let's make things
worse. Let's put that in the standard then!

The committee provides code for very few functions. For unknown
reasons then, they decided to put code into the standard text
that contains a clear buffer overflow problem.

And they persist into their error. Changing that 26 to a number
based on sizeof(INT_MAX) is beyond them, even if there is a clear
proof of how the calculation is done.

Why?

I do not know. In general, Mr Plauger is somebody that
jas written software of good quality and his book about the
C library has been a good inspiration for me. For this reasons
his attitude now is even more incomprehensible.

jacob

CBFalconer · Sep 16, 2007

Joachim said:
It claimed to be POSIX compliant. Prototype is indeed
size_t strlen(const char*s);
It's man-paged says it to return (size_t)-1 and set errno in case
of an error. Apparently strlen(NULL) isn't regarded an error, as
it segfaults then...

That is NOT -1. The cast of -1 to a size_t is exactly size_t_MAX.

Tor Rustad · Sep 16, 2007

Richard said:
Joe Wright said:

Sorry, Joe - I was guilty of truncated exegesis. What I meant was this:
that strcpy must keep going until it hits a null terminator, and it
doesn't know in advance where that null terminator will be found, so it
must test every character. So, although it isn't measuring the string
as such, that's only because it doesn't bother to write down how long
the string is. It's still ploughing through the string, character by
character. But we've already done that with our strlen call. By using
memcpy, we can take advantage of the fact that the string has already
been measured - memcpy can use any number of platform-specific tricks
for copying multiple bytes at a time. Therefore, if the length of the
string to be copied is known in advance, it is (likely to be) more
efficient to use memcpy than strcpy.

Well, some 5 years ago, I made a similar comment on your code Richard,
which was using strcpy() at the time. We had a rather "long" argument
about it, and in the end, I tried to make my point by measuring memcpy()
vs strcpy() performance.

IIRC, the result of those tests, was rather humiliating for me, as your
strcpy() performed excellent!

Is there a reason to beleave, that the strcpy() has become more CPU
bound in recent years? If not, I don't think you will have much success
in measuring an improvement by using memcpy().

Making good measurements on this, is a challenge. We don't want to
measure L1 cache performance only.

Richard Heathfield · Sep 16, 2007

Tor Rustad said:

Richard Heathfield wrote:

Well, some 5 years ago, I made a similar comment on your code Richard,
which was using strcpy() at the time. We had a rather "long" argument
about it, and in the end, I tried to make my point by measuring
memcpy() vs strcpy() performance.

IIRC, the result of those tests, was rather humiliating for me, as
your strcpy() performed excellent!

Whoops!

But really, I don't remember that at all. Sorry. I do,
however, recall that I used to use strcpy in those circumstances, and
now I use memcpy. Have I measured the difference? No, not really. I
care about performance enough not to want to throw it away willy-nilly,
but other than that I'm not really fussed. I try to focus more on
readability, correctness, and makessenseness. I guess the memcpy
argument just made sense to me (eventually!).

Making good measurements on this, is a challenge. We don't want to
measure L1 cache performance only.

I don't think it's that hard, actually. Write a program that can either
do a hundred million memcpying dupstrs or a hundred million strcpying
dupstrs, the choice being easily selectable by the user, and copies
data built from a predetermined PRNG (a hundred million strings of
varying lengths and contents), and records the results. Reboot machine.
Run program with Option A. Reboot machine. Run program with Option B.
Compare results.

Caches become irrelevant under these circumstances, I think, since any
cache benefit that one option gets will be cancelled by the fact that
the other option gets it too.

Keith Thompson · Sep 16, 2007

Sam Harris said:
Yeah, whatever. I'm a coder at a Fortune 500 company, I think I can just
about write a strdup function that works more than adequately on any
machine I'd ever want to run it on.

You haven't demonstrated it so far. Frankly, when I read your
implementation upthread I assumed it was a joke. You call realloc()
once for each character; why not compute the length and call malloc()
just once?

Keith Thompson · Sep 16, 2007

jacob navia said:
A buffer overflow happens when a fixed size memory area is defined
but a program writes PAST the fixed size buffer. This is a buffer
overflow.

Now, the standard specifies a buffer length of 26 for the buffer of
asctime.

[...]

Yes, calling asctime() with certain arguments can result in a buffer
overflow.

Calling strcpy() with certain arguments can result in a buffer
overflow. Likewise for sprintf(), sscanf(), memcpy(), memmove(),
strcat(), etc. In all these cases, the arguments passed are under the
program's control; the problem can reliably be avoided by checking the
arguments before invoking the function.

I happen to agree that asctime() should be defined to use a larger
buffer, one big enough so that the buffer won't overflow for any
possible arguments. But the problem is so easy to avoid that it's
hardly a fatal flaw in the language -- and it can't overflow if you
give it an argument corresponding to the current time (at least not
for the next 8000 years or so). It's certainly not nearly as
dangerous as gets().

I generally wouldn't use asctime() anyway. The format it uses isn't
my favorite (I prefer YYYY-MM-DD for dates), and the trailing '\n' is
more trouble than it's worth. In real code, I'd use strftime()
instead, which is more flexible and doesn't have asctime()'s problems.

Charlie Gordon · Sep 16, 2007

Keith Thompson said:
You haven't demonstrated it so far. Frankly, when I read your
implementation upthread I assumed it was a joke. You call realloc()
once for each character; why not compute the length and call malloc()
just once?

Frankly, I too thought the repeated calls to realloc was some sort of joke
from a forum regular trying to come up with the most inefficient yet correct
implementation and was surprised to find the small klotzy details I pointed
out.

If you are actually proud of the code you posted, and consider that a good
example of what you are paid for by a large corporation, shame on you ! You
have some serious progress to make to reach 'decent' status. So far you
qualify for 'best of the worst'. I guess being the best is what prompts
your arrogance, but rest assured everyone here can come up with an even
worse proposal, one you would not even understand.

No matter how efficient and powerful the hardware guys make their products,
there will be software bums to destroy these gains, and managers to come up
with lame excuses and marketers to ship lousy crap. Sturgeon was so right!

Willem · Sep 16, 2007

Sam wrote:
) Yeah, whatever. I'm a coder at a Fortune 500 company, I think I can just
) about write a strdup function that works more than adequately on any
) machine I'd ever want to run it on.

The reason people become coders at Fortune 500 companies is not because
they are any good at coding, but it's because they are good at quickly
churning out lots of code that passes acceptance tests. The more you
care about actual code quality, the less quantity you can churn out,
and the less productive you seem to managers. Managers, of course,
are the kind of person who gets other people to do their work for
them, so they wouldn't know code quality if it bit them in their
faces. So that nicely sums up how much weight your remark has.
Of course, you might be joking (put more precisely: Trolling).

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

CBFalconer · Sep 16, 2007

Keith said:
.... snip ...

I generally wouldn't use asctime() anyway. The format it uses isn't
my favorite (I prefer YYYY-MM-DD for dates), and the trailing '\n' is
more trouble than it's worth. In real code, I'd use strftime()
instead, which is more flexible and doesn't have asctime()'s problems.

Point of order: That isn't just a preference, it is adhering to
ISO date format specification.

Richard Heathfield · Sep 16, 2007

Willem said:

Sam wrote:
) Yeah, whatever. I'm a coder at a Fortune 500 company, I think I can
just ) about write a strdup function that works more than adequately on
any ) machine I'd ever want to run it on.

The reason people become coders at Fortune 500 companies is not because
they are any good at coding, but it's because they are good at quickly
churning out lots of code that passes acceptance tests. The more you
care about actual code quality, the less quantity you can churn out,
and the less productive you seem to managers. Managers, of course,
are the kind of person who gets other people to do their work for
them, so they wouldn't know code quality if it bit them in their
faces. So that nicely sums up how much weight your remark has.
Of course, you might be joking (put more precisely: Trolling).

It is a great shame that computer programming has become such a
commoditised task, with individual excellence being suppressed
by artificial deadlines and ludicrous budgets, so that nobody
who actually cares about the quality of the source code they
produce is able to spend the necessary time on it to get it
right, if they wish to compete in the market-place against
those who are perfectly content to churn out any old junk
as long as it can pass a badly-designed UAT. Our society
gets the bugs it deserves, by failing to insist on only
the highest quality code. This may explain the current
trend towards bozo-friendly languages that eschew all
pretence of high performance in favour of protecting
the programmer against his own silly mistakes. This
is why we are saddled with Gates's Law ("the speed
of software halves every eighteen months"). If we
insisted on programmers knowing their subject as
we insist on brain surgeons knowing theirs, the
whole of our society would have better, faster
software; software on which it could rely. To
buck the market, though, is becoming far too
expensive, and so it is unlikely that we'll
ever get high quality software, unless the
market manages to find a way to encourage
quality across the industry, rather than
punishing those companies that have the
courage and integrity to turn out high
quality programs, albeit at a greater
initial cost and therefore at prices
that seem unattractive to the naive
software purchaser. If we are able
to discover such a way, the whole
of society will be better off as
a result. It will not be simple
to accomplish such a change in
the current market-place, but
if we do not do so, then the
software we continue to use
day by day will remain, as
now, broken by mis-design
and an embarrassing wart
on an advanced society.

Joe Wright · Sep 16, 2007

Keith said:
Since returning (size_t)-1 is non-standard behavior (though it's
allowed), I'm not likely to check for it.

I don't want to check strlen for error. SIZE_MAX may well be valid.
Passing in a NULL should be a NOP in my view.

size_t strlen(const char *s) {
size_t r = 0;
if (s) while (*s++) ++r;
return r;
}

Joachim Schmitz · Sep 16, 2007

Joe Wright said:
I don't want to check strlen for error. SIZE_MAX may well be valid.
Passing in a NULL should be a NOP in my view.

size_t strlen(const char *s) {
size_t r = 0;
if (s) while (*s++) ++r;
return r;
}

Well, that version of strlen doesn't distinguish between a NULL and an empty
string. This would:
size_t strlen(const char *s) {
size_t r = (size_t)-1;
if (s) while (*s++) ++r;
return ++r;
}

And neither is a NOP if being passed a NULL...

Bye, Jojo

Walter Roberson · Sep 16, 2007

Joe Wright said:
I don't want to check strlen for error. SIZE_MAX may well be valid.
Passing in a NULL should be a NOP in my view.

size_t strlen(const char *s) {
size_t r = 0;
if (s) while (*s++) ++r;
return r;
}

Then how will you distinguish between the string containing just
the terminating nul, and the null pointer?? strlen() is often
used to determine array indices; you don't want to be indexing
the NULL pointer (for one thing, the result of the indexing
might get you to a readable or writable memory location -- and yes,
there are real systems on which virtual addresses near 0 are
accessible.)

jacob navia · Sep 16, 2007

CBFalconer said:
Keith Thompson wrote:
... snip ...

Point of order: That isn't just a preference, it is adhering to
ISO date format specification.

My point is that, exactly. The *specification* is flawed in the sense
that it doesn't specify a maximum range of the input, but prescribes a
maximum length for the buffer in the example code given as illustration!

No error returns are ever specified. The proposed correction by Mr
Cleaver said to fill the overflowing fields with the character '*'...

Not even that was allowed.

jacob

Tor Rustad · Sep 16, 2007

Richard said:
Tor Rustad said:

Whoops! But really, I don't remember that at all. Sorry.

No worries, this might even have been 6-7 years ago.

I do,
however, recall that I used to use strcpy in those circumstances, and
now I use memcpy. Have I measured the difference? No, not really. I
care about performance enough not to want to throw it away willy-nilly,
but other than that I'm not really fussed.

My argument back then, was similar to yours now. However, to my big
surprise, we had to put this into the micro-optimalization category,
after measurements on quite a number of different compilers and platforms.

I try to focus more on readability, correctness, and makessenseness.

Me too.

I guess the memcpy argument just made sense to me (eventually!).

I don't think it's that hard, actually. Write a program that can either
do a hundred million memcpying dupstrs or a hundred million strcpying
dupstrs, the choice being easily selectable by the user, and copies
data built from a predetermined PRNG (a hundred million strings of
varying lengths and contents), and records the results. Reboot machine.
Run program with Option A. Reboot machine. Run program with Option B.
Compare results.

Caches become irrelevant under these circumstances, I think, since any
cache benefit that one option gets will be cancelled by the fact that
the other option gets it too.

Making good measurements are usually a challenge, I have rarely seen
measurement code without some serious flaws or defects. I can't remember
the quality of the benchmark we used years ago, but I would expect it to
be a better starting point, than writing a new one from scratch.

Peter J. Holzer · Sep 16, 2007

I don't want to check strlen for error. SIZE_MAX may well be valid.

I don't think it should be. If strlen(s) was SIZE_MAX, then the total
size of s (including the terminating NUL) would be SIZE_MAX+1, which
isn't representable in a size_t. So that should not be possible.

Passing in a NULL should be a NOP in my view.

I think it's a bug which should result in a segfault.

hp

Keith Thompson · Sep 16, 2007

Richard Heathfield said:
Willem said:
It is a great shame that computer programming has become such a
commoditised task, with individual excellence being suppressed
by artificial deadlines and ludicrous budgets, so that nobody
who actually cares about the quality of the source code they
produce is able to spend the necessary time on it to get it
right, if they wish to compete in the market-place against
those who are perfectly content to churn out any old junk
as long as it can pass a badly-designed UAT. Our society
gets the bugs it deserves, by failing to insist on only
the highest quality code. This may explain the current
trend towards bozo-friendly languages that eschew all
pretence of high performance in favour of protecting
the programmer against his own silly mistakes. This
is why we are saddled with Gates's Law ("the speed
of software halves every eighteen months"). If we
insisted on programmers knowing their subject as
we insist on brain surgeons knowing theirs, the
whole of our society would have better, faster
software; software on which it could rely. To
buck the market, though, is becoming far too
expensive, and so it is unlikely that we'll
ever get high quality software, unless the
market manages to find a way to encourage
quality across the industry, rather than
punishing those companies that have the
courage and integrity to turn out high
quality programs, albeit at a greater
initial cost and therefore at prices
that seem unattractive to the naive
software purchaser. If we are able
to discover such a way, the whole
of society will be better off as
a result. It will not be simple
to accomplish such a change in
the current market-place, but
if we do not do so, then the
software we continue to use
day by day will remain, as
now, broken by mis-design
and an embarrassing wart
on an advanced society.

Let me just mention that
your demonstration of a
really impressive form
of ASCII graphics and
of unorthodox layout
in a Usenet posting
makes it difficult
to reply worthily
to your missive.
With every line
my choice of a
vocabulary is
narrower and
words leave
me behind.
Still, we
continue
a shape
beyond
sense
into
the
OT
..

Mark McIntyre · Sep 16, 2007

On Sat, 15 Sep 2007 16:23:38 -0700, in comp.lang.c ,

In general, Mr Plauger is somebody that
jas written software of good quality and his book about the
C library has been a good inspiration for me. For this reasons
his attitude now is even more incomprehensible.

Perhaps it has something to do with your attitude of extreme pomposity
and your continual rudeness to various posters here which brings out
the worst in others?
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mark McIntyre · Sep 16, 2007

CBFalconer wrote:

My point is that, exactly.

Er, no - the point you go on to make below has *nothing* to do with
CBF's posting.

(of some proposed solution to asctime overflowing)

Not even that was allowed.

From which you apparently draw the conclusion that the ISO committee
were a shower of jobsworths, idiots and fools. Does that really seem
likely to you?
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Keith Thompson · Sep 16, 2007

jacob navia said:
My point is that, exactly. The *specification* is flawed in the sense
that it doesn't specify a maximum range of the input, but prescribes a
maximum length for the buffer in the example code given as illustration!

No error returns are ever specified. The proposed correction by Mr
Cleaver said to fill the overflowing fields with the character '*'...

Not even that was allowed.

Of course it's allowed. Anything is allowed for undefined behavior.

strtok and strtok_r	6	Sep 14, 2007
Why does strcat mess up the tokens in strtok (and strtok_r)?	92	Jun 11, 2014
strtok	6	Nov 25, 2005
Can't solve problems! please Help	0	Sep 26, 2022
How to compare these variables by use operator 'AND'	5	Sep 22, 2003
Tokenizer Function (plus rant on strtok documentation)	18	Jul 11, 2006
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
How Do I Set text on an Image and use the image as a border?	7	Mar 16, 2023

strtok and strtok_r

jacob

CBFalconer

Tor Rustad

Richard Heathfield

Keith Thompson

Keith Thompson

Charlie Gordon

Willem

CBFalconer

Richard Heathfield

Joe Wright

Joachim Schmitz

Walter Roberson

jacob navia

Tor Rustad

Peter J. Holzer

Keith Thompson

Mark McIntyre

Mark McIntyre

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads