Why doesn't strrstr() exist?

Lawrence Kirby · Aug 31, 2005

ptrdiff_t is supposed to be defined as a type wide enough to
accommodate *any* possible result of a valid subtraction of
pointers to objects. If an implementation doesn't *have* a
suitable integer type, that is a deficiency..

The standard disagrees with you. 6.5.6p9 says:

"When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object; the
result is the difference of the subscripts of the two array elements. The
size of the result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header. If the
result is not representable in an object of that type, the behavior is
undefined. ..."

It states very clearly in the last sentence that the result of pointer
subtraction need not be representable as a ptrdiff_t. In such a case you
get undefined behaviour i.e. it is the program that is at fault, not the
implementation.

Lawrence

Bart van Ingen Schenau · Aug 31, 2005

Douglas said:
Judging by some of the reported bugs, one wonders.

Just a few weeks ago, there was an IAVA for a bug in
Kerberos v5 that was essentially of the form
if (!try_something) {
error_flag = CODE;
free(buffer);
}
free(buffer);
How that could have passed even a casual code review
is a mystery.

I recently had to find out why two of our build environments were
behaving slightly differently from each other.
It turned out that a largely unrelated component, which is only included
in one of the environments, did something in the line of
EnterCriticalSection();
if (condition)
return;
ExitCriticalSection();
and due to the architecture of our system, this affected quite a number
of processes.

The simple answer is that the C standard is a specification
document, not a programming tutorial. Such a warning
properly belongs in the Rationale Document, not in the spec.

I think that marking gets() as deprecated should be warning enough for
people to understand that the function must not be used for new code or
code that is being prepared for a new compiler version.

Bart v Ingen Schenau

Tim Rentsch · Aug 31, 2005

Lawrence Kirby said:
The standard disagrees with you. 6.5.6p9 says:

"When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object; the
result is the difference of the subscripts of the two array elements. The
size of the result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header. If the
result is not representable in an object of that type, the behavior is
undefined. ..."

It states very clearly in the last sentence that the result of pointer
subtraction need not be representable as a ptrdiff_t. In such a case you
get undefined behaviour i.e. it is the program that is at fault, not the
implementation.

Actually it doesn't say that the result need not be representable;
only that *if* the result is not representable then something else
is true. The cited paragraph doesn't say that the "if" clause can
be satisfied. We might infer that it can, but the text doesn't say
that it can.

I've been unable to find any statement one way or the other about
whether ptrdiff_t must accommodate all pointer subtractions done on
valid arrays. Consider for example the following. Suppose
PTRDIFF_MAX == INT_MAX, and SIZE_MAX == UINT_MAX; then

char *p0 = malloc( 1 + (size_t)PTRDIFF_MAX );
char *p1 = p0 + PTRDIFF_MAX;
char *p2 = p1 + 1;
size_t s = p2 - p0;

If undefined behavior prevents the last assignment from working,
what are we to conclude? That size_t is the wrong type? That
SIZE_MAX has the wrong value? That malloc should have failed
rather than delivering a too-large object? That ptrdiff_t is
the wrong type? Or that all standard requirements were met,
and the problem is one of low quality of implementation?

I believe, in the absense of an explicit statement to the contrary,
the limits on ptrdiff_t and PTRDIFF_MAX allow the possibility that
a difference of valid pointers to a valid array object might not be
representable as a value of type ptrdiff_t. I also think the
language would be improved if there were a requirement that a
difference of valid pointers to a valid array object must always be
representable as a value of type ptrdiff_t.

However, whether the language does or does not, or should or should
not, have such a requirement, the *standard* would be improved if it
included an explicit statement about whether this requirement must
be met.

Randy Howard · Aug 31, 2005

Douglas A. Gwyn wrote

(in article said:
The simple answer is that the C standard is a specification
document, not a programming tutorial. Such a warning
properly belongs in the Rationale Document, not in the spec.

Okay, where can I obtain the Rationale Document that warns
programmers not to use gets()?

Wojtek Lerch · Aug 31, 2005

Randy said:
Okay, where can I obtain the Rationale Document that warns
programmers not to use gets()?

http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf

Keith Thompson · Aug 31, 2005

Chris Hills said:
(e-mail address removed) writes [...]

For a nuclear reactor, I would also include the requirement that they
use a safer programming language like Ada.

Click to expand...

As the studies have shown.. language choice has a minimum impact on
errors.

Personally I would be
shocked to know that *ANY* nuclear reactor control mechanism was
written in C. Maybe a low level I/O driver library, that was
thoroughly vetted (because you probably can't do that in Ada), but
that's it.

Click to expand...

Which destroys your argument! Use Ada because it is safe but the
interface between Ada and the hardware is C.... So effectively C
controls the reactor.

Just to correct the misinformation, there's no reason a low level I/O
driver library couldn't be written in Ada. The language was designed
for embedded systems. Ada can do all the unsafe low-level stuff C can
do; it just isn't the default.

Keith Thompson · Aug 31, 2005

Wojtek Lerch said:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf

Which says, in 7.19.7.7:

Because gets does not check for buffer overrun, it is generally
unsafe to use when its input is not under the programmer's
control. This has caused some to question whether it should
appear in the Standard at all. The Committee decided that gets
was useful and convenient in those special circumstances when the
programmer does have adequate control over the input, and as
longstanding existing practice, it needed a standard
specification. In general, however, the preferred function is
fgets (see 7.19.7.2).

Personally, I think the Committee blew it on this one. I've never
heard of a real-world case where a program's input is under
sufficiently tight control that gets() can be used safely. On the
other hand, I have seen numerous calls to gets() in programs that are
expected to receive interactive input. As far as I know, the
"longstanding existing practice" cited in the Rationale is the
*unsafe* use of gets(), not the hypothetical safe use.

In the unlikely event that I were implementing a system where I had
that kind of control over a program's stdin, I'd still use fgets() so
I could do some error checking, at least in the context of unit
testing. Even in such a scenario, I'd be far more likely to read from
a source other than stdin, where gets() can't be used anyway.

I just found 13 calls to gets() in the source code for a large
software package implemented in C (which I prefer not to identify).
They were all in small test programs, not in production code, and they
all used buffers large enough that an interactive user is not likely
to overflow it -- but that's no excuse for writing unsafe code.

I'd be interested in seeing any real-world counterexamples.

Randy Howard · Aug 31, 2005

Wojtek Lerch wrote

(in article said:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf

Indeed. It doesn't exactly make the point very clearly, or
pointedly. Somehow "generally unsafe" doesn't seem strong
enough to me.

Anonymous 7843 · Aug 31, 2005

Personally, I think the Committee blew it on this one. I've never
heard of a real-world case where a program's input is under
sufficiently tight control that gets() can be used safely. On the
other hand, I have seen numerous calls to gets() in programs that are
expected to receive interactive input. As far as I know, the
"longstanding existing practice" cited in the Rationale is the
*unsafe* use of gets(), not the hypothetical safe use.

I think gets() could be made safer with some minor changes.

Add something like "#define GETSMAX n" to stdio.h, where n is an
implementation-defined constant with a guaranteed minimum. Then,
redefine gets() such that it is guaranteed to never put more than
GETSMAX-1 characters plus the trailing \0 into the buffer. Additional
characters in the input will be thrown away.

Code that uses gets() could then be made "safe" by making sure the
buffer passed in has at least GETSMAX characters available.

An interesting alternative to this would be to provide a function or
variable that can be set by the programmer at run time to alter gets()
max length behavior, something like setgetssize(size_t n). This would
allow an existing program filled with declarations like "char
inbuf[80]" to be fixable with one line.

Of course, nothing is stopping programmers from writing their own
line-oriented input function with exactly the interface they like. For
something on the order of gets() of fgets() it wouldn't take very long.

Keith Thompson · Aug 31, 2005

I think gets() could be made safer with some minor changes.

I disagree.

Add something like "#define GETSMAX n" to stdio.h, where n is an
implementation-defined constant with a guaranteed minimum. Then,
redefine gets() such that it is guaranteed to never put more than
GETSMAX-1 characters plus the trailing \0 into the buffer. Additional
characters in the input will be thrown away.

Code that uses gets() could then be made "safe" by making sure the
buffer passed in has at least GETSMAX characters available.

Assuming such a change is made in the next version of the standard, or
widely implemented as an extension, code that uses the new safe gets()
will inevitably be recompiled on implementations that provide the old
unsafe version.

The solution is to eradicate gets(), not to fix it.

To paraphrase Dennis Ritchie's comments on the proposed "noalias"
keyword (and not to imply that he does or doesn't agree with me on
gets()):

gets() must go. This is non-negotiable.

Wojtek Lerch · Aug 31, 2005

Randy Howard said:
Wojtek Lerch wrote

Indeed. It doesn't exactly make the point very clearly, or
pointedly. Somehow "generally unsafe" doesn't seem strong
enough to me.

Sure. Whatever. I don't think a lot of programmers learn C from the
Standard or the Rationale anyway. It should be the job of teachers and
handbooks to make sure that beginners realize that it's not a good idea to
use gets(), or to divide by zero, or to cause integer overflow.

On the other hand, I don't think it would be unreasonable for the Standard
to officially declare gets() as obsolescent in the "Future library
directions" chapter.

Chris Hills · Sep 1, 2005

Keith Thompson <kst- said:
Which says, in 7.19.7.7:

Because gets does not check for buffer overrun, it is generally
unsafe to use when its input is not under the programmer's
control. This has caused some to question whether it should
appear in the Standard at all. The Committee decided that gets
was useful and convenient in those special circumstances when the
programmer does have adequate control over the input, and as
longstanding existing practice, it needed a standard
specification. In general, however, the preferred function is
fgets (see 7.19.7.2).

Personally, I think the Committee blew it on this one. I've never
heard of a real-world case where a program's input is under
sufficiently tight control that gets() can be used safely. On the
other hand, I have seen numerous calls to gets() in programs that are
expected to receive interactive input. As far as I know, the
"longstanding existing practice" cited in the Rationale is the
*unsafe* use of gets(), not the hypothetical safe use.

In the unlikely event that I were implementing a system where I had
that kind of control over a program's stdin, I'd still use fgets() so
I could do some error checking, at least in the context of unit
testing. Even in such a scenario, I'd be far more likely to read from
a source other than stdin, where gets() can't be used anyway.

I just found 13 calls to gets() in the source code for a large
software package implemented in C (which I prefer not to identify).
They were all in small test programs, not in production code, and they
all used buffers large enough that an interactive user is not likely
to overflow it -- but that's no excuse for writing unsafe code.

I'd be interested in seeing any real-world counterexamples.

I think this was one of the reasons in the very early days of some of
the more security conscious computer networks that could be externally
accessed. They would sent as stream of several Kbytes of characters
back at anyone who did not get the password right at the first attempt.
Thus over flowing any buffers.

Back in the days when I had a better power to weight ratio and a 1200
full duplex modem was FAST I found a few like that. No it was not parity
errors or wrong baud rate. They looked different. In fact it was a
different world back then.

Anonymous 7843 · Sep 1, 2005

Assuming such a change is made in the next version of the standard, or
widely implemented as an extension, code that uses the new safe gets()
will inevitably be recompiled on implementations that provide the old
unsafe version.

Making a change in a new C standard is supposed to fix
implementations adhering to old standards? That's a mighty
high wall to climb, for any proposed change.

Aside from that, if the "new code" used GETSMAX or
setgetsbuflen() it would actually fail to compile on an old
implementation.

gets() must go. This is non-negotiable.

The situation isn't quite the same. noalias was a new
proposal with no existing code in jeopardy. gets() is
used widely in quick-n-dirty contexts like unit tests.

Ideally, gets() *would* go (see, I secretly agree with you,
please don't tell anyone) and there would a replacement
that comes to a nice compromise between the simplicity of
using gets() and the lets-not-incite-undefined-behavior
aspect of fgets().

Something like getstr(char *, size_t) with the truncation
of long lines.

Keith Thompson · Sep 1, 2005

Making a change in a new C standard is supposed to fix
implementations adhering to old standards? That's a mighty
high wall to climb, for any proposed change.

No, of course a change in a new standard won't fix old
implementations. That was my point.

Aside from that, if the "new code" used GETSMAX or
setgetsbuflen() it would actually fail to compile on an old
implementation.

Sure, but it would make it more difficult to detect code that uses
gets() incorrectly. Given the current standard, that's basically any
code that uses gets().

The situation isn't quite the same. noalias was a new
proposal with no existing code in jeopardy. gets() is
used widely in quick-n-dirty contexts like unit tests.

Yes, and it shouldn't be.

Ideally, gets() *would* go (see, I secretly agree with you,
please don't tell anyone) and there would a replacement
that comes to a nice compromise between the simplicity of
using gets() and the lets-not-incite-undefined-behavior
aspect of fgets().

Something like getstr(char *, size_t) with the truncation
of long lines.

You're proposing a new variant of fgets() that doesn't specify the
input file (and therefore always uses stdin), and that strips the
trailing '\n'. I would have no objection to that. But with or
without this new function, gets() should not be used, and ideally
should not be standardized or implemented.

Randy Howard · Sep 1, 2005

Anonymous 7843 wrote

(in article said:
Making a change in a new C standard is supposed to fix
implementations adhering to old standards? That's a mighty
high wall to climb, for any proposed change.

Yes. A far better use of spam would be to send out notices to
everyone with an email account on the dangers of gets() instead
of trying to convince them to order prescription medicine
online.

Aside from that, if the "new code" used GETSMAX or
setgetsbuflen() it would actually fail to compile on an old
implementation.

There are so many better alternatives available, I see no reason
to reuse the same name for different behavior. It's not like
they are running out of names for functions. Since str-whatever
is reserved already, strget might make a nice solution, and an
implementation similar to what various folks have proposed in
the past, such as Heathfield's 'fgetline' (IIRC), or ggets()
from CBF, etc. There certainly wouldn't be any harm in /adding/
a new replacement that can be used safely, and deprecating
gets() entirely. Reusing gets() would just confuse even more
people. The world does not need more confused newbies, they are
already in abundant supply and replicate faster than they
disappear.

The situation isn't quite the same. noalias was a new
proposal with no existing code in jeopardy. gets() is
used widely in quick-n-dirty contexts like unit tests.

I can't think of a single example of gets() being used in a
piece of code worth worrying about. If it is used widely in
quick-n-dirty contexts, then it isn't a problem. It's trivial
to fix 'quick-n-dirty programs if you get bitten by it.

The bigger packages using it, /need/ to break as early as
possible, before they spread into broader use and cause more
problems.

Something like getstr(char *, size_t) with the truncation
of long lines.

There are lots of options, and that may be part of the problem.
Too many choices. Fortunately, all of them are better than the
currently standardized gets().

Randy Howard · Sep 1, 2005

Wojtek Lerch wrote

(in article said:
Sure. Whatever. I don't think a lot of programmers learn C from the
Standard or the Rationale anyway.

Unfortunately, some of them don't listen to anything not nailed
down though. The typical freshly-minted know-it-all response is
"Who are you to tell me not to use it? The ISO C standards body
put it in there for a reason. If it was bad, it wouldn't be in
an international standard. duh."

It should be the job of teachers and handbooks to make sure that
beginners realize that it's not a good idea to use gets(), or to
divide by zero, or to cause integer overflow.

You'd think so. Judging by the number of college students today
that ask questions about basic problems with floating point
error propagation and avoidance, I am not enthusiastic about the
odds. That was a freshman year course back in the day, since we
weren't going to school to learn how to specify em units in a
style sheet, we were supposed to be learning about using
computers for something useful, like solving engineering
problems.

On the other hand, I don't think it would be unreasonable for the
Standard to officially declare gets() as obsolescent in the "Future
library directions" chapter.

If ISO expects anyone to take C0x seriously, then they have to
do something about this sort of thing, including gets() and
perhaps some strong words at least about some of the other
string function suspects as well. If gets() stays in unadorned,
it'll be pathetic.

Randy Howard · Sep 1, 2005

Keith Thompson wrote

(in article said:
Which says, in 7.19.7.7:

Because gets does not check for buffer overrun, it is generally
unsafe to use when its input is not under the programmer's
control. This has caused some to question whether it should
appear in the Standard at all. The Committee decided that gets
was useful and convenient in those special circumstances when the
programmer does have adequate control over the input, and as
longstanding existing practice, it needed a standard
specification. In general, however, the preferred function is
fgets (see 7.19.7.2).

Personally, I think the Committee blew it on this one.

I think that is an almost universal opinion, apart from those
that were sitting on it at the time. They're outnumbered about
10000:1 from what I can tell. Every time a buffer overrun gets
reported, or another "Shellcoder's Handbook" bets published, the
odds get even worse.

Having a few people sitting around doing the three-monkeys trick
doesn't change it.

As far as I know, the
"longstanding existing practice" cited in the Rationale is the
*unsafe* use of gets(), not the hypothetical safe use.

Exactamundo. I'd love to see a single example of a widely used
program implemented with gets() that can be demonstrated as
safe, due to the programmer having 'adequate control'. Can
anyone point to one that is in the wild?

I just found 13 calls to gets() in the source code for a large
software package implemented in C (which I prefer not to identify).

I wonder what it would take to get SourceForge to scan every
line of C or C++ source looking for it and putting out a 'bad
apples' list on their home page. A basic service for the good
of humanity. Embargoing downloads for projects until they are
expunged would be even better. Time to wake up...

They were all in small test programs, not in production code, and they
all used buffers large enough that an interactive user is not likely
to overflow it -- but that's no excuse for writing unsafe code.

One particular email package that is widely claimed to be
extremely secure and well-written includes a dozen or more
instances of void main() in it's various components. The author
couldn't care less, despite having been made aware of it.
That's a lesser evil in the grand scheme of things, but when
people refuse to change things, even when they know they are
wrong, you know it isn't going to be easy.

Wojtek Lerch · Sep 1, 2005

Randy Howard said:
Wojtek Lerch wrote

Unfortunately, some of them don't listen to anything not nailed
down though. The typical freshly-minted know-it-all response is
"Who are you to tell me not to use it? The ISO C standards body
put it in there for a reason. If it was bad, it wouldn't be in
an international standard. duh."

Well, *then* you can point them to the Rationale and explain what it means
by "generally unsafe". You could even try to explain why it was
standardized even though it was known to be unsafe, and why a lot of people
disagree with that decision. A good teacher can take advantage of this kind
of stuff.

Anyway, think of all the unsafe things they'll have to learn not to do
before they become competent programmers. Pretty much all of them are more
difficult to avoid than gets().

You'd think so.

I haven't said anything about how well I think they're doing their job. I'm
sure there are a lot of bad teachers and bad handbooks around. But I doubt
banning gets() would make it significantly easier for their victims to
become competent programmers.

Dennis Ritchie · Sep 1, 2005

About my attitude to gets(), this was dredged from google.
Conversations repeat; there are about 78 things in
this "The fate of gets" thread.

Dennis Ritchie Nov 9 1999, 8:00

Newsgroups: comp.std.c
> From: Dennis Ritchie <[email protected]> Date: 1999/11/09
Subject: Re: The fate of gets

On the other hand, we removed it from our library about a week
after the Internet worm. Of course, some couldn't afford
to do that.

Dennis

Richard Bos · Sep 1, 2005

Randy Howard said:
Wojtek Lerch wrote

Unfortunately, some of them don't listen to anything not nailed
down though. The typical freshly-minted know-it-all response is
"Who are you to tell me not to use it? The ISO C standards body
put it in there for a reason. If it was bad, it wouldn't be in
an international standard. duh."

I think you over-estimate the average VB-level programmer. s/ISO C
standards body/Great and Sacred Microsoft/ and s/an international
standard/MSVC++###/ would be more realistic.

Richard

Why doesn't the function get called?	1	Nov 20, 2023
Why struct not globally changed in function?	1	Aug 22, 2023
C Script Prematurely Terminating	3	Feb 7, 2022
Floor() doesn't exist ??	5	May 12, 2007
repeated calls to strrchr... to find second to last occurence	10	Jun 24, 2004
const or constant?	30	Apr 22, 2014
Image upload not working in browser	4	Sep 9, 2022
School Personal project	0	Aug 15, 2022

Why doesn't strrstr() exist?

Lawrence Kirby

Bart van Ingen Schenau

Tim Rentsch

Randy Howard

Wojtek Lerch

Keith Thompson

Keith Thompson

Randy Howard

Anonymous 7843

Keith Thompson

Wojtek Lerch

Chris Hills

Anonymous 7843

Keith Thompson

Randy Howard

Randy Howard

Randy Howard

Wojtek Lerch

Dennis Ritchie

Richard Bos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads