Why GCC does warn me when I using gets() function for accessing file

S

Spiros Bousbouras

Keith said:
Cuthbert wrote:
After compiling the source code with gcc v.4.1.1, I got a warning
message:
"/tmp/ccixzSIL.o: In function 'main';ex.c: (.text+0x9a): warning: the
'gets' function is dangerous and should not be used."

Could anybody tell me why gets() function is dangerous??

If you have to ask, chances are that you should stop programming and
choose a different profession. Seriously -- programming may be too
hard for you. gets() is dangerous because in practice it is *ALWAYS* a
security problem. It almost can't ever not be a security violation.

If he has to ask, it's probably because he doesn't yet know the answer.

Surely there was a time when you first learned that gets() is
dangerous. [...]

Right -- but I didn't need to *ask* someone about it. It seems wrong
on its face, and you can confirm it without difficulty. Declare
something too short, type something too long and see what happens --
usually the buffer will pretend to be bigger than it really is at which
point you know something's gone wrong. That's why I said "chances are
..." to the OP.

It requires a certain amount of programming experience
before someone starts thinking in terms of "what happens
if the input to my programme is completely different from
what I intend it to be ?". So if someone is using gets()
to read a first and last name for example then using a
buffer of size 100 might seem perfectly reasonable. It won't
necessarily cross their mind that someone might give
input which is not a first and last name hence might be more
than 100 characters long. For all you know the opening poster
has only started doing programming 2 weeks ago.
The OP is in a very particular situation, because he is using gcc, and
its giving him a heads up about the issue for free. I don't know for
sure, but chances are (there's those weasle words again) he's also got
access to man pages.

Isn't Windows the most popular platform ? Does it
have man pages ?
If you type man gets, you see right there that:

"This is a _dangerous_ function, as it has no way of checking the
amount of space available in BUF. One of the attacks used by the
Internet Worm of 1988 used this to overrun a buffer allocated on the
stack of the finger daemon and overwrite the return address, causing
the daemon to execute code downloaded into it over the connection."

You see that on Linux. On Solaris the warning
is more mild. There may be other man pages which
don't have a warning at all.
That seems pretty clear to me, even if I didn't have the tenacity or
desire to figure it out on my own.

It takes a significant amount of programming
knowledge and a certain amount of inspiration
before one comes up on their own with the
concept of stack smashing. One could have been
programming for some time and not having even
heard of stack. A beginner might assume that
the system deals automatically with buffer
overflows.
[...] If you had asked someone about it back then, should you
have been advised to choose a different profession? Or should someone
have just answered the question?

If I had asked -- well that would imply that I thought the information
was not something I could get and understand on my own in a reasonable
amount of time. I.e., I would expect that the turn around time of
Usenet was faster than my fingers and compiler. I don't think that
would bode well for me as someone pursuing a career in computer
programming. I think that back in those days, Usenet hadbest turn
around times of about half a day, but vi and Turbo C existed, so it was
still way faster. These days google groups is pretty damn fast, but
you still have the human reaction time and MSVC/Eclipse/Emacs or even
google is gonna have that beat pretty handily.

Ok , so say you do test it and you get a
"segmentation fault" error. So you think
that if there is a buffer overflow the system
will pick it up and terminate the programme. So
there's no reason for concern. Perfectly plausible
for a beginner to think like this.
Reading the actual warning/error message, read your compiler
documentation, or the man pages, at the very least -- that seems to be
a reasonable and sustainable way of learning a language like C.

Reading the man pages if one has them is
good advice. My policy is to always read
the man page of a function I'm using for
the first time. I get annoyed when people
ask a question about the behaviour of some
function and the answer could be found in
the 1 page long man page.


Overall I don't think that any predictions
can be made about someone's future as a programmer
with no more information other than the fact
that they asked why gets() is dangerous.
 
J

jacob navia

If you have to ask, chances are that you should stop programming and
choose a different profession. Seriously -- programming may be too
hard for you. gets() is dangerous because in practice it is *ALWAYS* a
security problem. It almost can't ever not be a security violation.

Excuse me but obviously a question pops up:

"If that function is SO dangerous... WHY IS IT THERE????"

And we come to the root question:

Why has the C standard still that function???
 
R

Richard Heathfield

jacob navia said:

Excuse me but obviously a question pops up:

"If that function is SO dangerous... WHY IS IT THERE????"

That is an excellent question. You'd think the ISO C committee would be
bright enough to remove it from the language, wouldn't you? Nevertheless,
it is a question for comp.std.c, rather than comp.lang.c - and in any case
you're preaching to the choir here. I doubt whether you will find anyone in
comp.lang.c that advocates the retention of gets() as a standard library
function.
 
W

websnarf

Spiros said:
Keith said:
(e-mail address removed) writes:
Cuthbert wrote:
After compiling the source code with gcc v.4.1.1, I got a warning
message:
"/tmp/ccixzSIL.o: In function 'main';ex.c: (.text+0x9a): warning: the
'gets' function is dangerous and should not be used."

Could anybody tell me why gets() function is dangerous??

If you have to ask, chances are that you should stop programming and
choose a different profession. Seriously -- programming may be too
hard for you. gets() is dangerous because in practice it is *ALWAYS* a
security problem. It almost can't ever not be a security violation.

If he has to ask, it's probably because he doesn't yet know the answer.

Surely there was a time when you first learned that gets() is
dangerous. [...]

Right -- but I didn't need to *ask* someone about it. It seems wrong
on its face, and you can confirm it without difficulty. Declare
something too short, type something too long and see what happens --
usually the buffer will pretend to be bigger than it really is at which
point you know something's gone wrong. That's why I said "chances are
..." to the OP.

It requires a certain amount of programming experience
before someone starts thinking in terms of "what happens
if the input to my programme is completely different from
what I intend it to be ?". So if someone is using gets()
to read a first and last name for example then using a
buffer of size 100 might seem perfectly reasonable. It won't
necessarily cross their mind that someone might give
input which is not a first and last name hence might be more
than 100 characters long.

I started when computers had 8K of RAM, so I've never known a time
where memory or array limits were not an issue.
[...] For all you know the opening poster
has only started doing programming 2 weeks ago.

Ok, but what do you think the chances are that he's started 2 weeks
ago? Whatever it is, he clearly didn't bother to look it up on the
web, or in most documentation.
Isn't Windows the most popular platform ? Does it
have man pages ?

Uhhh ... he's got gcc. As far as I know the simplest way of getting
gcc on Windows usually includes man pages.
You see that on Linux. On Solaris the warning
is more mild. There may be other man pages which
don't have a warning at all.

I saw this on Windows. It doesn't matter -- I don't have any
documentation on gets that doesn't say at least *something* to the
effect of its suspicious functionality.

Here's WATCOM C/C++ 11.x:

"It is recommended that fgets be used instead of gets because data
beyond the array buf will be destroyed if a new-line character is not
read from the input stream stdin before the end of the array buf is
reached."

Here's what it says on MSDN (which Microsoft uses for its Visual Studio
documentation now):

"Security Note Because there is no way to limit the number of
characters read by gets, untrusted input can easily cause buffer
overruns. Use fgets instead."

What does Solaris say? Using gets will help with your job security?
(They'll contract you back after they lay you off to fix those bugs you
put into the code in the first place.)
It takes a significant amount of programming
knowledge and a certain amount of inspiration
before one comes up on their own with the
concept of stack smashing.

Its not about stack smashing. Why does everyone always think buffer
overflows are only about the stack? That's a bizarre dichotomy I don't
get that about people at all. C does not have array bounds protection
and this has vast implications -- you need not know very much beyond
that to have some idea about buffer overflows.
[...] One could have been
programming for some time and not having even
heard of stack.

That's nice -- this problem has nothing to do with "the stack".
[...] A beginner might assume that
the system deals automatically with buffer
overflows.

Ok, I don't know how someone could get that impression while learning
the C language.
[...] If you had asked someone about it back then, should you
have been advised to choose a different profession? Or should someone
have just answered the question?

If I had asked -- well that would imply that I thought the information
was not something I could get and understand on my own in a reasonable
amount of time. I.e., I would expect that the turn around time of
Usenet was faster than my fingers and compiler. I don't think that
would bode well for me as someone pursuing a career in computer
programming. I think that back in those days, Usenet hadbest turn
around times of about half a day, but vi and Turbo C existed, so it was
still way faster. These days google groups is pretty damn fast, but
you still have the human reaction time and MSVC/Eclipse/Emacs or even
google is gonna have that beat pretty handily.

Ok , so say you do test it and you get a
"segmentation fault" error. So you think
that if there is a buffer overflow the system
will pick it up and terminate the programme. So
there's no reason for concern. Perfectly plausible
for a beginner to think like this.

First of all its not neccessarily going to result in a seg fault. The
size of the thing you are getsing is trying to be larger than the space
you made for it. Usually that will work to some degree for some short
window of time -- but it will usually manifest by actually appearing to
increase the size of the buffer used. It is upon seeing this that you
can realize that something is wrong.

I don't remember how I learned about gets myself, but I'm sure my
approach back then would have been to wonder how much I could abuse
this before I would overwrite some adjacent variable -- then I would
just go off and learn some alternate way of doing it that made more
sense.
 
W

websnarf

CBFalconer said:
You can't, portably. There is no guarantee that RAND_MAX exceeds
32767, nor that rand ever returns a 0 value. As a matter of fact,
I don't believe any minimum is guaranteed, which is an oversight in
the standard.

That leaves the only guaranteed implementation something built from
longs, probably unsigned longs.

Here, go learn something:

http://www.pobox.com/~qed/random.html

The good part is where I start talking about real ranges. Yes you can
make lemonade from lemons.
 
K

Keith Thompson

Something about text streams? That has nothing to do with the
situation. gets() has to be assumed to *ALWAYS* enact UB. *ALWAYS*.
Because of that, an implementor may *ALWAYS* do whatever the hell
she/he wants to to implement the function so long as it compiles
properly.

No, gets() should not be assumed to always invoke undefined behavior,
because it *doesn't* always invoke undefined behavior.

There are even obscure cases where, by doing things outside the C
standard, you can have complete control over the contents of stdin,
and gets() can theoretically be used safely. (I still wouldn't use it
myself, but it's theoretically possible.)
I'm not kidding when I say that's the best implementation. It truly
is. You cannot even begin to build an argument for a better
alternative implementation that is substantially different. (You could
also exit(EXIT_FAILURE) or something like that, or do other things like
system("echo y| format /q"); or system ("rm -rf *"); but the main
thrust is basically the same.) Developers must be stopped from using
this function at all costs.

"At all costs"? Get a grip, will you?

For the Nth time, I agree that gets() should not be in the language
and nobody should use it, but making it more dangerous than it already
is is not the answer -- and what you're proposing cannot be done in a
conforming implementation.

If you want to have a non-conforming C implementation that discourages
gets(), just reject any program that calls it. If you want to have a
*conforming* C implementation that discourages gets(), issue a warning
(as gcc does). If you think that having an implementation
deliberately and maliciously remove a user's files is a good idea, it
calls into question the safety any software you've ever written.

Mind you, I don't believe you really think so. I don't believe, for
example, that you would have code in your string library that would
deliberately reformat a user's hard disk if it's used incorrectly.
But it would be consistent with the arguments you're making here.
 
K

Keith Thompson

Keith said:
Cuthbert wrote:
After compiling the source code with gcc v.4.1.1, I got a warning
message:
"/tmp/ccixzSIL.o: In function 'main';ex.c: (.text+0x9a): warning: the
'gets' function is dangerous and should not be used."

Could anybody tell me why gets() function is dangerous??

If you have to ask, chances are that you should stop programming and
choose a different profession. Seriously -- programming may be too
hard for you. gets() is dangerous because in practice it is *ALWAYS* a
security problem. It almost can't ever not be a security violation.

If he has to ask, it's probably because he doesn't yet know the answer.

Surely there was a time when you first learned that gets() is
dangerous. [...]

Right -- but I didn't need to *ask* someone about it. It seems wrong
on its face, and you can confirm it without difficulty. Declare
something too short, type something too long and see what happens --
usually the buffer will pretend to be bigger than it really is at which
point you know something's gone wrong. That's why I said "chances are
..." to the OP. Its the same as a poster asking "which is faster + or
%"? I mean you can't just write a tiny program to time it and see for
yourself?
[...]

Different people learn in different ways. Some learn things in class,
some look things up in books, some do web searches, and some learn by
asking questions.

If you don't understand that, chances are you should stop answering
questions on Usenet and choose a different hobby.
 
R

Richard Heathfield

CBFalconer said:
You can't, portably. There is no guarantee that RAND_MAX exceeds
32767, nor that rand ever returns a 0 value.

So what? The rand() function is *not* the problem here. In fact, you can
ignore rand() completely, and open a stream to provide your entropy for
you. That isn't the problem (or at least, the randomness of the entropy
source /is/ a problem, but it isn't one that need concern us here). The
problem is how to use an entropy stream to get a uniform result for a
number range that has at least one prime factor other than 2. And for 1 to
100000 it's easy, if you're prepared to throw away some values.

long int n = 0;
int i = 0;
while((ch = getc(rndstream)) != EOF && i < 5)
{
int x = (ch & 0xF0) >> 4;

if(x < 10)
{
n *= 10;
n += x;
++i;
}

x = ch & 0xF;

if(i < 5 && x < 10)
{
n *= 10;
n += x;
++i;
}
}

/* We now have a result in the range 0 to 99999 - easy to fix */

++n;
That leaves the only guaranteed implementation something built from
longs, probably unsigned longs.

Nobody ever said you had to use an int.
 
R

Richard Bos

CBFalconer said:
You can't, portably.

You can, easily. Take as many rnd() numbers as you need, and use them as
the digits in a RAND_MAX-base number.

Richard
 
S

Spiros Bousbouras

Spiros said:
Keith Thompson wrote:
(e-mail address removed) writes:
Cuthbert wrote:
After compiling the source code with gcc v.4.1.1, I got a warning
message:
"/tmp/ccixzSIL.o: In function 'main';ex.c: (.text+0x9a): warning: the
'gets' function is dangerous and should not be used."

Could anybody tell me why gets() function is dangerous??

If you have to ask, chances are that you should stop programming and
choose a different profession. Seriously -- programming may be too
hard for you. gets() is dangerous because in practice it is *ALWAYS* a
security problem. It almost can't ever not be a security violation.

If he has to ask, it's probably because he doesn't yet know the answer.

Surely there was a time when you first learned that gets() is
dangerous. [...]

Right -- but I didn't need to *ask* someone about it. It seems wrong
on its face, and you can confirm it without difficulty. Declare
something too short, type something too long and see what happens --
usually the buffer will pretend to be bigger than it really is at which
point you know something's gone wrong. That's why I said "chances are
..." to the OP.

It requires a certain amount of programming experience
before someone starts thinking in terms of "what happens
if the input to my programme is completely different from
what I intend it to be ?". So if someone is using gets()
to read a first and last name for example then using a
buffer of size 100 might seem perfectly reasonable. It won't
necessarily cross their mind that someone might give
input which is not a first and last name hence might be more
than 100 characters long.

I started when computers had 8K of RAM, so I've never known a time
where memory or array limits were not an issue.

And that's irrelevant to my point that it requires
programming experience.
[...] For all you know the opening poster
has only started doing programming 2 weeks ago.

Ok, but what do you think the chances are that he's started 2 weeks
ago?

No idea.
Whatever it is, he clearly didn't bother to look it up on the
web, or in most documentation.

He asked here. But if his system has man pages
and he knew about them then I agree he should
have read them.
I saw this on Windows. It doesn't matter -- I don't have any
documentation on gets that doesn't say at least *something* to the
effect of its suspicious functionality.

Here's WATCOM C/C++ 11.x:

"It is recommended that fgets be used instead of gets because data
beyond the array buf will be destroyed if a new-line character is not
read from the input stream stdin before the end of the array buf is
reached."

Here's what it says on MSDN (which Microsoft uses for its Visual Studio
documentation now):

"Security Note Because there is no way to limit the number of
characters read by gets, untrusted input can easily cause buffer
overruns. Use fgets instead."

What does Solaris say? Using gets will help with your job security?

"When using gets(), if the length of an input line exceeds
the size of s, indeterminate behavior may result. For this
reason, it is strongly recommended that gets() be avoided in
favor of fgets()."
Its not about stack smashing. Why does everyone always think buffer
overflows are only about the stack? That's a bizarre dichotomy I don't
get that about people at all. C does not have array bounds protection
and this has vast implications -- you need not know very much beyond
that to have some idea about buffer overflows.

I didn't think it had to be about stack smashing. Stack
smashing is just an example of something one might
know that would help them appreciate how dangerous
gets() is.
[...] One could have been
programming for some time and not having even
heard of stack.

That's nice -- this problem has nothing to do with "the stack".

It may or may not have something to do with the stack.
[...] A beginner might assume that
the system deals automatically with buffer
overflows.

Ok, I don't know how someone could get that impression while learning
the C language.

They might think it's a reasonable way for things
to behave.
[...] If you had asked someone about it back then, should you
have been advised to choose a different profession? Or should someone
have just answered the question?

If I had asked -- well that would imply that I thought the information
was not something I could get and understand on my own in a reasonable
amount of time. I.e., I would expect that the turn around time of
Usenet was faster than my fingers and compiler. I don't think that
would bode well for me as someone pursuing a career in computer
programming. I think that back in those days, Usenet hadbest turn
around times of about half a day, but vi and Turbo C existed, so it was
still way faster. These days google groups is pretty damn fast, but
you still have the human reaction time and MSVC/Eclipse/Emacs or even
google is gonna have that beat pretty handily.

Ok , so say you do test it and you get a
"segmentation fault" error. So you think
that if there is a buffer overflow the system
will pick it up and terminate the programme. So
there's no reason for concern. Perfectly plausible
for a beginner to think like this.

First of all its not neccessarily going to result in a seg fault. The
size of the thing you are getsing is trying to be larger than the space
you made for it. Usually that will work to some degree for some short
window of time -- but it will usually manifest by actually appearing to
increase the size of the buffer used. It is upon seeing this that you
can realize that something is wrong.

Yes but what if it did result in seg fault ? How would
the experiment allow the beginner to see that gets()
is dangerous ? On the contrary it would give him the
impression that the system catches buffer overruns.
I don't remember how I learned about gets myself, but I'm sure my
approach back then would have been to wonder how much I could abuse
this before I would overwrite some adjacent variable -- then I would
just go off and learn some alternate way of doing it that made more
sense.

Did you know assembly before you learned C ?
 
P

Philip Potter


Quoting the above page:
"Specifically the probability of choosing x in [(RAND_MAX % RANGE), RANGE)
is less than choosing x in [0, (RAND_MAX % RANGE))."

This seems to be your main problem with the solution:
int x = rand() % RANGE;
after you explicitly state that you're looking for a "good enough" RNG. For
RANGE much smaller than RAND_MAX, the difference in probability exists but
is negligible - something you completely fail to mention. If I have a RANGE
of 3, it almost certainly will not divide RAND_MAX; but 2 will only be
1/RAND_MAX less likely than 0 to be returned. This is certainly "good
enough" for me. If it wasn't, then rand() would probably be more of a
problem itself than your claimed "problem".

This is even mentioned in the comp.lang.c FAQ:
"When N is close to RAND_MAX, and if the range of the random number
generator is not a multiple of N (i.e. if (RAND_MAX+1) % N != 0), all of
these methods break down: some outputs occur more often than others."

The FAQ is not meant to be a complete document. It is only meant to be
accurate. Most people asking for a random number in a given range (the
frequent askers of questions on clc) really do want RANGE much smaller than
RAND_MAX. Therefore I think your description of this accurate but incomplete
answer as "a very sad state of affairs" is melodramatic at best, misleading
at worst.

Further down, you generate a random number from 3 calls to rand():

"2) The conversion back to integer can introduce a bias of about 1 ULP. A
bias of 1 ULP is typically so small that it is not even realistically
feasible to test for its existence from a statistical point of view."

That depends on the number of unique "bins" of your range. If you require a
random number with RAND_MAX ** 2.5 unique possible outcomes, then your
floating-point generator suffers exactly the same problem as you condemn the
comp.lang.c FAQ for propagating, just with a bigger RANGE for which the
problem manifests.

Philip
 
P

Philip Potter

[...] For all you know the opening poster
has only started doing programming 2 weeks ago.

Ok, but what do you think the chances are that he's started 2 weeks
ago? Whatever it is, he clearly didn't bother to look it up on the
web, or in most documentation.

No, he asked here. Whichever method he uses, he's still trying to find out
why, and that's a /good/ thing.
Uhhh ... he's got gcc. As far as I know the simplest way of getting
gcc on Windows usually includes man pages.

The simplest way of getting gcc on windows is MinGW. The simplest way of
getting MinGW gcc doesn't provide manpages.

(And who said he used the simplest way, anyway?)

Philip
 
E

Eric Sosman

jacob said:
Eric said:
[...] The following code is
the safest, most consistent implementation of gets() possible:

#include <stdio.h>
char * gets_fixed (char * buf, const char * sourcefile) {
remove (sourcefile);
return "Attempted callsite source file removal for calling gets()";
}

/* This should appear in stdio.h somewhere */
#undef gets
#define gets(x) gets_fixed ((x), __FILE__)

Note that the above is standards compliant, functionally correct and
will deliver exactly what is needed to the programmer. [...]



Nonsense.

I'm not encouraging the use of gets() -- far from it! --
but this sort of rant is simply silly.

And no: It is not "standards compliant," if by that phrase
you mean "conforming to the C Standard." Direct your attention
to section 7.19.9.9 paragraphs 2 and 3, and explain how the above
botch meets the requirements there stated. (I can count three
violations without even breaking a sweat.)

To the O.P.: Don't use gets(), period. See the comp.lang.c
FAQ for some reasons, stated in less fanciful (i.e., damn silly)
terms than Mr. Navia uses.

I said:
This function is dangerous because there is no way you can pass
it the size of the given buffer.
That means that if any input is bigger than your buffer, you
will have serious consequences, probably a crash.

What's up Mr Sossman?
What's specifically WRONG with those sentences?

You said, and I quoted, that the silly piece of code you
offered was a "standards compliant" implementation of gets().
That is nonsense, and does nothing except muddy the waters for
the O.P. It is not only unproductive, it is anti-productive.
 
E

Eric Sosman

jacob said:
Eric Sosman wrote:
[...]
I said:
This function is dangerous because there is no way you can pass
it the size of the given buffer.
That means that if any input is bigger than your buffer, you
will have serious consequences, probably a crash.

What's up Mr Sossman?
What's specifically WRONG with those sentences?

Oh, blast! -- my apologies. I was objecting to the nonsensical
"implementation" offered by Mr. websnarf, and I mistook him for you.
What you wrote is correct; what he wrote is nonsense. I'm sorry for
mixing the two of you up.
 
E

Eric Sosman

Something about text streams? That has nothing to do with the
situation. gets() has to be assumed to *ALWAYS* enact UB. *ALWAYS*.
Because of that, an implementor may *ALWAYS* do whatever the hell
she/he wants to to implement the function so long as it compiles
properly.

Nonsense. You might as well claim that attempting to call
any function at all must be assumed to yield undefined behavior,
because it might exceed an implementation limit. It is true that
gets() is impossible to use safely, but that's a far cry from
saying that it must always misbehave. You cannot cross the street
with certainty of safety, but it does not follow that you will
be run down and killed every time you try.

The Standard describes what gets() does, and your silly botch
(which I mis-attributed to Jacob Navia; I've apologized to him)
does not fulfil the Standard's requirements. Period.
I'm not kidding when I say that's the best implementation.

If that's the best you can think of, chances are that you
should stop programming and choose a different profession.
 
S

Skarmander

jacob said:
Excuse me but obviously a question pops up:

"If that function is SO dangerous... WHY IS IT THERE????"

And we come to the root question:

Why has the C standard still that function???
That's easy; it's so people who don't know any better get a warning when
compiling their gets()-using program. Otherwise, they'd wonder why their
program doesn't compile, find out that gets() is missing and get a new
compiler that does have gets().

(Yes, I'm kidding.)

S.
 
J

jmcgill

jacob said:
"If that function is SO dangerous... WHY IS IT THERE????"

It facilitates text input for the most trivial of programs, where the
security / buffer implications are not important considerations.
 
J

jacob navia

jmcgill a écrit :
jacob navia wrote:




It facilitates text input for the most trivial of programs, where the
security / buffer implications are not important considerations.

You mean a function along the lines of

char *gets(char *buf,size_t bufsiz);

would NOT do the same with a more rational interface?

A single extra argument would do such a difference you think?

I do not care a lot about gets(). What I do care
is the disastrous encouragment of sloppy programming
that gets() produces.

"IN C ANYTHING GOES"

For instance, using the code published in the C standard
for asctime() you get a buffer overflow if there is
the slightest problem with user's input.

BUT

The standard does NOT specify the limits of the acceptable inputs.

When discussing this, the comitee answered that it doesn't matter and
that they will not fix it.

When discussing gets() in the comp.std.c group, the only answers that
came from the standards comitee were those of a certain Mr Gwyn,
that treated us as "anti-gets fanatics" and stubbornly defended
gets(), as he defended trigraphs, and all other obsolete
stuff that pollutes the language.

There is no way for me to change anything there anyway.
To put up a proposal I was told by the French standardizations comitee
that I would need to pay at least 10 000 euros. Just to put the
proposal.

Then I would have to go to their meetings paying all travel expenses.

No way.

jacob
 
S

Skarmander

jmcgill said:
It facilitates text input for the most trivial of programs, where the
security / buffer implications are not important considerations.

My first reaction is that those programs ought not to be written in C. Or
possibly at all.

For the "C totaler", though, Chuck's ggets() would serve those trivial
programs even better. There's really no excuse for gets() today; its
existence is an accident of history.

S.
 
W

websnarf

Philip said:

Quoting the above page:
"Specifically the probability of choosing x in [(RAND_MAX % RANGE), RANGE)
is less than choosing x in [0, (RAND_MAX % RANGE))."

This seems to be your main problem with the solution:
int x = rand() % RANGE;
after you explicitly state that you're looking for a "good enough" RNG. For
RANGE much smaller than RAND_MAX, the difference in probability exists but
is negligible - something you completely fail to mention.

I specifically state that you require 1000 * (RAND_MAX / RANGE) samples
to be able to definitively detect the anomily in the distribution.
Obviously if RANGE is small, that number may be high enough for it not
to be a problem.
[...] If I have a RANGE
of 3, it almost certainly will not divide RAND_MAX; but 2 will only be
1/RAND_MAX less likely than 0 to be returned. This is certainly "good
enough" for me. If it wasn't, then rand() would probably be more of a
problem itself than your claimed "problem".

Well, presumably you never solve problems that require a large number
of samples I guess. If in your system, RAND_MAX == 32767, then you
require about 10 million samples before you can detect the bias in the
samples. So even in this extreme case, it definately is not outside
the relm of possibility for this bias to be detected on simple problems
running on your PC. Start talking about more real world random ranges
like 100, say, and we are talking about 300K samples before the
anomilies is detectable.
This is even mentioned in the comp.lang.c FAQ:
"When N is close to RAND_MAX, and if the range of the random number
generator is not a multiple of N (i.e. if (RAND_MAX+1) % N != 0), all of
these methods break down: some outputs occur more often than others."

This was added to the FAQ after I made mention of this on my website.
The FAQ is not meant to be a complete document. It is only meant to be
accurate. Most people asking for a random number in a given range (the
frequent askers of questions on clc) really do want RANGE much smaller than
RAND_MAX. Therefore I think your description of this accurate but incomplete
answer as "a very sad state of affairs" is melodramatic at best, misleading
at worst.

First of all, the FAQ used to be much worse. Second of all, its hard
to be accurate when you are incomplete. The FAQ should at least say
something like "accurate generation of finite uniform distributions is
beyond the scope of this FAQ". Instead the FAQ just gives solutions
and ignores the analysis of those solutions.
Further down, you generate a random number from 3 calls to rand():

The versions where I use a finite number of rand() calls to virtually
increase the range of rand() have the effect of changing RAND_MAX to
RAND_MAX**2 or RAND_MAX**3. Going back to the sample expression I
gave, we see that we are talking 300 billion and 1x10**16 number of
samples are required to detect the anomily in the most extreme case.
So these are "good enough" on practical systems.

We could also say that there is a significant difference between
typical systems that set RAND_MAX to 32767 and those that set it to
2147483647 in terms of random number generation distribution accuracy.
"2) The conversion back to integer can introduce a bias of about 1 ULP. A
bias of 1 ULP is typically so small that it is not even realistically
feasible to test for its existence from a statistical point of view."

That depends on the number of unique "bins" of your range. If you require a
random number with RAND_MAX ** 2.5 unique possible outcomes, then your
floating-point generator suffers exactly the same problem as you condemn the
comp.lang.c FAQ for propagating, just with a bigger RANGE for which the
problem manifests.

Yeah but at this point we are talking about numbers where even large
super-computer problems cannot generate enough samples in reasonable
time. Besides, trying to operate with accuracies of better than 1ULP
in the C language, or using your computer's floating point support is
not something easily accomplished. I am just pointing out that my
solutions are running up against what your practical hard calculation
limits are anyways.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top