why the usage of gets() is dangerous.

RoS · Nov 23, 2007

In data Thu, 22 Nov 2007 08:57:22 +0100, RoS scrisse:

if the compiler has a array of elements
arrayelement{char* where; size_t size}
that point to each memory object in the memory returned from malloc or
allocated in the stack
it is possible to write a routine that says if one address is allow to
read-write or not; this mean it is possible to write a safe gets()
in the sense: make gets() function not overflow the input array

int isok(char* a)
{arrayelement *p=&whereitis;
for(i=0; p.where!=0; ++i)
{if(a>p.where && a<p.where+p.size)
return 1;
}
return 0;
}

size_t sizefromhere(char* a)
{arrayelement *p=&whereitis;
for(i=0; p.where!=0; ++i)
{if(a>p.where && a<p.where+p.size)
return p.size-(a-p.where);
}
return 0;
}

so gets could be something like

yes gets if its argument is a stack memory should return always 0
(why waste cicles and memory space for automatic objects?)

if the argument is malloc heap memory (that in my case has the list of
avaiable memory and their size) could be somethign like below

char* gets(char* buf)
{char *p;
int c;
size_t limit, h;
p=buf;
limit=sizefromhere(buf);
if(limit==0) return 0;
h=0;
l0:;
if(h==limit)
{l1:;
p[h]=0;
return 0;
}
c=getchar();
if(c==EOF)
{if(ferror(stdin)) goto l1
p[h]=0;
return buf;
}
if(c=='\n')
{p[h]='\n'; /* limit-1 */
p[h+1]=0; /* limit */
return buf;
}
p[h]=c; ++h; goto l0;
}

Click to expand...

James Kuyper · Nov 23, 2007

CBFalconer wrote:
....

I maintain that 'fat' pointers are incompatible with C in the first
place. Remember that pointers can point to structures, arrays,
single objects, etc. Other pointers can be created by performing
arithmetic on pointers (effectively indexing). The purpose of
'fat' pointers is to allow complete range checking at any time
(correct me if this is wrong). It only takes one example of
failure to prevent it all.

Let us assume we have "int foo[10];" declared. Now this is an
array, and entirely usable within its scope. However, to pass it
elsewhere, we have to convert it to "int *foobar" and lose the 10
dimension. What happens if we try to retain that dimension? Well,
maybe we need to pass out a single entity in the foo array, by
"foobar[2]". There is no proscription against the destined routine
regaining the original pointer via "foobarptr - 2". Yet the passed
foobar necessarily specified a valid array length of 1, allowing
only indexing by 0. -2 seems not to fit!

Could you convert that description into C code. There's no step in that
process which clearly sounds like it should be a problem for
range-checked pointers, but that's partly because I'm not entirely clear
about what you're saying.

Dik T. Winter · Nov 23, 2007

> CBFalconer wrote, On 23/11/07 03:15: ....
>
> I disagree. They are not easy to implement, but there are ways around
> all of the problems if you work hard enough at it.

I think a way would be to have a fat pointer consist of three parts.
The first part would be the traditional pointer, the second part a
pointer to the first element of the memory block, and the third part
the size of the memory block.

CBFalconer · Nov 23, 2007

santosh said:
.... snip ...

I am not knowledgeable enough with C to say whether fat pointers
break it's rules sufficiently severely to rule out their inclusion,
but from what I know, I can't see how it would be non-permissible.

Obviously it would require a lot of behind the screens compiler
magic, and is likely to severely degrade performance, but it ought
to be, from what I know possible. Of course I'm likely to be
proved wrong in a few minutes by an expert here.

I don't think you need too much experience to note the troubles.
Just try creating some sub-system compilers, that handle all sorts
of pointers. I think you will rapidly find that the backup
information required makes things impossible very rapidly. Build
your own structures for keeping track of pointers, and remember to
update them and the handling code whenever you consider another
input version.

The basic problem is that a C pointer is not restricted.

CBFalconer · Nov 23, 2007

Flash said:
CBFalconer wrote, On 23/11/07 03:15:
.... snip ...

Let us assume we have "int foo[10];" declared. Now this is an
array, and entirely usable within its scope. However, to pass it
elsewhere, we have to convert it to "int *foobar" and lose the 10
dimension. What happens if we try to retain that dimension? Well,
maybe we need to pass out a single entity in the foo array, by
"foobar[2]". There is no proscription against the destined routine
regaining the original pointer via "foobarptr - 2". Yet the passed
foobar necessarily specified a valid array length of 1, allowing
only indexing by 0. -2 seems not to fit!

Click to expand...

That one is easy to deal with. The fat pointer includes the
information that it points to the second element of a 10 element
array, then it knows that you can go back 2 elements.

For _that_ pointer. However, a routine can be called from various
places, with various original objects, and each has to pass all the
data. Then dynamic code has to decode it. I maintain this is not
practical.

A sweeping statement like that does nothing for your argument.

I disagree. The first requirement is some sweeping statements.
Further investigation may prove or disprove them. But it only
takes one impossibility to make the whole attack impracticable.

CBFalconer · Nov 23, 2007

James said:
CBFalconer wrote:
.... snip ...

Let us assume we have "int foo[10];" declared. Now this is an
array, and entirely usable within its scope. However, to pass it
elsewhere, we have to convert it to "int *foobar" and lose the 10
dimension. What happens if we try to retain that dimension? Well,
maybe we need to pass out a single entity in the foo array, by
"foobar[2]". There is no proscription against the destined routine
regaining the original pointer via "foobarptr - 2". Yet the passed
foobar necessarily specified a valid array length of 1, allowing
only indexing by 0. -2 seems not to fit!

Click to expand...

Could you convert that description into C code. There's no step in
that process which clearly sounds like it should be a problem for
range-checked pointers, but that's partly because I'm not entirely
clear about what you're saying.

Just consider the complications. At present a C pointer is a
simple thing. It rarely needs more than an integer displacement,
but sometimes includes a segment identifier. The fact that a zero
sized object is illegal is very handy. But you can pass pointers
one past the highest item. This is a pointer to a zero sized
object, but has nothing to do with the pointer typing. You can
also pass pointers to an item past the base of the array, or within
a structure. Now negative indices range their heads. There are
lots of rules, and everywhere I look I see added complications. I
have not tried to work them out, but have simply concluded that the
whole operation is impossible. I may be wrong, but I doubt it.

Note that this is for C pointers. Other languages have better
pointer restrictions, and it is all quite feasible.

CBFalconer · Nov 23, 2007

Chris said:
.... snip ...

In general, the simplest method is to have all "fat pointers"
represented as a triple: <currentvalue, base, limit>. A pointer
is "valid" if its current-value is at least as big as its base
and no bigger than its limit:

Just picking on one thing, based on it has to work everywhere, or
it is useless. If a pointer is malloced, and then copied while the
original is realloced, it may be totally invalid. The above won't
pick this up.

(If every use of a pointer is accompanied by a grand sweep of
possibilities, things may be possible. However the resultant code
would be totally useless and spend all its time sweeping.)

I have made my share of mistakes in code-generation, and I want
things simple and consistent.

Flash Gordon · Nov 24, 2007

CBFalconer wrote, On 23/11/07 22:32:

Flash said:
Flash said:

CBFalconer wrote, On 23/11/07 03:15:
... snip ...

Let us assume we have "int foo[10];" declared. Now this is an
array, and entirely usable within its scope. However, to pass it
elsewhere, we have to convert it to "int *foobar" and lose the 10
dimension. What happens if we try to retain that dimension? Well,
maybe we need to pass out a single entity in the foo array, by
"foobar[2]". There is no proscription against the destined routine
regaining the original pointer via "foobarptr - 2". Yet the passed
foobar necessarily specified a valid array length of 1, allowing
only indexing by 0. -2 seems not to fit!

Click to expand...

That one is easy to deal with. The fat pointer includes the
information that it points to the second element of a 10 element
array, then it knows that you can go back 2 elements.

Click to expand...

For _that_ pointer.

No, for *any* pointer. The definition of fat pointers is that they
contain extra information, and that is the extra information required by
a fat pointer to achieve this purpose.

However, a routine can be called from various
places, with various original objects, and each has to pass all the
data.

Yes, that is what happens with fat pointer. You are passing around more
than just the address.

Then dynamic code has to decode it.

Unless you find a processor that can do it for you. With more concern
about security processors might start doing this.

I maintain this is not
practical.

That is a different issue. I would say that a properly implementation
that does this (which is not easy to do because there are issues) is
very useful for exactly the same reason that valgrind is useful. It
allows developers to detect errors.

I disagree. The first requirement is some sweeping statements.
Further investigation may prove or disprove them. But it only
takes one impossibility to make the whole attack impracticable.

So provide such an impossibility then you will have proved your position.

Any number of examples we give of things which will work does not prove
everything will work. You only have to give one example of something
that does not work. Therefore your sweeping statement above does nothing
to improve your argument.

Note that the arguments I gave as to why this kind of fat pointer is
difficult (not impossible) raised specific problems, not vague
statements with no justification.

Anyway, char pointers are easy to deal with, you just include the
relevant information (including original type information) as part of
your fat char pointer. There, a slightly less sweeping statement, if you
think it impossible give an example of what is impossible about it.

James Kuyper · Nov 24, 2007

CBFalconer wrote:
....

Just consider the complications. At present a C pointer is a
simple thing. It rarely needs more than an integer displacement,
but sometimes includes a segment identifier. The fact that a zero
sized object is illegal is very handy. But you can pass pointers
one past the highest item. This is a pointer to a zero sized
object, but has nothing to do with the pointer typing. You can
also pass pointers to an item past the base of the array, or within
a structure. Now negative indices range their heads. There are
lots of rules, and everywhere I look I see added complications. I
have not tried to work them out, but have simply concluded that the
whole operation is impossible. I may be wrong, but I doubt it.

I have thought it through. I've worked out conceptually how such things
could be done, though I've never implemented it, so my concepts may be
wrong. I've not yet run into a situation that couldn't be handled in
some appropriate fashion. Keep in mind that correctly checking bounds is
not necessary, merely permitted; if the implementation has to miss some
opportunities to detect bounds violations, in order to remain
conforming, that's perfectly acceptable, just so long as it never
falsely detects a bounds violation.

However, the full explanation is long, and (I hope) unnecessary. That's
why I asked for a specific example. Give me such an example, and I'll
try to explain how it could be dealt with, or concede that it can't be.
Also, quite frankly, it's virtually impossible to write up something
that long and complicated without making several mistakes, and I prefer
to avoid publishing something like that on a public forum known for the
viciousness with which mistakes are attacked.

You said in your response to Flash Gordon, "I maintain this is not
practical." You are, of course, correct. I don't know of any efficient
way of implementing fat pointers. They are intended to help test for
code correctness; the performance cost of using them would be high, and
would only rarely be acceptable in production code. I say "would be",
because I believe that actual implementations with fat pointers are
rare; possibly non-existent.

santosh · Nov 24, 2007

James Kuyper said:
CBFalconer wrote:

You said in your response to Flash Gordon, "I maintain this is not
practical." You are, of course, correct. I don't know of any efficient
way of implementing fat pointers. They are intended to help test for
code correctness; the performance cost of using them would be high,
and would only rarely be acceptable in production code. I say "would
be", because I believe that actual implementations with fat pointers
are rare; possibly non-existent.

Apparently not non-existent, if the following page is to be believed.

<http://www.springerlink.com/content/qhb2pnhvr9w96nfc/>

Also Cyclone is well known language based on C, which also implements
fat pointers.

CBFalconer · Nov 24, 2007

Flash said:
CBFalconer wrote:
.... snip ...

So provide such an impossibility then you will have proved your
position.

So here is another. Imagine a routine to upshift a string. One
routine receives a char, and answere with 'this is lower case'.
Another receives a char, and answers by replacing it with the upper
case equivalent. Both are passed pointers.

Assuming the original data is a string, the calling routine will
pass something like:

p = &(s[3]); or p = s + 3; (p is parameter)

For the reading routine, there is no harm in allowing reads from (p
- n), where n can be 0 through 3. For the writing routine, this is
not allowable. How do we separate the actions? Further, the
optional read/write may be in one routine, so the multiple access
allowance must be passed in the parameter to everything. After
all, that access may be passed on to another routine.

Yes, these objections are fairly hazy, and I am not willing to try
and work them all out (especially since I am convinced they can't
be so worked out). The fact that a further access requires further
supervision data (rather than a revision of existing supervision
data) means unlimited storage requirements.

CBFalconer · Nov 24, 2007

santosh said:
Apparently not non-existent, if the following page is to be believed.

<http://www.springerlink.com/content/qhb2pnhvr9w96nfc/>

Also Cyclone is well known language based on C, which also implements
fat pointers.

However this is about C, not Cyclone. I have no difficulty
implementing proper range checking in Pascal, for example. Having
done so, I have a fair idea of the evil difficulties of adapting to
C pointers.

jacob navia · Nov 24, 2007

James Kuyper wrote:

[snip]

Also, quite frankly, it's virtually impossible to write up something
that long and complicated without making several mistakes, and I prefer
to avoid publishing something like that on a public forum known for the
viciousness with which mistakes are attacked.

I think this is the big problem with this forum, and I have been
trying to fight this for over 2 years now. The only solution is to
ignore the vicious attacks. Keep in mind that those people are
just unable to propose anything positive and their only way to
express themselves is through those attacks.

It suffices to ignore them, and discuss normally.

For instance the rounding thread, it was full of
those. I ignore them, and I think my opinion came through
at the end.

Richard Heathfield · Nov 24, 2007

James Kuyper said:

Also, quite frankly, it's virtually impossible to write up something
that long and complicated without making several mistakes, and I prefer
to avoid publishing something like that on a public forum known for the
viciousness with which mistakes are attacked.

On the whole, mistakes in clc articles are identified and corrected, but
not attacked. I have made a great many mistakes in clc articles over the
last few years, but I have never been attacked for them (except by trolls,
and I don't count those). I have also corrected a great many mistakes in
clc articles over the last few years, but I've never attacked a mistake.

What I do sometimes attack is the attitude that mistakes don't matter, that
correctness is unimportant, and that "it works on *my* machine" is
sufficient. And I'm not alone in this. The clc people I respect most are
those who can accept (correct) corrections with good grace - and I don't
think it's a coincidence that these are the very people who know C so
well, and hand out such good advice, for free, year in year out.

$)CHarald van D)&k · Nov 24, 2007

So here is another. Imagine a routine to upshift a string. One routine
receives a char, and answere with 'this is lower case'. Another receives
a char, and answers by replacing it with the upper case equivalent.
Both are passed pointers.

Assuming the original data is a string, the calling routine will pass
something like:

p = &(s[3]); or p = s + 3; (p is parameter)

For the reading routine, there is no harm in allowing reads from (p -
n), where n can be 0 through 3. For the writing routine, this is not
allowable.

Why not? Is it because the behaviour would be undefined, or is it because
the function's actions would be different from its description? If the
former, I'm not seeing it, so could you please explain? If the latter, as
long as it's valid C, there's no reason why an implementation would or
should complain about it.

RoS · Nov 24, 2007

In data Thu, 22 Nov 2007 08:57:22 +0100, RoS scrisse:

if the compiler has a array of elements
arrayelement{char* where; size_t size}
that point to each memory object in the memory returned from malloc or
allocated in the stack
it is possible to write a routine that says if one address is allow to
read-write or not; this mean it is possible to write a safe gets()
in the sense: make gets() function not overflow the input array

int isok(char* a)
{arrayelement *p=&whereitis;
for(i=0; p.where!=0; ++i)
{if(a>p.where && a<p.where+p.size)
return 1;
}
return 0;
}

size_t sizefromhere(char* a)
{arrayelement *p=&whereitis;
for(i=0; p.where!=0; ++i)
{if(a>p.where && a<p.where+p.size)
return p.size-(a-p.where);
}
return 0;
}

so gets could be something like

yes gets if its argument is a stack memory should return always 0
(why waste cicles and memory space for debug automatic objects?)

if the argument is malloc heap memory (that in my case has the list of
avaiable memory and their size) could be somethign like below

char* gets(char* buf)
{char *p;
int c;
size_t limit, h;
p=buf;
limit=sizefromhere(buf);
if(limit==0) return 0;
h=0;
l0:;
if(h==limit)
{l1:;
p[h]=0;
return 0;
}
c=getchar();
if(c==EOF)
{if(ferror(stdin)) goto l1
p[h]=0;
return buf;
}
if(c=='\n')
{p[h]='\n'; /* limit-1 */
p[h+1]=0; /* limit */
return buf;
}
p[h]=c; ++h; goto l0;
}

Click to expand...

Flash Gordon · Nov 24, 2007

CBFalconer wrote, On 24/11/07 15:24:

However this is about C, not Cyclone. I have no difficulty
implementing proper range checking in Pascal, for example. Having
done so, I have a fair idea of the evil difficulties of adapting to
C pointers.

Read the post again and then read the article. Once sentence in
particular from the article is, "This paper describes a memory-safe
implementation of the full ANSI C language."

The reference to Cyclone was a reference to something else related and
potentially interesting, not a reference to the article. Hence the word
"Also" in the post.

Flash Gordon · Nov 24, 2007

jacob navia wrote, On 24/11/07 16:38:

James Kuyper wrote:

[snip]

Also, quite frankly, it's virtually impossible to write up something
that long and complicated without making several mistakes, and I
prefer to avoid publishing something like that on a public forum known
for the viciousness with which mistakes are attacked.

Click to expand...

I think this is the big problem with this forum, and I have been
trying to fight this for over 2 years now. The only solution is to
ignore the vicious attacks.

So we should ignore you? You frequently attack people.

Keep in mind that those people are
just unable to propose anything positive and their only way to
express themselves is through those attacks.

Think about the fact that you frequently attack.

It suffices to ignore them, and discuss normally.

So why do you attack and ascribe to people motives and opinions that
they do not have?

For instance the rounding thread, it was full of
those. I ignore them, and I think my opinion came through
at the end.

Your opinion was clear. It was not sensible to offer code that did not
meet the stated requirements without telling the OP that it did not and
why not, and I would not say that you one the argument either since
there is no independent arbiter.

Flash Gordon · Nov 24, 2007

ï¿½Harald van DÄ³k wrote, On 24/11/07 17:29:

So here is another. Imagine a routine to upshift a string. One routine
receives a char, and answere with 'this is lower case'. Another receives
a char, and answers by replacing it with the upper case equivalent.
Both are passed pointers.

Assuming the original data is a string, the calling routine will pass
something like:

p = &(s[3]); or p = s + 3; (p is parameter)

For the reading routine, there is no harm in allowing reads from (p -
n), where n can be 0 through 3. For the writing routine, this is not
allowable.

Click to expand...

Why not? Is it because the behaviour would be undefined, or is it because
the function's actions would be different from its description? If the
former, I'm not seeing it, so could you please explain? If the latter, as
long as it's valid C, there's no reason why an implementation would or
should complain about it.

I agree. Further, if the routine that is not allowed to write has the
parameter declared as a pointer to const the compiler will complain
about it.

Someone else has posted a link to an article about a fat pointer
implementation of C thus providing strong evidence that such an
implementation is possible.

My position is that it is difficult rather than impossible.

Kenny McCormack · Nov 24, 2007

James Kuyper said:

On the whole, mistakes in clc articles are identified and corrected, but
not attacked. I have made a great many mistakes in clc articles over the
last few years, but I have never been attacked for them (except by trolls,
and I don't count those).

That's because you are an accepted "regular" and most defer to you.
Or, to put it another way, your definition of "troll" is:

One who has the balls to attack you.

(Therefore, it is definitionally true that the only posters who attack
you are "trolls").

I have also corrected a great many mistakes in
clc articles over the last few years, but I've never attacked a mistake.

Liar. Two words: Jacob Navia.

gets() - dangerous?	302	Dec 24, 2005
Why is it dangerous?	184	Aug 10, 2008
Safe version of gets	45	Aug 12, 2005
CIN Input #2 gets skipped, I don't understand why.	1	Feb 9, 2023
Coding Problem/Challenge website that limits your resource usage	2	Aug 16, 2022
Dangerous UDP Checksum code ?!?	11	Nov 8, 2008
What is the problem in a programmer point of view ?	3	Mar 8, 2024
gets and puts warning	1	Feb 27, 2009

why the usage of gets() is dangerous.

RoS

James Kuyper

Dik T. Winter

CBFalconer

CBFalconer

CBFalconer

CBFalconer

Flash Gordon

James Kuyper

santosh

CBFalconer

CBFalconer

jacob navia

Richard Heathfield

$)CHarald van D)&k

RoS

Flash Gordon

Flash Gordon

Flash Gordon

Kenny McCormack

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads

why the usage of gets() is dangerous.

RoS

James Kuyper

Dik T. Winter

CBFalconer

CBFalconer

CBFalconer

CBFalconer

Flash Gordon

James Kuyper

santosh

CBFalconer

CBFalconer

jacob navia

Richard Heathfield

$)CHarald van D)&k

RoS

Flash Gordon

Flash Gordon

Flash Gordon

Kenny McCormack

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads

$)CHarald van D)&k