Malcolm's new book

R

Richard

Keith Thompson said:
It doesn't exist, and it's entirely possible that it never will.

I should take a closer look at Richard Heathfield's fgetline(); I
think it already does exactly what I'm suggesting ggets should be
extended to do. But it's a bit more complex to use.

http://www.securityfocus.com/infocus/1412

An informative article.

Might be interesting to people still intrigued by the never ending
gets() debates and home made alternative comparisons.

The libsafe library has moved, but most people use fgets()
anyway. It's new place is here:

http://www.research.avayalabs.com/g...er=LabsProjectDetails&View=LabsProjectDetails

or

http://tinyurl.com/y2b88a

My own take is that "safe" routines such as this ggets should not be
used unless you know exactly what you are doing. Stuff like this can
hide a lot of errors. A good C programmer programs within certain
parameters and limits and this constant worrying about never ending
input streams is hardly ever an issue in the hundreds of programs I have
dealt with. Yes, I must be careful. Yes, I must know the lengths being
pass around. No, I rarely provide the ability for a "never ending" input
to be passed to certain streams. It has always been and will continue to
be so, relatively trivial to read input based on a fixed buffer using
fgets or the equivalent and deal with it at the time. Stuff like this
ggets is no more than a shallow attempt to enforce sloppiness IMO - it
reminds me of C++ and the template libraries:-;
 
C

CBFalconer

Joe said:
It is fairly simple code. Show us what ktgets() looks like.

Just take the ggets source, change the name, and add a size_t
parameter. It only comes into play when expanding the storage.
Now you have some more problems, including:

1. How to signal that this is a truncated line.
2. What to do with terminal '\n's.
3. How to train the users.

None of which I want to have anything to do with.
 
R

Richard Heathfield

Keith Thompson said:

I should take a closer look at Richard Heathfield's fgetline(); I
think it already does exactly what I'm suggesting ggets should be
extended to do. But it's a bit more complex to use.

As Einstein said, "as simple as possible - but no simpler". It contains
no new type definitions (i.e. doesn't attempt to be a "string class",
so to speak), and has only as many parameters as enable you to control
its behaviour effectively.

int fgetline(char **line,
size_t *size,
size_t maxrecsize,
FILE *fp,
unsigned int flags);

Let's deal with the flags first. There is in fact only one flag,
FGDATA_REDUCE (or 1, for short!), which tells fgetline to realloc the
final buffer back down to the minimum necessary for holding the string
- and that's it. If you don't want this behaviour, pass 0.

The line parameter points to a pointer to char, and size points to a
size_t. These work together: they are (pointers to) buffer start point
and buffer size respectively.

The maxrecsize parameter does what you'd expect - it stops fgetline from
reading too long a line (Denial of Memory attack).

fp is obvious.

The return value is 0 on success, 1 on EOF, negative value for any other
error (and THAT is a bad design). It should return EOF on EOF! And I
should perhaps define macros to describe other results, rather than
return magic numbers.

Apart from that, though, it's a reasonable function. It contains only
two parameters more than fgets, and each of these is for a reason that
is obvious on reflection. The parameters are in the same order as
fgets, except for maxrecsize (which is easy to remember because it
comes right after size) and flags (which goes on the end).

I reserve the right to change the return codes to something sensible
when I get a minute. :)
 
M

Malcolm McLean

Richard said:
My own take is that "safe" routines such as this ggets should not be
used unless you know exactly what you are doing. Stuff like this can
hide a lot of errors. A good C programmer programs within certain
parameters and limits and this constant worrying about never ending
input streams is hardly ever an issue in the hundreds of programs I have
dealt with. Yes, I must be careful. Yes, I must know the lengths being
pass around. No, I rarely provide the ability for a "never ending" input
to be passed to certain streams. It has always been and will continue to
be so, relatively trivial to read input based on a fixed buffer using
fgets or the equivalent and deal with it at the time. Stuff like this
ggets is no more than a shallow attempt to enforce sloppiness IMO - it
reminds me of C++ and the template libraries:-;
In practise fgets() is too hard for the average programmer to use correctly,
as has time after time been demonstrated here. I caused outrage by
suggesting that most programs would be safer if they replaced fgets() with
gets(). However I was right.

The real answer is not to use gets() but something like ggets(). If passed a
maliciously long line it will clog up the machine's memory, but it will not
cause the program to calcualte a wrong answer. So I'd say it is acceptable
for many security applications. Remember that if the enemy can pass
arbitrary data to your program he can often tie up the processor or memory
anyway.

However it probably does need some way of excluding malciously large inputs,
given the limitations of most OSes. Which means some change to the
interface. It is not an easy problem, which was partly why "readline" is
deliberately defective, to get the reader to think about how he can improve
the function for his own circumstances.
 
K

Keith Thompson

Malcolm McLean said:
In practise fgets() is too hard for the average programmer to use
correctly, as has time after time been demonstrated here. I caused
outrage by suggesting that most programs would be safer if they
replaced fgets() with gets(). However I was right.

I don't remember you making that absurd claim. If you did, you were
wrong.
The real answer is not to use gets() but something like ggets(). If
passed a maliciously long line it will clog up the machine's memory,
but it will not cause the program to calcualte a wrong answer. So I'd
say it is acceptable for many security applications. Remember that if
the enemy can pass arbitrary data to your program he can often tie up
the processor or memory anyway.

Did you mean "unacceptable"? I certainly hope so.
However it probably does need some way of excluding malciously large
inputs, given the limitations of most OSes. Which means some change to
the interface. It is not an easy problem, which was partly why
"readline" is deliberately defective, to get the reader to think about
how he can improve the function for his own circumstances.

If you're concerned about memory allocation for extremely long lines,
there are line-reading routines that handle that. Richard
Heathfield's fgetline() is one example.

This kind of problem can show up in the real world. Try something as
simple as "less -S < /dev/zero" on a Unix-like system and see what it
does to your CPU and memory usage. Be prepared to kill the process
from another window (you probably won't be able to interrupt it from
the keyboard), and don't try this on a shared system.

"less" is a popular freeware file viewer; it's primarily intended for
text files, but it can handle binary files. "/dev/zero" is a
pseudo-file that looks like an endless stream of null characters.
"less -S" causes long lines to be truncated for display rather than
wrapped. "less -S < /dev/zero" attempts to read and store input data
until it finds a line terminator; since there is none, it just keeps
reading and storing null characters.

I don't know how "less" reads lines (it's written in C, but I haven't
examined the source). "less -S < /dev/zero" is admittedly a contrived
example, but I've run into similar problems accidentally. I don't
believe "less" uses ggets(), but the observed behavior is similar to
what it would be if it did.
 
R

Richard

In practise fgets() is too hard for the average programmer to use

You do make a lot of sweeping statements.

If a C programmer can't use fgets then there is something wrong and he
shouldn't be coding in C. Simple. Sure, he might not use it 100%
efficiently but that is NO reason to plug in some half arsed equivalent
like ggets with all the baggage it comes with.

We are talking very basic pointer, c strings, malloc, realloc and
dealloc type stuff here. Hiding behind third party libraries can often
bring in more problems than it can solve.

I remember someone here (possibly the ggets author) suggesting that
something like

"*d++=*s++" is confusing and "misusing" the language. My take would be
that if you, as a programmer, can't understand that then you have no
place being a C programmer until that reads like an ABC for first time
readers... Similar for fgets().
correctly, as has time after time been demonstrated here. I caused
outrage by suggesting that most programs would be safer if they
replaced fgets() with gets(). However I was right.

What? Only if you are on an OS which behaves in a defined way when you
get buffer overrun. Ridiculous statement.
The real answer is not to use gets() but something like ggets(). If
passed a maliciously long line it will clog up the machine's memory,
but it will not cause the program to calcualte a wrong answer. So I'd
say it is acceptable for many security applications. Remember that if
the enemy can pass arbitrary data to your program he can often tie up
the processor or memory anyway.

Complete garbage.

Often the way they get in is to take the piss out of some numpty who has
used gets on the locked door algorithms ....
However it probably does need some way of excluding malciously large
inputs, given the limitations of most OSes. Which means some change to
the interface. It is not an easy problem, which was partly why
"readline" is deliberately defective, to get the reader to think about
how he can improve the function for his own circumstances.

I am at a loss here. You seem to have gone completely mad :)
 
R

Richard Heathfield

Malcolm McLean said:

In practise fgets() is too hard for the average programmer to use
correctly, as has time after time been demonstrated here.

No, it isn't. What is often demonstrated here is *ignorance* of the
proper way to use fgets. Ignorance is curable (although not in all
cases, it appears).
I caused
outrage by suggesting that most programs would be safer if they
replaced fgets() with gets(). However I was right.

No, your suggestion is incorrect. No program is safe if it calls gets.
That is not to say that all programs calling fgets are safe, of course
- but replacing an fgets call with a gets call is just plain stupid.
The real answer is not to use gets() but something like ggets().

No, it isn't, for reasons which were pointed out when ggets first
appeared five years ago and which have been reiterated at various times
ever since.

Malcolm, I really really wish you'd stop talking such junk all the time.
 
F

Flash Gordon

Malcolm McLean wrote, On 29/08/07 19:24:
In practise fgets() is too hard for the average programmer to use
correctly, as has time after time been demonstrated here.

Many people manage to use fgets correctly.
I caused
outrage by suggesting that most programs would be safer if they replaced
fgets() with gets().

Since gets cannot be used safely, and often the incorrect use of fgets
will be safer than any use of gets, you were wrong. In fact, you are
still wrong.
However I was right.

Asserting that in the face of the majority of people here claiming
otherwise without very strong supporting evidence does not do you any
favours.
The real answer is not to use gets() but something like ggets(). If
passed a maliciously long line it will clog up the machine's memory, but
it will not cause the program to calcualte a wrong answer.

Or on some systems cause random processes to terminate.
So I'd say it
is acceptable for many security applications.

I can see it being acceptable for ones which are not public facing, such
as generating a security key for use in some other application. However,
it would be completely unacceptable on a web server for receiving
incoming requests.
Remember that if the enemy
can pass arbitrary data to your program he can often tie up the
processor or memory anyway.

However, with a proper design you can limit how much they can tie up.
However it probably does need some way of excluding malciously large
inputs, given the limitations of most OSes.

Oh, so after all that you think that Keith is right! Why didn't you just
say so?
Which means some change to
the interface. It is not an easy problem, which was partly why
"readline" is deliberately defective, to get the reader to think about
how he can improve the function for his own circumstances.

You missed one major reason why it is not suitable for a lot of use. On
memory exhaustion it throws away the probably large amount of input it
has received. Personally I would consider that completely unacceptable
for a lot of uses.
 
M

Malcolm McLean

Flash Gordon said:
You missed one major reason why it is not suitable for a lot of use. On
memory exhaustion it throws away the probably large amount of input it has
received. Personally I would consider that completely unacceptable for a
lot of uses.
It would be better if ggets() took some action against the half-read data
currently in the stream, to prevent it being called agai nand possibly
returning a wrong result.
The last thing you want is wrong but reasonable-seeming results, which is
what partially-read lines are not too unlikely to generate.
 
M

Malcolm McLean

Richard said:
I am at a loss here. You seem to have gone completely mad :)
The OS ought to tell a process how much memory it can "reasonably" have. It
is very bad design to allow something to gobble up huge portions of swap
space, slowing every other part of the system to a crawl, except in the
relatively unusual circumstnaces where the process does genuinely need all
of the systems' resources.
So the strategy of asking malloc() for memory until failure ought to be the
best one. The reality is that it is not. Most systems will happily hand out
memory that cannot be afforded, and is almost certainly isn't really wanted,
allowing exploiters to tie up system resources.
However there isn't an easy answer, which is why we having this sub-thread.

readline() doesn't provide the answer either. It doesn't try. It provides
something that is good enough for casual use, and the documents that it has
weaknesses, because the idea of the chapter is to get the reader into the
way of writing functions. If he can improve on readline() then the chapter
has done its job.
Note than an improvement might be better for a program-specific fucntion,
but worse for a general one. For instance a function that writes an error
message to stderr, and then terminates, might be exactly what is wanted in
the context of a particular program, though it is no good for a
general-purpose fucntion.
 
R

Richard

Malcolm McLean said:
The OS ought to tell a process how much memory it can "reasonably"
have. It is very bad design to allow something to gobble up huge
portions of swap space, slowing every other part of the system to a
crawl, except in the relatively unusual circumstnaces where the
process does genuinely need all of the systems' resources.

There might be some truth in this - but this in no way supports your use
of gets().
So the strategy of asking malloc() for memory until failure ought to
be the best one. The reality is that it is not. Most systems will
happily hand out memory that cannot be afforded, and is almost
certainly isn't really wanted, allowing exploiters to tie up system
resources.
However there isn't an easy answer, which is why we having this
sub-thread.

There is. Use fgets and limit the amount of mallocing you do based on
known system behaviour. Using something like ggets is more dangerous
other than the fact it's just not worth using because of its lack of
flexibility. See Keith Thompson's suggestions for additions which should
have been in there from day one IMO.
 
F

Flash Gordon

Malcolm McLean wrote, On 29/08/07 21:20:
It would be better if ggets() took some action against the half-read
data currently in the stream,

Quite possibly, but I was not commenting on ggets I was commenting on
your ggets function as was made clear by the preceding text that you
snipped. The text, written by you and quoted by me was:
| Which means some change to the interface. It is not an easy problem,
| which was partly why "readline" is deliberately defective, to get the
| reader to think about how he can improve the function for his own
| circumstances.

Trying to deflect comments about your work by making it appear they were
about code from someone else will not help your case.
to prevent it being called agai nand
possibly returning a wrong result.
The last thing you want is wrong but reasonable-seeming results, which
is what partially-read lines are not too unlikely to generate.

Or the last thing you want might be throwing away valuable input which
is what your function does, and a weakness that you did not document.
Pointing out weaknesses in other peoples code has no effect on the
weaknesses in yours.

Note that I have not checked what Chuck's ggets function does, that if
for Chuck to comment on or not as he chooses.
 
M

Malcolm McLean

Flash Gordon said:
Malcolm McLean wrote, On 29/08/07 21:20:

Quite possibly, but I was not commenting on ggets I was commenting on your
ggets function as was made clear by the preceding text that you snipped.
The text, written by you and quoted by me was:
| Which means some change to the interface. It is not an easy problem,
| which was partly why "readline" is deliberately defective, to get the
| reader to think about how he can improve the function for his own
| circumstances.

Trying to deflect comments about your work by making it appear they were
about code from someone else will not help your case.


Or the last thing you want might be throwing away valuable input which is
what your function does, and a weakness that you did not document.
Pointing out weaknesses in other peoples code has no effect on the
weaknesses in yours.
readline() doesn't distinguish between EOF and out of memory. So it is only
really acceptable if either you can pick up a half-read file somehow else,
or if your machine never runs out of memory.
That was mentioned.

In practise the machine won't run out of memory on a legitimate input.
However the weakness was deliberately left in.
 
F

Flash Gordon

Malcolm McLean wrote, On 29/08/07 23:55:
readline() doesn't distinguish between EOF and out of memory.

I know.
So it is
only really acceptable if either you can pick up a half-read file
somehow else,

Which is not possible in all situations, since a stream might not be a file.
or if your machine never runs out of memory.
That was mentioned.

You only mentioned part of the problem, namely that you cannot
distinguish between the two conditions. You do not mention the problem
of loosing the input that has been received.
In practise the machine won't run out of memory on a legitimate input.
However the weakness was deliberately left in.

So mention all of the weaknesses, not just a subset.
 
C

CBFalconer

Flash said:
Malcolm McLean wrote, On 29/08/07 19:24: .... snip ...


You missed one major reason why it is not suitable for a lot of
use. On memory exhaustion it throws away the probably large amount
of input it has received. Personally I would consider that
completely unacceptable for a lot of uses.

You also missed that ggets is immune to this type of data loss, in
that it returns a filled buffer with an error signal. A new call
to ggets will continue the line, provided there is memory available
at that time. I don't know about other routines.

<http://cbfalconer.home.att.net/download/ggets.zip>
 
B

Ben Bacarisse

Malcolm McLean said:
In practise fgets() is too hard for the average programmer to use
correctly, as has time after time been demonstrated here. I caused
outrage by suggesting that most programs would be safer if they
replaced fgets() with gets(). However I was right.

Let me guess. You were alone in this belief on that occasion as well?
 
M

Malcolm McLean

Ben Bacarisse said:
Let me guess. You were alone in this belief on that occasion as well?
Yes. About two years later Steve Summit edited to FAQ to point out that
fgets() is problematic if lines overflow the buffer length, which was the
point I was making all along. Never said I was right.
 
C

CBFalconer

Malcolm said:
Yes. About two years later Steve Summit edited to FAQ to point out that
fgets() is problematic if lines overflow the buffer length, which was the
point I was making all along. Never said I was right.

It handles that. The complications arise when trying to interface
to its handling.
 
R

Richard

Malcolm McLean said:
Yes. About two years later Steve Summit edited to FAQ to point out
that fgets() is problematic if lines overflow the buffer length, which
was the point I was making all along. Never said I was right.

No. fgets is not problematic. fgets behaves in a defined non problematic
way.
 
R

Richard Heathfield

Malcolm McLean said:

Malcolm: to do a Galileo, it isn't enough to be alone in your opinions.
You have to be right, too.
About two years later Steve Summit edited to FAQ to point out
that fgets() is problematic if lines overflow the buffer length, which
was the point I was making all along.

fgets protects its buffer against overflow.

The only FAQ comment I can find that's even remotely relevant is "(If
long lines are a real possibility, their proper handling must be
carefully considered.)" - which is certainly true but says nothing
about overflowing buffer length.

Never said I was right.

Hold that thought.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top