Malcolm's new book

R

Richard Bos

Keith Thompson said:
It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.

True enough, but under those circumstances _any_ code can crash, even
code that doesn't call malloc() at all. I don't think you can get around
that by any amount of ISO C coding, and possibly not using only code at
all unless you have root access to that system. I wouldn't blame that on
ggets.

Richard
 
E

Ed Jensen

Flash Gordon said:
Not necessarily. It is common for the OS to provided zeroed memory so
calloc will not have to write to it if the memory has been freshly
obtained from the OS, which is when there is a potential problem.

That's interesting. Would malloc() followed by memset() (to manually
zero the entire contents of the allocated memory) resolve the problem?

I assume you would need some OS-specific code to catch a failed
memset() (assuming the "real" allocation happens during the write, and
you'd need to catch the failure somehow).
Also, calloc does not resize blocks, and although it is not obvious from
the quoted material the original discussion was about growing buffers
using realloc.

That's also interesting. I've never tried to realloc() memory
allocated with calloc() before, and thus never gave it any thought.
 
K

Keith Thompson

True enough, but under those circumstances _any_ code can crash, even
code that doesn't call malloc() at all. I don't think you can get around
that by any amount of ISO C coding, and possibly not using only code at
all unless you have root access to that system. I wouldn't blame that on
ggets.

That's a good point.

But I can reduce the risk of that kind of crash by limiting the amount
of memory I allocate to some "reasonable" size; for example, I might
want handle very long lines, but reject lines longer than a megabyte.
Even if malloc and realloc misbehave, that could still be a useful
feature. ggets, in its current form, doesn't let me do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.
 
R

Richard Heathfield

Keith Thompson said:

But I can reduce the risk of that kind of crash by limiting the amount
of memory I allocate to some "reasonable" size; for example, I might
want handle very long lines, but reject lines longer than a megabyte.
Even if malloc and realloc misbehave, that could still be a useful
feature. ggets, in its current form, doesn't let me do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.

It has been suggested to him already on several occasions. If he
rejected all the other suggestions, I don't reckon he's about to change
his mind. It's a shame, however, since it renders ggets unrecommendable
as a serious routine for use in production code.
 
F

Flash Gordon

Ed Jensen wrote, On 23/08/07 15:10:
That's interesting.

I just realised that what I wrote could be misinterpreted. It is common
for the OS to provide zeroed memory, but the reason has nothing to do
with calloc it is to prevent you getting possibly sensitive data from
some other program. Given that this occurs malloc could be written as:
IF memory available in free list THEN
zero memory and return pointer to it
ELSE
Get memory from OS and return pointer to it without zeroing it
> Would malloc() followed by memset() (to manually
zero the entire contents of the allocated memory) resolve the problem?

Depends. The compiler could be clever enough to replace that with a call
to calloc.
I assume you would need some OS-specific code to catch a failed
memset() (assuming the "real" allocation happens during the write, and
you'd need to catch the failure somehow).

Since you have to go the system specific route, you might as well go the
system specific route of finding a way to disable lazy allocation.
That's also interesting. I've never tried to realloc() memory
allocated with calloc() before, and thus never gave it any thought.

You can, although any extra memory will not be zeroed obviously.

The original discussion was about a buffer allocated with malloc then
grown with realloc (no zeroing of memory involved). Someone suggested
using calloc to sidestep the lazy allocation problem.
 
M

Malcolm McLean

Keith Thompson said:
But I can reduce the risk of that kind of crash by limiting the amount
of memory I allocate to some "reasonable" size; for example, I might
want handle very long lines, but reject lines longer than a megabyte.
Even if malloc and realloc misbehave, that could still be a useful
feature. ggets, in its current form, doesn't let me do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.
The problem is you are asking for something inherently very difficult. To
accept lines of arbitrary size, but reject "maliciously long lines". If
you've got some sort of model of the input you expect then you can maybe
discriminate, but this is highly advanced AI programming, not low-level code
for an input function.
 
F

Flash Gordon

Malcolm McLean wrote, On 23/08/07 20:37:
The problem is you are asking for something inherently very difficult.
To accept lines of arbitrary size, but reject "maliciously long lines".
If you've got some sort of model of the input you expect then you can
maybe discriminate, but this is highly advanced AI programming, not
low-level code for an input function.

Keith's point is that if the user of the library function could specify
a maximum size (possibly 0 meaning unlimited) then the user of the
library function could decide on some suitable upper bound.
 
M

Malcolm McLean

Flash Gordon said:
Keith's point is that if the user of the library function could specify a
maximum size (possibly 0 meaning unlimited) then the user of the library
function could decide on some suitable upper bound.
-1 for unlimited. Demands for zero-length objects should be honoured. But
then the parameter cannot be a size_t :)
 
S

santosh

Malcolm said:
The problem is you are asking for something inherently very difficult. To
accept lines of arbitrary size, but reject "maliciously long lines". If
you've got some sort of model of the input you expect then you can maybe
discriminate, but this is highly advanced AI programming, not low-level
code for an input function.

I don't see what AI-like about this at all. The approximate expected line
length is something that the caller should know. A generic, reusable
routine like ggets/getline cannot know about this. However, what it can do,
is accept a parameter specifying an upper limit on the amount of memory to
attempt to allocate, or the number of characters to attempt to read. If you
*really* want unlimited size, you could signal your intention by passing a
special value like zero. This would allow the get line function to be
tailored at each invocation, depending on what the application deems
reasonable at that point.
 
C

CBFalconer

Keith said:
.... snip ...

That's a good point.

But I can reduce the risk of that kind of crash by limiting the
amount of memory I allocate to some "reasonable" size; for example,
I might want handle very long lines, but reject lines longer than
a megabyte. Even if malloc and realloc misbehave, that could still
be a useful feature. ggets, in its current form, doesn't let me
do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.

I think I already gave my reasons for disagreeing.
 
K

Keith Thompson

Flash Gordon said:
Malcolm McLean wrote, On 23/08/07 20:37:

Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the user of
the library function could decide on some suitable upper bound.

Exactly. It's not difficult at all.
 
M

Malcolm McLean

santosh said:
Malcolm McLean wrote:
I don't see what AI-like about this at all. The approximate expected line
length is something that the caller should know. A generic, reusable
routine like ggets/getline cannot know about this. However, what it can
do,
is accept a parameter specifying an upper limit on the amount of memory to
attempt to allocate, or the number of characters to attempt to read. If
you
*really* want unlimited size, you could signal your intention by passing a
special value like zero. This would allow the get line function to be
tailored at each invocation, depending on what the application deems
reasonable at that point.
Let's say the input is English-language sentences. Upt o about 2000
characters is no problem. Above that, it could be malicious or it could be a
legit sentence.
By Markov modelling English text I can filter out a lot of garbage type
inputs. That leaves legitimate long sentences and malicious ones composed
with the Markov model or a similar one. So we can do a semantic check -
certain verbs can take only certain subjects and objects, for instanxce. A
few violations such as "she shot the bolt" we can ignore, but lots "a rabbit
shot dark dreams furiously" we can reject. Eventually we accept only genuine
English sentences, unless attacker is really very good indeed.
 
F

Flash Gordon

Malcolm McLean wrote, On 23/08/07 21:44:
-1 for unlimited. Demands for zero-length objects should be honoured.
But then the parameter cannot be a size_t :)

No, I said 0 for unlimited because that is exactly what I meant. Asking
for at most 0 bytes of input is not sensible IMHO. There is also a
long-standing tradition (I'm not specifically referring to C here) of
using a 0 limit to mean unlimited. Also it allows you to use the correct
type and pass in any valid size.

There are reasons why doing a malloc(0) and getting back a pointer where
no memory has been allocated can be useful which is probably why some
implementations did it.
 
S

santosh

Malcolm said:
Let's say the input is English-language sentences. Upt o about 2000
characters is no problem. Above that, it could be malicious or it could be
a legit sentence.
By Markov modelling English text I can filter out a lot of garbage type
inputs. That leaves legitimate long sentences and malicious ones composed
with the Markov model or a similar one. So we can do a semantic check -
certain verbs can take only certain subjects and objects, for instanxce. A
few violations such as "she shot the bolt" we can ignore, but lots "a
rabbit shot dark dreams furiously" we can reject. Eventually we accept
only genuine English sentences, unless attacker is really very good
indeed.

Right, but the point was that the actual input function should provide the
means to retrieve lines of any length, including unlimited. The ggets
routine earlier in the thread provides no mechanism to tell it to stop at a
particular value, presumably determined in the caller, by sophisticated
Markov modelling or just common sense.
 
C

Chris Torek

Malcolm McLean wrote, On 23/08/07 21:44:
No, I said 0 for unlimited because that is exactly what I meant. Asking
for at most 0 bytes of input is not sensible IMHO.

Indeed. However, Malcolm McLean's irrational fear of "size_t" aside,
passing -1 would work perfectly: (size_t)-1 is SIZE_MAX, which is
the largest possible value. If the size of the input line exceeds
SIZE_MAX, we have a paradox. :)
There is also a long-standing tradition (I'm not specifically
referring to C here) of using a 0 limit to mean unlimited.

This tends to depend on the situation. Sometimes a limit of zero
means "not allowed to do it"; sometimes that makes no sense and it
means "unlimited".
 
C

CBFalconer

Flash said:
Malcolm McLean wrote, On 23/08/07 20:37:

Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the
user of the library function could decide on some suitable upper
bound.

I wrote ggets{} to replace gets{}. It maintains the simplicity -
you supply only the address of a pointer, which will receive the
pointer to the next input line. The only other thing to worry
about is the return value, which can be 0 (good), EOF (EOF) or
positive non-zero (I/O error). Now you have to remember to arrange
to free() that pointer at some time. You can also copy it
elsewhere, embed it in a linked list, etc. etc.

However, use is always totally safe. The input action will never
overwrite anything. If you put any limits on it, sooner or later
those will bite. Or they are one more parameter to "get right"
before calling. The simplest parameter is no parameter. It is
fairly hard to get that one wrong.

What you can do, without noticeable harm (except to force the user
to initialize something) is say that ggets() will free the pointer
at the beginning of execution, whenever it is non-NULL. I don't
like it, because it complicates the usage, and I have a problem
remembering to do anything, including creating an object and
presetting it to NULL each time it is passed to ggets() (after
freeing, after any earlier ggets() call). You can see how the
specification gets out of hand very rapidly.
 
K

Keith Thompson

Malcolm McLean said:
-1 for unlimited. Demands for zero-length objects should be
honoured. But then the parameter cannot be a size_t :)

No, I'd use 0 for unlimited, since it's a common convention and C does
not support zero-sized objects.

But even if you choose to use -1, (size_t)-1 is a perfectly reasonable
way to specify an effectively unlimited line length.
 
R

Richard Heathfield

Keith Thompson said:
No, I'd use 0 for unlimited, since it's a common convention and C does
not support zero-sized objects.

But even if you choose to use -1, (size_t)-1 is a perfectly reasonable
way to specify an effectively unlimited line length.

In the library I'm working on right now, I use (size_t)-1 to indicate
"whatever" - in data capture, it means "the programmer doesn't have a
particular upper limit in mind", in array access it means "on the end,
please", and so on. Appropriate #defines disambiguate these meanings.
 
R

Richard Bos

Keith Thompson said:
That's a good point.

But I can reduce the risk of that kind of crash by limiting the amount
of memory I allocate to some "reasonable" size; for example, I might
want handle very long lines, but reject lines longer than a megabyte.
Even if malloc and realloc misbehave, that could still be a useful
feature. ggets, in its current form, doesn't let me do that.

Chuck, think of this as a friendly suggestion for a new and useful
feature, not necessarily as a bug report.

True, that would be a useful option in some cases. I don't think it will
reduce the risk of that sort of crash by much, though. Isn't it more
usually caused by an application intentionally asking for too much
memory which it only uses sparsely? Not often, AIUI, by an application
only going slightly over the line by asking for an increasing buffer.
And if so, it probably won't even be the user of ggets itself which
causes the crash.

In any case, if Chuck doesn't want to make the use of ggets() itself
more complicated (for which reluctance there are, IMO, very good
reasons), it would be easily possible to write a new function gngets()
which takes the extra maximum size parameter, and rewrite ggets() as
simply

#define ggets(.....) gngets(....., -1) /* or (....., 0), i.c.)

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top