Malcolm's new book

K

Keith Thompson

CBFalconer said:
Flash Gordon wrote: [...]
Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the
user of the library function could decide on some suitable upper
bound.

I wrote ggets{} to replace gets{}. It maintains the simplicity -
you supply only the address of a pointer, which will receive the
pointer to the next input line. The only other thing to worry
about is the return value, which can be 0 (good), EOF (EOF) or
positive non-zero (I/O error). Now you have to remember to arrange
to free() that pointer at some time. You can also copy it
elsewhere, embed it in a linked list, etc. etc.

However, use is always totally safe. The input action will never
overwrite anything. If you put any limits on it, sooner or later
those will bite. Or they are one more parameter to "get right"
before calling. The simplest parameter is no parameter. It is
fairly hard to get that one wrong.
[...]

A program using ggets(), reading from an arbitrary input file, can
attempt to allocate an arbitrarily large amount of memory. It will
eventually fail cleanly (assuming malloc and realloc work the way
they're required to), but even so, allocating as much memory as you
can may have negative consequences. I can write a program that calls
malloc() in a loop to see how much I can allocate, but I wouldn't want
to run it on a shared system.

I'm not suggesting changing the default behavior, just providing a way
for the user to change it. You could either provide a routine to set
a maximum size for future calls (though that could introduce issues
for threaded environments), or provide an additional function that
lets you specify a limit. (The behavior on exceeding the limit would
have to be defined.)

Heck, since it's public domain, I might go ahead and make some changes
myself. Naturally you're under no obligation to accept them; and if I
distribute it myself I'll certainly give you credit. I'll do this in
my copious free time, of course, so don't hold your breath.
 
C

CBFalconer

Keith said:
CBFalconer said:
Flash Gordon wrote: [...]
Keith's point is that if the user of the library function could
specify a maximum size (possibly 0 meaning unlimited) then the
user of the library function could decide on some suitable upper
bound.

I wrote ggets{} to replace gets{}. It maintains the simplicity -
you supply only the address of a pointer, which will receive the
pointer to the next input line. The only other thing to worry
about is the return value, which can be 0 (good), EOF (EOF) or
positive non-zero (I/O error). Now you have to remember to arrange
to free() that pointer at some time. You can also copy it
elsewhere, embed it in a linked list, etc. etc.

However, use is always totally safe. The input action will never
overwrite anything. If you put any limits on it, sooner or later
those will bite. Or they are one more parameter to "get right"
before calling. The simplest parameter is no parameter. It is
fairly hard to get that one wrong.
[...]

A program using ggets(), reading from an arbitrary input file, can
attempt to allocate an arbitrarily large amount of memory. It will
eventually fail cleanly (assuming malloc and realloc work the way
they're required to), but even so, allocating as much memory as you
can may have negative consequences. I can write a program that calls
malloc() in a loop to see how much I can allocate, but I wouldn't want
to run it on a shared system.

I'm not suggesting changing the default behavior, just providing a way
for the user to change it. You could either provide a routine to set
a maximum size for future calls (though that could introduce issues
for threaded environments), or provide an additional function that
lets you specify a limit. (The behavior on exceeding the limit would
have to be defined.)

Heck, since it's public domain, I might go ahead and make some changes
myself. Naturally you're under no obligation to accept them; and if I
distribute it myself I'll certainly give you credit. I'll do this in
my copious free time, of course, so don't hold your breath.

Since it is PD you can do whatever you wish. However I request
that, if you change the header file in any way, you also change the
routines name.

Note the simplicity and safety of the demo file reverse.c.
 
D

David Thompson

(e-mail address removed)-cnrc.gc.ca (Walter Roberson) writes:
Indeed. Might even be UB if the size_t is not documented as an
alternative type for bits fields. The only portable types for bit
fields are (optionally qualified) _Bool and int (which may be signed
or unsigned).

s/UB/CV/ (6.7.2.1p4 is a constraint)

- formerly david.thompson1 || achar(64) || worldnet.att.net
 
D

David Thompson

I understand this is a software group, and people here may not be as
familiar with hardware as either a hardware group or assembly language
group would be. But that statement is totally incorrect. Historically
memory address range has almost always been larger than native data

For micros, minis, and (early) "data processing" machines, _usually_.
size. One particular bit-slice processor had a native data size of 4-
bits, yet an address bus 20 bits wide. The popular 8085 has a 16-bit
address bus but only an 8-bit native data size. The 8086 that the IBM-

Also Intel 8080, Motorola 6809, and MOS Tech (IIRC) 6502.
PC was based on had a 20-bit address bus with a 16-bit native data
size. I could go on with further examples. The point is that it is
actually very rare that the "natural" integer size is as large as the
addressable memory range. Thus the need for a special type to hold the
size of address ranges.
I wouldn't say _very_ rare, but certainly far from universal.
FYI the one counter-example I can think of is the Data General Eclipse
S/140 that had a 15 bit address bus and a 16 bit data size, but it
used memory paging to access more than 32K words. And for those who
may wonder why not just add that 1 extra bit, bit #0 (the MSB) was
used for indirect addressing.

I think (some?) HP minis also had 16b data 15+1b address.

A much more widespread counterexample was IBM S/360 (and clones) with
32b data word (but support for 16b and 8b) and 24b address initially,
only later growing to 31b. Another mini (perhaps supermini) case was
DEC PDP-6 and -10 with 36b data and 18(+5)b address. Motorola 68k (in
original Apple Macintosh) was also nominal 32b data 24b address.

And there were a lot of "scientific" machines like IBM 704/709 et seq,
Univac, CDC, Cyber, Cray with data 36b to 72b but address much less.

- formerly david.thompson1 || achar(64) || worldnet.att.net
 
P

pete

Keith said:
It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.

I'm sure you dislike the idea of catering to such systems as much as I
do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte of
memory if somebody feeds your program a file with a gigabyte-long line
of text.

I don't think there's any point in attempting
to publish portable code for nonconforming implementations of C.
 
S

santosh

Keith said:
It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.

<OT>
Do the said systems at least deliver a signal to the process chosen to be
terminated? This is one of the purposes behind the existence of signals?
Perhaps ENOMEM or similar?
</OT>

<snip>
 
P

Peter J. Holzer

However, there is nothing the programmer can do about that. The system
administrator can: He can turn off overcommitment (if the system allows
it - in Linux that was only added in 2.5.x), he can add more swapspace
and/or impose memory limits on individual processes.

<OT>
Do the said systems at least deliver a signal to the process chosen to be
terminated?

On Linux it is SIGKILL - which is one of the two signals which cannot be
caught or ignored.
This is one of the purposes behind the existence of signals?

Yes. In fact, there are some other signals (SIGXCPU, SIGXFSZ) used to
signal excessive resource usage which can be caught or ignored. Using
SIGKILL for exceeding memory usage was probably a bad design decision.
After all, a process can free memory, but it can't decrease CPU usage.

A special signal (something like SIGLOWMEM) might be useful. It should be
sent to processes before the situation gets really desparate, and maybe
only to processes which explicitely request it. Unfortunately, I don't
know of any system which does this.
Perhaps ENOMEM or similar?

ENOMEM is no name for a signal, but for a value for errno. After a
failure of malloc, errno may indeed contain that value.

hp
 
S

santosh

Peter said:
However, there is nothing the programmer can do about that. The system
administrator can: He can turn off overcommitment (if the system allows
it - in Linux that was only added in 2.5.x), he can add more swapspace
and/or impose memory limits on individual processes.



On Linux it is SIGKILL - which is one of the two signals which cannot be
caught or ignored.

Which is, I suppose, as good as no signal at all.
Yes. In fact, there are some other signals (SIGXCPU, SIGXFSZ) used to
signal excessive resource usage which can be caught or ignored. Using
SIGKILL for exceeding memory usage was probably a bad design decision.
After all, a process can free memory, but it can't decrease CPU usage.

A special signal (something like SIGLOWMEM) might be useful. It should be
sent to processes before the situation gets really desparate, and maybe
only to processes which explicitely request it. Unfortunately, I don't
know of any system which does this.

Yes any catchable signal indicating resource constraint would do. It would
give the program a chance to deallocate resources and continue running, or
terminate cleanly, without a coredump.
ENOMEM is no name for a signal, but for a value for errno. After a
failure of malloc, errno may indeed contain that value.

Thanks for correcting that slip-up.
 
L

lawrence.jones

Peter J. Holzer said:
A special signal (something like SIGLOWMEM) might be useful. It should be
sent to processes before the situation gets really desparate, and maybe
only to processes which explicitely request it. Unfortunately, I don't
know of any system which does this.

AIX does - it sends SIGDANGER (whose default action is ignore) to all
processes when memory runs low. It only starts sending SIGKILLs when
the situation gets really desperate and it starts with processes that
are large users of memory that *don't* have handlers for SIGDANGER.

-Larry Jones

Another casualty of applied metaphysics. -- Hobbes
 
K

Keith Thompson

CBFalconer said:
Keith Thompson wrote: [...]
Heck, since it's public domain, I might go ahead and make some changes
myself. Naturally you're under no obligation to accept them; and if I
distribute it myself I'll certainly give you credit. I'll do this in
my copious free time, of course, so don't hold your breath.

Since it is PD you can do whatever you wish. However I request
that, if you change the header file in any way, you also change the
routines name.

Hmm. What I was thinking of doing was a drop-in replacement for ggets
and fgets; changing the routine names would make that impossible.
Note the simplicity and safety of the demo file reverse.c.

(It's freverse.c.) When I run "./freverse < /dev/zero", the process
grows continuously until I kill it.
 
K

Keith Thompson

pete said:
Keith Thompson wrote: [...]
I'm sure you dislike the idea of catering to such systems as much as I
do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte of
memory if somebody feeds your program a file with a gigabyte-long line
of text.

I don't think there's any point in attempting to publish portable
code for nonconforming implementations of C.

Sure, but in this case what I'm suggesting is adding a new feature
that could be useful even on conforming implementations; it's merely a
bit more important on certain non-conforming systems.

If the new feature catered to non-conforming systems but caused
problems on conforming systems, I'd agree with you (though sometimes
such things are still necessary, alas).
 
K

Keith Thompson

Peter J. Holzer said:
However, there is nothing the programmer can do about that. The system
administrator can: He can turn off overcommitment (if the system allows
it - in Linux that was only added in 2.5.x), he can add more swapspace
and/or impose memory limits on individual processes.
[...]

There's nothing the programmer can *reliably* do about it. However,
it may be possible to reduce the risk of running into the problem by
checking whether an allocation is unreasonably large before attempting
it (where "unreasonably large" is an extremely slippery concept, but
it might be possible to quantify it for some applications). And doing
so can be beneficial even on properly working systems.

A library routine that can attempt to allocate arbitrarily large
amounts of memory based on circumstances beyond the programmers
control, such as the contents of an input file or stdin, makes that
impossible.

<OT>
Incidentally, I do a lot of programming in Perl; its text line input
routines have exactly this issue, but I've never run into a case where
it actually caused a problem.
</OT>
 
C

CBFalconer

Keith said:
CBFalconer said:
Keith Thompson wrote:
[...]
Heck, since it's public domain, I might go ahead and make some
changes myself. Naturally you're under no obligation to accept
them; and if I distribute it myself I'll certainly give you
credit. I'll do this in my copious free time, of course, so
don't hold your breath.

Since it is PD you can do whatever you wish. However I request
that, if you change the header file in any way, you also change
the routines name.

Hmm. What I was thinking of doing was a drop-in replacement for
ggets and fgets; changing the routine names would make that
impossible.
Note the simplicity and safety of the demo file reverse.c.

(It's freverse.c.) When I run "./freverse < /dev/zero", the process
grows continuously until I kill it.

No, its patiently trying to find the EOF marker. :) Try "man
/dev/zero" too.
 
K

Keith Thompson

CBFalconer said:
Keith said:
CBFalconer said:
Keith Thompson wrote: [...]
Heck, since it's public domain, I might go ahead and make some
changes myself. Naturally you're under no obligation to accept
them; and if I distribute it myself I'll certainly give you
credit. I'll do this in my copious free time, of course, so
don't hold your breath.

Since it is PD you can do whatever you wish. However I request
that, if you change the header file in any way, you also change
the routines name.

Hmm. What I was thinking of doing was a drop-in replacement for
ggets and fgets; changing the routine names would make that
impossible.
Note the simplicity and safety of the demo file reverse.c.

(It's freverse.c.) When I run "./freverse < /dev/zero", the process
grows continuously until I kill it.

No, its patiently trying to find the EOF marker. :) Try "man
/dev/zero" too.

That's exactly the point. I know how /dev/zero works; did you really
think I didn't? (For anyone who doesn't know, "/dev/zero" is a
pseudo-file on Unix-like systems that, on reading, appears as an
endless stream of '\0' characters.)

As a programmer, I may not have control over the content of the files
I read, particularly stdin. If I use gets(), that means I run the
risk of a buffer overflow. ggets() is a vast improvement, but I still
run the risk of attempting to allocate a potentially unbounded amount
of memory. ggets() doesn't give the programmer the ability to set an
upper bound on the amount of memory allocated.
 
P

Peter J. Holzer

(It's freverse.c.) When I run "./freverse < /dev/zero", the process
grows continuously until I kill it.

You aren't patient enough. It will stop growing when it has consumed all
available space:

% limit addressspace 200M
% ./freverse </dev/zero
Reversing stdin to stdout
0 chars in 0 lines
../freverse < /dev/zero 4.62s user 0.58s system 93% cpu 5.550 total

The problem with arbitrary limits is that they are, well, arbitrary.

You may think that you never need to reverse lines longer than 1000000
characters. But then somebody comes along with an input file with a line
of 1000001 characters. With a hard coded limit that won't work. So for
every limit you want a way to configure it at run-time (commandline
switch, config file, environment variable, etc.) which adds extra
complexity. So I think you should think twice whether this is necessary
or wether restricting resource usage by external means is good enough.

hp
 
M

Malcolm McLean

Peter J. Holzer said:
However, there is nothing the programmer can do about that. The system
administrator can: He can turn off overcommitment (if the system allows
it - in Linux that was only added in 2.5.x), he can add more swapspace
and/or impose memory limits on individual processes.
I think that's the real answer. Users and processes should have a memory
budget. If a user wants an especially large memory space, for instance to
reverse an encylopedia, he has got to request it specially.

However that's got to be done at the system level. At the moment we do have
to deal with systems that will gobble endless resources.
 
C

CBFalconer

Keith said:
.... snip ...

As a programmer, I may not have control over the content of the
files I read, particularly stdin. If I use gets(), that means I
run the risk of a buffer overflow. ggets() is a vast improvement,
but I still run the risk of attempting to allocate a potentially
unbounded amount of memory. ggets() doesn't give the programmer
the ability to set an upper bound on the amount of memory
allocated.

Nobody removed fgets() :)
 
K

Keith Thompson

CBFalconer said:
Keith Thompson wrote:
... snip ...

Nobody removed fgets() :)

Good point. But fgets doesn't let me read a million-character line
without first allocating a million bytes to hold it.

The capability I'm proposing is to be able to read a line up to some
large size N without first allocating a full N bytes of space, where N
is perhaps determined not by the likely size of an input line but by
how much memory I'm willing to allocate. Neither fgets (which
requires me to allocate N bytes first) nor ggets (which will happily
allocate 10*N bytes for certain inputs) lets me do this, at least not
directly.
 
J

Joe Wright

Keith said:
Good point. But fgets doesn't let me read a million-character line
without first allocating a million bytes to hold it.

The capability I'm proposing is to be able to read a line up to some
large size N without first allocating a full N bytes of space, where N
is perhaps determined not by the likely size of an input line but by
how much memory I'm willing to allocate. Neither fgets (which
requires me to allocate N bytes first) nor ggets (which will happily
allocate 10*N bytes for certain inputs) lets me do this, at least not
directly.
It is fairly simple code. Show us what ktgets() looks like.
 
K

Keith Thompson

Joe Wright said:
It is fairly simple code. Show us what ktgets() looks like.

It doesn't exist, and it's entirely possible that it never will.

I should take a closer look at Richard Heathfield's fgetline(); I
think it already does exactly what I'm suggesting ggets should be
extended to do. But it's a bit more complex to use.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top