Malcolm's new book

R

Richard Heathfield

Keith Thompson said:

I'm sure you dislike the idea of catering to [broken] systems as much
as I do, but you might consider implementing a way to (optionally)
limit the maximum line length, to avoid attempting to allocate a
gigabyte of memory if somebody feeds your program a file with a
gigabyte-long line of text.

Other suggested improvements:

* take a stream parameter, so that the function can be used on streams
other than stdin
* take a pointer to a size_t - if the pointer is non-null, populate the
size_t with the number of bytes read into the string
* allow the buffer to be re-used (which means taking another pointer to
size_t, so that the buffer size can be tracked)

My own (ostensibly similar) routine supports all these features.
 
R

Richard Bos

Malcolm McLean said:
There's a case for that. But the reader is more likely to know C than any
other algorithm notation.

I don't see why. The author apparently doesn't.

Richard
 
C

CBFalconer

Keith said:
.... snip ...

It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.

I'm sure you dislike the idea of catering to such systems as much as I
do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte of
memory if somebody feeds your program a file with a gigabyte-long line
of text.

I would resist any such change. The chances of running into such a
malloc failure are extremely low, especially since the memory is
used as soon as allocated. The idea is to avoid any limits, rather
than make special adjustments for remote possibilities. A maximum
limit would also complicate the error-returning problem.
 
C

CBFalconer

Richard said:
Keith Thompson said:
I'm sure you dislike the idea of catering to [broken] systems as
much as I do, but you might consider implementing a way to
(optionally) limit the maximum line length, to avoid attempting
to allocate a gigabyte of memory if somebody feeds your program
a file with a gigabyte-long line of text.

Other suggested improvements:

* take a stream parameter, so that the function can be used on
streams other than stdin
* take a pointer to a size_t - if the pointer is non-null, populate
the size_t with the number of bytes read into the string
* allow the buffer to be re-used (which means taking another
pointer to size_t, so that the buffer size can be tracked)

My own (ostensibly similar) routine supports all these features.

The first is already handled, since ggets is a macro in ggets.h
operating fggets. The other suggestions show a difference of
philosophy between us. Both routines allow any size input to be
received, but IMHO yours requires the user to think, worry, etc.
 
R

Richard Heathfield

CBFalconer said:
Keith Thompson wrote:


I would resist any such change. The chances of running into such a
malloc failure are extremely low, especially since the memory is
used as soon as allocated.

On the contrary, they're extremely high, especially since you continue
to plug your routine to every newbie that passes through clc (or at
least, so it seems!) - and it is very likely indeed that they will leak
the memory they acquire through ggets(), causing considerable strain on
the allocation system.
The idea is to avoid any limits, rather
than make special adjustments for remote possibilities.

Then why do you ignore the limit of a newbie's understanding of memory
management, which is likely to be a rather low limit?
A maximum
limit would also complicate the error-returning problem.

It didn't for me.
 
R

Richard Heathfield

CBFalconer said:
Richard Heathfield wrote:
Other suggested improvements [to ggets]:

* take a stream parameter, so that the function can be used on
streams other than stdin
* take a pointer to a size_t - if the pointer is non-null, populate
the size_t with the number of bytes read into the string
* allow the buffer to be re-used (which means taking another
pointer to size_t, so that the buffer size can be tracked)

My own (ostensibly similar) routine supports all these features.

The first is already handled, since ggets is a macro in ggets.h
operating fggets.

Fair enough.
The other suggestions show a difference of
philosophy between us. Both routines allow any size input to be
received, but IMHO yours requires the user to think, worry, etc.

My version does indeed require the programmer to think, but then nobody
should be writing programs without thinking.

It is your version, however, that requires the user to worry. :)
 
S

santosh

Richard said:
CBFalconer said:


On the contrary, they're extremely high, especially since you continue
to plug your routine to every newbie that passes through clc (or at
least, so it seems!) - and it is very likely indeed that they will leak
the memory they acquire through ggets(), causing considerable strain on
the allocation system.

A newbie shouldn't be using dynamically allocated memory anyway, and by the
time they are ready to do so, it might not be appropriate to label them as
unqualified newbies.

IMHO, it's not practical to avoid any limits. It's extremely unlikely that
any system can allocate to your program more than 95% of SIZE_MAX.
 
K

Keith Thompson

CBFalconer said:
I would resist any such change. The chances of running into such a
malloc failure are extremely low, especially since the memory is
used as soon as allocated. The idea is to avoid any limits, rather
than make special adjustments for remote possibilities. A maximum
limit would also complicate the error-returning problem.

If a malloc failure is so unlikely, why do you bother to check whether
malloc returns a null pointer?

I just tried your test program, "./tggets /dev/zero". The process
grew to over a gigabyte before it died. I don't think it actually
crashed, but it easily could have (I'm not going to try it on a system
that I share with anybody else). (/dev/zero acts as an endless source
of null characters.)
 
E

Ed Jensen

Keith Thompson said:
But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it.

Wouldn't simply switching to calloc() solve that problem, since it
allocates memory and writes to all of it?
 
C

CBFalconer

Keith said:
If a malloc failure is so unlikely, why do you bother to check whether
malloc returns a null pointer?

I just tried your test program, "./tggets /dev/zero". The process
grew to over a gigabyte before it died. I don't think it actually
crashed, but it easily could have (I'm not going to try it on a system
that I share with anybody else). (/dev/zero acts as an endless source
of null characters.)

No, when realloc fails the new data is returned to the stream and
the function returns what it had received with an error marker.
This allows the user to examine it, free the memory, and go back to
getting input.
 
P

Peter J. Holzer

No? Yet the functions as defined _cannot_ accept a legitimately sized
buffer (ints being too small)

Maybe, maybe not. There is no requirement that (size_t)-1 > INT_MAX. It
could be smaller, although it usually is larger.
but *can* accept negative values.

For the umpteenth time: I am *NOT* arguing that the correct type
wouldn't have been size_t. Yes, the correct type would have been size_t.
Can you please accept that?

What I am arguing is that it's not the signedness of int which makes it
a bad choice. unsigned int would have been just as bad as int.
Obviously, the requirement to take negatives was so compelling that it
overrode the ability to handle perfectly valid sizes; there _must_ be
"magic" involved, else there would be no such compelling reason.

Again. This is *not* obvious. It may be obvious to you, but I think it
is contrived, far-fetched bullshit (hint: "obvious" is highly
subjective). While I would also wonder why the author chose int instead
of size_t, I would NOT jump to the conclusion that he wants to be able
to accept negative numbers.

What magic happens if you pass a value outside of [-1.0, 1.0] to an
arcsine function? You'll have to check the documentation.

Again, does the arcsin function clip its input range, to reject
legitimate sensible values for no particular reason?

No, but it allows additional values which make no sense and which may
result in undefined behaviour or another error condition - just like
negative values for a buffer size.

Let's see... nope, it's defined in terms of values _between_ -1 and 1,
so it would not be excluding legitimate values if it allows passing in
(with bogus results or otherwise) of a larger range.

You fail to compare cases. Your exemplars involve functions which accept
the entire legitimate range of possible values, then you ask what happens
beyond those ranges?

That is not the case with MM's code.

But I'm not interested in MM's code. I'm interested only in your
argument that a parameter which allows only non-negative values must not
use a signed type. I think that argument is faulty. There is absolute
no requirement that a function does somethin useful with all possible
input values. If a signed type is large enough to represent all possible
values, it can be used, and who cares if it can represent other values,
too? C doesn't even have a type which can represent [-1.0, 1.0] or [-1,
255] (although the latter could be constructed as an enum), so you will
*have* to choose the next larger type and rely on documentation in a
natural language to convey the permissible range of values. And using a
signed type often allows writing clearer code.
You do if, in the process of using it, you limit the range of perfectly
valid inputs for no good reason.

I don't propose to do that.
I see. So you're the sort who would write code as he does, with ints
instead of size_ts as size values.

No. I already said that I don't. Please read what I write before you
answer.

hp
 
P

Peter J. Holzer

Harald said:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
Is the correct way to cast the char to unsigned char, or is it to
reinterpret the char as an unsigned char? In other words,

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char line[100];
if (fgets(line, sizeof line, stdin) && strchr(line, '\n')) {
#ifdef MAYBE
char *p;
for (p = line; *p; p++)
*p = toupper((unsigned char) *p);
#else
unsigned char *p;
for (p = (unsigned char *) line; *p; p++)
*p = toupper(*p);
#endif
fputs(line, stdout);
}
return 0;
}

Should MAYBE be defined or undefined
for a correct program?

My feeling is that whether an implementation
uses signed magnitude or one's complement,
to represent negative integers,
shouldn't come into play wtih ctype functions.
I prefer a cast.

Both forms use a cast,
so I'm not completely sure which form you believe is correct.

"cast the char to unsigned char"

Thanks.
Does this also imply that if you want to have helper functions that
operate on arrays of unsigned char (that contain text),
you should not pass them a converted char *,
but you should use an array of unsigned char right
from the start?

As I said in my other post, it's complicated.

I just remembered what it is that I like about casting the values:
It's because putchar works that way.

Yep. But putchar takes an int argument which can represent all possible
values of unsigned char[0]. However, a char may not be able to represent
all values of an unsigned char (indeed on most systems it can't), and
while an assignment from a char to an unsigned char is well defined, the
reverse isn't. So I'm not convinced that:


FILE *fp = fopen(filename, "wb");
putc(200, fp);
putc('\n', fp);
fclose(fp);
fp = fopen(filename, "rb");
char s[3];
fgets(s, sizeof(s), fp);
unsigned char u = s[0];

must result in a value of 200 of u, although I think each step is
conforming. The missing bit is how does convert fgets the data it reads
into chars? 200 isn't representable in an 8 bit signed char, and on a
sign-magnitude or one's-complement system there are two possible ways to
do the conversion: Just re-interprete the bits or do the reverse of the
signed->unsigned conversion, and neither is the "obvious" way. I would
hope that any implementor who has the misfortune to have to target a
system which doesn't use two's complement arithmetic will sidestep the
issue by making the default char type unsigned, but that's probably too
optimistic.

hp


[0] On most systems, and one can argue that this is required for hosted
implementations.
 
F

Flash Gordon

Ed Jensen wrote, On 22/08/07 17:16:
Wouldn't simply switching to calloc() solve that problem, since it
allocates memory and writes to all of it?

Not necessarily. It is common for the OS to provided zeroed memory so
calloc will not have to write to it if the memory has been freshly
obtained from the OS, which is when there is a potential problem.

Also, calloc does not resize blocks, and although it is not obvious from
the quoted material the original discussion was about growing buffers
using realloc.
 
S

santosh

Ed said:
Wouldn't simply switching to calloc() solve that problem, since it
allocates memory and writes to all of it?

The situations in which the system might make malloc believe that it has
acquired enough memory to satisfy the caller, while in reality, the system
has lied to malloc, are cases of huge allocation requests, or at least,
those significantly higher than the amount of physical memory available. In
such cases calloc, by attempting to write to it's acquired memory, will
likely trigger the operating system's memory manager to kill the process,
or another one, all that much earlier, than in the case of malloc, where
the same is likely to happen when the user functions try to actually use
all the memory that malloc apparently returned with.

Also consider the performance implications of writing to, possibly,
gigabytes of memory, especially on a multiuser system.

So calloc doesn't really solve the problem Keith was talking about.
 
K

Keith Thompson

Ed Jensen said:
Wouldn't simply switching to calloc() solve that problem, since it
allocates memory and writes to all of it?

I don't know. It would if the calloc() implementation is smart enough
to detect an error while writing to the allocated memory, causing it
to deallocate the memory and return a null pointer. Possibly it does
so. Possibly calloc() can't do this, because it doesn't have the
opportunity to detect the write error before the OS starts killing
processes. And possibly zeroing the allocated memory doesn't require
physically writing to it.

I could probably do some research and answer the question for some
particular system, but other systems could behave differently, so I
won't bother.

And I *shouldn't* have to use calloc() rather than malloc() if
malloc() (assuming it works properly) exactly meets my requirements.
 
L

lawrence.jones

pete said:
I suspect that fgetc and fputc exist, at in part,
to simplify the standard's description of input and output.

No, fgetc and fputc long predated the standard. Back in the dark ages,
getc and putc were *only* implemented as macros with fgetc and fputc
being the corresponding functions.

-Larry Jones

Even though we're both talking english, we're not speaking the same language.
-- Calvin
 
K

Keith Thompson

CBFalconer said:
No, when realloc fails the new data is returned to the stream and
the function returns what it had received with an error marker.
This allows the user to examine it, free the memory, and go back to
getting input.

Only if realloc reports its failure by returning a null pointer.

But even if realloc works properly, ggets doesn't provide a way for
the user to ask it not to attempt to allocate more than N bytes. If
my program allocates all the memory that it's permitted to, it might
have some bad impact on the rest of the system. I might reasonably
want to read a text file that may have very long lines (allocating
approximately only as much memory as necessary to hold each line), but
reject any line over, say, a megabyte. ggets doesn't let me exercise
that control.

Of course, since the code is public domain, I can always add such a
capability myself.
 
C

CBFalconer

Keith said:
I don't know. It would if the calloc() implementation is smart
enough to detect an error while writing to the allocated memory,
causing it to deallocate the memory and return a null pointer.
Possibly it does so. Possibly calloc() can't do this, because it
doesn't have the opportunity to detect the write error before the
OS starts killing processes. And possibly zeroing the allocated
memory doesn't require physically writing to it.

Why should it? What's to prevent the system from using 'copy on
non-zero write' rather than 'copy on write' as the solution?
 
P

pete

No, fgetc and fputc long predated the standard.
Back in the dark ages,
getc and putc were *only* implemented as macros with fgetc and fputc
being the corresponding functions.

Thank you.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,194
Latest member
KarriWhitt

Latest Threads

Top