size_t or int for malloc-type functions?

K

Keith Thompson

jacob navia said:
Can't you read?

This thread is not for you [...]
You never do any errors heathfield since you are a "competent
C programmer". Please go away. This thread is about errors
and their prevention. You do not need it.

Gosh, I didn't know one person could ban someone like that.

Let me try it. navia, this newsgroup is not for you. Please go away.
 
J

jacob navia

P.J. Plauger a écrit :
The best of all worlds is probably a calloc that checks as it
should for wraparound, and further checks a non-wrapped byte
count against RSIZE_MAX, a la TR 24731. And this doesn't really
require any change to existing calloc calls -- just better
runtime checking.

I discovered this bug precisely in the windows CRTDLL.DLL
C runtime library provided by the system. The calloc in
there will NOT check if the multiplication overflows.

I replaced it with:

void *calloc(size_t n,size_t s)
{
long long siz = (long long)n * (long long)s;
void *result;
if (siz>>32)
return 0;
result = malloc((unsigned)siz);
if (result)
memset(result,0,(unsigned)siz);
return result;
}

In an implementation with 32 bit unsigned ints (size_t)
and 64 bit long long the multiplication can never
overflow. I just test if the upper 32 bits are different than
zero, what catches all errors of this type.

At least in my opinion. Maybe there is a bug above.
But none of this blather about the virtues of signed arithmetic,
or the imperviousness to overflow of unsigned arithmetic,
addresses the true problem of allocating storage sanely and
reporting insane requests properly.

True.
 
M

Mark McIntyre

As always when I post something, the same group
of people started to try to find possible errors,
a harmless passtime they seem to enjoy.

Don't go giving yourself airs, laddy.

/Anyone/ posting to this group finds their posts scanned for errors.
It'd be pretty poor if mistakes didn't get noticed don't you think.
You belong to that group.

Glad to hear it.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
M

Mark McIntyre

Thats impossible, if the argument is unsigned.

Why don't you take a leaf out of P J Plauger's book and address the
real question, [/QUOTE]

Which is? If the argument is unsigned, you can't pass a -ve value to
it. If you're using an unsigned type throughout you can't even
generate a negative value. Sure, you could somehow pass a huge
positive value to malloc. Where's the problem?
instead of being a tosser and pretending not to
understand?

And why don't you swivel on it, since we're being polite?
best regards

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
M

Mark McIntyre

Clark S. Cox III a écrit :

His majesty is always right, no matter how much nonsense
he says.

No, truth is always right, no matter how much nonsense people try to
conceal it behind.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
M

Mark McIntyre

I want to allocate 65520 objects of size 65 552 bytes each.

If you do that in a 32 bit system what you obtain?
i=65520, i*sizeof(*p)=4,294,967,040
as signed number = -256

So what?
WE OBTAIN A NEGATIVE NUMBER.

Only if you are stupid enough to copy it into a signed type. Clearly,
since you can't allocate negative memory, that is dopey. Use an
unsigned type, and everything is peachy.
YOU HAVE AN OVERFLOW

You can't overflow unsigned types.
and you get a POSITIVE BUT WRONG number!!!

You end allocating space for ONE object and not for
65521!!!!!!

And there is NO WAY a malloc will tell you about any errors
since the request is perfectly normal.

Capsitis. any day now, you'll ANNOUNCE that you are a CHAIR.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
R

Richard Heathfield

jacob navia said:
Richard Heathfield a écrit :

This "arithmetic" has nothing to do with arithmetic
and overflow is just declared normal, leading
to unexpected results.

The results are not unexpected to those who know how unsigned integer
arithmetic works. It's not unreasonable to expect people to know a thing or
two about the language before they start using it for stuff like allocating
memory dynamically. Unsigned integer arithmetic is dealt with on page 36 of
K&R, whereas malloc isn't mentioned until page 143. Unsigned integer
arithmetic is primitive, obvious stuff. To call its results "unexpected" is
to betray one's ignorance of the language.
But it is useless to discuss with you

Yes, and you know why? Because you never *listen*, that's why. That, You
seem to take almost every disagreement as a personal attack.
HINT:

I did not answer to you but to somebody else

If you want your answer to be read only by a particular individual, use
email. If you publish an article on Usenet, you should be prepared for it
to be read, and replied to, by anybody at all.
Overflow doesn't exist?
Right.

65521 *65552 --> 65296 ???

Before you can know the answer to that question, you need to define your
universe of discourse. In N, the answer is 4295032592. In the ring of
integers modulo 2^b, the answer is 4295032592 modulo 2^b.
OK.

Then if I want to allocate 65521 objects of
65552 bytes each I obtain a block 65296 bytes long

On any given implementation, either size_t is big enough to store 65521 *
65552 or it isn't. If it is, there is no issue. And if it is not, your
request is meaningless, since you're asking for an object bigger than the
system can provide.

I have demonstrated that the multiplication
65521 * 65552 gives 65296,

No, assuming you are using unsigned arithmetic it gives 4295032592 modulo
2^b where b is the number of bits in the unsigned integer type you are
using. If b is 33 or more, the answer is 4295032592, not 65296. The width
of size_t is implementation-defined.
what is clearly not
enough space to store 65521 objects of size 65552
each. You are free to call that "multiplication that
doesn't overflow".

The Standard says it doesn't overflow, and therefore it doesn't overflow
(unless the Standard is wrong, which is something you can take up with ISO
if you like, but I don't rate your chances).
You can babble as much as you like but that is a fact.

Well, it's a highly selective fact which chooses to ignore all sorts of
other rather important facts, such as what the Standard says, how wide a
size_t is allowed to be, and so on.
 
R

Randy Howard

Um, *what*?

p = malloc(n * sizeof *p);

is a very common idiom and about the most-typed code fragment in this
group.
It is a well-known, popular idiom.

n is an object count, so it makes sense for it to be a size_t.
sizeof *p is an object size, and is a size_t.

So we have size_t * size_t - how, precisely, will that produce a negative
number when passed to a function taking size_t as an argument?

Well obviously it won't. But the idea was to protect against
programmers who make mistakes, and as we all know not all programmers
use that idiom.

As I said, I don't think the proposal would do much good, but surely
you can see what the idea was?[/QUOTE]

I'll take a stab it. The proposal is to arbitrarily chop the available
address in space, for those rare cases in which someone wishes to
malloc an amount of memory /larger/ than the available address space on
the processor. I.e., they want a boatload of RAM, so restrict them to
less than half of that they need to make things "safer". That has BS
written all over it, imo.

If the programmer knows how to ask for that much memory, but can't
figure out how to make sure he doesn't ask for more than will fit in
size_t, he needs to be writing DOS .bat files instead, where he can do
less damage. Or maybe he should just go off and write his own pseudo-C
compiler instead.
 
J

jacob navia

Randy Howard a écrit :
I'll take a stab it. The proposal is to arbitrarily chop the available
address in space, for those rare cases in which someone wishes to
malloc an amount of memory /larger/ than the available address space on
the processor. I.e., they want a boatload of RAM, so restrict them to
less than half of that they need to make things "safer". That has BS
written all over it, imo.

Not at all. Please just see what the proposal really was
before going to answer to phantasy proposals.

The proposal was discussing the idea of using a signed
type to avoid problems with small negative numbers
that get translated into huge unsigned ones.

THAT was the reason.

After this discussion, and after the remarks of Mr Plauger
I am not so sure that that proposal was actually a very good
idea.

The problem is in this discussions, that everybody is trying to
discuss in such an emotional way, that acknowledging
something in your "adversary" argumentation is seen as
an equivalent of "surrender"...

:)

Look at you, deforming the proposal in such a way that
it is completely NUTS. Of course THEN it is easy to
say:

IT IS NUTS!!!

Obvious Mr Howard. But if you care to follow a bit the
discussion that wasn't proposed at all.

The argument of Mr Plauger that convinces me is that anyway
a multiplication error like the one I signaled to heathfield
will ANYWAY cause havoc even if we put the signed type in
the argument to malloc, so it is actually not a good idea at all.


If the programmer knows how to ask for that much memory, but can't
figure out how to make sure he doesn't ask for more than will fit in
size_t, he needs to be writing DOS .bat files instead, where he can do
less damage. Or maybe he should just go off and write his own pseudo-C
compiler instead.

A programmer is a human, and human make errors. This discussion
is not designed for people that do not make errors. It is for
people that want to discuss how can we build safety into software
in such a waythat the consequences of errors are limited, for instance
a malloc that diagnoses a bogus argument returns NULL instead of
crashing the software.
 
K

Keith Thompson

Richard Heathfield said:
jacob navia said: [...]
what is clearly not
enough space to store 65521 objects of size 65552
each. You are free to call that "multiplication that
doesn't overflow".

The Standard says it doesn't overflow, and therefore it doesn't overflow
(unless the Standard is wrong, which is something you can take up with ISO
if you like, but I don't rate your chances).
[...]

You're both arguing about the meaning of the word "overflow", which is
IMHO less important than the actual behavior of integer types.

Multiplication of two values of type size_t (or of any unsigned
integer type) can yield a mathematical result that cannot be
represented as a value of type size_t. The standard does not call
this "overflow", but in my opinion "overflow" *would* be a valid term
for it.

The point is this: If a result of an unsigned multiplication cannot be
represented, the result is reduced modulo 2**N (where the maximum
value of the type is 2**N-1 -- "**" denotes exponentiation). If the
result of a signed multiplication cannot be represented, the behavior
is undefined.

Using an unsigned type as the parameter of malloc() means that a
larger range of arguments can be used than if it took the
corresponding signed type. This can be significant for systems with a
16-bit size_t that can allocate more than 32767 bytes, or for systems
with a 32-bit size-t that can allocate more than 2147483647 bytes.

jacob's argument is that using a signed type for the parameter of a
malloc-like function is beneficial, supposedly because if a user
incorrectly passes a very large value as the argument, it is likely to
wrap around to a negative value, which can be detected as an error.

In fact, the standard does not guarantee any such thing. Signed
overflow for an arithmetic operation invokes undefined behavior.
Overflow on conversion to a signed type yields an
implementation-defined result (or, in C99, raises an
implementation-defined signal).

Wraparound to a possibly negative value is not uncommon. However, the
result is just as likely to be positive (and still incorrect). Using
a signed type might catch *some* errors, but in my opinion it's not a
good solution. Unsigned types are tricky; that can't be changed
without changing the language. The only real solution is for the
*programmer* to avoid overflow in the first place, and choosing any
particular type as the parameter to a malloc-like function can't help
much with that. You might as well use size_t for consistency with the
standard library.

At times, it would be very nice to be able to define some different
behavior for unsigned "overflow" (i.e., for operations that yield a
mathematical value outside the range of the type). If I multiply two
unsigned values and get an out-of-range result, the result reduced
modulo 2**N *might* be what I want, but more often it's an error that
I'd like to know about. Ditto for signed and floating-point. But C
doesn't let us do that, at least not portably.
 
K

Keith Thompson

jacob navia said:
Suppose:
struct S {
unsigned n;
double d;
double data[8192];
};

I want to allocate 65520 objects of size 65 552 bytes each.

If you do that in a 32 bit system what you obtain?
i=65520, i*sizeof(*p)=4,294,967,040 as signed number = -256

WE OBTAIN A NEGATIVE NUMBER.
[...]

We obtain undefined behavior.
 
C

CBFalconer

Richard said:
Richard Tobin said:

Quite so.


The fix, then, is obvious.


It appeared to be an attempt to introduce the additional risk of
overflow to the existing risk of specifying an allocation amount
other than the amount actually required. No, I don't see why this
would be of benefit.

What this has brought home to me is that calloc should be included
in the nmalloc package, so that the same maximum size criterion
will be applied. I.E:

void *ncalloc(size_t nmemb, size_t size) {
size_t sz;
void *p;

sz = nmemb * size;
if ((sz < nmemb) || (sz < size)) return NULL;
if (p = nmalloc(sz)) memset(p, 0, sz);
return p;
}

Since nmalloc drives a 0 size up to 1, this leaves a problem for
nmemb or size being zero. I don't know whether it is worth
worrying about. I am also having qualms about the overflow test.
There doesn't seem to be a SIZE_T_MAX in limits.h. I am worrying
about something like "p = calloc(7, (SIZE_T_MAX/4 + 1));".

cross-posted to c.std.c to see if there is any opinion on this.
 
J

James Daughtry

jacob said:
A programmer is a human, and human make errors. This discussion
is not designed for people that do not make errors. It is for
people that want to discuss how can we build safety into software
in such a waythat the consequences of errors are limited, for instance
a malloc that diagnoses a bogus argument returns NULL instead of
crashing the software.

The real problem is how do you diagnose bogus arguments without making
unwarranted assumptions and severely restricting *everyone* at *every
call* to malloc so that you can save people from a pretty rare mistake?
I think you're focusing on solving the problem at the wrong place. It
should be either a behavior adjustment as some have suggested or a
correctness tool outside of malloc that can toss a warning during
compilation. Neither one is a great solution, and I would favor the
former because it doesn't encourage reliance on tools.
 
R

Randy Howard

Randy Howard a écrit :

Not at all. Please just see what the proposal really was
before going to answer to phantasy proposals.

The proposal was discussing the idea of using a signed
type to avoid problems with small negative numbers
that get translated into huge unsigned ones.

THAT was the reason.

There was a typo (brain fart) in my original quoted above, which was to
read "chop the address space in half"...

Well, that's exactly what your proposal does. No thank you. size_t
exists for a very good reason. I am willing to take the risk that if
if I manage to somehow attempt to compute a value that is larger than
size_t variables can hold that problems will occur. That is far and
away better than chopping the range of mallocs in half for a but of
questionable safety.
After this discussion, and after the remarks of Mr Plauger
I am not so sure that that proposal was actually a very good
idea.

I agree.
The problem is in this discussions, that everybody is trying to
discuss in such an emotional way, that acknowledging
something in your "adversary" argumentation is seen as
an equivalent of "surrender"...

The only one exhibiting any obvious sign of emotionalism is you. I
have no idea why.
Look at you, deforming the proposal in such a way that
it is completely NUTS. Of course THEN it is easy to
say:

IT IS NUTS!!!

Well, sorry. It pretty much matches the actual /results/ of what your
proposed to the letter. Maybe not the intent, but that's what comes of
it. And those results are nuts. You seem to recognize that yourself
now, so kudos to you.
The argument of Mr Plauger that convinces me is that anyway
a multiplication error like the one I signaled to heathfield
will ANYWAY cause havoc even if we put the signed type in
the argument to malloc, so it is actually not a good idea at all.
bingo.


A programmer is a human, and human make errors.

Yes, and when they do, they debug their programs and attempt to make
those errors go away. Here's a dirty little secret for you: Computers
make errors too. They are not even digital on the inside, contrary to
popular mythology. They're little confused layers upon layers of
antennae that pick up noise and xtalk and do /not/ always put out the
correct 0/1 result for a given set of inputs. Google for "Simultaneous
Switching Output Noise" for a good example of one of the more hairy
forms of this problem. they're analog devices that sometimes do a good
job of simulating a binary computer.
This discussion is not designed for people that do not make errors.

No doubt, since I've been searching for several decades for such a
person, and have yet to find one.
It is for
people that want to discuss how can we build safety into software
in such a waythat the consequences of errors are limited, for instance
a malloc that diagnoses a bogus argument returns NULL instead of
crashing the software.

malloc() can not, and will not read minds. It will not know if you
really /meant/ to try and malloc() 2GB+40K of RAM, or if it was an
accident. All it needs to do is malloc that amount, or return NULL.
It's up to you to decide what to do afterward. Even better, you would
know the range of size_t on your platform and check your numbers
/before/ you call malloc. Plenty of people like to use malloc wrapper
functions, this seems like a candidate for such a thing if you want to
make programming safer, not hacking on the libc implementation.
 
S

Simon Biber

jacob said:
P.J. Plauger a écrit :

Yes. You are right in this point. For 16 bit systems the
lost of 32K of addressing space is quite a hit. Specially
if you do have the full 64K.

You may even have more than 64K, for example on MS-DOS you can have 10
of the 64K regions for a total of 640K. An int is -32768 to 32767 and
size_t is 0 to 65535. Pointers must be more than 16 bits of course if
they can refer to allocations inside separate regions. Each of these
allocations may take up nearly the full 64K maximum object size.

#include <stdio.h>
#include <stdlib.h>

#define N 10

int main(void)
{
char *p[N];
int i;
for(i = 0; i < N; i++) p = malloc(60000);
for(i = 0; i < N; i++) printf("%p\n", (void*)p);
return 0;
}

This program attempts to malloc 10 blocks of 60000 bytes, and prints out
the pointers. On a 640K MS-DOS system in 'large' or 'huge' memory model
where pointers are 32 bits in a segment:eek:ffset format, it outputs
something like:

10C4:0008
1F6B:0008
2E12:0008
3CB9:0008
4B60:0008
5A07:0008
68AE:0008
7755:0008
85FC:0008
0000:0000

The last allocation failed, since the 10th block is unavailable for
malloc's use; it probably contains the program's code, stack, etc.

However on a 'small' or 'medium' memory model where pointers are just 16
bits, it outputs:

0B24
0000
0000
0000
0000
0000
0000
0000
0000
0000

Indicating that only the first allocation succeeded. Memory outside of
the first 64K is unavailable since 16-bit pointers cannot address it.
 
R

Richard Heathfield

Keith Thompson said:
Richard Heathfield said:
jacob navia said: [...]
what is clearly not
enough space to store 65521 objects of size 65552
each. You are free to call that "multiplication that
doesn't overflow".

The Standard says it doesn't overflow, and therefore it doesn't overflow
(unless the Standard is wrong, which is something you can take up with
ISO if you like, but I don't rate your chances).
[...]

You're both arguing about the meaning of the word "overflow",

No, there is no argument here. The Standard is quite clear on the matter.
That Mr Navia cannot understand this does not mean that the matter is open
to dispute; it merely means that he cannot understand it.
which is
IMHO less important than the actual behavior of integer types.

Again, this is made very clear by the Standard.
Multiplication of two values of type size_t (or of any unsigned
integer type) can yield a mathematical result that cannot be
represented as a value of type size_t.

No, it cannot. Unsigned integer arithmetic is clearly described as being
performed modulo (2 to the power of the number of value bits in the type);
in other words, the unsigned integers representable by the type form a
ring. Mathematically, the multiplication of two integers in a ring yields
another integer in that ring. So, when you multiply two values of size_t
(which are indeed in such a ring, with 2 to the power of b elements where b
is the number of value bits in a size_t), you get another value in the
ring, so it must be representable in a size_t.
The standard does not call
this "overflow", but in my opinion "overflow" *would* be a valid term
for it.

Much as I respect your opinion, it does not take precedence over the
terminology used by the Standard.
The point is this: If a result of an unsigned multiplication cannot be
represented, the result is reduced modulo 2**N (where the maximum
value of the type is 2**N-1 -- "**" denotes exponentiation). If the
result of a signed multiplication cannot be represented, the behavior
is undefined.

It is true that introducing signed ints into the mix exposes the process of
dynamically allocating memory to yet another risk of undefined behaviour,
yes - and since this is supposedly all about protecting the ignorant or
careless programmer from his mistakes, I'm not convinced that giving him
another way of screwing up constitutes protecting him.
Using an unsigned type as the parameter of malloc() means that a
larger range of arguments can be used than if it took the
corresponding signed type. This can be significant for systems with a
16-bit size_t that can allocate more than 32767 bytes, or for systems
with a 32-bit size-t that can allocate more than 2147483647 bytes.
Right.

jacob's argument is that using a signed type for the parameter of a
malloc-like function is beneficial, supposedly because if a user
incorrectly passes a very large value as the argument, it is likely to
wrap around to a negative value, which can be detected as an error.

I agree that that is his argument, but I cannot see that it has any merit,
since there is no way for the compiler to distinguish between a user who
accidentally makes a large memory request (because his program is broken)
and a user who deliberately makes a large memory request (because he needs
lots of memory).

<snip>
 
G

Guest

CBFalconer said:
What this has brought home to me is that calloc should be included
in the nmalloc package, so that the same maximum size criterion
will be applied. I.E:

void *ncalloc(size_t nmemb, size_t size) {
size_t sz;
void *p;

sz = nmemb * size;
if ((sz < nmemb) || (sz < size)) return NULL;
if (p = nmalloc(sz)) memset(p, 0, sz);
return p;
}

Since nmalloc drives a 0 size up to 1, this leaves a problem for
nmemb or size being zero. I don't know whether it is worth
worrying about.

If nmalloc() and ncalloc() are to be used as replacements for or
implementations of malloc() and calloc(), keep in mind that whether
malloc(0) returns NULL is implementation-defined. I don't believe there
is anything preventing the implementation from defining that malloc(0)
and calloc(0, 0) are different from calloc(1, 0) and calloc(0, 1) in
this regard, but this would need to be documented accurately. I believe
it's very easy to avoid this problem by simply replacing your check
with:
if ((sz < nmemb) && (sz < size))
but I may be overlooking something.
I am also having qualms about the overflow test.
There doesn't seem to be a SIZE_T_MAX in limits.h.

There's SIZE_MAX, but even without it, you could convert -1 to size_t.
 
G

Guest

Harald said:
I believe
it's very easy to avoid this problem by simply replacing your check
with:
if ((sz < nmemb) && (sz < size))
but I may be overlooking something.

I was, of course. Sorry, and please don't do that, it won't work.
 
C

CBFalconer

Harald said:
CBFalconer wrote:
.... snip ...

There's SIZE_MAX, but even without it, you could convert -1 to size_t.


If nmalloc() and ncalloc() are to be used as replacements for or
implementations of malloc() and calloc(), keep in mind that whether
malloc(0) returns NULL is implementation-defined. I don't believe there
is anything preventing the implementation from defining that malloc(0)
and calloc(0, 0) are different from calloc(1, 0) and calloc(0, 1) in
this regard, but this would need to be documented accurately. I believe
it's very easy to avoid this problem by simply replacing your check
with:
if ((sz < nmemb) && (sz < size))
but I may be overlooking something.

I think the test detects a zero field already. The problem is that
that is legitimate, and so it should procede to nmalloc, which
handles the zero allocation test already. The test should probably
be:

if ((nmemb && size) && (sz < nmemb) || (sz < size)) return NULL;

but that doesn't handle the calloc call case I mentioned above.
 
P

Peter Nilsson

CBFalconer said:
We detect the 'unsigned overflow' by:

if (((n = a * b) < a || (n < b)) overflow()
else alliswell(n);

No. Say UINT_MAX is 65535, observe that 257 * 257 == 65536 + 513.

The simple test for whether two size_t variables can be multiplied
is...

if (a <= ((size_t) -1) / b)
/* good */;
else
/* bad */;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,163
Latest member
Sasha15427
Top