array-size/malloc limit and strlen() failure

J

jay

#include <stdio.h>
#include <limits.h>

int main(void)
{
char arrc[UINT_MAX] = {'a'};
printf("arrc = %s\n", arrc);

return 0;
}

================ OUTPUT ==============
[myhome]$ gcc -ansi -pedantic -Wall -Wextra test2.c
test2.c: In function `main':
test2.c:6: error: size of array `arrc' is too large
test2.c:6: warning: unused variable `arrc'


I am using gcc 3.4.6 on Solaris 10. I completely understand the error thatarray size should be below some limit but what is that limit ? Same happens if I try to malloc(UINT_MAX), it fails.

2nd, I am writing a program where input is taken from command-line (think argv[1]). User can input anything of any size. Imagine user inputs a string too long, just like UINT_MAX was way bigger for malloc(), and I try to do strlen() on it ? How will I know/check that string is too long so that I can bail out on time ?

I read about strlen() in C standard (n1570) in 6.24.6.3, it does not have any error condition associated with it.
 
X

Xavier Roche

char arrc[UINT_MAX] = {'a'};

You have no guarantee that such size can be provided by a compiler. As
far as I know, there is no "minimal" array size a compiler (and
operating system) has to support.
I am using gcc 3.4.6 on Solaris 10. I completely understand the error that array size should be below some limit but what is that limit ?

When allocating on the *stack* (like here), the limit can vary (typical
value in a created thread is few megabytes), but it is strongly advised
not to put anything "large" on stack (because the size is limited, and
failing to allocate will probably raise a signal when reaching the guard
page)
Same happens if I try to malloc(UINT_MAX), it fails.

It does not fail with me (64-bit, no ulimit, plenty of ram). It depends
on the operating system, the available memory / virtual space, etc.
2nd, I am writing a program where input is taken from command-line (think argv[1]). User can input anything of any size. Imagine user inputs a string too long, just like UINT_MAX was way bigger for malloc(), and I try to do strlen() on it ? How will I know/check that string is too long so that I can bail out on time ?

Program argument size are generally limited (typically 32K on many
systems), and I do not see any practical case where you may end up with
very long arguments. This case won't happen in the real world (because
it is not possible to inject too long arguments in a program as far as I
know).

For strlen(), yes, you can have a string whose size is at most SIZE_MAX
(hypothetical case one more time, due to virtual space limit etc.)

A real-world case is a huge mmap'ed text file whose length would be
computed using strlen(). Yes, it may take ages. Rewrite your own
"strnlen" function if necessary...
 
K

Kaz Kylheku

#include <stdio.h>
#include <limits.h>

int main(void)
{
char arrc[UINT_MAX] = {'a'};
printf("arrc = %s\n", arrc);

return 0;
}

================ OUTPUT ==============
[myhome]$ gcc -ansi -pedantic -Wall -Wextra test2.c
test2.c: In function `main':
test2.c:6: error: size of array `arrc' is too large
test2.c:6: warning: unused variable `arrc'

You do realize that on a system with 32 bit pointers and 32 bit unsigned int,
"char [UINT_MAX]" takes up the entire address space?

Are you compiling a 64 bit executable or 32 bit?

Four billion byte arrays call for 64 bit programming on a 64 bit OS.
I am using gcc 3.4.6 on Solaris 10. I completely understand the error that
array size should be below some limit but what is that limit ? Same happens
if I try to malloc(UINT_MAX), it fails.
2nd, I am writing a program where input is taken from command-line (think
argv[1]). User can input anything of any size. Imagine user inputs a string
too long, just like UINT_MAX was way bigger for malloc(), and I try to do
strlen() on it ? How will I know/check that string is too long so that I can
bail out on time ?

The operating system imposes a limit on the total amont of memory which can be passed from one
process to another as environment variables and argument material.

The user will run into that argument passing limit long before we worry about
UINT_MAX sized arrays for a single argument string.

If an argv[1] could be prepared and passed to your program that is four billion
bytes long, strlen should work on it just fine.
 
B

BartC

int main(void)
{
char arrc[UINT_MAX] = {'a'};
I am using gcc 3.4.6 on Solaris 10. I completely understand the error
that array size should be below some limit but what is that limit ? Same
happens if I try to malloc(UINT_MAX), it fails.

Why do you need a 4 billion-char array?
2nd, I am writing a program where input is taken from command-line (think
argv[1]). User can input anything of any size. Imagine user inputs a
string too long, just like UINT_MAX was way bigger for malloc(), and I try
to do strlen() on it ? How will I know/check that string is too long so
that I can bail out on time ?

Is the idea that you allocate a 4294967296-char array just in case someone
types something in that long (which would take several years of constantly
holding down a key)? If so then stop worrying about it (it will be checked,
or something will break, long before it gets to your program).
I read about strlen() in C standard (n1570) in 6.24.6.3, it does not have
any error condition associated with it.

I doubt strlen will do much error checking, and I don't see how it can. It
will expect the input to be well-formed, that is, it expects to encounter a
0-byte before it hits the end of the memory block, and before it wraps the
address space. It can't know the size of the memory block, and it would be
too slow to keep checking that it runs out of address space as well as
looking for a zero byte.

However, you can write your own strlen() function which takes some of that
into account, if you're that paranoid about it (although beware that if the
address space is 64-bits, wrap-around might be impossible).
 
M

Malcolm McLean

#include <stdio.h>
#include <limits.h>

int main(void)
{
char arrc[UINT_MAX] = {'a'};
printf("arrc = %s\n", arrc);
return 0;

}

I am using gcc 3.4.6 on Solaris 10. I completely understand the error that array size should be below > some limit but what is that limit ? Same happens if I try to malloc(UINT_MAX), it fails.
It's very system dependent. memory is getting cheaper all the time, and systems with many gigabytes
of ram are now common. However stacks are designed to be small, they grow logarithmically with
program complexity. You shouldn't put big items on the stack. Whilst it's hard to say what the limit
should be, 1024 doubles (8k) is a sort of ballpark. Anything bigger than that should be malloced.

malloc() either gives you the memory or it doesn't. If it doesn't, your computer isn't big enough to
handle that particular program with that dataset, and there's nothing you can do other than rewrite
the algorithm, or buy a bigger computer.
2nd, I am writing a program where input is taken from command-line (think argv[1]). User can
input anything of any size. Imagine user inputs a string too long, just like UINT_MAX was way
bigger for malloc(), and I try to do strlen() on it ? How will I know/check that string is too long
so that I can bail out on time ?

I read about strlen() in C standard (n1570) in 6.24.6.3, it does not have any error condition associated > with it.
strlen steps through the string one character at a time, looking for a nil. As long as size_t has been
properly defined, it's impossible for the string to be too long for it. Even if string takes up 4GB,
strlen will return in three or four seconds on a typical 3Ghz system.
 
J

James Kuyper

#include <stdio.h>
#include <limits.h>

int main(void)
{
char arrc[UINT_MAX] = {'a'};
printf("arrc = %s\n", arrc);

return 0;
}

================ OUTPUT ==============
[myhome]$ gcc -ansi -pedantic -Wall -Wextra test2.c
test2.c: In function `main':
test2.c:6: error: size of array `arrc' is too large
test2.c:6: warning: unused variable `arrc'


I am using gcc 3.4.6 on Solaris 10. I completely understand the
error that array size should be below some limit but what is that
limit ? Same happens if I try to malloc(UINT_MAX), it fails.

The standard imposes no specific limit. The closest it comes to doing so
is in section 5.2.4.1, which says that an "... implementation shall be
able to translate and execute at least one program that ...", among
other things, contains "65535 bytes in an object (in a hosted
environment only)". Freestanding implementations, which often have very
small amounts of memory, are exempted from even this minimal
requirement. In principle, this clause renders then entire standard
nearly useless, since this is the ONLY program that implementation is
required to be able to translate and execute, and it's probably not your
program.

However, 5.2.4.1 is generally treated as if it imposed somewhat stronger
requirements than it actually does. This isn't entirely unreasonable -
for instance, if an implementation has a fixed upper limit on the size
of an object, 5.2.4.1p1 implies that this limit must be >= 65535.

A good general rule is that you should worry about the feasibility of
defining any object with a size greater than 65535. In some cases, it's
perfectly feasible to define objects on the stack with millions and even
billions of bytes; in other cases, the total available memory might be
barely sufficient for a single object of 65535 bytes. The only way to be
sure on any given systems is to try it.
2nd, I am writing a program where input is taken from command-line
(think argv[1]). User can input anything of any size. Imagine user
inputs a string too long, just like UINT_MAX was way bigger for
malloc(), and I try to do strlen() on it ? How will I know/check
that string is too long so that I can bail out on time ?

As long as you're using a conforming implementation of C, you're
guaranteed that argc is non-negative and <= INT_MAX, argv[argc] is a
null pointer, and argv for 0 <= i && i < argc points at a null
terminated string. On any sane implementation, none of the strings will
have a size larger than can be reported by strlen(), which is SIZE_MAX
(#defined in said:
I read about strlen() in C standard (n1570) in 6.24.6.3, it does not
have any error condition associated with it.

Correct, strlen() provides no way of reporting errors. There's only one
possible error: passing it a pointer to a block of memory that is not
terminated by a null character. strlen() will continue looking for
characters until it reaches the end of that block of memory - what it
does after that point depends upon how your system works; as far as the
C standard is concerned, the behavior is simply undefined.
 
T

Thomas Jahns

A good general rule is that you should worry about the feasibility of
defining any object with a size greater than 65535. In some cases, it's
perfectly feasible to define objects on the stack with millions and even
billions of bytes; in other cases, the total available memory might be
barely sufficient for a single object of 65535 bytes. The only way to be
sure on any given systems is to try it.

One problem here though is that for todays multi-threaded environments the limit
depends on which thread a function is called in and what limits on them were
imposed by the user or system configuration (things like ulimit -s or the
KMP_STACKSIZE environment variable). I therefore often advocate against putting
anything of truly variable size (e.g. depending on some input parameter) on the
stack, things that are sometimes of length 8 and sometimes length 12 are
perfectly fine unless really tight memory requirements are in place.

Thomas
 
K

Keith Thompson

BartC said:
I doubt strlen will do much error checking, and I don't see how it can. It
will expect the input to be well-formed, that is, it expects to encounter a
0-byte before it hits the end of the memory block, and before it wraps the
address space. It can't know the size of the memory block, and it would be
too slow to keep checking that it runs out of address space as well as
looking for a zero byte.

However, you can write your own strlen() function which takes some of that
into account, if you're that paranoid about it (although beware that if the
address space is 64-bits, wrap-around might be impossible).

You can't *portably* write such a strlen() function. There is no
portable way to tell whether a given non-null address is valid or not.

In any case, strlen isn't the problem here. If you can allocate an
array and ensure that it contains a null byte somewhere, you can run
strlen on it.
 
K

Keith Thompson

James Kuyper said:
On 04/02/2014 02:31 AM, jay wrote: [...]
I read about strlen() in C standard (n1570) in 6.24.6.3, it does not
have any error condition associated with it.

Correct, strlen() provides no way of reporting errors. There's only one
possible error: passing it a pointer to a block of memory that is not
terminated by a null character. strlen() will continue looking for
characters until it reaches the end of that block of memory - what it
does after that point depends upon how your system works; as far as the
C standard is concerned, the behavior is simply undefined.

There are other possible errors: Passing it a null pointer, and
passing it an invalid non-null pointer.

Implementations of strlen are not required to detect any of these
errors; they all have undefined behavior. (It would be easy to
detect a null pointer argument, but the overhead might be an issue,
and on most systems it's likely to cause a run-time fault anyway
-- and it's not clear what it should return for a null pointer
argument anyway.)
 
J

James Kuyper

There are other possible errors: Passing it a null pointer, and
passing it an invalid non-null pointer.

You're right. I knew about those, of course, but didn't think of them
while composing that message.
 
G

glen herrmannsfeldt

(snip, someone wrote)
You can't *portably* write such a strlen() function. There is no
portable way to tell whether a given non-null address is valid or not.

Hmm. You can check to see if p++ is less than p. If that happens before
you find the '\0' in the string, you should probably stop looking.
In any case, strlen isn't the problem here. If you can allocate an
array and ensure that it contains a null byte somewhere, you can run
strlen on it.

-- glen
 
G

glen herrmannsfeldt

Keith Thompson said:
(snip)
(snip)
There are other possible errors: Passing it a null pointer, and
passing it an invalid non-null pointer.

Seems to me that there are two different types of systems
to consider. Those where the NULL pointer is actually addressable,
and those where it isn't.

On protected mode x86, segment selector zero is reserved by the
hardware. That is, the hardware knows about a NULL pointer.

On some systems, the null pointer is addressable, but outside the
user's address space. It might be that you can read it, but not
write it. When you do read it, it may or may not look like a '\0'.

I have known some systems (that didn't do any memory protection)
to test for writes to NULL by testing the value stored there at
the end of a program. (Better late than never.) In this case, you
can read and write to NULL.
Implementations of strlen are not required to detect any of these
errors; they all have undefined behavior. (It would be easy to
detect a null pointer argument, but the overhead might be an issue,
and on most systems it's likely to cause a run-time fault anyway
-- and it's not clear what it should return for a null pointer
argument anyway.)

I suppose 0 and -1 are not so unreasonable choices.

-- glen
 
B

BartC

glen herrmannsfeldt said:
I suppose 0 and -1 are not so unreasonable choices.

0 is a valid return value for strlen. I suppose you can also return it for a
NULL pointer, but such a pointer is likely to run into more problems later
on.

-1 might also be problematical when strlen returns an unsigned value.
 
J

James Kuyper

(snip, someone wrote)


Hmm. You can check to see if p++ is less than p. ...

That depends upon whether you evaluate 'p' before or after evaluating
p++, and the most obvious way of checking that condition leaves that
order unspecified (it also has undefined behavior, for a different but
closely related reason). You can avoid both problems with a temporary:

char *q = p;
if( p++ < q) ...

but in that case the normal result is p==q. How about:

char *q = p++;
if(q < p) ...

but in that case, q == p - 1 is the normal result. I think you didn't
express the condition properly; or perhaps you were thinking of ++p
rather than p++?
... If that happens before
you find the '\0' in the string, you should probably stop looking.

That test won't help you if p wanders into a protected block of memory.
Also, when handled properly, that test will fail only after the behavior
of p++ is already undefined, and in that case it's already too late. A
test that could be performed while the code implementing the test still
has defined behavior would be better.
 
J

Joe Pfeiffer

glen herrmannsfeldt said:
I suppose 0 and -1 are not so unreasonable choices.

A story I've told on myself before -- on a VAX running BSD, byte 0 was
readable and always contained 0. For years I thought a null pointer was
a valid representation of an empty string...

I still think it would be a good idea to define it like that, but at
least I know it isn't the case!
 
M

Malcolm McLean

That depends upon whether you evaluate 'p' before or after evaluating
p++, and the most obvious way of checking that condition leaves that
order unspecified (it also has undefined behavior, for a different but
closely related reason). You can avoid both problems with a temporary:
You could hold on to the pointer passed, then compare it with your
temporary travelling pointer. If it compares as equal, wrap-round has
occurred.
 
K

Keith Thompson

glen herrmannsfeldt said:
(snip, someone wrote)


Hmm. You can check to see if p++ is less than p. If that happens before
you find the '\0' in the string, you should probably stop looking.

Do you mean if p+1 is less than p? If it is (and, in many cases, even
if it isn't), then just evaluating p+1 has undefined behavior.
 
K

Keith Thompson

glen herrmannsfeldt said:
I suppose 0 and -1 are not so unreasonable choices.

Defining strlen to return 0 would encourage treating both "" and
NULL as empty strings. If you wanted to *consistently* change the
language and library so that a null pointer always acts like an
empty string, I suppose you could, but I think it would be overly
complicated and would, on many systems, impose run-time overhead
without much benefit.

-1 would convert to SIZE_MAX, which would lead to bugs where a
program would treat a null pointer as if it pointed to a very very
long string.

In the absence of exception handling, I prefer leaving strlen(NULL)
undefined, which allows implementations to trap it, which at least
potentially can catch bugs earlier.
 
K

Keith Thompson

Malcolm McLean said:
You could hold on to the pointer passed, then compare it with your
temporary travelling pointer. If it compares as equal, wrap-round has
occurred.

And how do you know pointers "wrap around" under ++? I wrote
above that you can't *portably* write such a strlen() function.
Do you disagree? Keep in mind that incrementing a pointer just
past the end of an array has undefined behavior.

Perhaps it can be done if you have detailed knowledge about the memory
layout and addressing scheme of the current platform -- but you could
still get false negatives if, for example, an array with no '\0' happens
to be immediately followed in memory by another array that does have a
'\0'.

For example:

char s0[4] = "abcd"; // no terminating '\0'
char s1[4] = "def";

If s1 happens to immediately follow s0 in memory, strlen(s0) will likely
return 7 rather than reporting an error -- unless your implementation
uses fat pointers.
 
G

glen herrmannsfeldt

(snip)
(snip, then I wrote)
Do you mean if p+1 is less than p? If it is (and, in many cases, even
if it isn't), then just evaluating p+1 has undefined behavior.

Yes, I was thinking about this later (while riding my bike down
the street) that p+1 would have been a better test.

But yes, it is already undefined behavior. Even so, it is nicer
undefined behavior than the infinite loop that you otherwise might
run into.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top