Creating and Accessing Very Big Arrays in C

D

djhulme

Hi,

I'm using GCC. Please could you tell me, what is the maximum number of
array elements that I can create in C, i.e.

char* anArray = (char*) calloc( ??MAX?? , sizeof(char) ) ;

I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.

// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;

// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

Your help and advice would be sinceraly appreciated.

Thanking you in anticipation,

Daniel
 
I

Ian Collins

Hi,

I'm using GCC. Please could you tell me, what is the maximum number of
array elements that I can create in C, i.e.

char* anArray = (char*) calloc( ??MAX?? , sizeof(char) ) ;
Depends on your system.
I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.
Should that be 'int'?
// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;
Realy?

// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;
If you can allocate more than INT_MAX elements, you should probably
write 1234567891234584LL to tell the compiler the constant is a long
long, otherwise it will default to int.
 
O

Old Wolf

Ian said:
// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;
If you can allocate more than INT_MAX elements, you should probably
write 1234567891234584LL to tell the compiler the constant is a long
long, otherwise it will default to int.

Constants always take on a type that can represent them
(unless there is no such type). In this case, the constant
will be a long long without you having to specify it --
assuming that the number exceeds LONG_MAX, and that
the compiler is in C99 mode.

I don't know where this myth came from that there is
something wrong with un-suffixed constants.
 
I

Ian Collins

Old said:
Ian said:
// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

If you can allocate more than INT_MAX elements, you should probably
write 1234567891234584LL to tell the compiler the constant is a long
long, otherwise it will default to int.


Constants always take on a type that can represent them
(unless there is no such type). In this case, the constant
will be a long long without you having to specify it --
assuming that the number exceeds LONG_MAX, and that
the compiler is in C99 mode.

I don't know where this myth came from that there is
something wrong with un-suffixed constants.
In my case, the dim and distant past!

Cheers,
 
T

Tom St Denis

I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.

// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;

I doubt you are allocating what you think you are (hint: truncation).
Also why are you casting the return value?

How many times can people say "don't cast void pointers" before newbs
actually read the advice and do it?
// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

Your help and advice would be sinceraly appreciated.

This usually is a sign you are trying to address objects larger than
your platform can natively address. E.g. on a 32-bit platform the
object has more than 4 billion entries. On a 64-bit platform it means
you have an object with 2^64 entries which you most certainly don't
have enough disk or memory for.

Why are you trying to use such large arrays?

There are ways of emulating sparsely populated arrays. Perhaps that's
what you want?

Tom
 
J

jaysome

Ian said:
// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;
If you can allocate more than INT_MAX elements, you should probably
write 1234567891234584LL to tell the compiler the constant is a long
long, otherwise it will default to int.

Constants always take on a type that can represent them
(unless there is no such type). In this case, the constant
will be a long long without you having to specify it --
assuming that the number exceeds LONG_MAX, and that
the compiler is in C99 mode.

I don't know where this myth came from that there is
something wrong with un-suffixed constants.

Which begs the question, why would one ever use a suffixed constant?
 
J

jacob navia

jaysome said:
Ian said:
(e-mail address removed) wrote:

// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;


If you can allocate more than INT_MAX elements, you should probably
write 1234567891234584LL to tell the compiler the constant is a long
long, otherwise it will default to int.

Constants always take on a type that can represent them
(unless there is no such type). In this case, the constant
will be a long long without you having to specify it --
assuming that the number exceeds LONG_MAX, and that
the compiler is in C99 mode.

I don't know where this myth came from that there is
something wrong with un-suffixed constants.


Which begs the question, why would one ever use a suffixed constant?

long double a = 1e500;

This is an error because 1e500 is a double constant and its maximum
value can be only around 1e300. To avoid
getting garbage you HAVE to put the 'L' like

long double a = 1e500L;
 
K

Keith Thompson

jacob navia said:
jaysome said:
On 8 Aug 2006 16:22:52 -0700, "Old Wolf" <[email protected]>
wrote: [...]
I don't know where this myth came from that there is
something wrong with un-suffixed constants.
Which begs the question, why would one ever use a suffixed constant?

long double a = 1e500;

This is an error because 1e500 is a double constant and its maximum
value can be only around 1e300. To avoid
getting garbage you HAVE to put the 'L' like

long double a = 1e500L;

Interesting point. Integer literals have a type that depends on the
value of the literal and the range of the type; floating-point
literals don't. I suppose that since floating-point has both range
and precision, it would be more difficult to construct a consistent
set of rules.

(If DBL_MAX >= 1e500, the above declaration is ok, but you do need the
suffix to be safe.)

So the previous question can be changed to: why would one ever use a
suffixed *integer* constant?

One example is an argument to a variadic function, but that's probably
fairly rare. In most contexts, numeric values will be implicitly
converted to the required type.
 
T

tgur

Tom said:
How many times can people say "don't cast void pointers" before newbs
actually read the advice and do it?
Could you explain, why casting void pointers is bad?
 
R

Richard Bos

tgur said:
Could you explain, why casting void pointers is bad?

Because it is rarely if ever necessary.

There are a couple of occasions when an ordinary programmer should cast
a void pointer or should cast any other pointer to void *. One is when
passing them to a variadic function. Another is inside the comparison
function for qsort().
The case Tom complains about, which is on receiving a void pointer from
malloc() and friends, is _not_ one of these occasions. Often, the
perceived need to cast malloc() is a result of one or more of three
faults: not #including <stdlib.h>, using a C++ compiler instead of a C
one, and a lack of clue about void pointers in C.

Richard
 
A

Ancient_Hacker

The limit depends on the particular compiler, memory model, hardware,
and operating system.

Many compilers nowdays generate 32-bit code and 32-bit addresses, which
limits their addressing ability to an absolute maximum of about 4
billion bytes of data plus 4 billion bytes of code. Sometimes enhanced
by another 4GB of heap data.

In addition to that the operating system often grabs a part of the
potential address space. For instance, various versions of Windows
reserve 1/4 to 1/2 the address space for their own use.

In addition, some older motherboards don't implement all 32 address
lines, so you may be limited to 1/2, 1/4, or 1/8th of 4GB.

Then in addition you're limited by the amount of memory the OS can fake
using virtual memory. If your system vm file is only 1GB in size,
that's the upper limit.

THen in addition to all the above, if you want fast access to the
array, you're mostly limited by the amount of actual, real RAM the
computer has, and in addition the other programs and OS tasks running
will subtract from that.

So in a typical 32-bit PC (and I KNOW there are other architectures),
you'll have, best case, about 1GB of fast RAM, that's 250 million
32-bit floats, twice that if you fall back on virtual memory.

If you upgrade to a 64-bit PC, then you can typically address 2^48
bytes of memory. But get real, 2^48 bytes of actual RAM would cost
about 2^42 dollars, about 4 trillion dolalrs.


Perhaps if you explained what you want to do, we can suggest ways of
doing it without requiring ridiculous amounts of memory.
 
D

djhulme

Perhaps if you explained what you want to do, we can suggest ways of
doing it without requiring ridiculous amounts of memory.

I think this is very wise. Thanks for everyone's comments so far.

I'm effectively creating massive 7 dimensional hypermatricies in C.
These matrices are of type Char and only contain boolean values (1 or
0). They are incredibly sparse (I know that there are sparse matrix
techniques, but I haven't found any good implementations in C yet -
can you recommend any?).

I'll write in pseudo code, but I hope you get the gist.

Lets say I want to create a large 7D hypermatrix:

n = 100 ;
m = new boolean_matrix[n][n][n][n][n][n][n] ;

The problem is, this takes up way too much memory.

To save memory, alternatively, I could create a 1D array that stores
n^7 elements:

m = new Boolean_matrix[n*n*n*n*n*n*n] ;

But n^7 (where n = 100) is bigger than the larges INT (2^32).

In C, it seems that I can create an array with a size contained within
DOUBLE number of elements, but I can't access the array once created,
hence my original problem:

-------------------------------------------------------

I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.


// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;


// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

----------------------------------------------------------

There is certainly mileage in using other data storage classes to
represent my sparse matrices, i.e. hashtable, trees, etc - but I
haven't found any that are intuitive and/or efficient.

If anyone has any suggestions , I'd be sincerely grateful

Daniel
 
A

Ancient_Hacker

Okay, 7 dimensions is a lot. If you want 100 elements every which way,
that's 10^14 elements. Since 10^3 is about 2^10, that's about 2^42
bits, or 2^39 bytes, so that's easily addressable on a system with a
x64 CPU.

Nowdays memory is selling for about $60 a gigabyte, and you need about
550 gigabytes, so you better dig deep and find $32,767 just for RAM.
And you'll need a rather special motherboard that can accept that many
memory DIMMS.

I think you may have to scale back the number of dimensions or the
extent along a dimension. Just dropping ONE dimension, down to 6,
brings it within the realm of feasibility, without getting a big
federal grant.

Or scaling the extent is even better , as the total memory will go down
as the seventh power.
 
A

Al Balmer

Because it is rarely if ever necessary.

There are a couple of occasions when an ordinary programmer should cast
a void pointer or should cast any other pointer to void *. One is when
passing them to a variadic function.

That one I get.
Another is inside the comparison
function for qsort().

This one I don't. Why would you need a cast there?

This seems OK (at least the compiler doesn't complain):

int compare(const void *arg1, const void *arg2)
{
const int *warg1 = arg1;
const int *warg2 = arg2;

return *warg1 - *warg2;
}

Actually, the compiler doesn't complain if I leave out all the
"const", either. (With include for stdlib.h and a call to qsort.)
 
K

Keith Thompson

Ancient_Hacker said:
The limit depends on the particular compiler, memory model, hardware,
and operating system.
[...]

What limit? Please quote context.
 
K

Keith Thompson

Perhaps if you explained what you want to do, we can suggest ways of
doing it without requiring ridiculous amounts of memory.

I think this is very wise. Thanks for everyone's comments so far.

I'm effectively creating massive 7 dimensional hypermatricies in C.
These matrices are of type Char and only contain boolean values (1 or
0). They are incredibly sparse (I know that there are sparse matrix
techniques, but I haven't found any good implementations in C yet -
can you recommend any?).

I'll write in pseudo code, but I hope you get the gist.

Lets say I want to create a large 7D hypermatrix:

n = 100 ;
m = new boolean_matrix[n][n][n][n][n][n][n] ;

Your pseudo code looks a whole lot like C++.
The problem is, this takes up way too much memory.

To save memory, alternatively, I could create a 1D array that stores
n^7 elements:

m = new Boolean_matrix[n*n*n*n*n*n*n] ;

Why do you think that would save memory? You're still storing the
same number of elements.
But n^7 (where n = 100) is bigger than the larges INT (2^32).

It's "int", not "INT".
In C, it seems that I can create an array with a size contained within
DOUBLE number of elements, but I can't access the array once created,
hence my original problem:

What do you mean by "DOUBLE"? C has a type called "double", but it's
a floating-point type, and it can't be used as an array index.

The fact that your array size exceeds the largest value of type int
isn't the problem. The problem is just that you're trying to allocate
a really really huge amount of memory.
-------------------------------------------------------

I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.


// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;

The first argument to calloc() is of type size_t, which is an unsigned
integer type. Your argument 999999999999999999 will be implicitly
converted to size_t. If your size_t is 64 bits, then
999999999999999999 is a valid value of that type (and the calloc()
call is likely to fail because you almost certainly don't have that
much memory available). If your size_t is smaller than 60 bits, your
argument will be converted by, in effect, dropping the high-order bits.
For example, if size_t is 32 bits, then the call
calloc(999999999999999999, sizeof(char))
is equivalent to
calloc(2808348671, sizeof(char))
You're still not likely to have that much available memory -- but
unless you check the result of calloc(), your program will never know
that.

(On some systems, particularly Linux, malloc or calloc will pretend to
allocate however much memory you ask for, and problems won't show up
until you try to use that memory.)
// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

----------------------------------------------------------

What is an "INT"? Remember that C is case-sensitive. There is, as
you know, a built-in type called "int".

This declaration:

int anArray[1234567891234584];

is legal *unless* it exceeds an implementation-defined limit (which it
most likely does).
There is certainly mileage in using other data storage classes to
represent my sparse matrices, i.e. hashtable, trees, etc - but I
haven't found any that are intuitive and/or efficient.

If anyone has any suggestions , I'd be sincerely grateful

On any modern system, you just aren't going to be able to allocate and
use the amount of memory you want. You'll need to use some kind of
sparse representation. (I don't have any pointers for you.)
 
A

Al Balmer

There is certainly mileage in using other data storage classes to
represent my sparse matrices, i.e. hashtable, trees, etc - but I
haven't found any that are intuitive and/or efficient.

A list or array containing the coordinates of the non-zero elements
seems intuitive enough to me. Whether it's efficient or not depends on
what manipulation you need to do with the matrices.
 
A

Ancient_Hacker

Keith said:
What limit? Please quote context.


Good point. I think we were talking about the limit on the amount of
memory allocatable for this guy's array.

He wants a 100x100x100x100x100x100x100 array of boolean and is a bit
fuzzy on what this entails.
 
H

Herbert Rosenau

Could you explain, why casting void pointers is bad?

should you please read the postings of the last 5 days. Youl'll find
the answer multiple times. You'll learn something more by that.

Or use google and search this group for malloc. Ypu'll get multiple
thousend times the answer you needs.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top