Creating and Accessing Very Big Arrays in C

Discussion in 'C Programming' started by djhulme@gmail.com, Aug 8, 2006.

  1. Guest

    Hi,

    I'm using GCC. Please could you tell me, what is the maximum number of
    array elements that I can create in C, i.e.

    char* anArray = (char*) calloc( ??MAX?? , sizeof(char) ) ;

    I've managed to create arrays using DOUBLE data types, but when I try
    to access the array, the compiler complains that the number is not an
    INT, i.e.

    // this succeeds
    char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;

    // this does not compile because the number 1234567891234584 is not an
    INT
    anArray[1234567891234584] = 1 ;

    Your help and advice would be sinceraly appreciated.

    Thanking you in anticipation,

    Daniel
     
    , Aug 8, 2006
    #1
    1. Advertising

  2. Ian Collins Guest

    wrote:
    > Hi,
    >
    > I'm using GCC. Please could you tell me, what is the maximum number of
    > array elements that I can create in C, i.e.
    >
    > char* anArray = (char*) calloc( ??MAX?? , sizeof(char) ) ;
    >

    Depends on your system.

    > I've managed to create arrays using DOUBLE data types, but when I try
    > to access the array, the compiler complains that the number is not an
    > INT, i.e.
    >

    Should that be 'int'?

    > // this succeeds
    > char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;
    >

    Realy?

    > // this does not compile because the number 1234567891234584 is not an
    > INT
    > anArray[1234567891234584] = 1 ;
    >

    If you can allocate more than INT_MAX elements, you should probably
    write 1234567891234584LL to tell the compiler the constant is a long
    long, otherwise it will default to int.

    --
    Ian Collins.
     
    Ian Collins, Aug 9, 2006
    #2
    1. Advertising

  3. Old Wolf Guest

    Ian Collins wrote:
    > wrote:
    > > // this does not compile because the number 1234567891234584 is not an
    > > INT
    > > anArray[1234567891234584] = 1 ;
    > >

    > If you can allocate more than INT_MAX elements, you should probably
    > write 1234567891234584LL to tell the compiler the constant is a long
    > long, otherwise it will default to int.


    Constants always take on a type that can represent them
    (unless there is no such type). In this case, the constant
    will be a long long without you having to specify it --
    assuming that the number exceeds LONG_MAX, and that
    the compiler is in C99 mode.

    I don't know where this myth came from that there is
    something wrong with un-suffixed constants.
     
    Old Wolf, Aug 9, 2006
    #3
  4. Ian Collins Guest

    Old Wolf wrote:
    > Ian Collins wrote:
    >
    >> wrote:
    >>
    >>>// this does not compile because the number 1234567891234584 is not an
    >>>INT
    >>>anArray[1234567891234584] = 1 ;
    >>>

    >>
    >>If you can allocate more than INT_MAX elements, you should probably
    >>write 1234567891234584LL to tell the compiler the constant is a long
    >>long, otherwise it will default to int.

    >
    >
    > Constants always take on a type that can represent them
    > (unless there is no such type). In this case, the constant
    > will be a long long without you having to specify it --
    > assuming that the number exceeds LONG_MAX, and that
    > the compiler is in C99 mode.
    >
    > I don't know where this myth came from that there is
    > something wrong with un-suffixed constants.
    >

    In my case, the dim and distant past!

    Cheers,

    --
    Ian Collins.
     
    Ian Collins, Aug 9, 2006
    #4
  5. Tom St Denis Guest

    wrote:
    > I've managed to create arrays using DOUBLE data types, but when I try
    > to access the array, the compiler complains that the number is not an
    > INT, i.e.
    >
    > // this succeeds
    > char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;


    I doubt you are allocating what you think you are (hint: truncation).
    Also why are you casting the return value?

    How many times can people say "don't cast void pointers" before newbs
    actually read the advice and do it?

    > // this does not compile because the number 1234567891234584 is not an
    > INT
    > anArray[1234567891234584] = 1 ;
    >
    > Your help and advice would be sinceraly appreciated.


    This usually is a sign you are trying to address objects larger than
    your platform can natively address. E.g. on a 32-bit platform the
    object has more than 4 billion entries. On a 64-bit platform it means
    you have an object with 2^64 entries which you most certainly don't
    have enough disk or memory for.

    Why are you trying to use such large arrays?

    There are ways of emulating sparsely populated arrays. Perhaps that's
    what you want?

    Tom
     
    Tom St Denis, Aug 9, 2006
    #5
  6. jaysome Guest

    On 8 Aug 2006 16:22:52 -0700, "Old Wolf" <>
    wrote:

    >Ian Collins wrote:
    >> wrote:
    >> > // this does not compile because the number 1234567891234584 is not an
    >> > INT
    >> > anArray[1234567891234584] = 1 ;
    >> >

    >> If you can allocate more than INT_MAX elements, you should probably
    >> write 1234567891234584LL to tell the compiler the constant is a long
    >> long, otherwise it will default to int.

    >
    >Constants always take on a type that can represent them
    >(unless there is no such type). In this case, the constant
    >will be a long long without you having to specify it --
    >assuming that the number exceeds LONG_MAX, and that
    >the compiler is in C99 mode.
    >
    >I don't know where this myth came from that there is
    >something wrong with un-suffixed constants.


    Which begs the question, why would one ever use a suffixed constant?

    --
    jay
     
    jaysome, Aug 9, 2006
    #6
  7. jacob navia Guest

    jaysome wrote:
    > On 8 Aug 2006 16:22:52 -0700, "Old Wolf" <>
    > wrote:
    >
    >
    >>Ian Collins wrote:
    >>
    >>> wrote:
    >>>
    >>>>// this does not compile because the number 1234567891234584 is not an
    >>>>INT
    >>>>anArray[1234567891234584] = 1 ;
    >>>>
    >>>
    >>>If you can allocate more than INT_MAX elements, you should probably
    >>>write 1234567891234584LL to tell the compiler the constant is a long
    >>>long, otherwise it will default to int.

    >>
    >>Constants always take on a type that can represent them
    >>(unless there is no such type). In this case, the constant
    >>will be a long long without you having to specify it --
    >>assuming that the number exceeds LONG_MAX, and that
    >>the compiler is in C99 mode.
    >>
    >>I don't know where this myth came from that there is
    >>something wrong with un-suffixed constants.

    >
    >
    > Which begs the question, why would one ever use a suffixed constant?
    >


    long double a = 1e500;

    This is an error because 1e500 is a double constant and its maximum
    value can be only around 1e300. To avoid
    getting garbage you HAVE to put the 'L' like

    long double a = 1e500L;
     
    jacob navia, Aug 9, 2006
    #7
  8. jaysome wrote:
    > On 8 Aug 2006 16:22:52 -0700, "Old Wolf" <>
    > wrote:
    >>Constants always take on a type that can represent them
    >>(unless there is no such type). In this case, the constant
    >>will be a long long without you having to specify it --
    >>assuming that the number exceeds LONG_MAX, and that
    >>the compiler is in C99 mode.
    >>
    >>I don't know where this myth came from that there is
    >>something wrong with un-suffixed constants.

    >
    > Which begs the question, why would one ever use a suffixed constant?


    long long int i = 1LL << 40;
     
    Harald van =?UTF-8?B?RMSzaw==?=, Aug 9, 2006
    #8
  9. jacob navia <> writes:
    > jaysome wrote:
    >> On 8 Aug 2006 16:22:52 -0700, "Old Wolf" <>
    >> wrote:

    [...]
    >>>I don't know where this myth came from that there is
    >>>something wrong with un-suffixed constants.

    >> Which begs the question, why would one ever use a suffixed constant?

    >
    > long double a = 1e500;
    >
    > This is an error because 1e500 is a double constant and its maximum
    > value can be only around 1e300. To avoid
    > getting garbage you HAVE to put the 'L' like
    >
    > long double a = 1e500L;


    Interesting point. Integer literals have a type that depends on the
    value of the literal and the range of the type; floating-point
    literals don't. I suppose that since floating-point has both range
    and precision, it would be more difficult to construct a consistent
    set of rules.

    (If DBL_MAX >= 1e500, the above declaration is ok, but you do need the
    suffix to be safe.)

    So the previous question can be changed to: why would one ever use a
    suffixed *integer* constant?

    One example is an argument to a variadic function, but that's probably
    fairly rare. In most contexts, numeric values will be implicitly
    converted to the required type.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Aug 9, 2006
    #9
  10. tgur Guest

    Tom St Denis wrote:

    > How many times can people say "don't cast void pointers" before newbs
    > actually read the advice and do it?
    >

    Could you explain, why casting void pointers is bad?
     
    tgur, Aug 9, 2006
    #10
  11. Richard Bos Guest

    tgur <> wrote:

    > Tom St Denis wrote:
    >
    > > How many times can people say "don't cast void pointers" before newbs
    > > actually read the advice and do it?
    > >

    > Could you explain, why casting void pointers is bad?


    Because it is rarely if ever necessary.

    There are a couple of occasions when an ordinary programmer should cast
    a void pointer or should cast any other pointer to void *. One is when
    passing them to a variadic function. Another is inside the comparison
    function for qsort().
    The case Tom complains about, which is on receiving a void pointer from
    malloc() and friends, is _not_ one of these occasions. Often, the
    perceived need to cast malloc() is a result of one or more of three
    faults: not #including <stdlib.h>, using a C++ compiler instead of a C
    one, and a lack of clue about void pointers in C.

    Richard
     
    Richard Bos, Aug 9, 2006
    #11
  12. The limit depends on the particular compiler, memory model, hardware,
    and operating system.

    Many compilers nowdays generate 32-bit code and 32-bit addresses, which
    limits their addressing ability to an absolute maximum of about 4
    billion bytes of data plus 4 billion bytes of code. Sometimes enhanced
    by another 4GB of heap data.

    In addition to that the operating system often grabs a part of the
    potential address space. For instance, various versions of Windows
    reserve 1/4 to 1/2 the address space for their own use.

    In addition, some older motherboards don't implement all 32 address
    lines, so you may be limited to 1/2, 1/4, or 1/8th of 4GB.

    Then in addition you're limited by the amount of memory the OS can fake
    using virtual memory. If your system vm file is only 1GB in size,
    that's the upper limit.

    THen in addition to all the above, if you want fast access to the
    array, you're mostly limited by the amount of actual, real RAM the
    computer has, and in addition the other programs and OS tasks running
    will subtract from that.

    So in a typical 32-bit PC (and I KNOW there are other architectures),
    you'll have, best case, about 1GB of fast RAM, that's 250 million
    32-bit floats, twice that if you fall back on virtual memory.

    If you upgrade to a 64-bit PC, then you can typically address 2^48
    bytes of memory. But get real, 2^48 bytes of actual RAM would cost
    about 2^42 dollars, about 4 trillion dolalrs.


    Perhaps if you explained what you want to do, we can suggest ways of
    doing it without requiring ridiculous amounts of memory.
     
    Ancient_Hacker, Aug 9, 2006
    #12
  13. Guest

    > Perhaps if you explained what you want to do, we can suggest ways of
    > doing it without requiring ridiculous amounts of memory.


    I think this is very wise. Thanks for everyone's comments so far.

    I'm effectively creating massive 7 dimensional hypermatricies in C.
    These matrices are of type Char and only contain boolean values (1 or
    0). They are incredibly sparse (I know that there are sparse matrix
    techniques, but I haven't found any good implementations in C yet -
    can you recommend any?).

    I'll write in pseudo code, but I hope you get the gist.

    Lets say I want to create a large 7D hypermatrix:

    n = 100 ;
    m = new boolean_matrix[n][n][n][n][n][n][n] ;

    The problem is, this takes up way too much memory.

    To save memory, alternatively, I could create a 1D array that stores
    n^7 elements:

    m = new Boolean_matrix[n*n*n*n*n*n*n] ;

    But n^7 (where n = 100) is bigger than the larges INT (2^32).

    In C, it seems that I can create an array with a size contained within
    DOUBLE number of elements, but I can't access the array once created,
    hence my original problem:

    -------------------------------------------------------

    I've managed to create arrays using DOUBLE data types, but when I try
    to access the array, the compiler complains that the number is not an
    INT, i.e.


    // this succeeds
    char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;


    // this does not compile because the number 1234567891234584 is not an
    INT
    anArray[1234567891234584] = 1 ;

    ----------------------------------------------------------

    There is certainly mileage in using other data storage classes to
    represent my sparse matrices, i.e. hashtable, trees, etc - but I
    haven't found any that are intuitive and/or efficient.

    If anyone has any suggestions , I'd be sincerely grateful

    Daniel
     
    , Aug 9, 2006
    #13
  14. Okay, 7 dimensions is a lot. If you want 100 elements every which way,
    that's 10^14 elements. Since 10^3 is about 2^10, that's about 2^42
    bits, or 2^39 bytes, so that's easily addressable on a system with a
    x64 CPU.

    Nowdays memory is selling for about $60 a gigabyte, and you need about
    550 gigabytes, so you better dig deep and find $32,767 just for RAM.
    And you'll need a rather special motherboard that can accept that many
    memory DIMMS.

    I think you may have to scale back the number of dimensions or the
    extent along a dimension. Just dropping ONE dimension, down to 6,
    brings it within the realm of feasibility, without getting a big
    federal grant.

    Or scaling the extent is even better , as the total memory will go down
    as the seventh power.

    ----

    Exactly how sparse is this matrix? If it's less than 10% filled, it
    might be possible to use some sparse-matrix techniques.
     
    Ancient_Hacker, Aug 9, 2006
    #14
  15. Al Balmer Guest

    On Wed, 09 Aug 2006 10:29:08 GMT, (Richard
    Bos) wrote:

    >tgur <> wrote:
    >
    >> Tom St Denis wrote:
    >>
    >> > How many times can people say "don't cast void pointers" before newbs
    >> > actually read the advice and do it?
    >> >

    >> Could you explain, why casting void pointers is bad?

    >
    >Because it is rarely if ever necessary.
    >
    >There are a couple of occasions when an ordinary programmer should cast
    >a void pointer or should cast any other pointer to void *. One is when
    >passing them to a variadic function.


    That one I get.

    > Another is inside the comparison
    >function for qsort().


    This one I don't. Why would you need a cast there?

    This seems OK (at least the compiler doesn't complain):

    int compare(const void *arg1, const void *arg2)
    {
    const int *warg1 = arg1;
    const int *warg2 = arg2;

    return *warg1 - *warg2;
    }

    Actually, the compiler doesn't complain if I leave out all the
    "const", either. (With include for stdlib.h and a call to qsort.)

    >The case Tom complains about, which is on receiving a void pointer from
    >malloc() and friends, is _not_ one of these occasions. Often, the
    >perceived need to cast malloc() is a result of one or more of three
    >faults: not #including <stdlib.h>, using a C++ compiler instead of a C
    >one, and a lack of clue about void pointers in C.
    >
    >Richard


    --
    Al Balmer
    Sun City, AZ
     
    Al Balmer, Aug 9, 2006
    #15
  16. "Ancient_Hacker" <> writes:
    > The limit depends on the particular compiler, memory model, hardware,
    > and operating system.

    [...]

    What limit? Please quote context.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Aug 9, 2006
    #16
  17. writes:
    >> Perhaps if you explained what you want to do, we can suggest ways of
    >> doing it without requiring ridiculous amounts of memory.

    >
    > I think this is very wise. Thanks for everyone's comments so far.
    >
    > I'm effectively creating massive 7 dimensional hypermatricies in C.
    > These matrices are of type Char and only contain boolean values (1 or
    > 0). They are incredibly sparse (I know that there are sparse matrix
    > techniques, but I haven't found any good implementations in C yet -
    > can you recommend any?).
    >
    > I'll write in pseudo code, but I hope you get the gist.
    >
    > Lets say I want to create a large 7D hypermatrix:
    >
    > n = 100 ;
    > m = new boolean_matrix[n][n][n][n][n][n][n] ;


    Your pseudo code looks a whole lot like C++.

    > The problem is, this takes up way too much memory.
    >
    > To save memory, alternatively, I could create a 1D array that stores
    > n^7 elements:
    >
    > m = new Boolean_matrix[n*n*n*n*n*n*n] ;


    Why do you think that would save memory? You're still storing the
    same number of elements.

    > But n^7 (where n = 100) is bigger than the larges INT (2^32).


    It's "int", not "INT".

    > In C, it seems that I can create an array with a size contained within
    > DOUBLE number of elements, but I can't access the array once created,
    > hence my original problem:


    What do you mean by "DOUBLE"? C has a type called "double", but it's
    a floating-point type, and it can't be used as an array index.

    The fact that your array size exceeds the largest value of type int
    isn't the problem. The problem is just that you're trying to allocate
    a really really huge amount of memory.

    > -------------------------------------------------------
    >
    > I've managed to create arrays using DOUBLE data types, but when I try
    > to access the array, the compiler complains that the number is not an
    > INT, i.e.
    >
    >
    > // this succeeds
    > char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;


    The first argument to calloc() is of type size_t, which is an unsigned
    integer type. Your argument 999999999999999999 will be implicitly
    converted to size_t. If your size_t is 64 bits, then
    999999999999999999 is a valid value of that type (and the calloc()
    call is likely to fail because you almost certainly don't have that
    much memory available). If your size_t is smaller than 60 bits, your
    argument will be converted by, in effect, dropping the high-order bits.
    For example, if size_t is 32 bits, then the call
    calloc(999999999999999999, sizeof(char))
    is equivalent to
    calloc(2808348671, sizeof(char))
    You're still not likely to have that much available memory -- but
    unless you check the result of calloc(), your program will never know
    that.

    (On some systems, particularly Linux, malloc or calloc will pretend to
    allocate however much memory you ask for, and problems won't show up
    until you try to use that memory.)

    > // this does not compile because the number 1234567891234584 is not an
    > INT
    > anArray[1234567891234584] = 1 ;
    >
    > ----------------------------------------------------------


    What is an "INT"? Remember that C is case-sensitive. There is, as
    you know, a built-in type called "int".

    This declaration:

    int anArray[1234567891234584];

    is legal *unless* it exceeds an implementation-defined limit (which it
    most likely does).

    > There is certainly mileage in using other data storage classes to
    > represent my sparse matrices, i.e. hashtable, trees, etc - but I
    > haven't found any that are intuitive and/or efficient.
    >
    > If anyone has any suggestions , I'd be sincerely grateful


    On any modern system, you just aren't going to be able to allocate and
    use the amount of memory you want. You'll need to use some kind of
    sparse representation. (I don't have any pointers for you.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Aug 9, 2006
    #17
  18. Al Balmer Guest

    On 9 Aug 2006 08:35:30 -0700, wrote:

    >
    >There is certainly mileage in using other data storage classes to
    >represent my sparse matrices, i.e. hashtable, trees, etc - but I
    >haven't found any that are intuitive and/or efficient.


    A list or array containing the coordinates of the non-zero elements
    seems intuitive enough to me. Whether it's efficient or not depends on
    what manipulation you need to do with the matrices.

    --
    Al Balmer
    Sun City, AZ
     
    Al Balmer, Aug 9, 2006
    #18
  19. Keith Thompson wrote:

    > What limit? Please quote context.



    Good point. I think we were talking about the limit on the amount of
    memory allocatable for this guy's array.

    He wants a 100x100x100x100x100x100x100 array of boolean and is a bit
    fuzzy on what this entails.
     
    Ancient_Hacker, Aug 9, 2006
    #19
  20. On Wed, 9 Aug 2006 10:13:01 UTC, tgur <> wrote:

    > Tom St Denis wrote:
    >
    > > How many times can people say "don't cast void pointers" before newbs
    > > actually read the advice and do it?
    > >

    > Could you explain, why casting void pointers is bad?


    should you please read the postings of the last 5 days. Youl'll find
    the answer multiple times. You'll learn something more by that.

    Or use google and search this group for malloc. Ypu'll get multiple
    thousend times the answer you needs.

    --
    Tschau/Bye
    Herbert

    Visit http://www.ecomstation.de the home of german eComStation
    eComStation 1.2 Deutsch ist da!
     
    Herbert Rosenau, Aug 9, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shaguf
    Replies:
    0
    Views:
    519
    Shaguf
    Dec 24, 2008
  2. Shaguf
    Replies:
    0
    Views:
    479
    Shaguf
    Dec 26, 2008
  3. ericnoonan
    Replies:
    0
    Views:
    355
    ericnoonan
    Dec 14, 2009
  4. Shaguf
    Replies:
    0
    Views:
    258
    Shaguf
    Dec 26, 2008
  5. Shaguf
    Replies:
    0
    Views:
    236
    Shaguf
    Dec 24, 2008
Loading...

Share This Page