Large arrays: malloc or 'just declare'?

H

Hal Styli

Hello,

Can someone please help.

I need to use a large array and I'm not sure when one should
use (A) and when one should use (B) below:-

#define MAX 10000000

(A) int x[MAX];

(B) int *x = ( int * ) malloc( MAX*sizeof( int ) );

This isn't a compiler specific problem, but just for some background, I was
using (A) for lower values of MAX but when I started increasing it, I hit
some compiler/linker problems with the borland compiler.

I then noticed that the size of the executable was similar regardless of
whether
I used (A) or (B) which might have been a mild advantage of (B).

Other than compiler/linker problems, assuming MAX is fixed (as it is!)
why would I use (B) rather than (A)?

I think the answer relates to stack versus heap (but I'm interested in any
other answers that come up). Can anyone point me to an online reference on
stack vs heap, my google searches haven't inspired me.

Thanks in advance
Hal.
 
R

Rakesh

i) First of all, if you want to go for very large arrays,( as you had
rightly put it b4 ) , go for the heap and avoid taking up stack space
for it.

ii) Second, the life time of the variable. Most of the time, you would
want to move around the data pointed by the very large array and operate
on them. In that case, again it makes sense to have it on the heap and
pass around the address of the same.
Yeah . you can even define a local variable on the stack and move
around the address, but needless to say that is lost once the function
returns, whereas if it taken from heap, it is still there ( But then
there is always a lurking danger of a memory leak by separating the
malloc / free in diff. functions . )
My vote goes for heap !!





Hal said:
Hello,

Can someone please help.

I need to use a large array and I'm not sure when one should
use (A) and when one should use (B) below:-

#define MAX 10000000

(A) int x[MAX];

(B) int *x = ( int * ) malloc( MAX*sizeof( int ) );

Thanks in advance
Hal.
 
C

Christopher Benson-Manica

Hal Styli said:
#define MAX 10000000
(A) int x[MAX];
(B) int *x = ( int * ) malloc( MAX*sizeof( int ) );
^^^^^^^^^
Don't cast the return value of malloc, ever. If you've forgotten to
include <stdlib.h>, you won't know it if you cast malloc's return
value. Highly detailed discussions on this topic can be found in this
group's archives.
This isn't a compiler specific problem, but just for some background, I was
using (A) for lower values of MAX but when I started increasing it, I hit
some compiler/linker problems with the borland compiler.

If it isn't compiler specific, why mention Borland's compiler? The
amount of automatic storage you can allocate is a function of your
implementation and how you use it.
I then noticed that the size of the executable was similar regardless of
whether
I used (A) or (B) which might have been a mild advantage of (B).

Mostly irrelevent, unless you have the (mis)fortune of developing on a
highly space-sensitive platform, such as an embedded device.
Other than compiler/linker problems, assuming MAX is fixed (as it is!)
why would I use (B) rather than (A)?

Use A when:

1) You know how much space you want at compile time,
2) You won't want more space at any future time, and
3) You don't run into problems such as the one you're having.

Otherwise, use B. (Right, clc?) Remember that you must free() memory
you allocate with malloc().
I think the answer relates to stack versus heap (but I'm interested in any
other answers that come up). Can anyone point me to an online reference on
stack vs heap, my google searches haven't inspired me.

On this newsgroup, "stack" and "heap" are dangerous terms, as
implementations need not use such structures to manage the memory they
allocate. As for references, the following contain a wealth of
valuable C information:

http://www.ungerhu.com/jxh/clc.welcome.txt
http://www.eskimo.com/~scs/C-faq/top.html
http://benpfaff.org/writings/clc/off-topic.html
 
A

August Derleth

Hal Styli said:
#define MAX 10000000
(A) int x[MAX];
(B) int *x = ( int * ) malloc( MAX*sizeof( int ) );
^^^^^^^^^
Don't cast the return value of malloc, ever. If you've forgotten to
include <stdlib.h>, you won't know it if you cast malloc's return
value. Highly detailed discussions on this topic can be found in this
group's archives.
I then noticed that the size of the executable was similar regardless of
whether
I used (A) or (B) which might have been a mild advantage of (B).

Mostly irrelevent, unless you have the (mis)fortune of developing on a
highly space-sensitive platform, such as an embedded device.

In which case you should know how much memory your program can use to the
nearest byte or word.
Use A when:

1) You know how much space you want at compile time,
2) You won't want more space at any future time, and
3) You don't run into problems such as the one you're having.

Otherwise, use B. (Right, clc?) Remember that you must free() memory
you allocate with malloc().

And always check to see if your call to malloc() succeeded; that is,
always test the pointer you got back against NULL.

I like to use constructions like the below:

#include <stdlib.h>

/* ... */

int *ptr;

if ((ptr = malloc(sizeof(int) * LEN)) == NULL) {
/* Handle failure to allocate RAM */
}
On this newsgroup, "stack" and "heap" are dangerous terms, as
implementations need not use such structures to manage the memory they
allocate. As for references, the following contain a wealth of
valuable C information:

http://www.ungerhu.com/jxh/clc.welcome.txt
http://www.eskimo.com/~scs/C-faq/top.html
http://benpfaff.org/writings/clc/off-topic.html

Good pointers. ;)
 
R

Rakesh

August said:
And always check to see if your call to malloc() succeeded; that is,
always test the pointer you got back against NULL.

I like to use constructions like the below:

#include <stdlib.h>

/* ... */

int *ptr;

if ((ptr = malloc(sizeof(int) * LEN)) == NULL) {
/* Handle failure to allocate RAM */
}

My two cents - Before you start writing your C++ code, write your
wrappers for all the routine things that you would do - mem. allocation
(like the one above), sockets, files etc. It would really help a lot
later in terms of debugging the code later.
 
A

August Derleth

My two cents - Before you start writing your C++ code, write your
wrappers for all the routine things that you would do - mem. allocation
(like the one above), sockets, files etc. It would really help a lot
later in terms of debugging the code later.

Who says we're going to write C++?

(I probably missed something in the original post. Sorry.)

[OT]
Using malloc() in C++ isn't good style, as I understand it.
[/OT]
 
R

Rakesh Kumar

Sorry for the confusion and bringing in C++ here.

I wanted to mention abt wrappers ( and that has nothing to do with, if u
do in C++ or not ).

August said:
My two cents - Before you start writing your C++ code, write your
wrappers for all the routine things that you would do - mem. allocation
(like the one above), sockets, files etc. It would really help a lot
later in terms of debugging the code later.


Who says we're going to write C++?

(I probably missed something in the original post. Sorry.)

[OT]
Using malloc() in C++ isn't good style, as I understand it.
[/OT]
 
J

josh

Hal Styli said:
I need to use a large array and I'm not sure when one should
use (A) and when one should use (B) below:-

#define MAX 10000000

(A) int x[MAX];

(B) int *x = ( int * ) malloc( MAX*sizeof( int ) ); [...]
Other than compiler/linker problems, assuming MAX is fixed (as it is!)
why would I use (B) rather than (A)?

I think the answer relates to stack versus heap (but I'm interested in
any other answers that come up). Can anyone point me to an online
reference on stack vs heap, my google searches haven't inspired me.

If "int x[MAX];" is global or static, it will typically not be put in
either place. Usually it'll get added to the .bss section, which is stored
in your executable as just a size, then the OS will allocate that much
memory somewhere for you when your program starts. Completely system
dependent, but pretty common.

Otherwise, the only serious reason I can think of to prefer (B) is: what
happens when you don't have that much memory available? With (A), your
program dies in some way that you probably have no control over. With (B),
you can exit gracefully and let the user know why. (or, algorithm
permitting, fall back to a smaller size)

-josh
 
H

heyo

August Derleth said:
Hal Styli <no_spam@all> spoke thus:
[...]
Mostly irrelevent, unless you have the (mis)fortune of developing on a
highly space-sensitive platform, such as an embedded device.

In which case you should know how much memory your program can use to the
nearest byte or word.
.... and you don't need to alloc() memory, because you know it's there and
it's all yours. :)

4) You realy need it.
Otherwise, use B. (Right, clc?) Remember that you must free() memory
you allocate with malloc().

And always check to see if your call to malloc() succeeded; that is,
always test the pointer you got back against NULL.
[...]

That makes (A) a favorite even if you don't know the exact amount but
an upper limit. (and have enough so you mustn't be stingy)
I think the answer relates to stack versus heap (but I'm interested in any
other answers that come up). Can anyone point me to an online reference on
stack vs heap, my google searches haven't inspired me.

On this newsgroup, "stack" and "heap" are dangerous terms, as
implementations need not use such structures to manage the memory they
allocate. [...]

.... but for most of them you can assum that local variables end up
on the stack and global (and static) variables end up on the heap.
So your problem (stack overflow ?) may vanish, if you declare your
big array global. (If it doesn't bother you that that is regarded
bad style.)

bye,
heyo
 
D

Darrell Grainger

Hello,

Can someone please help.

I need to use a large array and I'm not sure when one should
use (A) and when one should use (B) below:-

#define MAX 10000000

(A) int x[MAX];

(B) int *x = ( int * ) malloc( MAX*sizeof( int ) );

Just a side note. If you failed to have #include <stdlib.h> the compiler
will not have a prototype for malloc. This will lead to undefined
behaviour. The behaviour might be things work fine. On the other hand, it
migth crash your computer or format your hard drive. If you had:

int *x = malloc(MAX*sizeof(int));

you should get a warning or error said:
This isn't a compiler specific problem, but just for some background, I was
using (A) for lower values of MAX but when I started increasing it, I hit
some compiler/linker problems with the borland compiler.

I then noticed that the size of the executable was similar regardless of
whether
I used (A) or (B) which might have been a mild advantage of (B).

Other than compiler/linker problems, assuming MAX is fixed (as it is!)
why would I use (B) rather than (A)?

I think the answer relates to stack versus heap (but I'm interested in any
other answers that come up). Can anyone point me to an online reference on
stack vs heap, my google searches haven't inspired me.

The idea of stack vs heap assumes your compiler is using a stack and a
heap. If you are looking for a generic C answer then forget that.

Method A will only work if you know the size of the array at compile time.
If the size of the array is dependent on the user then you must use method
B. Since you indicate that you can use either method the decision is not
being force one way or the other.

If I use method A, I'll know at compile or load time whether the program
will work. It will also work or not work. With method B, I have to wait
until run time to see if it will work. But I have the option to recover.
If the call to malloc fails I can have a fall back plan. I could switch to
secondary memory or process an array of half the size twice. It I'm just
going to exit then it is no better than method A.

On the other hand, I realize that different compilers and platforms behave
differently. I might try both methods and measure the performance of each.
For my particular platform with a specific compiler I might find one
method is notably better than the other. I'd put a comment reminding
myself and anyone else who might inherit my code.

If my testing found that method B was no better than method A, I'd go with
method A because it is less work for me.
 
S

Severian

Hello,

Can someone please help.

I need to use a large array and I'm not sure when one should
use (A) and when one should use (B) below:-

#define MAX 10000000

(A) int x[MAX];

(B) int *x = ( int * ) malloc( MAX*sizeof( int ) );

Just a side note. If you failed to have #include <stdlib.h> the compiler
will not have a prototype for malloc. This will lead to undefined
behaviour. The behaviour might be things work fine. On the other hand, it
migth crash your computer or format your hard drive.

Yeah, right. I'm writing a compiler and I'm going to make it format
some newbie's hard drive or shoot dragons from his nose when he makes
a mistake. I'm sure I'll sell -- or give away -- *lots* of copies of
that.

The DS9000 (or whatever it is) compiler does not exist. Compilers
*generally* do *reasonable* things when undefined behavior is invoked
(hey ya, hey ya, wave the cloth over the fire). The key is to learn
when you're invoking said undefined behavior. Most compilers will tell
you if you ask properly!

While C has become a more complex language over the last few years, it
is very well defined. Properly-written code will run on many
platforms.

But if you learned to program for Windows using M$ compilers (or
Unix/variants using GCC or other compilers) you may be in for a bit of
a shock when you change platforms. You may actually need to know what
each environment understands and how each environment works.

By learning these things, you will learn what differs between
environments, and what functions to move into platform-specific
modules, and which libraries are availble where, and which libraries
provide appropriate user interfaces for each environment. (I can't
offer general guideance here; every cross-platform "solution" I've
seen has been woefully incomplete).
 
G

goose

Rakesh said:
My two cents - Before you start writing your C++ code, write your

In this group we write only C code.
wrappers for all the routine things that you would do - mem. allocation
(like the one above), sockets, files etc. It would really help a lot
later in terms of debugging the code later.


hand
goose,
 
G

goose

Rakesh Kumar said:
Sorry for the confusion and bringing in C++ here.

I wanted to mention abt wrappers ( and that has nothing to do with, if u
do in C++ or not ).

Welcome top the group. Has no one yet asked you not to top-post?
No?

Could you please note top-post

<snipped previous post>

goose,
 
F

Flash Gordon

On 20 Apr 2004 01:10:55 -0700
I think the answer relates to stack versus heap (but I'm
interested in any other answers that come up). Can anyone point
me to an online reference on stack vs heap, my google searches
haven't inspired me.

On this newsgroup, "stack" and "heap" are dangerous terms, as
implementations need not use such structures to manage the memory
they allocate. [...]

... but for most of them you can assum that local variables end up
on the stack and global (and static) variables end up on the heap.
So your problem (stack overflow ?) may vanish, if you declare your
big array global. (If it doesn't bother you that that is regarded
bad style.)

Actually, on many systems with a stack and a heap static and global data
is neither in the stack nor on the heap. Which just goes to show why you
should reserve such discussions for more appropriate groups where people
will know about your implementation.
 
F

Flash Gordon

Hello,

Can someone please help.

I need to use a large array and I'm not sure when one should
use (A) and when one should use (B) below:-

#define MAX 10000000

(A) int x[MAX];

(B) int *x = ( int * ) malloc( MAX*sizeof( int ) );

Just a side note. If you failed to have #include <stdlib.h> the
compiler will not have a prototype for malloc. This will lead to
undefined behaviour. The behaviour might be things work fine. On the
other hand, it migth crash your computer or format your hard drive.

Yeah, right. I'm writing a compiler and I'm going to make it format
some newbie's hard drive or shoot dragons from his nose when he makes
a mistake. I'm sure I'll sell -- or give away -- *lots* of copies of
that.

The DS9000 (or whatever it is) compiler does not exist. Compilers
*generally* do *reasonable* things when undefined behavior is invoked
(hey ya, hey ya, wave the cloth over the fire).

There ARE systems where pointers are returned in one register and
integers in another. On such systems you WILL get strange behaviour, the
most likely symptom being you program crashing. Avoiding this (in this
instance) costs nothing and provides a guarantee of the application
working on all systems which have sufficient memory available to malloc.
The key is to learn
when you're invoking said undefined behavior.

No, the key is to avoid undefined behaviour under almost all
circumstances. Off the top of my head I can't actually think of any
conditions where invoking undefined behaviour would be the preferred way
to go.
Most compilers will tell
you if you ask properly!

Only for certain types of undefined behaviour. It is impossible to
determine at compile time that some constructs will give undefined
behaviour.
While C has become a more complex language over the last few years, it
is very well defined.

I don't think there is significantly more complexity in the language.
There may be more complexity in system specific libraries which are
outside the scope of the C language, but that is a separate issue.
Properly-written code will run on many
platforms.
True.

But if you learned to program for Windows using M$ compilers (or
Unix/variants using GCC or other compilers) you may be in for a bit of
a shock when you change platforms. You may actually need to know what
each environment understands and how each environment works.

If you have learnt C, rather that the MS or GNU (or whatever) languages
based on C then you will already have a good idea as to where the
problems will be.

This has nothing to do with whether you should invoke undefined
behaviour.
By learning these things, you will learn what differs between
environments, and what functions to move into platform-specific
modules, and which libraries are availble where, and which libraries
provide appropriate user interfaces for each environment. (I can't
offer general guideance here; every cross-platform "solution" I've
seen has been woefully incomplete).

I use one every day for an application that makes 50% of the revenue for
the company I work for. It build and runs on several *nix variants and
Windows and we know what the issues will be on porting to any other OS.

Yes, you need to learn system specifics, but what happens when you
invoke undefined behaviour is not something you need to know since if
might change when you apply the next patch to your system, or even if
you change something completely unrelated such as which command shell
you are using.
 
D

Daniel Haude

On Tue, 20 Apr 2004 14:24:28 GMT,
in Msg. said:
While C has become a more complex language over the last few years, it
is very well defined. Properly-written code will run on many
platforms.

How has it become more complex? By the C89->C99 transition?

--Daniel
 
M

Michael Wojcik

The DS9000 (or whatever it is) compiler does not exist. Compilers
*generally* do *reasonable* things when undefined behavior is invoked

As has been noted here any number of times, there are (successful)
commercial implementations where undefined behavior - including in
particular the case under discussion here, where malloc is invoked
without a correct declaration in scope - causes program malfunction.
The key is to learn when you're invoking said undefined behavior.

And not to do it. What exactly is your point? The post you responded
to pointed out, correctly, that invoking malloc without a correct
declaration in scope causes undefined behavior. That is trivially
avoided. "It works OK with all the compilers I know" is not a helpful
addition.
 
H

Hal Styli

(A) int x[MAX];

Thanks for replies so far.

If (A) is global, why is this considered bad style? Is it due to issues
raised in CFAQ 1.7 to do with header files or something else?
 
K

Keith Thompson

Flash Gordon said:
On Tue, 20 Apr 2004, Hal Styli wrote: [...]
I need to use a large array and I'm not sure when one should
use (A) and when one should use (B) below:-

#define MAX 10000000

(A) int x[MAX];

(B) int *x = ( int * ) malloc( MAX*sizeof( int ) );

Just a side note. If you failed to have #include <stdlib.h> the
compiler will not have a prototype for malloc. This will lead to
undefined behaviour. The behaviour might be things work fine. On the
other hand, it migth crash your computer or format your hard drive.

Yeah, right. I'm writing a compiler and I'm going to make it format
some newbie's hard drive or shoot dragons from his nose when he makes
a mistake. I'm sure I'll sell -- or give away -- *lots* of copies of
that.

The DS9000 (or whatever it is) compiler does not exist. Compilers
*generally* do *reasonable* things when undefined behavior is invoked
(hey ya, hey ya, wave the cloth over the fire).

There ARE systems where pointers are returned in one register and
integers in another. On such systems you WILL get strange behaviour, the
most likely symptom being you program crashing. Avoiding this (in this
instance) costs nothing and provides a guarantee of the application
working on all systems which have sufficient memory available to malloc.

And there are ways you can run into problems even on systems where
integers and pointers are returned in the same register. For example,
on an IA-64, int is 32 bits and pointers are 64 bits. Calling
malloc() with no prototype in scope (with a C90 compiler) causes the
high-order 32 bits of the pointer to be lost; casting to a pointer
type constructs a bad 64-bit pointer with all zeros in the high-order
32 bits. Adding a #include for <stdlib.h> avoids the problem;
casting the result doesn't.
 
D

Dave Thompson

(A) int x[MAX];

Thanks for replies so far.

If (A) is global, why is this considered bad style? Is it due to issues
raised in CFAQ 1.7 to do with header files or something else?
I don't think anyone said that; in fact I don't think they even said
it's bad style if local (and automatic).

We said >casting< the result of a malloc call, in your (B), is bad
style because it isn't needed in C, like any unneeded cast adds
clutter and helps lower the "alarm" with which people do and should
view casts, and in C89 can hide the problem of not #include'ing the
correct declaration. (In C++ you generally shouldn't be using malloc
anyway, you should use the typesafe and objectoriented 'new'.)

I think some said that although not officially guaranteed, on nearly
all implementations >as an automatic variable< the array uses stack
space while the malloc is heap, and there are significantly more
implementations where the former is smaller and hence overflowed by a
smaller MAX than the latter and/or where stack overflow (which is
Undefined Behavior) is not handled well while malloc failure (return
NULL) can be handled as well as you wish to code.

As a "global" -- more precisely, since there are several different
though related meanings to the term global and C uses none of them
officially, as a variable defined at file scope with either external
or internal linkage, and also as a local variable explicitly declared
with static duration -- you *can't* write the malloc call as an
initializer; your choices are the array definition, or defining a
pointer and then assigning a malloc'ed pointer in code at some
suitable point like in or called from the beginning of main().

The malloc approach, with two separated pieces, is now modestly more
difficult to write and maintain, while the array approach at best
requires only the one definition. And, while malloc failure (again)
can be handled as well as coded, an overflow of statically allocated
space is usually detected and reported by the linker, before run time,
as contrasted to stack overflow, though this is not guaranteed either.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top