file read -- increase the size of an array

S

Sameer

Hello,

i wish to read a file of int and store into an array dynamically...
the size of memory allocated finally, should just be sufficeient to store n
integers.
I do not know the number of integers in the file...
How should I go about creating the array int array[n] dynamically ?

the file is

1
2
3
4
5
...
....
....
....

so on upto n


Kindly reply soon,

Sameer.
 
S

Sameer

Sameer said:
Hello,

i wish to read a file of int and store into an array dynamically...
the size of memory allocated finally, should just be sufficeient to store n
integers.
I do not know the number of integers in the file...
How should I go about creating the array int array[n] dynamically ?

the file is

1
2
3
4
5
..
...
...
...

so on upto n


Kindly reply soon,

Sameer.
#include<stdio.h>


main()
{
int i=0,j=0,k,num,n;
int* a;
FILE *fp;
fp=fopen("input" ,"r");

while(fscanf(fp,"%d", &num)!=EOF)
{
i++;
}

a=new int;

for(k=0;k<i;k++)
{
a[k]=0;
}
fclose(fp);
fp=fopen("input" ,"r");
while(fscanf(fp,"%d", &num)!=EOF)
{
a[j]=num;
j++;
}

for(k=0;k<i;k++)
{
printf("%d\n", a[k]);
}
//free a[n];

fclose(fp);
return 0;
}
Here is one way ...but I am reading the files twice ...
By reading the file only once ...is it possible ?
 
I

Irrwahn Grausewitz

Sameer said:
Sameer said:
Hello,

i wish to read a file of int and store into an array dynamically...
the size of memory allocated finally, should just be sufficeient to store n
integers.
I do not know the number of integers in the file...
How should I go about creating the array int array[n] dynamically ?
#include<stdio.h>


main()
{
int i=0,j=0,k,num,n;
int* a;
FILE *fp;
fp=fopen("input" ,"r");

while(fscanf(fp,"%d", &num)!=EOF)
{
i++;
}

a=new int;

^^^

This is not a C program.

My suggestion: use malloc() or realloc() to dynamically grow your
'array' as you go. This way you have to go through your file only
once.

<SNIP>

Regards

Irrwahn
 
C

Christopher Benson-Manica

Irrwahn Grausewitz said:
My suggestion: use malloc() or realloc() to dynamically grow your
'array' as you go. This way you have to go through your file only
once.

My question is, is it more efficient to call realloc() for every integer, or
to go through the file twice and call malloc() only once?
 
T

Thomas Matthews

Christopher said:
My question is, is it more efficient to call realloc() for every integer, or
to go through the file twice and call malloc() only once?

You want to reduce the number of reallocations while not over allocating
memory. Choose a number as an increase amount. When the array is full,
allocate space for another N integers.

Perhaps you want to use a dynamic data structure, such as a linked list.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library
 
I

Irrwahn Grausewitz

Christopher Benson-Manica said:
My question is, is it more efficient to call realloc() for every integer, or
to go through the file twice and call malloc() only once?

Depends. Personally, I would choose a third option: realloc() memory
in reasonable big chunks, and when done reading do a last realloc() to
shrink-to-fit (if that is necessary at all, alas the OP's assignment
said so).

Regards,

Irrwahn
 
A

Al Bowers

Christopher said:
My question is, is it more efficient to call realloc() for every integer, or
to go through the file twice and call malloc() only once?

I would not go through the file twice and I would use function realloc.
Instead of reallocating on every int read from the file, I would
in blocks. For example you can set the reallocations after every five
integers read. Define BLOCK to whatever value is most efficient for your
requirements.

#define BLOCK 5

int *iarray,num;
unsigned count;
FILE *fp;

/* open the file for reading */
for(iarray = NULL, count = 0U;
EOF != fscanf(fp,"%d",&num);count++)
{
int *tmp;
if(count%BLOCK == 0U)
if((tmp = realloc(iarray,(sizeof *iarray)*(count+BLOCK)))
== NULL) break;
iarray = tmp;
iarray[count] = num;
}
 
I

Irrwahn Grausewitz

Christopher Benson-Manica said:
Is that a preferred solution?

Maybe not when dealing with integer items, but of course it is worth
consideration when handling more complex data structures.

Regards

Irrwahn
 
M

Matt Gregory

Irrwahn said:
Depends. Personally, I would choose a third option: realloc() memory
in reasonable big chunks, and when done reading do a last realloc() to
shrink-to-fit (if that is necessary at all, alas the OP's assignment
said so).

Is it really that big of a deal to call realloc() every time? Doesn't
realloc() get memory in semi-big blocks anyway? Is it worth the added
complexity to manage the blocks ourselves?

Matt Gregory
 
I

Irrwahn Grausewitz

Matt Gregory said:
Is it really that big of a deal to call realloc() every time? Doesn't
realloc() get memory in semi-big blocks anyway? Is it worth the added
complexity to manage the blocks ourselves?

Matt Gregory

Again, the answer is: it depends. I remember back in the 80's I wrote a
text editor for DOS3.x. The call to malloc() (one per text line!) was
slowing down the speed considerably: it took about 20s to read approx.
1000 lines of text! After changing my memory allocation strategy it
took about 4s! Nowadays I wouldn't care about 900 calls to malloc() in
a quick'n'dirty trow-away program. But if the mission is time critical,
well, then ...

Regards

Irrwahn
 
M

Matt Gregory

Irrwahn said:
Again, the answer is: it depends. I remember back in the 80's I wrote a
text editor for DOS3.x. The call to malloc() (one per text line!) was
slowing down the speed considerably: it took about 20s to read approx.
1000 lines of text! After changing my memory allocation strategy it
took about 4s! Nowadays I wouldn't care about 900 calls to malloc() in
a quick'n'dirty trow-away program. But if the mission is time critical,
well, then ...

But I was wondering if realloc had the same overhead as malloc(). Like,
if realloc() returns a block that's bigger than what you requested and
you call it again, it could just update the node or whatever without
having to make a system call. I don't know, I guess it's implementation
dependent, but it seems like realloc() would almost have to work like
that out of necessity.

Matt Gregory
 
E

Eric Sosman

Matt said:
But I was wondering if realloc had the same overhead as malloc(). Like,
if realloc() returns a block that's bigger than what you requested and
you call it again, it could just update the node or whatever without
having to make a system call. I don't know, I guess it's implementation
dependent, but it seems like realloc() would almost have to work like
that out of necessity.

Most implementations of malloc() and friends will round
the requested size up to a multiple of some convenient "atom"
size, for alignment if for no better reason. Eight bytes is
a typical "atom" size, but of course implementations vary.
(Some -- "buddy system" allocators, for example -- use several
different "atom" sizes. But for what follows, let's just
pretend that the Universe contains nothing but hydrogen.)

For the sake of argument, let's take eight bytes as the
typical "atom" and assume a four-byte `int'. Then if you
call realloc() for every new `int', half of the requests
can be satisfied by simply recognizing the previously unused
four bytes at the end of the existing area. Of course, the
realloc() function needs to do a certain amount of work to
figure out the actual as opposed to nominal region size, but
that's usually not too hard: let's be optimistic and guess
that it's only as hard as storing five `int' values.

But what about the other half of the realloc() calls, those
that find the area already full? The function must now work
harder ...

Perhaps it can discover that the full region is just below
an unallocated area, making it possible to take one "atom" from
the upper area and attach it to the existing one. Discovering
this fact usually involves a search of some kind, plus some
fiddling with bookkeeping information: two area sizes and a
pointer, at least, would need to be changed. Let's again be
optimistic and suppose that the cost of this lengthier operation
is only fifteen times that of storing the `int'.

Of course, realloc() might also find that the growing region
abuts an allocated area, so stealing an "atom" isn't possible.
In this case, realloc() needs to find and allocate an entirely
new memory region, copy the existing data from the old region
to the new one, and deallocate the old region. This requires
more searching, about twice as much bookkeeping as in the above
scenario, and the cost of copying all the data. But let's not
complicate matters: chances are that if you're just sucking in
data from a single source and stashing it in a growing array,
this "full Monty" realloc() will happen fairly rarely.

So, assuming that half the realloc() calls are of the first
type and the rest are of the second, the cost to swallow N `int'
values comes to

(N/2) * 5 + (N/2) * 15 + N = 11 * N

91% of the cost is in the realloc() function; the other 9% is
"payload."

But what would happen if you grew the array by sixteen
`int's worth each time, instead of just by one? The downside
is that all the realloc() calls are now of the costlier second
variety, but the upside is that there are only one-sixteenth
as many calls to begin with. The total cost becomes

(N/16) * 15 + N ~= 1.94 * N

See what just happened? This section of the program is about
5.7 times faster than it used to be. Grow the array by 100
elements at a whack and the cost drops to

(N/100) * 15 + N = 1.15 * N

.... and the program runs at more than nine times the speed of
the original.

Of course, all this is based on guesstimations; the Standard
is silent on the cost of realloc(). The actual cost may be quite
different from the illustrative computations given here. But since
the realloc() operations themselves are "overhead" as opposed to
"payload" as far as the overall progress of the program is
concerned, it makes sense to avoid excessive realloc() calls if
N is large. (If N is small, of course, it makes no sense at
all to bother optimizing this insignificant piece of program.)
 
M

Matt Gregory

Eric Sosman wrote:

....
Of course, all this is based on guesstimations; the Standard
is silent on the cost of realloc(). The actual cost may be quite
different from the illustrative computations given here. But since
the realloc() operations themselves are "overhead" as opposed to
"payload" as far as the overall progress of the program is
concerned, it makes sense to avoid excessive realloc() calls if
N is large. (If N is small, of course, it makes no sense at
all to bother optimizing this insignificant piece of program.)

I wasn't thinking about all that searching and copying that has to be
done. The complexity of tailoring the allocation to the task really
isn't that bad to begin with, but I was thinking the atom size would
be bigger for some reason, like 64 bytes or something. That would be
an incredible waste of memory though in most circumstances.

Do you think the allocator in K&R is fairly representative of how
malloc() and friends are implemented? I'll have to re-read that
part, since I don't think I understand it very well.

Matt Gregory
 
E

Eric Sosman

Matt said:
Do you think the allocator in K&R is fairly representative of how
malloc() and friends are implemented? I'll have to re-read that
part, since I don't think I understand it very well.

The implementation shown in K&R (first edition; I don't
have the second) is a simple one, good for smallish memory
"arenas." I've certainly encountered implementations that
used pretty much the K&R technique, but my informal impression
is that modern implementations use fancier data structures.

The motivations for more sophisticated approaches are
not just a matter of being modern for modernity's sake.
Today's systems and applications are far larger (some would
say far more bloated) than those of the original K&R vintage,
and they tend to use a lot more memory and create a lot more
free areas. Each malloc() or free() in the K&R allocator
must conduct a linear search through the free list, and if
the free list is long the searches get slow. This effect
can be even worse than expected on modern machines with
memory caches: traversing a long linked list is an excellent
way to make the cache perform poorly.

Note that implementations of That Other Language commonly
use C's malloc() and free() as the underpinnings of `new' and
`delete', which often translates to a lot more memory management
activity than in "traditional" C programs. The free list for
a TOL program can be expected (as a rule; rules have exceptions)
to be longer than that of a C program.

Also, in a multi-threaded application the single free list
becomes a point of contention: only one thread can be working
on it at a time, and any other thread that wants to malloc() or
free() must stall while the first gets its work done.

Put all this together, and the reasons to seek out slicker
implementations become clear. Still, the K&R allocator remains
a good example of useful techniques; if they'd tried to show a
more modern (and more gnarly) implementation they'd have found
themselves writing about a specialized topic rather than about
the C language.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top