file read -- increase the size of an array

Sameer · Sep 16, 2003

Hello,

i wish to read a file of int and store into an array dynamically...
the size of memory allocated finally, should just be sufficeient to store n
integers.
I do not know the number of integers in the file...
How should I go about creating the array int array[n] dynamically ?

the file is

1
2
3
4
5
...
....
....
....

so on upto n

Kindly reply soon,

Sameer.

Sameer · Sep 16, 2003

Sameer said:
Hello,

i wish to read a file of int and store into an array dynamically...
the size of memory allocated finally, should just be sufficeient to store n
integers.
I do not know the number of integers in the file...
How should I go about creating the array int array[n] dynamically ?

the file is

1
2
3
4
5
..
...
...
...

so on upto n

Kindly reply soon,

Sameer.

#include<stdio.h>

main()
{
int i=0,j=0,k,num,n;
int* a;
FILE *fp;
fp=fopen("input" ,"r");

while(fscanf(fp,"%d", &num)!=EOF)
{
i++;
}

a=new int;

for(k=0;k<i;k++)
{
a[k]=0;
}
fclose(fp);
fp=fopen("input" ,"r");
while(fscanf(fp,"%d", &num)!=EOF)
{
a[j]=num;
j++;
}

for(k=0;k<i;k++)
{
printf("%d\n", a[k]);
}
//free a[n];

fclose(fp);
return 0;
}
Here is one way ...but I am reading the files twice ...
By reading the file only once ...is it possible ?

Irrwahn Grausewitz · Sep 16, 2003

Sameer said:
Sameer said:

Hello,

i wish to read a file of int and store into an array dynamically...
the size of memory allocated finally, should just be sufficeient to store n
integers.
I do not know the number of integers in the file...
How should I go about creating the array int array[n] dynamically ?

Click to expand...

#include<stdio.h>

main()
{
int i=0,j=0,k,num,n;
int* a;
FILE *fp;
fp=fopen("input" ,"r");

while(fscanf(fp,"%d", &num)!=EOF)
{
i++;
}

a=new int;

^^^

This is not a C program.

My suggestion: use malloc() or realloc() to dynamically grow your
'array' as you go. This way you have to go through your file only
once.

<SNIP>

Regards

Irrwahn

Christopher Benson-Manica · Sep 16, 2003

Irrwahn Grausewitz said:
My suggestion: use malloc() or realloc() to dynamically grow your
'array' as you go. This way you have to go through your file only
once.

My question is, is it more efficient to call realloc() for every integer, or
to go through the file twice and call malloc() only once?

Thomas Matthews · Sep 16, 2003

Christopher said:
My question is, is it more efficient to call realloc() for every integer, or
to go through the file twice and call malloc() only once?

You want to reduce the number of reallocations while not over allocating
memory. Choose a number as an increase amount. When the array is full,
allocate space for another N integers.

Perhaps you want to use a dynamic data structure, such as a linked list.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library

Irrwahn Grausewitz · Sep 16, 2003

Christopher Benson-Manica said:
My question is, is it more efficient to call realloc() for every integer, or
to go through the file twice and call malloc() only once?

Depends. Personally, I would choose a third option: realloc() memory
in reasonable big chunks, and when done reading do a last realloc() to
shrink-to-fit (if that is necessary at all, alas the OP's assignment
said so).

Regards,

Irrwahn

Christopher Benson-Manica · Sep 16, 2003

Thomas Matthews said:
Perhaps you want to use a dynamic data structure, such as a linked list.

Is that a preferred solution?

Al Bowers · Sep 16, 2003

Christopher said:
My question is, is it more efficient to call realloc() for every integer, or
to go through the file twice and call malloc() only once?

I would not go through the file twice and I would use function realloc.
Instead of reallocating on every int read from the file, I would
in blocks. For example you can set the reallocations after every five
integers read. Define BLOCK to whatever value is most efficient for your
requirements.

#define BLOCK 5

int *iarray,num;
unsigned count;
FILE *fp;

/* open the file for reading */
for(iarray = NULL, count = 0U;
EOF != fscanf(fp,"%d",&num);count++)
{
int *tmp;
if(count%BLOCK == 0U)
if((tmp = realloc(iarray,(sizeof *iarray)*(count+BLOCK)))
== NULL) break;
iarray = tmp;
iarray[count] = num;
}

Irrwahn Grausewitz · Sep 16, 2003

Christopher Benson-Manica said:
Is that a preferred solution?

Maybe not when dealing with integer items, but of course it is worth
consideration when handling more complex data structures.

Regards

Irrwahn

Matt Gregory · Sep 16, 2003

Irrwahn said:
Depends. Personally, I would choose a third option: realloc() memory
in reasonable big chunks, and when done reading do a last realloc() to
shrink-to-fit (if that is necessary at all, alas the OP's assignment
said so).

Is it really that big of a deal to call realloc() every time? Doesn't
realloc() get memory in semi-big blocks anyway? Is it worth the added
complexity to manage the blocks ourselves?

Matt Gregory

Irrwahn Grausewitz · Sep 16, 2003

Matt Gregory said:
Is it really that big of a deal to call realloc() every time? Doesn't
realloc() get memory in semi-big blocks anyway? Is it worth the added
complexity to manage the blocks ourselves?

Matt Gregory

Again, the answer is: it depends. I remember back in the 80's I wrote a
text editor for DOS3.x. The call to malloc() (one per text line!) was
slowing down the speed considerably: it took about 20s to read approx.
1000 lines of text! After changing my memory allocation strategy it
took about 4s! Nowadays I wouldn't care about 900 calls to malloc() in
a quick'n'dirty trow-away program. But if the mission is time critical,
well, then ...

Regards

Irrwahn

Matt Gregory · Sep 16, 2003

Irrwahn said:
Again, the answer is: it depends. I remember back in the 80's I wrote a
text editor for DOS3.x. The call to malloc() (one per text line!) was
slowing down the speed considerably: it took about 20s to read approx.
1000 lines of text! After changing my memory allocation strategy it
took about 4s! Nowadays I wouldn't care about 900 calls to malloc() in
a quick'n'dirty trow-away program. But if the mission is time critical,
well, then ...

But I was wondering if realloc had the same overhead as malloc(). Like,
if realloc() returns a block that's bigger than what you requested and
you call it again, it could just update the node or whatever without
having to make a system call. I don't know, I guess it's implementation
dependent, but it seems like realloc() would almost have to work like
that out of necessity.

Matt Gregory

Eric Sosman · Sep 16, 2003

Matt said:
But I was wondering if realloc had the same overhead as malloc(). Like,
if realloc() returns a block that's bigger than what you requested and
you call it again, it could just update the node or whatever without
having to make a system call. I don't know, I guess it's implementation
dependent, but it seems like realloc() would almost have to work like
that out of necessity.

Most implementations of malloc() and friends will round
the requested size up to a multiple of some convenient "atom"
size, for alignment if for no better reason. Eight bytes is
a typical "atom" size, but of course implementations vary.
(Some -- "buddy system" allocators, for example -- use several
different "atom" sizes. But for what follows, let's just
pretend that the Universe contains nothing but hydrogen.)

For the sake of argument, let's take eight bytes as the
typical "atom" and assume a four-byte `int'. Then if you
call realloc() for every new `int', half of the requests
can be satisfied by simply recognizing the previously unused
four bytes at the end of the existing area. Of course, the
realloc() function needs to do a certain amount of work to
figure out the actual as opposed to nominal region size, but
that's usually not too hard: let's be optimistic and guess
that it's only as hard as storing five `int' values.

But what about the other half of the realloc() calls, those
that find the area already full? The function must now work
harder ...

Perhaps it can discover that the full region is just below
an unallocated area, making it possible to take one "atom" from
the upper area and attach it to the existing one. Discovering
this fact usually involves a search of some kind, plus some
fiddling with bookkeeping information: two area sizes and a
pointer, at least, would need to be changed. Let's again be
optimistic and suppose that the cost of this lengthier operation
is only fifteen times that of storing the `int'.

Of course, realloc() might also find that the growing region
abuts an allocated area, so stealing an "atom" isn't possible.
In this case, realloc() needs to find and allocate an entirely
new memory region, copy the existing data from the old region
to the new one, and deallocate the old region. This requires
more searching, about twice as much bookkeeping as in the above
scenario, and the cost of copying all the data. But let's not
complicate matters: chances are that if you're just sucking in
data from a single source and stashing it in a growing array,
this "full Monty" realloc() will happen fairly rarely.

So, assuming that half the realloc() calls are of the first
type and the rest are of the second, the cost to swallow N `int'
values comes to

(N/2) * 5 + (N/2) * 15 + N = 11 * N

91% of the cost is in the realloc() function; the other 9% is
"payload."

But what would happen if you grew the array by sixteen
`int's worth each time, instead of just by one? The downside
is that all the realloc() calls are now of the costlier second
variety, but the upside is that there are only one-sixteenth
as many calls to begin with. The total cost becomes

(N/16) * 15 + N ~= 1.94 * N

See what just happened? This section of the program is about
5.7 times faster than it used to be. Grow the array by 100
elements at a whack and the cost drops to

(N/100) * 15 + N = 1.15 * N

.... and the program runs at more than nine times the speed of
the original.

Of course, all this is based on guesstimations; the Standard
is silent on the cost of realloc(). The actual cost may be quite
different from the illustrative computations given here. But since
the realloc() operations themselves are "overhead" as opposed to
"payload" as far as the overall progress of the program is
concerned, it makes sense to avoid excessive realloc() calls if
N is large. (If N is small, of course, it makes no sense at
all to bother optimizing this insignificant piece of program.)

Matt Gregory · Sep 17, 2003

Eric Sosman wrote:

....

Of course, all this is based on guesstimations; the Standard
is silent on the cost of realloc(). The actual cost may be quite
different from the illustrative computations given here. But since
the realloc() operations themselves are "overhead" as opposed to
"payload" as far as the overall progress of the program is
concerned, it makes sense to avoid excessive realloc() calls if
N is large. (If N is small, of course, it makes no sense at
all to bother optimizing this insignificant piece of program.)

I wasn't thinking about all that searching and copying that has to be
done. The complexity of tailoring the allocation to the task really
isn't that bad to begin with, but I was thinking the atom size would
be bigger for some reason, like 64 bytes or something. That would be
an incredible waste of memory though in most circumstances.

Do you think the allocator in K&R is fairly representative of how
malloc() and friends are implemented? I'll have to re-read that
part, since I don't think I understand it very well.

Matt Gregory

Eric Sosman · Sep 17, 2003

Matt said:
Do you think the allocator in K&R is fairly representative of how
malloc() and friends are implemented? I'll have to re-read that
part, since I don't think I understand it very well.

The implementation shown in K&R (first edition; I don't
have the second) is a simple one, good for smallish memory
"arenas." I've certainly encountered implementations that
used pretty much the K&R technique, but my informal impression
is that modern implementations use fancier data structures.

The motivations for more sophisticated approaches are
not just a matter of being modern for modernity's sake.
Today's systems and applications are far larger (some would
say far more bloated) than those of the original K&R vintage,
and they tend to use a lot more memory and create a lot more
free areas. Each malloc() or free() in the K&R allocator
must conduct a linear search through the free list, and if
the free list is long the searches get slow. This effect
can be even worse than expected on modern machines with
memory caches: traversing a long linked list is an excellent
way to make the cache perform poorly.

Note that implementations of That Other Language commonly
use C's malloc() and free() as the underpinnings of `new' and
`delete', which often translates to a lot more memory management
activity than in "traditional" C programs. The free list for
a TOL program can be expected (as a rule; rules have exceptions)
to be longer than that of a C program.

Also, in a multi-threaded application the single free list
becomes a point of contention: only one thread can be working
on it at a time, and any other thread that wants to malloc() or
free() must stall while the first gets its work done.

Put all this together, and the reasons to seek out slicker
implementations become clear. Still, the K&R allocator remains
a good example of useful techniques; if they'd tried to show a
more modern (and more gnarly) implementation they'd have found
themselves writing about a specialized topic rather than about
the C language.

Program to find the largest integer element of an array.	1	Mar 2, 2022
Setting array size with a variable - What does the C compiler do?	3	Feb 25, 2022
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Passing an array of structures back through the argument list	9	Sep 7, 2013
Size of a compound literal array	1	Sep 12, 2013
Limitations and workarounds to expressions defining size of an array	44	Sep 4, 2012
The cost of the cheapest routes between cities	3	Jan 7, 2023
Find the size of an array	21	Nov 6, 2008

file read -- increase the size of an array

Sameer

Sameer

Irrwahn Grausewitz

Christopher Benson-Manica

Thomas Matthews

Irrwahn Grausewitz

Christopher Benson-Manica

Al Bowers

Irrwahn Grausewitz

Matt Gregory

Irrwahn Grausewitz

Matt Gregory

Eric Sosman

Matt Gregory

Eric Sosman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads