storing data in file instead of memory

P

pereges

I've to store an array of structures:

typedef struct
{
double origin[3];
double e_field_at_origin_real, e_field_at_origin_imag;
double direction[3];
double pathlength;
int depth;
}ray;

A single instance occupies 80 bytes of memory on my machine. I need
to store an array of more than 1,000,000 structures. My program is
failing if the array size exceeds 400,000 because I other big data
structures to store as well. Do you think it will be a better idea to
store the list on a file and use the file as an array ?
 
V

vippstar

I've to store an array of structures:

typedef struct
{
double origin[3];
double e_field_at_origin_real, e_field_at_origin_imag;
double direction[3];
double pathlength;
int depth;

}ray;

A single instance occupies 80 bytes of memory on my machine. I need
to store an array of more than 1,000,000 structures. My program is
failing if the array size exceeds 400,000 because I other big data
structures to store as well. Do you think it will be a better idea to
store the list on a file and use the file as an array ?
Depends. You might not need a 1.000.000 elements array in the first
place.
The hard disk is a lot slower than the RAM, but a lot bigger. You can
store all your data there, and have, say 10.000 structures loaded in
memory.
But this is not topical for comp.lang.c; FILE streams may very well be
implemented entirely in memory.
 
A

Antoninus Twink

I've to store an array of structures:

A single instance occupies 80 bytes of memory on my machine. I need
to store an array of more than 1,000,000 structures. My program is
failing if the array size exceeds 400,000 because I other big data
structures to store as well. Do you think it will be a better idea to
store the list on a file and use the file as an array ?

Can't you buy more memory? I mean, most people probably have 80Mb going
spare in their wristwatches nowadays...

Failing that, storing to disk should be an absolute last resort. As long
as there aren't too many writes, it will almost certainly be orders of
magnitude quicker to keep the data in RAM using some compression scheme,
and decompress it on-the-fly as needed.
 
N

Nick Keighley

On  3 Jun 2008 at  5:57, pereges wrote:


Can't you buy more memory? I mean, most people probably have 80Mb going
spare in their wristwatches nowadays...

Failing that, storing to disk should be an absolute last resort. As long
as there aren't too many writes, it will almost certainly be orders of
magnitude quicker to keep the data in RAM using some compression scheme,
and decompress it on-the-fly as needed.

what a sensible post!

to the OP: the values in your struct mostly seem to be doubles.

typedef struct
{
double origin[3];
double e_field_at_origin_real, e_field_at_origin_imag;
double direction[3];
double pathlength;
int depth;
}ray;

Do you really use the full range of double for all the values?
Is depth really a 32-bit value (as int is usually 32-bit on modern
hardware. I'm not saying change your struct but consider Twink's idea
of compression and where it might be applied.
 
R

Richard Tobin

A single instance occupies 80 bytes of memory on my machine. I need
to store an array of more than 1,000,000 structures. My program is
failing if the array size exceeds 400,000 because I other big data
structures to store as well. Do you think it will be a better idea to
store the list on a file and use the file as an array ?
[/QUOTE]
Can't you buy more memory? I mean, most people probably have 80Mb going
spare in their wristwatches nowadays...

He probably already has more memory. The first thing to check is
that he hasn't got some limit set on his memory use that he can
change. That'll be operating-system-specific: in unix with bash
you would use the "ulimit" command. Also note that there may be
separate limits for stack and heap memory.

-- Richard
 
J

Jens Thoms Toerring

pereges said:
I've to store an array of structures:
typedef struct
{
double origin[3];
double e_field_at_origin_real, e_field_at_origin_imag;
double direction[3];
double pathlength;
int depth;
}ray;
A single instance occupies 80 bytes of memory on my machine. I need
to store an array of more than 1,000,000 structures. My program is
failing if the array size exceeds 400,000 because I other big data
structures to store as well. Do you think it will be a better idea to
store the list on a file and use the file as an array ?

It's not clear if you create an array like this

ray all_may_rays[ 1000000 ];

or if you use dynamically allocated memory, e.g. using

ray * all_may_rays;
all_may_rays = malloc( 1000000 * sizeof *all_may_rays );
if ( all_may_rays == NULL ) {
fprintf( stderr, "Not enough memory for rays\n" );
exit( EXIT_FAILURE );
}

If you use the first method then I would consider using dynamically
allocated memory. While the amount of memory available for arrays
as created with the first method can be rather limited, the second
method should only get you in trouble if you need about the same
amount of memory as is available on your machine (and it may even
do some storing on the disk automatically for you on many modern
systems that have swap space).
Regards, Jens
 
S

santosh

FILE streams may very well be implemented entirely in memory.

How would you be able to access disk files (or named files) with fopen
if this were so? I think C streams need some kind persistent storage.
Maybe I'm not thinking this out though.
 
B

Bartc

pereges said:
I've to store an array of structures:

typedef struct
{
double origin[3];
double e_field_at_origin_real, e_field_at_origin_imag;
double direction[3];
double pathlength;
int depth;
}ray;

A single instance occupies 80 bytes of memory on my machine. I need
to store an array of more than 1,000,000 structures. My program is
failing if the array size exceeds 400,000 because I other big data
structures to store as well. Do you think it will be a better idea to
store the list on a file and use the file as an array ?

I first read this as 80GB. 80MB doesn't sound that much and you're failing
at 32MB anyway; what do all the other tables add up to?

How you thought at all of using float instead of double? That'll pretty much
halve the requirement to 40MB (but will still fail it seems).

Where do the fields origin and direction come from? If they are simply
copies of points in other data structures, you might be able to store
pointers or indices to those points instead (so 24 bytes becomes 4 bytes).

Or possibly the same origin or direction is shared amongst many different
rays; then it may be possible to store one copy only of each xyz value and
again use indices or pointers to it. This will require some analysis.

When some years ago I had to store large numbers of xyz points in memory
(together with transformation data and other stuff), I created a packed (not
compressed) record format. This worked well when a large proportion of
values were 0.0 or 1.0, or had many trailing zero bits. But access is then
much more complex and slower.

It might also be possible (depending on exactly what your appl does), to
partition the task, say into regions. And it could be that you can do a
region at a time and the number of rays for each region is much smaller than
for the entire task.

As for storing the file on disk: your virtual memory system should easily be
able to cope with an extra 80MB or so. If not then you should see what else
is using the memory. Or maybe just install more ram as someone suggested...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top