Best way to allocate a large amount of data

P

Peter Hickman

I have a program that requires x strings all of y length. x will be in the range
of 100-10000 whereas the strings will all be < 200 each.

This does not need to be grown once it has been created.

Should I allocate x strings of y length or should I allocate a single string x *
y long? Which would be more efficient and / or portable?

Thank you.
 
?

=?ISO-8859-1?Q?Bj=F8rn_Augestad?=

Peter said:
I have a program that requires x strings all of y length. x will be in
the range of 100-10000 whereas the strings will all be < 200 each.

This does not need to be grown once it has been created.

Should I allocate x strings of y length or should I allocate a single
string x * y long? Which would be more efficient and / or portable?

Allocate x strings of y length. Something like this(untested):

char** alloc_strings(size_t x, size_t y)
{
char** mystrings = malloc(sizeof *mystrings * x);
if(mystrings != NULL) {
size_t i;
for(i = 0; i < x; i++) {
if( (mystrings = malloc(y)) == NULL) {
while(i--)
free(mystrings);
free(mystrings);
mystrings = NULL;
}
}
}
return mystrings;
}

Efficiency depends on what you plan to do with all the strings. ;-)
Thank you.

Bjørn
 
M

Method Man

Peter Hickman said:
I have a program that requires x strings all of y length. x will be in the range
of 100-10000 whereas the strings will all be < 200 each.

This does not need to be grown once it has been created.

Should I allocate x strings of y length or should I allocate a single string x *
y long? Which would be more efficient and / or portable?

You're best off storing them in a string array.

Allocating a single string with x*y length is a bad idea. One problem is
that you'll need the null terminator or a sentinal character to seperate the
strings. This would force you to re-write many of the string functions that
you need, and add house-keeping data (which is not good if you're concerned
about memory space).

Of course, you could have one string that is 200 characters and x-1 strings
that are 1 character long. But, I'm assuming the deviation of lengths isn't
that big, and you don't know any of the data in advance.
 
L

Lawrence Kirby

I have a program that requires x strings all of y length. x will be in
the range
of 100-10000 whereas the strings will all be < 200 each.

This does not need to be grown once it has been created.

Should I allocate x strings of y length or should I allocate a single
string x * y long? Which would be more efficient and / or portable?

Thank you.

The simplest approach would be to allocate memory for each string
individually. 10000 strings isn't a HUGE number (depending on your
environment) and overheads of separate allocation may not be significant.

Allocating one large memory block for all of the strings is likely to be
more efficient in terms of speed and space. You have to write the code to
suballocate from that block but that isn't very tricky. You can't
realloc() for individual strings and you can only free() everything in one
go, which is very simple if that is what you need.

On the portability side there are implementations that can allocate lots
of little objects but not one big object of the same total size. For
example some 16 bit implementations limit the size of any one object to
below 64K but permit the total for all allocations to exceed that. However
a couple of megabytes isn't a particularly large allocation these days and
it is reasonable not to worry about that unless you have a particular
reason to do so.

Lawrence
 
P

Peter Hickman

Sorry I'm being a bit sloppy with my wording here. The length of a string is
likely to be less than 200 characters but all strings will be the same length,
whatever that length is.

What worries me is that allocating a large number of small strings may eat up
resources out of proportion to the data they hold. So it would seem that a large
block of memory would be a good idea but I don't know if allocating a single
large chunk of data has problems of it's own.

Until the 64K limit of some older systems was mentioned I had clean forgotten
about it.
 
L

Lawrence Kirby

Sorry I'm being a bit sloppy with my wording here. The length of a string is
likely to be less than 200 characters but all strings will be the same length,
whatever that length is.

What worries me is that allocating a large number of small strings may eat up
resources out of proportion to the data they hold. So it would seem that a large
block of memory would be a good idea but I don't know if allocating a single
large chunk of data has problems of it's own.

Until the 64K limit of some older systems was mentioned I had clean forgotten
about it.

If you create for yourself, say, an array of pointers to char which you
set up and use to access the strings, it becomes almost immaterial which
method you use to allocate (and free) them, because the access method will
be consistent. If you implement one allocation method and don't like it
you can alter the allocation code later on without affecting the code that
accesses the string data.

Lawrence
 
D

Dan Pop

In said:
Sorry I'm being a bit sloppy with my wording here. The length of a string is
likely to be less than 200 characters but all strings will be the same length,
whatever that length is.

What worries me is that allocating a large number of small strings may eat up
resources out of proportion to the data they hold. So it would seem that a large
block of memory would be a good idea but I don't know if allocating a single
large chunk of data has problems of it's own.

Much less than any other approach. If the number of strings is known at
compile time, just use:

static char mystrings[X][Y];

If it's not, use a pointer to an array of Y characters:

char (*mystrings)[Y] = malloc(X * sizeof *mystrings);

In either case, you acees mystrings using the same syntax: mystring
refers to a whole string, while mystring[j] to a character from a
string.

If Y is not a compile-time constant, either, you also need to allocate
a dope vector, which is one more malloc call, but the syntax is still
the same, except that sizeof *mystring will no longer be equal to Y
(except by pure accident):

char **mystrings = malloc(Y * sizeof *mystrings);
mystrings[0] = malloc(X * Y);
for (i = 1; i < X; i++) mystrings = mystrings[i - 1] + Y;

If you ever deallocate the strings, call free(mystrings[0]) first and then
free(mystrings).

Note, however, that, although the syntax to access mystrings is the same,
this method is slower than the first two, because each access needs one
more pointer dereferencing. If this is going to be an issue, you may
want to simply allocate X * Y bytes and do the index arithmetic on your
own (usually using function-like macros).

Error checking deliberately omitted, BTW.
Until the 64K limit of some older systems was mentioned I had clean forgotten
about it.

It was a bogus argument: even those older systems provided ways to
allocate objects as large as the available memory, some of them available
to standard C code (e.g. the huge memory model of the MSDOS C
implementations).

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top