data storage question...


R

Randy Howard

Suppose you want to have a large number of items (as an array of struct)
wherein one field is "record-specific" and of variable length, yet not
violate standard C (C90 probably, since C99 isn't available on the platforms
in question) to get it there so that other modules can get the
data at run-time.

I could have the structure contain:

struct element_record foo_tag
{
... /* some number of conventional entries */
size_t blob_size;
unsigned char *blob;
}

Then, manually declare individually named arrays with the raw "blob"
data of varying sizes, and manually put them and their size into an
array of structures like the one above.

Or,

#define WORST_CASE 2048

struct element_record foo_tag
{
... /* some number of conventional entries */
size_t blob_size;
unsigned char blob[WORST_CASE];
}

and initialize them in place, or read it in from a data file, but this
winds up bloating the size of the binary (or memory usage) considerably.

For this application, there is a very wide range from smallest
to largest of this variable portion of the data, (from 2 bytes
up to a little under 2K) and several hundred instances.

This seems like it should be really obvious, but I haven't tripped over
this before and I'm hoping someone can point out what I'm missing without
using some compiler-specific extension.
 
Ad

Advertisements

E

Eric Sosman

Randy said:
Suppose you want to have a large number of items (as an array of struct)
wherein one field is "record-specific" and of variable length, yet not
violate standard C (C90 probably, since C99 isn't available on the platforms
in question) to get it there so that other modules can get the
data at run-time.

C99's "flexible array member" would not work anyhow,
because you can't make an array of FAM-containing structs.
(Think about it: How would array indexing work when the
elements all have different sizes?)
I could have the structure contain:

struct element_record foo_tag

ITYM `element_record' *or* `foo_tag', but not both.
{
... /* some number of conventional entries */
size_t blob_size;
unsigned char *blob;
}

Then, manually declare individually named arrays with the raw "blob"
data of varying sizes, and manually put them and their size into an
array of structures like the one above.

Or,

#define WORST_CASE 2048

struct element_record foo_tag
{
... /* some number of conventional entries */
size_t blob_size;
unsigned char blob[WORST_CASE];
}

and initialize them in place, or read it in from a data file, but this
winds up bloating the size of the binary (or memory usage) considerably.

For this application, there is a very wide range from smallest
to largest of this variable portion of the data, (from 2 bytes
up to a little under 2K) and several hundred instances.

This seems like it should be really obvious, but I haven't tripped over
this before and I'm hoping someone can point out what I'm missing without
using some compiler-specific extension.

Given the wide range of blob sizes, the first method is
probably preferable. You'd wind up with something like

static unsigned char blob1[] = { ... };
static unsigned char blob2[] = { ... };
...

struct element_record {
... conventional entries ...
size_t blob_size;
unsigned char *blob_data;
} blob_array[] = {
{ ..., sizeof blob1, blob1 },
{ ..., sizeof blob2, blob2 },
...
};

You could save a small amount of typing by using a
simple macro like

#define BLOB(name) sizeof name , name

in the initializer.

If you're willing to abandon the array in favor of some
kind of linked data structure, you could use flexible array
members (C99) or "the struct hack" (a widely-accepted abuse
of C90).

Finally, if you've really got a lot of this stuff you
may want to consider the merits of managing it in some other
way than by compiling it into the program. Read it from a
file or files distributed along with the program, perhaps.
There *may* be some (off-topic) advantages to putting the data
where it might be (off-topic) shared by (off-topic) other
programs, but there's no 100% reliable way to tell the compiler
to put your data in (off-topic) shareable memory. On the other
hand, there may be advantages in being able to distribute just
one new data file when something changes, instead of recompiling
and redistributing the entire monolithic program package.
 

Top