Initializing an array comprised of very long strings

D

David Mathog

I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:

scripts.h:1: warning: string length '4918' is greater than the length
'4095' ISO C99 compilers are required to support

Luckily gcc supports (much) longer strings so the warnings are just
warnings. However this makes me wonder if there isn't some clever
preprocessor trick that is standards compliant to get past this limit?
For instance, could several shorter strings be combined somehow into a
single longer string within the header file, or must the longer
string be constructed at run time to safely avoid this warning?
That is, is this string length limit for any const char * no matter
how it is put together, or is it just a limitation that applies to
statements like:

astring = "......(many characters)...";

where the limitation is on the right side of the expression?

Thanks,

David Mathog
 
G

Guest

David said:
I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:

scripts.h:1: warning: string length '4918' is greater than the length
'4095' ISO C99 compilers are required to support

Luckily gcc supports (much) longer strings so the warnings are just
warnings. However this makes me wonder if there isn't some clever
preprocessor trick that is standards compliant to get past this limit?
For instance, could several shorter strings be combined somehow into a
single longer string within the header file, or must the longer
string be constructed at run time to safely avoid this warning?

One possibility:

char string[] = { 'o', 'n', 'e', ' ', 'b', 'y', ' ', 'o', 'n',
'e', ..., '\0' };

Another possibility:

char string_array[][100] = { "first 100 characters", "second 100
characters", "..." };
char *string = (char *) string_array;

Both suggestions are hard to maintain. Constructing strings at run
time is probably a better idea.

Or you could ignore or disable the warning.
 
K

Keith Thompson

David Mathog said:
I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:

scripts.h:1: warning: string length '4918' is greater than the length
'4095' ISO C99 compilers are required to support

Luckily gcc supports (much) longer strings so the warnings are just
warnings. However this makes me wonder if there isn't some clever
preprocessor trick that is standards compliant to get past this limit?
[...]

The limit is on the length of a string *literal*, not of a string.
Specifically (C99 5.2.4.1, Translation limits):

-- 4095 characters in a character string literal or wide string
literal (after concatenation)

-- 65535 bytes in an object (in a hosted environment only)

As long as you don't hit the 65535-byte limit, you can build up the
string at runtime from a set of compile-time string literals. This is
likely to be wasteful of space, since you'll have two copies of all
the data. Some cleverness will also be required to avoid scanning the
data multiple times; for example, a simple series of strcat() calls:

char the_whole_thing[BIG_ENOUGH];
the_whole_thing[0] = '\0';
strcat(the_whole_thing, s[0]);
strcat(the_whole_thing, s[1]);
strcat(the_whole_thing, s[2]);
/* ... */

will re-scan the_whole_thing each time to determine where to start
appending.

You might be better off just ignoring the warning, assuming you're not
concerned about the possibility of a compiler that actually imposes a
fixed limit on the size of a string literal.
 
K

Keith Thompson

Harald van Dijk said:
David said:
I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:

scripts.h:1: warning: string length '4918' is greater than the length
'4095' ISO C99 compilers are required to support
[snip]
One possibility:

char string[] = { 'o', 'n', 'e', ' ', 'b', 'y', ' ', 'o', 'n',
'e', ..., '\0' };

Another possibility:

char string_array[][100] = { "first 100 characters", "second 100
characters", "..." };
char *string = (char *) string_array;

Interesting. That takes advantage of the fact that a string literal
in an initializer doesn't have a trailing '\0' if it's *exactly* the
declared size. A simpler example is:

char s[3] = "foo";

Of course, if you accidentally make any of the literals too short, the
compiler will silently insert a '\0' for you. I wouldn't try that
kind of thing unless I had written a program to generate the C source
code for me.
Both suggestions are hard to maintain. Constructing strings at run
time is probably a better idea.

Or you could ignore or disable the warning.

Or (I forgot to mention this in my earlier followup) you could read
the data from a file. You (the OP) may have a good reason not to want
to do that, or you probably wouldn't be asking how to do it directly
in your program.
 
D

David Mathog

Keith said:
Harald van Dijk said:
David said:
I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:
Another possibility:

char string_array[][100] = { "first 100 characters", "second 100
characters", "..." };
char *string = (char *) string_array;

Interesting. That takes advantage of the fact that a string literal
in an initializer doesn't have a trailing '\0' if it's *exactly* the
declared size. A simpler example is:

char s[3] = "foo";

Of course, if you accidentally make any of the literals too short, the
compiler will silently insert a '\0' for you. I wouldn't try that
kind of thing unless I had written a program to generate the C source
code for me.

The include file is generated by a script or program, unfortunately
one I don't yet have access to. In any case, the actual format is
currently like this (there are more than 2 scripts, but this illustrates
the point):

char *PerlScriptFile[]={"script1...","script2...");

where the scripts are all sorts of different lengths, and of course the
whole thing is awash in backslash escape characters, lines are all 52
characters long (ending in \ EOL, so effectively 50 characters per
line), and it goes on for several thousand lines. Anyway, if I'm
following this correctly, then doing something like this:

char script1[4500]="script1...";
char script2[7654]="script2...";
char *PerlScriptFile[]={script1,script2};

would eliminate the warnings, so long as the number of characters
used exactly matches the number of characters within the double quotes.

(I think I would have had the program copy from a file or files as well,
instead of doing it this way, but I believe the program's author did
this so that his program could generate these scripts without having to
look around for the source scripts.)

Thanks,

David Mathog
 
G

Guest

David said:
Keith said:
Harald van Dijk said:
David Mathog wrote:
I'm looking at a program which stores perl scripts in an array. Each
script is stored as a single entry in that array, and the whole set of
them live in a single header file (escaped out the wazoo to get the
perl code intact through the C preprocessor.) The issue is that
many of these strings are quite long, which causes gcc to throw
these sorts of warnings:
Another possibility:

char string_array[][100] = { "first 100 characters", "second 100
characters", "..." };
char *string = (char *) string_array;

Interesting. That takes advantage of the fact that a string literal
in an initializer doesn't have a trailing '\0' if it's *exactly* the
declared size. A simpler example is:

char s[3] = "foo";

Of course, if you accidentally make any of the literals too short, the
compiler will silently insert a '\0' for you. I wouldn't try that
kind of thing unless I had written a program to generate the C source
code for me.

The include file is generated by a script or program, unfortunately
one I don't yet have access to. In any case, the actual format is
currently like this (there are more than 2 scripts, but this illustrates
the point):

char *PerlScriptFile[]={"script1...","script2...");

where the scripts are all sorts of different lengths, and of course the
whole thing is awash in backslash escape characters, lines are all 52
characters long (ending in \ EOL, so effectively 50 characters per
line), and it goes on for several thousand lines. Anyway, if I'm
following this correctly, then doing something like this:

char script1[4500]="script1...";
char script2[7654]="script2...";
char *PerlScriptFile[]={script1,script2};

would eliminate the warnings, so long as the number of characters
used exactly matches the number of characters within the double quotes.

Such a solution is error phrone and not easy to maintain, IMHO. How
about folding the lines in each array, something like this?

$ cat a.c
#include <stdio.h>

const char *script1[] = {
"line 1",
"line 2",
"line 3",
"line 4",
};

const char *script2[] = {
"line 1",
"line 2",
"line 3",
"line 4",
};

struct {
size_t nlines;
const char **code;
} scripts[] = {
{ sizeof script1 / sizeof *script1, script1 },
{ sizeof script2 / sizeof *script2, script2 },
};

int main(void)
{
size_t i, j, nscripts = sizeof scripts / sizeof *scripts;

for(i = 0; i < nscripts; i++) {
for(j = 0; j < scripts.nlines; j++) {
printf("%s\n", scripts.code[j]);
}
}

return 0;
}


$ gcc -ansi -pedantic -W -Wall -o a a.c

$ ./a
line 1
line 2
line 3
line 4
line 1
line 2
line 3
line 4

The line lengths can now vary and you can have as many lines per
script(array) as you like. You may need to write a tiny script that
reformats the original code, but that's doable. ;-)

Bjørn
[snip]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top