Initializing an array comprised of very long strings

Discussion in 'C Programming' started by David Mathog, May 4, 2007.

  1. David Mathog

    David Mathog Guest

    I'm looking at a program which stores perl scripts in an array. Each
    script is stored as a single entry in that array, and the whole set of
    them live in a single header file (escaped out the wazoo to get the
    perl code intact through the C preprocessor.) The issue is that
    many of these strings are quite long, which causes gcc to throw
    these sorts of warnings:

    scripts.h:1: warning: string length '4918' is greater than the length
    '4095' ISO C99 compilers are required to support

    Luckily gcc supports (much) longer strings so the warnings are just
    warnings. However this makes me wonder if there isn't some clever
    preprocessor trick that is standards compliant to get past this limit?
    For instance, could several shorter strings be combined somehow into a
    single longer string within the header file, or must the longer
    string be constructed at run time to safely avoid this warning?
    That is, is this string length limit for any const char * no matter
    how it is put together, or is it just a limitation that applies to
    statements like:

    astring = "......(many characters)...";

    where the limitation is on the right side of the expression?

    Thanks,

    David Mathog
     
    David Mathog, May 4, 2007
    #1
    1. Advertising

  2. David Mathog wrote:
    > I'm looking at a program which stores perl scripts in an array. Each
    > script is stored as a single entry in that array, and the whole set of
    > them live in a single header file (escaped out the wazoo to get the
    > perl code intact through the C preprocessor.) The issue is that
    > many of these strings are quite long, which causes gcc to throw
    > these sorts of warnings:
    >
    > scripts.h:1: warning: string length '4918' is greater than the length
    > '4095' ISO C99 compilers are required to support
    >
    > Luckily gcc supports (much) longer strings so the warnings are just
    > warnings. However this makes me wonder if there isn't some clever
    > preprocessor trick that is standards compliant to get past this limit?
    > For instance, could several shorter strings be combined somehow into a
    > single longer string within the header file, or must the longer
    > string be constructed at run time to safely avoid this warning?


    One possibility:

    char string[] = { 'o', 'n', 'e', ' ', 'b', 'y', ' ', 'o', 'n',
    'e', ..., '\0' };

    Another possibility:

    char string_array[][100] = { "first 100 characters", "second 100
    characters", "..." };
    char *string = (char *) string_array;

    Both suggestions are hard to maintain. Constructing strings at run
    time is probably a better idea.

    Or you could ignore or disable the warning.
     
    =?utf-8?B?SGFyYWxkIHZhbiBExLNr?=, May 4, 2007
    #2
    1. Advertising

  3. David Mathog <> writes:
    > I'm looking at a program which stores perl scripts in an array. Each
    > script is stored as a single entry in that array, and the whole set of
    > them live in a single header file (escaped out the wazoo to get the
    > perl code intact through the C preprocessor.) The issue is that
    > many of these strings are quite long, which causes gcc to throw
    > these sorts of warnings:
    >
    > scripts.h:1: warning: string length '4918' is greater than the length
    > '4095' ISO C99 compilers are required to support
    >
    > Luckily gcc supports (much) longer strings so the warnings are just
    > warnings. However this makes me wonder if there isn't some clever
    > preprocessor trick that is standards compliant to get past this limit?

    [...]

    The limit is on the length of a string *literal*, not of a string.
    Specifically (C99 5.2.4.1, Translation limits):

    -- 4095 characters in a character string literal or wide string
    literal (after concatenation)

    -- 65535 bytes in an object (in a hosted environment only)

    As long as you don't hit the 65535-byte limit, you can build up the
    string at runtime from a set of compile-time string literals. This is
    likely to be wasteful of space, since you'll have two copies of all
    the data. Some cleverness will also be required to avoid scanning the
    data multiple times; for example, a simple series of strcat() calls:

    char the_whole_thing[BIG_ENOUGH];
    the_whole_thing[0] = '\0';
    strcat(the_whole_thing, s[0]);
    strcat(the_whole_thing, s[1]);
    strcat(the_whole_thing, s[2]);
    /* ... */

    will re-scan the_whole_thing each time to determine where to start
    appending.

    You might be better off just ignoring the warning, assuming you're not
    concerned about the possibility of a compiler that actually imposes a
    fixed limit on the size of a string literal.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 5, 2007
    #3
  4. Harald van Dijk <> writes:
    > David Mathog wrote:
    >> I'm looking at a program which stores perl scripts in an array. Each
    >> script is stored as a single entry in that array, and the whole set of
    >> them live in a single header file (escaped out the wazoo to get the
    >> perl code intact through the C preprocessor.) The issue is that
    >> many of these strings are quite long, which causes gcc to throw
    >> these sorts of warnings:
    >>
    >> scripts.h:1: warning: string length '4918' is greater than the length
    >> '4095' ISO C99 compilers are required to support

    [snip]
    > One possibility:
    >
    > char string[] = { 'o', 'n', 'e', ' ', 'b', 'y', ' ', 'o', 'n',
    > 'e', ..., '\0' };
    >
    > Another possibility:
    >
    > char string_array[][100] = { "first 100 characters", "second 100
    > characters", "..." };
    > char *string = (char *) string_array;


    Interesting. That takes advantage of the fact that a string literal
    in an initializer doesn't have a trailing '\0' if it's *exactly* the
    declared size. A simpler example is:

    char s[3] = "foo";

    Of course, if you accidentally make any of the literals too short, the
    compiler will silently insert a '\0' for you. I wouldn't try that
    kind of thing unless I had written a program to generate the C source
    code for me.

    > Both suggestions are hard to maintain. Constructing strings at run
    > time is probably a better idea.
    >
    > Or you could ignore or disable the warning.


    Or (I forgot to mention this in my earlier followup) you could read
    the data from a file. You (the OP) may have a good reason not to want
    to do that, or you probably wouldn't be asking how to do it directly
    in your program.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 5, 2007
    #4
  5. David Mathog

    David Mathog Guest

    Keith Thompson wrote:
    > Harald van Dijk <> writes:
    >> David Mathog wrote:
    >>> I'm looking at a program which stores perl scripts in an array. Each
    >>> script is stored as a single entry in that array, and the whole set of
    >>> them live in a single header file (escaped out the wazoo to get the
    >>> perl code intact through the C preprocessor.) The issue is that
    >>> many of these strings are quite long, which causes gcc to throw
    >>> these sorts of warnings:


    >> Another possibility:
    >>
    >> char string_array[][100] = { "first 100 characters", "second 100
    >> characters", "..." };
    >> char *string = (char *) string_array;

    >
    > Interesting. That takes advantage of the fact that a string literal
    > in an initializer doesn't have a trailing '\0' if it's *exactly* the
    > declared size. A simpler example is:
    >
    > char s[3] = "foo";
    >
    > Of course, if you accidentally make any of the literals too short, the
    > compiler will silently insert a '\0' for you. I wouldn't try that
    > kind of thing unless I had written a program to generate the C source
    > code for me.


    The include file is generated by a script or program, unfortunately
    one I don't yet have access to. In any case, the actual format is
    currently like this (there are more than 2 scripts, but this illustrates
    the point):

    char *PerlScriptFile[]={"script1...","script2...");

    where the scripts are all sorts of different lengths, and of course the
    whole thing is awash in backslash escape characters, lines are all 52
    characters long (ending in \ EOL, so effectively 50 characters per
    line), and it goes on for several thousand lines. Anyway, if I'm
    following this correctly, then doing something like this:

    char script1[4500]="script1...";
    char script2[7654]="script2...";
    char *PerlScriptFile[]={script1,script2};

    would eliminate the warnings, so long as the number of characters
    used exactly matches the number of characters within the double quotes.

    (I think I would have had the program copy from a file or files as well,
    instead of doing it this way, but I believe the program's author did
    this so that his program could generate these scripts without having to
    look around for the source scripts.)

    Thanks,

    David Mathog
     
    David Mathog, May 7, 2007
    #5
  6. David Mathog wrote:
    > Keith Thompson wrote:
    >> Harald van Dijk <> writes:
    >>> David Mathog wrote:
    >>>> I'm looking at a program which stores perl scripts in an array. Each
    >>>> script is stored as a single entry in that array, and the whole set of
    >>>> them live in a single header file (escaped out the wazoo to get the
    >>>> perl code intact through the C preprocessor.) The issue is that
    >>>> many of these strings are quite long, which causes gcc to throw
    >>>> these sorts of warnings:

    >
    >>> Another possibility:
    >>>
    >>> char string_array[][100] = { "first 100 characters", "second 100
    >>> characters", "..." };
    >>> char *string = (char *) string_array;

    >>
    >> Interesting. That takes advantage of the fact that a string literal
    >> in an initializer doesn't have a trailing '\0' if it's *exactly* the
    >> declared size. A simpler example is:
    >>
    >> char s[3] = "foo";
    >>
    >> Of course, if you accidentally make any of the literals too short, the
    >> compiler will silently insert a '\0' for you. I wouldn't try that
    >> kind of thing unless I had written a program to generate the C source
    >> code for me.

    >
    > The include file is generated by a script or program, unfortunately
    > one I don't yet have access to. In any case, the actual format is
    > currently like this (there are more than 2 scripts, but this illustrates
    > the point):
    >
    > char *PerlScriptFile[]={"script1...","script2...");
    >
    > where the scripts are all sorts of different lengths, and of course the
    > whole thing is awash in backslash escape characters, lines are all 52
    > characters long (ending in \ EOL, so effectively 50 characters per
    > line), and it goes on for several thousand lines. Anyway, if I'm
    > following this correctly, then doing something like this:
    >
    > char script1[4500]="script1...";
    > char script2[7654]="script2...";
    > char *PerlScriptFile[]={script1,script2};
    >
    > would eliminate the warnings, so long as the number of characters
    > used exactly matches the number of characters within the double quotes.


    Such a solution is error phrone and not easy to maintain, IMHO. How
    about folding the lines in each array, something like this?

    $ cat a.c
    #include <stdio.h>

    const char *script1[] = {
    "line 1",
    "line 2",
    "line 3",
    "line 4",
    };

    const char *script2[] = {
    "line 1",
    "line 2",
    "line 3",
    "line 4",
    };

    struct {
    size_t nlines;
    const char **code;
    } scripts[] = {
    { sizeof script1 / sizeof *script1, script1 },
    { sizeof script2 / sizeof *script2, script2 },
    };

    int main(void)
    {
    size_t i, j, nscripts = sizeof scripts / sizeof *scripts;

    for(i = 0; i < nscripts; i++) {
    for(j = 0; j < scripts.nlines; j++) {
    printf("%s\n", scripts.code[j]);
    }
    }

    return 0;
    }


    $ gcc -ansi -pedantic -W -Wall -o a a.c

    $ ./a
    line 1
    line 2
    line 3
    line 4
    line 1
    line 2
    line 3
    line 4

    The line lengths can now vary and you can have as many lines per
    script(array) as you like. You may need to write a tiny script that
    reformats the original code, but that's doable. ;-)

    Bjørn
    [snip]
     
    =?UTF-8?B?QmrDuHJuIEF1Z2VzdGFk?=, May 7, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Raymond Arthur St. Marie II of III

    very Very VERY dumb Question About The new Set( ) 's

    Raymond Arthur St. Marie II of III, Jul 23, 2003, in forum: Python
    Replies:
    4
    Views:
    510
    Raymond Hettinger
    Jul 27, 2003
  2. shanx__=|;-

    very very very long integer

    shanx__=|;-, Oct 16, 2004, in forum: C Programming
    Replies:
    19
    Views:
    1,704
    Merrill & Michele
    Oct 19, 2004
  3. Abhishek Jha

    very very very long integer

    Abhishek Jha, Oct 16, 2004, in forum: C Programming
    Replies:
    4
    Views:
    456
    jacob navia
    Oct 17, 2004
  4. Andrew Crabb

    Writing a module comprised of several files?

    Andrew Crabb, May 17, 2004, in forum: Perl Misc
    Replies:
    2
    Views:
    112
    Anno Siegel
    May 18, 2004
  5. Replies:
    6
    Views:
    129
    Ben Morrow
    Aug 13, 2004
Loading...

Share This Page