Non-constant constant strings

Discussion in 'C Programming' started by Rick C. Hodgin, Jan 19, 2014.

  1. I have a need for something like this, except that I need to edit list[N]'sdata, as in memcpy(list[0], "eno", 3):
    char* list[] = { "one", "two", "three", "four" };

    I have a work-around like this:
    char one[] = "one";
    char two[] = "two";
    char three[] = "three";
    char four[] = "four";
    char* list[] = { one, two, three, four };

    However, this is clunky because I want to be able to change the items because in the actual application it is source code that I'm coding within the compiler for an automatic processor. For example:

    char* readonlySourceCode[] =
    {
    "if (something[9999])\r\n",
    "{\r\n",
    " // Do something\r\n",
    "} else {\r\n",
    " // Do something else\r\n",
    "}"
    };

    My algorithm iterates through the list and looks for "[" with 4 characters between, and then a closing "]" ... when found, it injects at runtime the current reference which begins at 1 and increments up to the maximum value which, at present, is 812. It might change over time.

    I want to use the list this way because I will alter the source code from time to time.

    Obviously, this is creating a series of constant strings which are setup inread-only memory. My runtime solutions are two-fold: First, I can replace the pointers with a copy of each one malloc()'d and then memcpy()'d, which is my current solution. Or, I can do this:

    char line1[] = "if (something[9999])\r\n";
    char line2[] = "{\r\n";
    char line3[] = " // Do something\r\n";
    char line4[] = "} else {\r\n";
    char line5[] = " // Do something else\r\n";
    char line6[] = "}";

    char* editableSourceCode[] = { line1, line2, line3, line4, line5, line6 };

    My issue is that the source changes periodically, and the actual use case is about 100 lines, which also increases from time to time, and causes the numbering system in source code to be off.

    Is there a way to create the lines in the readonlySourceCode definition so it's not read-only. I'm using Visual C++, and am looking for something like this:

    #pragma data_seg(push, ".data", all)

    Or this:
    char* readwriteSourceCode[] =
    {
    _rw("if (something[9999])\r\n"),
    _rw("{\r\n"),
    _rw(" // Do something\r\n"),
    _rw("} else {\r\n"),
    _rw(" // Do something else\r\n"),
    _rw("}")
    };

    Thank you in advance.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Jan 19, 2014
    #1
    1. Advertisements

  2. Rick C. Hodgin

    Eric Sosman Guest

    That works -- but not if you try strcpy(one, "five"). That is,
    you can only replace these strings with replacements that are no
    longer than the originals.
    I'm not sure what you mean by "inject." You write as if you
    mean to plop something else down in place of the 9999, but I don't
    see what use that would be: The altered strings would still be inside
    your program, not in a place where a compiler could get at them.

    If you intend to write the altered strings to a file and then
    compile the file, consider how printf() does things: It has no need
    to replace the "%d" in a format string with the "1234" that goes to
    the output; rather, it leaves the "%d" untouched and sends the "1234"
    to the output.

    ... but, as I say, I'm not clear on what "inject" means here.
    Not in C, although particular compilers may allow it as an
    extension. Among other things, a C compiler is permitted to merge
    common suffixes, so (for example) the string literals "ant" and
    "cant" and "secant" might occupy a total of seven bytes.

    I'm using Visual C++, [...]

    If you're writing C++, try comp.lang.c++ instead of here.
     
    Eric Sosman, Jan 19, 2014
    #2
    1. Advertisements

  3. Rick C. Hodgin

    Kaz Kylheku Guest

    This is not how you write compilers or compiler-like transliterators in 2014.

    You do it in a high level language where you don't have to mess around with low
    level string manipulation and memory management (or avoidance thereof).

    You will be done in way less time, and with fewer bugs.

    Possibly, the performance of the thing will be better (if it even matters).
     
    Kaz Kylheku, Jan 20, 2014
    #3
  4. Rick C. Hodgin

    Asaf Las Guest

    you can write your own byte code machine and allocate statically big
    enough array for opcodes to be loaded from text files or anything else

    if speed is not issue www.swig.org or similar can glue your program with interpreter languages

    or define generic api and write your logic in c files so your application
    will invoke c compiler to create dynamically loaded lib and load them on fly
     
    Asaf Las, Jan 20, 2014
    #4
  5. There are lots of solutions and workarounds. I'm looking for a compiler directive that will override the default behavior of allocating constant strings to read-only memory, and instead allocate them to read-write memory.

    char foo[] = "Rick"; // Goes to read-write memory
    char* list[] = { "Rick" } // Goes to read-only memory

    I want a way for list[0] to go to the same place as foo. I am using Visual C++ compiler, but I am writing in C. I use the C++ compiler because it has some relaxed syntax constraints.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Jan 20, 2014
    #5
  6. (snip)
    As I remember it, not having looked recently, the pre-ANSI (K&R)
    compilers allowed writable strings. While not the best practice,
    it was an allowed and sometimes useful technique.

    Some compilers have an option still to do that. Note that this
    option also has to be sure to separately allocate strings that are
    otherwise equal.

    For K&R it would be:

    static char *list[]={"Rick", "Rick", "Rick"};

    (K&R didn't allow initializing auto arrays.)

    Note that in this example all three are the same and, if read only,
    there is no need to store separate copies.

    Is it really so bad to initialize separate variables, and then
    initialize an array with those pointer values?

    -- glen
     
    glen herrmannsfeldt, Jan 20, 2014
    #6

  7. As I change source code from time to time I would like to be able to edit the strings defined within this block, as it's closer to a real source code layout with minimal overhead to maintain:

    char* sourceCode[] =
    {
    "if (foo[9999]) {\r\n",
    " // Do something\r\n",
    "} else {\r\n",
    " // Do something else\r\n",
    "}\r\n"
    };

    Changes to this:
    char* sourceCode[] =
    {
    "if (foo[9999] == 0) {\r\n",
    " // Do something\r\n",
    "} else if (foo[9999] == 1) {\r\n",
    " // Do something else\r\n",
    "} else {\r\n",
    " // Do some other things\r\n",
    "}"
    };

    Rather than this:
    char line10[] = "if (foo[9999]) {\r\n";
    char line20[] = " // Do something\r\n";
    char line30[] = "} else {\r\n";
    char line40[] = " // Do something else\r\n";
    char line50[] = "}\r\n";
    char* sourceCode[] = { line10, line20, line30, line40, line50 };

    Changed to this:
    char line10[] = "if (foo[9999] == 0) {\r\n";
    char line20[] = " // Do something\r\n";
    char line30[] = "} else if (foo[9999] == 1) {\r\n";
    char line40[] = " // Do something else\r\n";
    char line43[] = "} else }\r\n";
    char line47[] = " // Do some other things\r\n";
    char line50[] = "}\r\n";
    char* sourceCode[] = { line10, line20, line30, line40, line43, line47, line50 };

    Because now I'm back in the days of BASICA and needing the RENUM 100,10 ability to redistribute as my initially defined numbering system gets bigger. Or I begin having unusual naming conventions with extra parts tagged on (line41a or line41_1, and so on) because (1) I must manually give everything names so they are explicitly referenced thereafter, and because (2) the code assigned to each item will change from time to time it (3) introduces thepossibility of additional errors due to the mechanics of setting everything up (something the compiler should handle).

    My current solution to do something along these lines during initialization:
    char* sourceCode[] =
    {
    "if (foo[9999] == 0) {\r\n",
    " // Do something\r\n",
    "} else if (foo[9999] == 1) {\r\n",
    " // Do something else\r\n",
    "} else {\r\n",
    " // Do some other things\r\n",
    "}",
    null
    };

    int i, len;
    char* ptr;
    for (i = 0; list != null; i++)
    {
    len = strlen(list) + 1;
    ptr = (char*)malloc(len);
    memcpy(ptr, list[0], len);
    list[0] = ptr;
    }

    This works ... but with the compiler switch it wouldn't even be necessary. The compiler would remove the possibility of these errors.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Jan 20, 2014
    #7

  8. That would fix it. I appreciate the suggestion. I'm still holding out hope for a #pragma directive, or constant string wrapper macro, that does what I'm after. :)

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Jan 20, 2014
    #8


  9. And what, exactly, is wrong with the basic principle of this approach?

    I personally would have done something like this:
    char *read_only[] = { "Rick", "Jane", "Marc", 0 };
    char **read_write;

    char **init_readwrite(char **readonly) {
    unsigned int i, count;
    char **readwrite;

    for (count=0, i=0; readonly; i++) {
    count++;
    }
    readwrite = malloc(count * sizeof(*readwrite));
    /* no check */
    return memcpy(read_write, read_only, count * sizeof(*readwrite));
    }

    read_write = init_readwrite(read_only);

    ....And then you operate on read_write and ignore read_only.
     
    Aleksandar Kuktin, Jan 20, 2014
    #9
  10. Rick C. Hodgin

    Eric Sosman Guest

    ... and you've been told (by more than one respondent) where
    to find such a thing, if it exists: In the documentation of the
    compiler you happen to be using, somewhere in the "Extensions to
    C" or "Beyond C" or "Things That Aren't Quite C" section.

    The C language does not offer what you ask for.
     
    Eric Sosman, Jan 20, 2014
    #10


  11. What "exactly" is wrong with this approach is that I must do something manually in code, something that is (a) unnecessary, (b) rather cumbersome mechanically, and (c) something the compiler would be capable of doing for me were it not for design protocol limitations being artificially imposed upon an otherwise valid data request for a block of read-write memory.


    Now you're dealing with a copy that must be free()'d each time after use. In my example I'm replacing the "[9999]" portion with something akin to printf("[%04u]", my_int_iterator_value++).

    I don't have need of making copies of my data. It introduces unnecessary code, complexity, opportunity for errors. What I do have need of is accessing the data I've encoded, as it's encoded at comple-time, to be altered at run-time.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Jan 20, 2014
    #11
  12. Yup.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Jan 20, 2014
    #12


  13. I have not gone through this deeply or tried it in code, but I'm thinking the theory of this solution would not work in all cases (and that this particular implementation also will not work).

    Since each read_only[] pointer is to a constant string, and the compiler creates the entry in read-only memory, it could optimize and allow lines like"red" and "fred" to be mapped to the same four byte area, one pointing to "f" and one pointing to "r" after "f". So making a bulk copy would not copy all things properly in all cases.

    I believe to be sure, you must copy each pointer out one-by-one to verify you'll always get an appropriate copy into read-write memory.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Jan 20, 2014
    #13
  14. All versions of the C language, from K&R to ISO C11, have permitted
    compilers to make string literals writable. What's changed over
    time is that most compilers don't take advantage of that permission.
    They don't *have* to do that unless they make additional guarantees
    beyond what the language specifies.

    If I write:

    char *a = "foo";
    char *b = "foo";
    a[0] = 'F';
    puts(b);

    and the puts call is actually executed, the language permits it to
    print either "foo", or "Foo"(or "fnord", or a suffusion of yellow).
    The behavior of the assignment to a[0] is undefined, and once you
    do that all bets are off.

    But if a compiler were to guarantee, as a language extension,
    that string literals are meaningfully modifiable, then it would
    probably have to guarantee that the strings pointed to by a and
    b must be distinct (unless the compiler can prove that they're
    never modified). The compiler's documentation would have to spell
    out just what additional guarantees it offers. (Such an extension
    would not make the compiler non-conforming, since any code that
    takes advantage of it have undefined behavior.)

    [...]
     
    Keith Thompson, Jan 20, 2014
    #14
  15. [...]

    This isn't relevant to your question, but why do you have explicit
    "\r\n" line endings? If your program reads and/or writes the
    source code in text mode (or if you're on a UNIX-like system),
    line endings are marked by a single '\n' character, regardless of
    the format used by the operating system.
     
    Keith Thompson, Jan 20, 2014
    #15
  16. [...]

    It would be helpful if you'd format your articles to have lines
    no longer than about 72 columns. Usenet is not the web, and
    newsreaders don't necessarily deal with with arbitrary long lines.
    (My newsreader does split long lines, but not at word boundaries.)
     
    Keith Thompson, Jan 20, 2014
    #16
  17. Ah! That's a shame. :)
    I believe the language should operate such that as I've defined a to point to "foo", and b to point to "foo", and these are separate strings, then they should be separate strings in memory, the same as if I'd said char* a="123"; char* b="456".
    I personally believe it's a silly requirement to do such a comparison to save a few bytes of space by default. I'd rather have it always duplicated and then allow the developer to provide a manually inserted command line switch which specifically turns on that kind of checking, and that kind of substituting.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Jan 20, 2014
    #17
  18. To be consistent with the source file input I'm processing. Without using both \r and \n it gives warnings when opening the files that the line endings are not consistent.

    This program I'm writing this for is an augment of code generated by another tool. The other tool generates for a version 1.0 of the tool, and my tool modifies it for a version 2.0 of the tool.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Jan 20, 2014
    #18
  19. Will do. I'm using Google Groups and it handles wrapping. I never knew
    it was an issue for anyone.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Jan 20, 2014
    #19
  20. (snip, I wrote)
    My usual solution in that case is to put all the data into a file
    of some kind, than, as part of the build process, usually with make,
    convert that file into appropriate C, just before compiling it.

    In many cases when one wants initialized char data to modify,
    the new length can be different. In that case, this solution doesn't
    work, which reminded me of one case that does:

    char sourcecode[][80]={
    "if (foo[9999]) {\r\n",
    " // Do something\r\n",
    "} else {\r\n",
    " // Do something else\r\n",
    "}\r\n",
    }

    In this case, you get the appropriate number of 80 element
    char arrays, initialized to the given values.

    Oh, also, one of my favorite C features (Java also has), you
    can have the extra comma on the last line. Convenient for program
    generated text, though most likely in the standard as it allows
    for easy preprocessor conditionals.

    -- glen
     
    glen herrmannsfeldt, Jan 20, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.