Non-constant constant strings

R

Rick C. Hodgin

I have a need for something like this, except that I need to edit list[N]'sdata, as in memcpy(list[0], "eno", 3):
char* list[] = { "one", "two", "three", "four" };

I have a work-around like this:
char one[] = "one";
char two[] = "two";
char three[] = "three";
char four[] = "four";
char* list[] = { one, two, three, four };

However, this is clunky because I want to be able to change the items because in the actual application it is source code that I'm coding within the compiler for an automatic processor. For example:

char* readonlySourceCode[] =
{
"if (something[9999])\r\n",
"{\r\n",
" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",
"}"
};

My algorithm iterates through the list and looks for "[" with 4 characters between, and then a closing "]" ... when found, it injects at runtime the current reference which begins at 1 and increments up to the maximum value which, at present, is 812. It might change over time.

I want to use the list this way because I will alter the source code from time to time.

Obviously, this is creating a series of constant strings which are setup inread-only memory. My runtime solutions are two-fold: First, I can replace the pointers with a copy of each one malloc()'d and then memcpy()'d, which is my current solution. Or, I can do this:

char line1[] = "if (something[9999])\r\n";
char line2[] = "{\r\n";
char line3[] = " // Do something\r\n";
char line4[] = "} else {\r\n";
char line5[] = " // Do something else\r\n";
char line6[] = "}";

char* editableSourceCode[] = { line1, line2, line3, line4, line5, line6 };

My issue is that the source changes periodically, and the actual use case is about 100 lines, which also increases from time to time, and causes the numbering system in source code to be off.

Is there a way to create the lines in the readonlySourceCode definition so it's not read-only. I'm using Visual C++, and am looking for something like this:

#pragma data_seg(push, ".data", all)

Or this:
char* readwriteSourceCode[] =
{
_rw("if (something[9999])\r\n"),
_rw("{\r\n"),
_rw(" // Do something\r\n"),
_rw("} else {\r\n"),
_rw(" // Do something else\r\n"),
_rw("}")
};

Thank you in advance.

Best regards,
Rick C. Hodgin
 
E

Eric Sosman

I have a need for something like this, except that I need to edit list[N]'s data, as in memcpy(list[0], "eno", 3):
char* list[] = { "one", "two", "three", "four" };

I have a work-around like this:
char one[] = "one";
char two[] = "two";
char three[] = "three";
char four[] = "four";
char* list[] = { one, two, three, four };

That works -- but not if you try strcpy(one, "five"). That is,
you can only replace these strings with replacements that are no
longer than the originals.
However, this is clunky because I want to be able to change the items because in the actual application it is source code that I'm coding within the compiler for an automatic processor. For example:

char* readonlySourceCode[] =
{
"if (something[9999])\r\n",
"{\r\n",
" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",
"}"
};

My algorithm iterates through the list and looks for "[" with 4 characters between, and then a closing "]" ... when found, it injects at runtime the current reference which begins at 1 and increments up to the maximum value which, at present, is 812. It might change over time.

I'm not sure what you mean by "inject." You write as if you
mean to plop something else down in place of the 9999, but I don't
see what use that would be: The altered strings would still be inside
your program, not in a place where a compiler could get at them.

If you intend to write the altered strings to a file and then
compile the file, consider how printf() does things: It has no need
to replace the "%d" in a format string with the "1234" that goes to
the output; rather, it leaves the "%d" untouched and sends the "1234"
to the output.

... but, as I say, I'm not clear on what "inject" means here.
[...]
Is there a way to create the lines in the readonlySourceCode definition so it's not read-only.

Not in C, although particular compilers may allow it as an
extension. Among other things, a C compiler is permitted to merge
common suffixes, so (for example) the string literals "ant" and
"cant" and "secant" might occupy a total of seven bytes.

I'm using Visual C++, [...]

If you're writing C++, try comp.lang.c++ instead of here.
 
K

Kaz Kylheku


This is not how you write compilers or compiler-like transliterators in 2014.

You do it in a high level language where you don't have to mess around with low
level string manipulation and memory management (or avoidance thereof).

You will be done in way less time, and with fewer bugs.

Possibly, the performance of the thing will be better (if it even matters).
 
A

Asaf Las

However, this is clunky because I want to be able to change
the items because in the actual application it is source code
that I'm coding within the compiler for an automatic processor.
Rick C. Hodgin

you can write your own byte code machine and allocate statically big
enough array for opcodes to be loaded from text files or anything else

if speed is not issue www.swig.org or similar can glue your program with interpreter languages

or define generic api and write your logic in c files so your application
will invoke c compiler to create dynamically loaded lib and load them on fly
 
R

Rick C. Hodgin

There are lots of solutions and workarounds. I'm looking for a compiler directive that will override the default behavior of allocating constant strings to read-only memory, and instead allocate them to read-write memory.

char foo[] = "Rick"; // Goes to read-write memory
char* list[] = { "Rick" } // Goes to read-only memory

I want a way for list[0] to go to the same place as foo. I am using Visual C++ compiler, but I am writing in C. I use the C++ compiler because it has some relaxed syntax constraints.

Best regards,
Rick C. Hodgin
 
G

glen herrmannsfeldt

(snip)
char foo[] = "Rick"; // Goes to read-write memory
char* list[] = { "Rick" } // Goes to read-only memory
I want a way for list[0] to go to the same place as foo.
I am using Visual C++ compiler, but I am writing in C.
I use the C++ compiler because it has some relaxed syntax
constraints.

As I remember it, not having looked recently, the pre-ANSI (K&R)
compilers allowed writable strings. While not the best practice,
it was an allowed and sometimes useful technique.

Some compilers have an option still to do that. Note that this
option also has to be sure to separately allocate strings that are
otherwise equal.

For K&R it would be:

static char *list[]={"Rick", "Rick", "Rick"};

(K&R didn't allow initializing auto arrays.)

Note that in this example all three are the same and, if read only,
there is no need to store separate copies.

Is it really so bad to initialize separate variables, and then
initialize an array with those pointer values?

-- glen
 
R

Rick C. Hodgin

Is it really so bad to initialize separate variables, and then
initialize an array with those pointer values?


As I change source code from time to time I would like to be able to edit the strings defined within this block, as it's closer to a real source code layout with minimal overhead to maintain:

char* sourceCode[] =
{
"if (foo[9999]) {\r\n",
" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",
"}\r\n"
};

Changes to this:
char* sourceCode[] =
{
"if (foo[9999] == 0) {\r\n",
" // Do something\r\n",
"} else if (foo[9999] == 1) {\r\n",
" // Do something else\r\n",
"} else {\r\n",
" // Do some other things\r\n",
"}"
};

Rather than this:
char line10[] = "if (foo[9999]) {\r\n";
char line20[] = " // Do something\r\n";
char line30[] = "} else {\r\n";
char line40[] = " // Do something else\r\n";
char line50[] = "}\r\n";
char* sourceCode[] = { line10, line20, line30, line40, line50 };

Changed to this:
char line10[] = "if (foo[9999] == 0) {\r\n";
char line20[] = " // Do something\r\n";
char line30[] = "} else if (foo[9999] == 1) {\r\n";
char line40[] = " // Do something else\r\n";
char line43[] = "} else }\r\n";
char line47[] = " // Do some other things\r\n";
char line50[] = "}\r\n";
char* sourceCode[] = { line10, line20, line30, line40, line43, line47, line50 };

Because now I'm back in the days of BASICA and needing the RENUM 100,10 ability to redistribute as my initially defined numbering system gets bigger. Or I begin having unusual naming conventions with extra parts tagged on (line41a or line41_1, and so on) because (1) I must manually give everything names so they are explicitly referenced thereafter, and because (2) the code assigned to each item will change from time to time it (3) introduces thepossibility of additional errors due to the mechanics of setting everything up (something the compiler should handle).

My current solution to do something along these lines during initialization:
char* sourceCode[] =
{
"if (foo[9999] == 0) {\r\n",
" // Do something\r\n",
"} else if (foo[9999] == 1) {\r\n",
" // Do something else\r\n",
"} else {\r\n",
" // Do some other things\r\n",
"}",
null
};

int i, len;
char* ptr;
for (i = 0; list != null; i++)
{
len = strlen(list) + 1;
ptr = (char*)malloc(len);
memcpy(ptr, list[0], len);
list[0] = ptr;
}

This works ... but with the compiler switch it wouldn't even be necessary. The compiler would remove the possibility of these errors.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

one option might be to change your
char* sourceCode[] = { ... };

to
char sourceCode[][MAXLEN] = { ... };

This will "waste" some memory, as all the lines will take the space of a
"full" line, and runs the danger of losing the terminating null on a
line that just exactly overruns the length, but does give you the easy
to edit format.


That would fix it. I appreciate the suggestion. I'm still holding out hope for a #pragma directive, or constant string wrapper macro, that does what I'm after. :)

Best regards,
Rick C. Hodgin
 
A

Aleksandar Kuktin

My current solution to do something along these lines during
initialization:
char* sourceCode[] =
{
"if (foo[9999] == 0) {\r\n",
" // Do something\r\n",
"} else if (foo[9999] == 1) {\r\n",
" // Do something else\r\n",
"} else {\r\n",
" // Do some other things\r\n", "}",
null
};

int i, len;
char* ptr;
for (i = 0; list != null; i++)
{
len = strlen(list) + 1;
ptr = (char*)malloc(len); memcpy(ptr, list[0], len);
list[0] = ptr;
}


And what, exactly, is wrong with the basic principle of this approach?

I personally would have done something like this:
char *read_only[] = { "Rick", "Jane", "Marc", 0 };
char **read_write;

char **init_readwrite(char **readonly) {
unsigned int i, count;
char **readwrite;

for (count=0, i=0; readonly; i++) {
count++;
}
readwrite = malloc(count * sizeof(*readwrite));
/* no check */
return memcpy(read_write, read_only, count * sizeof(*readwrite));
}

read_write = init_readwrite(read_only);

....And then you operate on read_write and ignore read_only.
 
E

Eric Sosman

one option might be to change your
char* sourceCode[] = { ... };

to
char sourceCode[][MAXLEN] = { ... };

This will "waste" some memory, as all the lines will take the space of a
"full" line, and runs the danger of losing the terminating null on a
line that just exactly overruns the length, but does give you the easy
to edit format.


That would fix it. I appreciate the suggestion. I'm still holding out hope for a #pragma directive, or constant string wrapper macro, that does what I'm after. :)

... and you've been told (by more than one respondent) where
to find such a thing, if it exists: In the documentation of the
compiler you happen to be using, somewhere in the "Extensions to
C" or "Beyond C" or "Things That Aren't Quite C" section.

The C language does not offer what you ask for.
 
R

Rick C. Hodgin

My current solution to do something along these lines during
initialization:
char* sourceCode[] =
{
"if (foo[9999] == 0) {\r\n",
" // Do something\r\n",
"} else if (foo[9999] == 1) {\r\n",
" // Do something else\r\n",
"} else {\r\n",
" // Do some other things\r\n", "}",
null
};

int i, len;
char* ptr;
for (i = 0; list != null; i++)
{
len = strlen(list) + 1;
ptr = (char*)malloc(len); memcpy(ptr, list[0], len);
list[0] = ptr;
}


And what, exactly, is wrong with the basic principle of this approach?


What "exactly" is wrong with this approach is that I must do something manually in code, something that is (a) unnecessary, (b) rather cumbersome mechanically, and (c) something the compiler would be capable of doing for me were it not for design protocol limitations being artificially imposed upon an otherwise valid data request for a block of read-write memory.
I personally would have done something like this:
char *read_only[] = { "Rick", "Jane", "Marc", 0 };
char **read_write;

char **init_readwrite(char **readonly) {
unsigned int i, count;
char **readwrite;

for (count=0, i=0; readonly; i++) {
count++;
}
readwrite = malloc(count * sizeof(*readwrite));
/* no check */
return memcpy(read_write, read_only, count * sizeof(*readwrite));
}

read_write = init_readwrite(read_only);
...And then you operate on read_write and ignore read_only.


Now you're dealing with a copy that must be free()'d each time after use. In my example I'm replacing the "[9999]" portion with something akin to printf("[%04u]", my_int_iterator_value++).

I don't have need of making copies of my data. It introduces unnecessary code, complexity, opportunity for errors. What I do have need of is accessing the data I've encoded, as it's encoded at comple-time, to be altered at run-time.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

I personally would have done something like this:
char *read_only[] = { "Rick", "Jane", "Marc", 0 };
char **read_write;

char **init_readwrite(char **readonly) {
unsigned int i, count;
char **readwrite;

for (count=0, i=0; readonly; i++) {
count++;
}
readwrite = malloc(count * sizeof(*readwrite));
/* no check */
return memcpy(read_write, read_only, count * sizeof(*readwrite));
}
read_write = init_readwrite(read_only);

...And then you operate on read_write and ignore read_only.


I have not gone through this deeply or tried it in code, but I'm thinking the theory of this solution would not work in all cases (and that this particular implementation also will not work).

Since each read_only[] pointer is to a constant string, and the compiler creates the entry in read-only memory, it could optimize and allow lines like"red" and "fred" to be mapped to the same four byte area, one pointing to "f" and one pointing to "r" after "f". So making a bulk copy would not copy all things properly in all cases.

I believe to be sure, you must copy each pointer out one-by-one to verify you'll always get an appropriate copy into read-write memory.

Best regards,
Rick C. Hodgin
 
K

Keith Thompson

glen herrmannsfeldt said:
(snip)
char foo[] = "Rick"; // Goes to read-write memory
char* list[] = { "Rick" } // Goes to read-only memory
I want a way for list[0] to go to the same place as foo.
I am using Visual C++ compiler, but I am writing in C.
I use the C++ compiler because it has some relaxed syntax
constraints.

As I remember it, not having looked recently, the pre-ANSI (K&R)
compilers allowed writable strings. While not the best practice,
it was an allowed and sometimes useful technique.

All versions of the C language, from K&R to ISO C11, have permitted
compilers to make string literals writable. What's changed over
time is that most compilers don't take advantage of that permission.
Some compilers have an option still to do that. Note that this
option also has to be sure to separately allocate strings that are
otherwise equal.

They don't *have* to do that unless they make additional guarantees
beyond what the language specifies.

If I write:

char *a = "foo";
char *b = "foo";
a[0] = 'F';
puts(b);

and the puts call is actually executed, the language permits it to
print either "foo", or "Foo"(or "fnord", or a suffusion of yellow).
The behavior of the assignment to a[0] is undefined, and once you
do that all bets are off.

But if a compiler were to guarantee, as a language extension,
that string literals are meaningfully modifiable, then it would
probably have to guarantee that the strings pointed to by a and
b must be distinct (unless the compiler can prove that they're
never modified). The compiler's documentation would have to spell
out just what additional guarantees it offers. (Such an extension
would not make the compiler non-conforming, since any code that
takes advantage of it have undefined behavior.)

[...]
 
K

Keith Thompson

Rick C. Hodgin said:
As I change source code from time to time I would like to be able to
edit the strings defined within this block, as it's closer to a real
source code layout with minimal overhead to maintain:

char* sourceCode[] =
{
"if (foo[9999]) {\r\n",
" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",
"}\r\n"
};
[...]

This isn't relevant to your question, but why do you have explicit
"\r\n" line endings? If your program reads and/or writes the
source code in text mode (or if you're on a UNIX-like system),
line endings are marked by a single '\n' character, regardless of
the format used by the operating system.
 
K

Keith Thompson

Rick C. Hodgin said:
I have a need for something like this, except that I need to edit list[N]'s data, as in memcpy(list[0], "eno", 3):
char* list[] = { "one", "two", "three", "four" };
[...]

It would be helpful if you'd format your articles to have lines
no longer than about 72 columns. Usenet is not the web, and
newsreaders don't necessarily deal with with arbitrary long lines.
(My newsreader does split long lines, but not at word boundaries.)
 
R

Rick C. Hodgin

glen herrmannsfeldt said:
Rick C. Hodgin said:
char foo[] = "Rick"; // Goes to read-write memory
char* list[] = { "Rick" } // Goes to read-only memory
I want a way for list[0] to go to the same place as foo.
I am using Visual C++ compiler, but I am writing in C.
I use the C++ compiler because it has some relaxed syntax
constraints.
As I remember it, not having looked recently, the pre-ANSI (K&R)
compilers allowed writable strings. While not the best practice,
it was an allowed and sometimes useful technique.

All versions of the C language, from K&R to ISO C11, have permitted
compilers to make string literals writable. What's changed over
time is that most compilers don't take advantage of that permission.

Ah! That's a shame. :)
Some compilers have an option still to do that. Note that this
option also has to be sure to separately allocate strings that are
otherwise equal.
They don't *have* to do that unless they make additional guarantees
beyond what the language specifies.

If I write:
char *a = "foo";
char *b = "foo";
a[0] = 'F';
puts(b);
and the puts call is actually executed, the language permits it to
print either "foo", or "Foo"(or "fnord", or a suffusion of yellow).
The behavior of the assignment to a[0] is undefined, and once you
do that all bets are off.

I believe the language should operate such that as I've defined a to point to "foo", and b to point to "foo", and these are separate strings, then they should be separate strings in memory, the same as if I'd said char* a="123"; char* b="456".
But if a compiler were to guarantee, as a language extension,
that string literals are meaningfully modifiable, then it would
probably have to guarantee that the strings pointed to by a and
b must be distinct (unless the compiler can prove that they're
never modified). The compiler's documentation would have to spell
out just what additional guarantees it offers. (Such an extension
would not make the compiler non-conforming, since any code that
takes advantage of it have undefined behavior.)

I personally believe it's a silly requirement to do such a comparison to save a few bytes of space by default. I'd rather have it always duplicated and then allow the developer to provide a manually inserted command line switch which specifically turns on that kind of checking, and that kind of substituting.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

Rick C. Hodgin said:
As I change source code from time to time I would like to be able to
edit the strings defined within this block, as it's closer to a real
source code layout with minimal overhead to maintain:

char* sourceCode[] =
{
"if (foo[9999]) {\r\n",
" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",
"}\r\n"
};

This isn't relevant to your question, but why do you have explicit
"\r\n" line endings? If your program reads and/or writes the
source code in text mode (or if you're on a UNIX-like system),
line endings are marked by a single '\n' character, regardless of
the format used by the operating system.

To be consistent with the source file input I'm processing. Without using both \r and \n it gives warnings when opening the files that the line endings are not consistent.

This program I'm writing this for is an augment of code generated by another tool. The other tool generates for a version 1.0 of the tool, and my tool modifies it for a version 2.0 of the tool.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

Rick C. Hodgin said:
I have a need for something like this, except that I need to edit
list[N]'s data, as in memcpy(list[0], "eno", 3):
char* list[] = { "one", "two", "three", "four" };
[...]

It would be helpful if you'd format your articles to have lines
no longer than about 72 columns. Usenet is not the web, and
newsreaders don't necessarily deal with with arbitrary long lines.
(My newsreader does split long lines, but not at word boundaries.)

Will do. I'm using Google Groups and it handles wrapping. I never knew
it was an issue for anyone.

Best regards,
Rick C. Hodgin
 
G

glen herrmannsfeldt

(snip, I wrote)
As I change source code from time to time I would like to be able
to edit the strings defined within this block, as it's closer to
a real source code layout with minimal overhead to maintain:
char* sourceCode[] =
{
"if (foo[9999]) {\r\n",
" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",
"}\r\n"
};
(snip)

Because now I'm back in the days of BASICA and needing the
RENUM 100,10 ability to redistribute as my initially defined
numbering system gets bigger. Or I begin having unusual naming
conventions with extra parts tagged on (line41a or line41_1,
and so on) because (1) I must manually give everything names
so they are explicitly referenced thereafter, and because (2)
the code assigned to each item will change from time to time it
(3) introduces the possibility of additional errors due to the
mechanics of setting everything up (something the compiler
should handle).

My usual solution in that case is to put all the data into a file
of some kind, than, as part of the build process, usually with make,
convert that file into appropriate C, just before compiling it.
My current solution to do something along these lines
during initialization:
(snip)

int i, len;
char* ptr;
for (i = 0; list != null; i++)
{
len = strlen(list) + 1;
ptr = (char*)malloc(len);
memcpy(ptr, list[0], len);
list[0] = ptr;
}

This works ... but with the compiler switch it wouldn't even
be necessary. The compiler would remove the possibility of
these errors.

In many cases when one wants initialized char data to modify,
the new length can be different. In that case, this solution doesn't
work, which reminded me of one case that does:

char sourcecode[][80]={
"if (foo[9999]) {\r\n",
" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",
"}\r\n",
}

In this case, you get the appropriate number of 80 element
char arrays, initialized to the given values.

Oh, also, one of my favorite C features (Java also has), you
can have the extra comma on the last line. Convenient for program
generated text, though most likely in the standard as it allows
for easy preprocessor conditionals.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,056
Messages
2,570,441
Members
47,101
Latest member
DoloresHol

Latest Threads

Top