Constant Strings

A

Adam L.

Hello all, again.

It's the Pascal guy trying to figure stuff out in C. :)

One of my programming 'ways' in Pascal is to create a unit file that has
most of the program's strings. Error messages, window titles, file paths,
etc... These are all constants.

1) What is the best way to have a long list of constant strings in C? I
read somewhere that I shouldn't define variables in a header file (which I
would do in the Interface part of a Pascal Unit). Do I just make some .c
file with all the strings and #include it somewhere?

2) What would you recommend as the type? A #define, const char[], ?

Just a note - I'm not using any type of RAD environment. This is a Linux
project. Much like my FreePascal coding, I use NEdit and the compiler. So
I don't have a fancy resource editor to click on.

Thanks!
 
I

Ian Collins

Adam said:
Hello all, again.

It's the Pascal guy trying to figure stuff out in C. :)

One of my programming 'ways' in Pascal is to create a unit file that has
most of the program's strings. Error messages, window titles, file paths,
etc... These are all constants.

1) What is the best way to have a long list of constant strings in C? I
read somewhere that I shouldn't define variables in a header file (which I
would do in the Interface part of a Pascal Unit). Do I just make some .c
file with all the strings and #include it somewhere?

2) What would you recommend as the type? A #define, const char[], ?
I would declare them as "extern const char*" in a header and define them
in source module.
 
O

Old Wolf

One of my programming 'ways' in Pascal is to create a unit file that has
most of the program's strings. Error messages, window titles, file paths,
etc... These are all constants.

1) What is the best way to have a long list of constant strings in C?

char const *const strings[] =
{ "string1"
, "string2"
, "the next string"
};

Then in your header file you can make this accessible to the
outside world by writing:
extern char const * const strings[];

If you want some bounds checking you'll have to do that
expliclitly, e.g. include a:
#define NUM_STRINGS 10
in the header file, and then perhaps a static assert in
the source file to check you actually have enough strings.
 
R

Richard Heathfield

Adam L. said:

1) What is the best way to have a long list of constant strings in C?

"Best" is a somewhat nebulous term.

One very simple way is to encapsulate the strings in a function:

const char *GetErrorString(size_t idx)
{
const char *ma[] =
{
"OK",
"Not enough wings for sodium substrate",
"Hub light has fallen out",
"Grain is too soft",
"Ink leak in cowshed",
"Microfilter is the wrong colour"
};
size_t len = sizeof ma / sizeof ma[0];
return idx < len ? ma[idx] : NULL;
}

You might wish to consider making the message array static.
(which I
would do in the Interface part of a Pascal Unit). Do I just make some
.c file with all the strings and #include it somewhere?

When you understand all eight translation phases, you will be in a good
position to realise not only why it's generally a bad idea to #include
..c files, but also why on very rare occasions it can be a good idea.

This is not one of those very rare occasions.
Just a note - I'm not using any type of RAD environment. This is a
Linux project.

Linux /is/ a RAD environment. :)
 
C

CBFalconer

Adam L. said:
It's the Pascal guy trying to figure stuff out in C. :)

One of my programming 'ways' in Pascal is to create a unit file
that has most of the program's strings. Error messages, window
titles, file paths, etc... These are all constants.

How are you going to refer to them? If by index number (possibly
something enumerated) you have to create an array of pointers and
initialize that. That can be a const array, with the pointers
pointing to const strings. Put it in a .c file, and make a .h file
that provides the essential information to other compilation units.
 
C

Chris Dollin

CBFalconer said:
How are you going to refer to them? If by index number (possibly
something enumerated) you have to create an array of pointers and
initialize that. That can be a const array, with the pointers
pointing to const strings. Put it in a .c file, and make a .h file
that provides the essential information to other compilation units.

And do that latter /with a program/, to avoid horrible didn't-match-up
and didn't-recompile-everything errors.
 
P

Peter J. Holzer

Adam said:
One of my programming 'ways' in Pascal is to create a unit file that has
most of the program's strings. Error messages, window titles, file paths,
etc... These are all constants.

1) What is the best way to have a long list of constant strings in C? I
read somewhere that I shouldn't define variables in a header file (which I
would do in the Interface part of a Pascal Unit). Do I just make some .c
file with all the strings and #include it somewhere?

2) What would you recommend as the type? A #define, const char[], ?
I would declare them as "extern const char*" in a header and define them
in source module.

Why "extern const char*" and not "extern const char[]"?

(Yes, I can think of a reason - you don't have to change the interface
if in the future you decide that the strings shouldn't really be
constant)

hp
 
B

Ben Bacarisse

Adam L. said:
One of my programming 'ways' in Pascal is to create a unit file that has
most of the program's strings. Error messages, window titles, file paths,
etc... These are all constants.

1) What is the best way to have a long list of constant strings in C? I
read somewhere that I shouldn't define variables in a header file (which I
would do in the Interface part of a Pascal Unit). Do I just make some .c
file with all the strings and #include it somewhere?

As has been said, "best" is rather hard to pin down. If you want to
name your strings (rather than having them indexed) you can do it like
this:

#ifndef H_MYSTRINGS
#define H_MYSTRINGS

#ifdef DEFINE_THE_STRINGS
#define MY_STRING(n, s) const char n[] = s
#else
#define MY_STRING(n, s) extern const char n[]
#endif

MY_STRING(error_one, "input required");
MY_STRING(error_two, "no input expected");
#endif

You put this in, say, "mystrings.h" and include it in any .c files
that use strings. One, and only one, .c file will have this code:

#define DEFINE_THE_STRINGS
#include "mystrings.h"

causing the const char arrays to be defined and initialised.

You can reduce the problem of having so many global names by using
token pasting to add a prefix to them all, if you like:

#define MY_STRING(n, s) const char msg_##n[] = s

I'll add an observation. This only pays off if these strings are used
all over the place, and such programs are not common. If you find
that you are referring to shared strings in lots of places, you may
want to think about some other design.

For example, error messages are often better handled by codes, with
only one function that needs to know how to turn them into strings.
In that case the, strings will just be in a table inside (or "close
to") the error function.

File names are usually best coming from outside of the program. They
should be set in configuration files or supplied as command-line
arguments. A few default names might be wanted, but it is likely that
they will be all in once place and the more usual static declaration
or simple a #define will suffice.

To put it simply, not all strings are created alike, and collecting
them together because they are strings may not be the right pattern.
 
C

CBFalconer

Peter J. Holzer said:
Ian Collins said:
Adam said:
One of my programming 'ways' in Pascal is to create a unit file
that has most of the program's strings. Error messages, window
titles, file paths, etc... These are all constants.

1) What is the best way to have a long list of constant strings
in C? I read somewhere that I shouldn't define variables in a
header file (which I would do in the Interface part of a Pascal
Unit). Do I just make some .c file with all the strings and
#include it somewhere?

2) What would you recommend as the type? A #define, const
char[], ?

I would declare them as "extern const char*" in a header and
define them in source module.

Why "extern const char*" and not "extern const char[]"?

(Yes, I can think of a reason - you don't have to change the
interface if in the future you decide that the strings shouldn't
really be constant)

And I can think of an anti-reason. You don't want to generate the
extra coding to copy all those strings into their storage in the
first place. The code generated is not proportional in size to the
source text.
 
P

Peter J. Holzer

Peter J. Holzer said:
Ian Collins said:
Adam L. wrote:
One of my programming 'ways' in Pascal is to create a unit file
that has most of the program's strings. Error messages, window
titles, file paths, etc... These are all constants.

1) What is the best way to have a long list of constant strings
in C? I read somewhere that I shouldn't define variables in a
header file (which I would do in the Interface part of a Pascal
Unit). Do I just make some .c file with all the strings and
#include it somewhere?

2) What would you recommend as the type? A #define, const
char[], ?

I would declare them as "extern const char*" in a header and
define them in source module.

Why "extern const char*" and not "extern const char[]"?

(Yes, I can think of a reason - you don't have to change the
interface if in the future you decide that the strings shouldn't
really be constant)

And I can think of an anti-reason. You don't want to generate the
extra coding to copy all those strings into their storage in the
first place.

Which extra coding? I was assuming that Ian meant something like this:

const char *msg1 = "Hello, world";
const char *msg2 = "How are your nasal demons?";
....

and asked why he preferred that to this:

const char msg1[] = "Hello, world";
const char msg2[] = "How are your nasal demons?";
....

In both cases there is no code here which copies anything. The linker
produces a suitable data segment in the executable, which is loaded at
startup.

(The possible change I was hinting at was that if msg1 is a pointer, you
can do something like

fgets(s, sizeof(s), msg_catalog_fp);
msg1 = mystrdup(s);

in an initialization routine and the rest of the application won't
notice any change - that is extra code, of course, but I mentioned that
as a future option)

hp
 
C

CBFalconer

Peter J. Holzer said:
.... snip ...

Which extra coding? I was assuming that Ian meant something like this:

const char *msg1 = "Hello, world";
const char *msg2 = "How are your nasal demons?";
...

and asked why he preferred that to this:

const char msg1[] = "Hello, world";
const char msg2[] = "How are your nasal demons?";
...

In both cases there is no code here which copies anything. The linker
produces a suitable data segment in the executable, which is loaded at
startup.

In the first case the actual strings are located somewhere, and may
be shared with other code. They are also non-writable. All that
is inserted in the user memory is a pointer.

In the second case something has to copy the strings into the user
memory, byte by byte. The method of doing this can vary greatly.
The results are NOT protected against alteration. The strings to
be copied exist somewhere, possibly only in the object code module
(as you suggested), but not limited to that.
 
I

Ian Collins

Peter said:
Adam said:
One of my programming 'ways' in Pascal is to create a unit file that has
most of the program's strings. Error messages, window titles, file paths,
etc... These are all constants.

1) What is the best way to have a long list of constant strings in C? I
read somewhere that I shouldn't define variables in a header file (which I
would do in the Interface part of a Pascal Unit). Do I just make some .c
file with all the strings and #include it somewhere?

2) What would you recommend as the type? A #define, const char[], ?
I would declare them as "extern const char*" in a header and define them
in source module.

Why "extern const char*" and not "extern const char[]"?
Because it's idiomatic C to use const char* for a string literal.
 
F

Flash Gordon

CBFalconer wrote, On 31/08/07 22:58:
:
... snip ...
Which extra coding? I was assuming that Ian meant something like this:

const char *msg1 = "Hello, world";
const char *msg2 = "How are your nasal demons?";
...

and asked why he preferred that to this:

const char msg1[] = "Hello, world";
const char msg2[] = "How are your nasal demons?";
...

In both cases there is no code here which copies anything. The linker
produces a suitable data segment in the executable, which is loaded at
startup.

In the first case the actual strings are located somewhere, and may
be shared with other code. They are also non-writable. All that
is inserted in the user memory is a pointer.

Apart from on implementations where the strings *will* have to be
copied, such as some of the ones I have worked on.
In the second case something has to copy the strings into the user
memory, byte by byte.

Why? Why can't the address of the array be the address of where ever the
string started?
The method of doing this can vary greatly.
The results are NOT protected against alteration.

The array is const qualified so attempting to modify it invoked
undefined behaviour. This means the implementation can put it in read
only memory just as it can put sting literals in writeable memory.
The strings to
be copied exist somewhere, possibly only in the object code module
(as you suggested), but not limited to that.

Why does this apply to a const qualified array but not a string literal?
Both are arrays, and modifying either invokes undefined behaviour.
 
C

CBFalconer

Flash said:
CBFalconer wrote, On 31/08/07 22:58:
:

... snip ...
Which extra coding? I was assuming that Ian meant something like this:

const char *msg1 = "Hello, world";
const char *msg2 = "How are your nasal demons?";
...

and asked why he preferred that to this:

const char msg1[] = "Hello, world";
const char msg2[] = "How are your nasal demons?";
...

In both cases there is no code here which copies anything. The linker
produces a suitable data segment in the executable, which is loaded at
startup.

In the first case the actual strings are located somewhere, and may
be shared with other code. They are also non-writable. All that
is inserted in the user memory is a pointer.

Apart from on implementations where the strings *will* have to be
copied, such as some of the ones I have worked on.
In the second case something has to copy the strings into the user
memory, byte by byte.

Why? Why can't the address of the array be the address of where
ever the string started?

Constant strings can be shared. The pointer system uses constant
strings. The array system uses copies of strings, which are not
necessarily constant.
The array is const qualified so attempting to modify it invoked
undefined behaviour. This means the implementation can put it in
read only memory just as it can put sting literals in writeable
memory.


Why does this apply to a const qualified array but not a string
literal? Both are arrays, and modifying either invokes undefined
behaviour.

Remember the sharing?
 
C

CBFalconer

Ian said:
Peter J. Holzer wrote:
.... snip ...
Why "extern const char*" and not "extern const char[]"?

Because it's idiomatic C to use const char* for a string literal.

Also the const array can only be initialized at declaration with a
constant. Hard to do from an external file. The pointer can be
initialized at any time, it isn't a const.
 
P

pete

Adam said:
Hello all, again.

It's the Pascal guy trying to figure stuff out in C. :)

One of my programming 'ways' in Pascal
is to create a unit file that has most of the program's strings.

Because it was one of your programming 'ways' in Pascal,
is not really a very good reason
to write C code in a certain way.

There is such a thing as
C code that looks like it was written by a Pascal writer at gunpoint.
I don't like it when I see code like that.
 
P

Peter J. Holzer

Flash said:
CBFalconer wrote, On 31/08/07 22:58:
:
Which extra coding? I was assuming that Ian meant something like this:

const char *msg1 = "Hello, world";
const char *msg2 = "How are your nasal demons?";
...

and asked why he preferred that to this:

const char msg1[] = "Hello, world";
const char msg2[] = "How are your nasal demons?";
...

In both cases there is no code here which copies anything. The linker
produces a suitable data segment in the executable, which is loaded at
startup.

In the first case the actual strings are located somewhere, and may
be shared with other code. They are also non-writable. All that
is inserted in the user memory is a pointer.

Apart from on implementations where the strings *will* have to be
copied, such as some of the ones I have worked on.
In the second case something has to copy the strings into the user
memory, byte by byte.

Why? Why can't the address of the array be the address of where
ever the string started?

Constant strings can be shared.

Mostly irrelevant in this case, I think. Since the purpose is to build a
message catalog, the strings will be mostly unique anyway.
The pointer system uses constant strings. The array system uses
copies of strings, which are not necessarily constant.

No, it does not use copies of strings. It uses initialized character
arrays. As a specific example, let's look at what code gcc produces in
these cases.

.file "cb_ptr.c"
..globl msg1
.section .rodata
..LC0:
.string "Hello, world"
.data
.align 4
.type msg1, @object
.size msg1, 4
msg1:
.long .LC0
..globl msg2
.section .rodata
..LC1:
.string "How are your nasal demons?"
.data
.align 4
.type msg2, @object
.size msg2, 4
msg2:
.long .LC1
.ident "GCC: (GNU) 4.1.2 20061115 (prerelease) (Debian
4.1.1-21)"
.section .note.GNU-stack,"",@progbits


There are the two strings (.LC0 and .LC1) in the .rodata section.
Additionally, there are two .long (i.e., 4 bytte) objects msg1 and msg2
in the .data section, which are initialized with the addresses of the
strings.
const char msg1[] = "Hello, world";
const char msg2[] = "How are your nasal demons?";

.file "cb_arr.c"
..globl msg1
.section .rodata
.type msg1, @object
.size msg1, 13
msg1:
.string "Hello, world"
..globl msg2
.type msg2, @object
.size msg2, 27
msg2:
.string "How are your nasal demons?"
.ident "GCC: (GNU) 4.1.2 20061115 (prerelease) (Debian
4.1.1-21)"
.section .note.GNU-stack,"",@progbits

Again we have the two strings in the .rodata section, but this time they
are called msg1 and msg2. So the only difference is that there is no
extra indirection through an additional pointer.


ACK. As demonstrated above.


Neither is a C implementation required to share strings or put them into
read-only memory. There is no difference between a string literal and an
initialized const char[] of static duration, except that the latter must
have a unique address (and I'm not even sure of that).

hp
 
P

Peter J. Holzer

Ian said:
Peter said:
Why "extern const char*" and not "extern const char[]"?

Because it's idiomatic C to use const char* for a string literal.

Emulating B with C is idiomatic C?

Also the const array can only be initialized at declaration with a
constant. Hard to do from an external file. The pointer can be
initialized at any time, it isn't a const.

That was the reason I mentioned in the beginning. But apparently this
wasn't the one Ian was thinking about.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top