Constant Strings

Discussion in 'C Programming' started by Adam L., Aug 30, 2007.

  1. Adam L.

    Adam L. Guest

    Hello all, again.

    It's the Pascal guy trying to figure stuff out in C. :)

    One of my programming 'ways' in Pascal is to create a unit file that has
    most of the program's strings. Error messages, window titles, file paths,
    etc... These are all constants.

    1) What is the best way to have a long list of constant strings in C? I
    read somewhere that I shouldn't define variables in a header file (which I
    would do in the Interface part of a Pascal Unit). Do I just make some .c
    file with all the strings and #include it somewhere?

    2) What would you recommend as the type? A #define, const char[], ?

    Just a note - I'm not using any type of RAD environment. This is a Linux
    project. Much like my FreePascal coding, I use NEdit and the compiler. So
    I don't have a fancy resource editor to click on.

    Thanks!
     
    Adam L., Aug 30, 2007
    #1
    1. Advertisements

  2. Adam L.

    Ian Collins Guest

    I would declare them as "extern const char*" in a header and define them
    in source module.
     
    Ian Collins, Aug 30, 2007
    #2
    1. Advertisements

  3. Adam L.

    Old Wolf Guest

    char const *const strings[] =
    { "string1"
    , "string2"
    , "the next string"
    };

    Then in your header file you can make this accessible to the
    outside world by writing:
    extern char const * const strings[];

    If you want some bounds checking you'll have to do that
    expliclitly, e.g. include a:
    #define NUM_STRINGS 10
    in the header file, and then perhaps a static assert in
    the source file to check you actually have enough strings.
     
    Old Wolf, Aug 30, 2007
    #3
  4. Adam L. said:

    "Best" is a somewhat nebulous term.

    One very simple way is to encapsulate the strings in a function:

    const char *GetErrorString(size_t idx)
    {
    const char *ma[] =
    {
    "OK",
    "Not enough wings for sodium substrate",
    "Hub light has fallen out",
    "Grain is too soft",
    "Ink leak in cowshed",
    "Microfilter is the wrong colour"
    };
    size_t len = sizeof ma / sizeof ma[0];
    return idx < len ? ma[idx] : NULL;
    }

    You might wish to consider making the message array static.
    When you understand all eight translation phases, you will be in a good
    position to realise not only why it's generally a bad idea to #include
    ..c files, but also why on very rare occasions it can be a good idea.

    This is not one of those very rare occasions.
    Linux /is/ a RAD environment. :)
     
    Richard Heathfield, Aug 30, 2007
    #4
  5. Adam L.

    CBFalconer Guest

    How are you going to refer to them? If by index number (possibly
    something enumerated) you have to create an array of pointers and
    initialize that. That can be a const array, with the pointers
    pointing to const strings. Put it in a .c file, and make a .h file
    that provides the essential information to other compilation units.
     
    CBFalconer, Aug 30, 2007
    #5
  6. Adam L.

    Chris Dollin Guest

    And do that latter /with a program/, to avoid horrible didn't-match-up
    and didn't-recompile-everything errors.
     
    Chris Dollin, Aug 30, 2007
    #6
  7. Why "extern const char*" and not "extern const char[]"?

    (Yes, I can think of a reason - you don't have to change the interface
    if in the future you decide that the strings shouldn't really be
    constant)

    hp
     
    Peter J. Holzer, Aug 30, 2007
    #7
  8. As has been said, "best" is rather hard to pin down. If you want to
    name your strings (rather than having them indexed) you can do it like
    this:

    #ifndef H_MYSTRINGS
    #define H_MYSTRINGS

    #ifdef DEFINE_THE_STRINGS
    #define MY_STRING(n, s) const char n[] = s
    #else
    #define MY_STRING(n, s) extern const char n[]
    #endif

    MY_STRING(error_one, "input required");
    MY_STRING(error_two, "no input expected");
    #endif

    You put this in, say, "mystrings.h" and include it in any .c files
    that use strings. One, and only one, .c file will have this code:

    #define DEFINE_THE_STRINGS
    #include "mystrings.h"

    causing the const char arrays to be defined and initialised.

    You can reduce the problem of having so many global names by using
    token pasting to add a prefix to them all, if you like:

    #define MY_STRING(n, s) const char msg_##n[] = s

    I'll add an observation. This only pays off if these strings are used
    all over the place, and such programs are not common. If you find
    that you are referring to shared strings in lots of places, you may
    want to think about some other design.

    For example, error messages are often better handled by codes, with
    only one function that needs to know how to turn them into strings.
    In that case the, strings will just be in a table inside (or "close
    to") the error function.

    File names are usually best coming from outside of the program. They
    should be set in configuration files or supplied as command-line
    arguments. A few default names might be wanted, but it is likely that
    they will be all in once place and the more usual static declaration
    or simple a #define will suffice.

    To put it simply, not all strings are created alike, and collecting
    them together because they are strings may not be the right pattern.
     
    Ben Bacarisse, Aug 30, 2007
    #8
  9. Adam L.

    CBFalconer Guest

    And I can think of an anti-reason. You don't want to generate the
    extra coding to copy all those strings into their storage in the
    first place. The code generated is not proportional in size to the
    source text.
     
    CBFalconer, Aug 30, 2007
    #9
  10. Which extra coding? I was assuming that Ian meant something like this:

    const char *msg1 = "Hello, world";
    const char *msg2 = "How are your nasal demons?";
    ....

    and asked why he preferred that to this:

    const char msg1[] = "Hello, world";
    const char msg2[] = "How are your nasal demons?";
    ....

    In both cases there is no code here which copies anything. The linker
    produces a suitable data segment in the executable, which is loaded at
    startup.

    (The possible change I was hinting at was that if msg1 is a pointer, you
    can do something like

    fgets(s, sizeof(s), msg_catalog_fp);
    msg1 = mystrdup(s);

    in an initialization routine and the rest of the application won't
    notice any change - that is extra code, of course, but I mentioned that
    as a future option)

    hp
     
    Peter J. Holzer, Aug 31, 2007
    #10
  11. Adam L.

    CBFalconer Guest

    In the first case the actual strings are located somewhere, and may
    be shared with other code. They are also non-writable. All that
    is inserted in the user memory is a pointer.

    In the second case something has to copy the strings into the user
    memory, byte by byte. The method of doing this can vary greatly.
    The results are NOT protected against alteration. The strings to
    be copied exist somewhere, possibly only in the object code module
    (as you suggested), but not limited to that.
     
    CBFalconer, Aug 31, 2007
    #11
  12. Adam L.

    Ian Collins Guest

    Because it's idiomatic C to use const char* for a string literal.
     
    Ian Collins, Aug 31, 2007
    #12
  13. Adam L.

    Flash Gordon Guest

    CBFalconer wrote, On 31/08/07 22:58:
    Apart from on implementations where the strings *will* have to be
    copied, such as some of the ones I have worked on.
    Why? Why can't the address of the array be the address of where ever the
    string started?
    The array is const qualified so attempting to modify it invoked
    undefined behaviour. This means the implementation can put it in read
    only memory just as it can put sting literals in writeable memory.
    Why does this apply to a const qualified array but not a string literal?
    Both are arrays, and modifying either invokes undefined behaviour.
     
    Flash Gordon, Sep 1, 2007
    #13
  14. Adam L.

    CBFalconer Guest

    Constant strings can be shared. The pointer system uses constant
    strings. The array system uses copies of strings, which are not
    necessarily constant.
    Remember the sharing?
     
    CBFalconer, Sep 1, 2007
    #14
  15. Adam L.

    CBFalconer Guest

    Also the const array can only be initialized at declaration with a
    constant. Hard to do from an external file. The pointer can be
    initialized at any time, it isn't a const.
     
    CBFalconer, Sep 1, 2007
    #15
  16. Adam L.

    pete Guest

    Because it was one of your programming 'ways' in Pascal,
    is not really a very good reason
    to write C code in a certain way.

    There is such a thing as
    C code that looks like it was written by a Pascal writer at gunpoint.
    I don't like it when I see code like that.
     
    pete, Sep 1, 2007
    #16
  17. Mostly irrelevant in this case, I think. Since the purpose is to build a
    message catalog, the strings will be mostly unique anyway.
    No, it does not use copies of strings. It uses initialized character
    arrays. As a specific example, let's look at what code gcc produces in
    these cases.

    .file "cb_ptr.c"
    ..globl msg1
    .section .rodata
    ..LC0:
    .string "Hello, world"
    .data
    .align 4
    .type msg1, @object
    .size msg1, 4
    msg1:
    .long .LC0
    ..globl msg2
    .section .rodata
    ..LC1:
    .string "How are your nasal demons?"
    .data
    .align 4
    .type msg2, @object
    .size msg2, 4
    msg2:
    .long .LC1
    .ident "GCC: (GNU) 4.1.2 20061115 (prerelease) (Debian
    4.1.1-21)"
    .section .note.GNU-stack,"",@progbits


    There are the two strings (.LC0 and .LC1) in the .rodata section.
    Additionally, there are two .long (i.e., 4 bytte) objects msg1 and msg2
    in the .data section, which are initialized with the addresses of the
    strings.
    .file "cb_arr.c"
    ..globl msg1
    .section .rodata
    .type msg1, @object
    .size msg1, 13
    msg1:
    .string "Hello, world"
    ..globl msg2
    .type msg2, @object
    .size msg2, 27
    msg2:
    .string "How are your nasal demons?"
    .ident "GCC: (GNU) 4.1.2 20061115 (prerelease) (Debian
    4.1.1-21)"
    .section .note.GNU-stack,"",@progbits

    Again we have the two strings in the .rodata section, but this time they
    are called msg1 and msg2. So the only difference is that there is no
    extra indirection through an additional pointer.


    ACK. As demonstrated above.


    Neither is a C implementation required to share strings or put them into
    read-only memory. There is no difference between a string literal and an
    initialized const char[] of static duration, except that the latter must
    have a unique address (and I'm not even sure of that).

    hp
     
    Peter J. Holzer, Sep 1, 2007
    #17
  18. Emulating B with C is idiomatic C?

    That was the reason I mentioned in the beginning. But apparently this
    wasn't the one Ian was thinking about.

    hp
     
    Peter J. Holzer, Sep 1, 2007
    #18
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.