Mixed initializer for char arrays?

L

Lauri Alanko

There are two ways to initialize a char array: with integer
expressions or with a string literal:

char foo[3] = { 102, 111, 111 };
char bar[4] = "bar";

However, I'd like to initialize the array so that its first element is
defined with a (constant) integer expression, and the rest of the
elements are defined with a string literal, i.e. something like:

char baz[4] = { 3, "baz" };

This is of course illegal. What I can do is:

char* baz = (char*)&(struct {char i; char c[3]}){ 3, "baz"};

But to my understanding there can be padding between i and c, so this
isn't fully portable. Is there a better way?


Lauri
 
E

Eric Sosman

There are two ways to initialize a char array: with integer
expressions or with a string literal:

char foo[3] = { 102, 111, 111 };
char bar[4] = "bar";

However, I'd like to initialize the array so that its first element is
defined with a (constant) integer expression, and the rest of the
elements are defined with a string literal, i.e. something like:

char baz[4] = { 3, "baz" };

This is of course illegal. What I can do is:

char* baz = (char*)&(struct {char i; char c[3]}){ 3, "baz"};

But to my understanding there can be padding between i and c, so this
isn't fully portable. Is there a better way?

char baz[4] = "\003baz";

Note that there's no trailing '\0', because of the [4].
 
B

Barry Schwarz

There are two ways to initialize a char array: with integer
expressions or with a string literal:

char foo[3] = { 102, 111, 111 };
char bar[4] = "bar";

However, I'd like to initialize the array so that its first element is
defined with a (constant) integer expression, and the rest of the
elements are defined with a string literal, i.e. something like:

char baz[4] = { 3, "baz" };

This is of course illegal. What I can do is:

char* baz = (char*)&(struct {char i; char c[3]}){ 3, "baz"};

But to my understanding there can be padding between i and c, so this
isn't fully portable. Is there a better way?

Consider using an escape sequence, such as
char baz[4] = "\003baz";
 
J

James Kuyper

There are two ways to initialize a char array: with integer
expressions or with a string literal:

char foo[3] = { 102, 111, 111 };

As a general rule, if you're going to use a character type to store
numbers, you should explicitly declare it as either 'signed' or
'unsigned', depending upon how you're going to use it. Otherwise, weird
things can happen if you port the code to a platform where plain char
has a different signedness than the one you're expecting.
char bar[4] = "bar";

However, I'd like to initialize the array so that its first element is
defined with a (constant) integer expression, and the rest of the
elements are defined with a string literal, i.e. something like:

char baz[4] = { 3, "baz" };

This is of course illegal. What I can do is:

char* baz = (char*)&(struct {char i; char c[3]}){ 3, "baz"};

But to my understanding there can be padding between i and c, so this
isn't fully portable. Is there a better way?

Would this be acceptable?

char baz[4] = "\003baz";

By the way, you do realize that neither foo nor baz contains a
null-terminated string? If that wasn't deliberate, it's probably a
problem. If it was deliberate, I'd recommend adding a comment to your
code about that fact, for the benefit of future maintenance programmers.
 
L

Lauri Alanko

char baz[4] = "\003baz";

I see only a string literal, no integer expression.

The initializer is to be produced by a macro. The integer expression
involves sizeof, so the above approach doesn't work.

Well, except that I can do:

#define FOO(n,s) \
(n == 0 ? "\0" s : n == 1 ? "\1" s : n == 2 ? "\2" s : ....)

for 256 cases or so. This hardly seems ideal, though.


Lauri
 
I

Ike Naar

char baz[4] = "\003baz";

I see only a string literal, no integer expression.

The initializer is to be produced by a macro. The integer expression
involves sizeof, so the above approach doesn't work.

Well, except that I can do:

#define FOO(n,s) \
(n == 0 ? "\0" s : n == 1 ? "\1" s : n == 2 ? "\2" s : ....)

for 256 cases or so. This hardly seems ideal, though.

Would

char baz[4] = { 3, 'b', 'a', 'z' };

be acceptable? Or something like

#define INIT3(s) { 3, s[0], s[1], s[2] }

char foo[4] = INIT3("foo");
char bar[4] = INIT3("bar");
char baz[4] = INIT3("baz");
 
M

Mark Bluemel

char baz[4] = "\003baz";

I see only a string literal, no integer expression.

The initializer is to be produced by a macro. The integer expression
involves sizeof, so the above approach doesn't work.

Perhaps you could use a more advanced preprocessor such as "m4"?

Or could you generate (the relevant parts of) your source code from
another program?
 
S

Shao Miller

There are two ways to initialize a char array: with integer
expressions or with a string literal:

char foo[3] = { 102, 111, 111 };
char bar[4] = "bar";

Hmmm...

However, I'd like to initialize the array so that its first element is
defined with a (constant) integer expression, and the rest of the
elements are defined with a string literal, i.e. something like:

char baz[4] = { 3, "baz" };

This is of course illegal. What I can do is:

char* baz = (char*)&(struct {char i; char c[3]}){ 3, "baz"};

But to my understanding there can be padding between i and c, so this
isn't fully portable. Is there a better way?

It looks like you have C99, based on your "can do" with a compound
literal. If you don't mind having roughly an equal amount of padding to
your string literals, perhaps you could use something like:

#define M_FOO_STR "foo"

#include <stdio.h>

int main(void) {
char
foo_[2][sizeof M_FOO_STR] = {
{ [sizeof M_FOO_STR - 1] = sizeof M_FOO_STR },
{ M_FOO_STR },
},
* foo = foo_[0] + sizeof M_FOO_STR - 1;
char * i = foo + 1;

printf("size: %d, chars: { ", foo[0]);
while (*i)
printf("'%c', ", *i++);
printf("'\\0' }\n");
return 0;
}
 
S

Shao Miller

There are two ways to initialize a char array: with integer
expressions or with a string literal:

char foo[3] = { 102, 111, 111 };
char bar[4] = "bar";

Hmmm...

However, I'd like to initialize the array so that its first element is
defined with a (constant) integer expression, and the rest of the
elements are defined with a string literal, i.e. something like:

char baz[4] = { 3, "baz" };

This is of course illegal. What I can do is:

char* baz = (char*)&(struct {char i; char c[3]}){ 3, "baz"};

But to my understanding there can be padding between i and c, so this
isn't fully portable. Is there a better way?

It looks like you have C99, based on your "can do" with a compound
literal. If you don't mind having roughly an equal amount of padding to
your string literals, perhaps you could use something like:

#define M_DECL_STRING(id, string) \
char \
id ## _[2][sizeof string] = { \
{ [sizeof string - 1] = sizeof string }, \
{ string }, \
}, \
* id = id ## _[0] + sizeof string - 1

#include <stdio.h>

#define M_FOO_STR "foo"
#define M_BAR_STR "hum a few bars"

static M_DECL_STRING(foo, M_FOO_STR);
static M_DECL_STRING(bar, M_BAR_STR);

int main(void) {
char * i;

i = foo + 1;
printf("size: %d, chars: { ", foo[0]);
while (*i)
printf("'%c', ", *i++);
printf("'\\0' }\n");

i = bar + 1;
printf("size: %d, chars: { ", bar[0]);
while (*i)
printf("'%c', ", *i++);
printf("'\\0' }\n");

return 0;
}
 
S

Shao Miller

There are two ways to initialize a char array: with integer
expressions or with a string literal:

char foo[3] = { 102, 111, 111 };
char bar[4] = "bar";

Hmmm...

However, I'd like to initialize the array so that its first element is
defined with a (constant) integer expression, and the rest of the
elements are defined with a string literal, i.e. something like:

char baz[4] = { 3, "baz" };

This is of course illegal. What I can do is:

char* baz = (char*)&(struct {char i; char c[3]}){ 3, "baz"};

But to my understanding there can be padding between i and c, so this
isn't fully portable. Is there a better way?

It looks like you have C99, based on your "can do" with a compound
literal. If you don't mind having roughly an equal amount of padding to
your string literals, perhaps you could use something like:
#define M_DECL_STRING(id, string) \
char \
id ## _[2][sizeof string] = { \
{ [sizeof string - 1] = sizeof string }, \
{ string }, \
}, \
* id = id ## _[0] + sizeof string - 1

#include <stdio.h>

#define M_FOO_STR "foo"
#define M_BAR_STR "hum a few bars"

static M_DECL_STRING(foo, M_FOO_STR);
static M_DECL_STRING(bar, M_BAR_STR);

static void dump_string(char * string) {
printf("size: %d, chars: { ", *string);
while (*++string)
printf("'%c', ", *string);
printf("'\\0' }\n");
return;
}

int main(void) {
dump_string(foo);
dump_string(bar);
return 0;
}
 
S

Shao Miller

But beware that your string literals not exceed 'CHAR_MAX' in length, if
you are assigning a 'size_t' to a 'char'.
 
L

Lauri Alanko

Perhaps you could use a more advanced preprocessor such as "m4"?

Or could you generate (the relevant parts of) your source code from
another program?

Well, this is always an option for _every_ problem regarding C:
compile some other language into C. But taking that route amounts to
admitting that C alone isn't sufficient for your need, and you have to
resort to some other language. Obviously I'd like to avoid that, since
I have (in hindsight, quite questionably) chosen C as my language for
this particular project.


Lauri
 
L

Lauri Alanko

It looks like you have C99, based on your "can do" with a compound
literal. If you don't mind having roughly an equal amount of padding to
your string literals, perhaps you could use something like:
char
foo_[2][sizeof M_FOO_STR] = {
{ [sizeof M_FOO_STR - 1] = sizeof M_FOO_STR },
{ M_FOO_STR },
},

That's pretty ingenious. The overhead is annoying, so I'm not sure if
I really want to take this route, but it's nice to know there is at
least one really portable way to do this. Thanks!


Lauri
 
S

Shao Miller

Shao Miller   said:
It looks like you have C99, based on your "can do" with a compound
literal.  If you don't mind having roughly an equal amount of paddingto
your string literals, perhaps you could use something like:
       char
         foo_[2][sizeof M_FOO_STR] = {
             { [sizeof M_FOO_STR - 1] = sizeof M_FOO_STR },
             { M_FOO_STR },
           },

That's pretty ingenious. The overhead is annoying, so I'm not sure if
I really want to take this route, but it's nice to know there is at
least one really portable way to do this. Thanks!

The assumption I had here was that you are trying to have a 'char[]'
where the first element indicates the number of subsequent elements.

A couple of reasons I can think of for doing that would be:

- You are attempting to conform to some external-to-C requirement,
such as calling a function that expects such a format of 'char[]'.

- You yourself have a function which walks such a format of 'char[]'.

If the latter is the case, perhaps there's a "better way."

But for either case, do the arrays need to be set in stone or can they
be built at some point during execution?

/* Leave room for the size byte */
char
foo[1 + sizeof "foo"] = " " "foo",
bar[1 + "hum a few bars"] = " " "hum a few bars";

int main(void) {
one_time_init(foo);
one_time_init(bar);
/* Work with 'foo' and 'bar' */
/* ... */
return 0;
}

Or maybe your target implementations provide 'struct' "packing"
extensions?

Or maybe something else entirely... Depending on further detail
regarding the requirements/use case.
 
B

Barry Schwarz

char baz[4] = "\003baz";

I see only a string literal, no integer expression.

\003 is a constant integer expression.
The initializer is to be produced by a macro. The integer expression
involves sizeof, so the above approach doesn't work.

Well, except that I can do:

#define FOO(n,s) \
(n == 0 ? "\0" s : n == 1 ? "\1" s : n == 2 ? "\2" s : ....)

If the character following the integer could be a digit, you need to
expand these to three octal characters. But the preprocessor will not
evaluate the ?: operator so you could not use this in an attempt to
concatenate an integer to a string as part of an initializer.
 
E

Eric Sosman

char baz[4] = "\003baz";

I see only a string literal, no integer expression.

The initializer is to be produced by a macro. The integer expression
involves sizeof, so the above approach doesn't work.

Works for all the examples in your original post, though.
Maybe if you'd explain what you're trying to do, and why...
Or is it more fun to change the rules after each response? ;-)
 
A

Alphaeus

Shao Miller   said:
It looks like you have C99, based on your "can do" with a compound
literal.  If you don't mind having roughly an equal amount of paddingto
your string literals, perhaps you could use something like:
       char
         foo_[2][sizeof M_FOO_STR] = {
             { [sizeof M_FOO_STR - 1] = sizeof M_FOO_STR },
             { M_FOO_STR },
           },

That's pretty ingenious. The overhead is annoying, so I'm not sure if
I really want to take this route, but it's nice to know there is at
least one really portable way to do this. Thanks!

Lauri

Would this work for you?


#include <stdio.h>

#define mystring(variable, str_value) \
char variable[] = " " str_value; \
variable[0] = sizeof str_value;


#define int_char(variable, i, str) \
/* add an assert for i < 256 */ \
char variable[] = " " str; \
variable[0] = (char) i;

// shao, I just copied your dump_string function :)
static void dump_string(char * string) {
printf("size: %d, chars: { ", *string);
while (*++string)
printf("'%c', ", *string);
printf("'\\0' }\n");
return;
}

void main()
{
mystring(var1, "hello_world")
int_char(var2, 5, "hello_world")

dump_string(var1);
dump_string(var2);

return;
}
 
S

Shao Miller

... The overhead is annoying, so I'm not sure if
I really want to take this route, but it's nice to know there is at
least one really portable way to do this. Thanks!

The assumption I had here was that you are trying to have a 'char[]'
where the first element indicates the number of subsequent elements.
...

The code example below attempts to adjust for the padding you were
worried about. If you have:

struct {
unsigned char size;
char string[sizeof "foo"];
} foo = {
sizeof "foo",
string = "foo",
};

then I agree with you that it's possible that there might be padding
between the 'size' member and the 'string' member.

So we can look at the offset of the 'string' member and consider
something like this:

struct {
unsigned char size[XXX];
char string[sizeof "foo"];
} foo = {
{ [XXX - 1] = sizeof "foo" },
string = "foo",
};

where 'XXX' was the offset of 'string' we noted and the position 'XXX -
1' ought to designate the 'size' element that comes immediately before
the 'string' member. That'll probably do the trick. (Some insane
implementation might add further padding even though we "adjusted" for
padding. If we used struct tags, we could actually get guarantees,
instead!)

But if we are going to point a 'char *' to the last element of the
'size' member and we pass that pointer to a function which attempts to
read 'char' elements beyond, we could run out-of-bounds.

So we can "protect" the whole thing by wrapping with a union that has
the above struct as a member but also has a 'char[]' member that matches
the whole size of the struct. We can then point to where the last
element of 'size' would be if it were in that member. The bounds should
be sufficient.

If you need more portable guarantees for the tag-less struct business,
one could potentially make use of '__LINE__'.

#define AUTOTAG__(line) autotag_ ## line ## __
#define AUTOTAG_(line) AUTOTAG__(line)
#define AUTOTAG AUTOTAG_(__LINE__)

Anyway, the code is also available, with nice colours, at:

http://codepad.org/qYUfXcmR

Here's the code:

#include <stddef.h>
#include <stdio.h>

typedef unsigned char byte__;

/* Derived from Chris M. Thommason */
#define STRING_ALIGNMENT(str) ( \
offsetof( \
struct { \
byte__ b; \
char ca[sizeof (str)]; \
}, \
ca \
) \
)

#define STRING_STRUCT(str) \
struct { \
byte__ pre[STRING_ALIGNMENT(str)]; \
char ca[sizeof (str)]; \
}

#define STRING_STRUCT_INIT(str) { \
.pre = { \
[STRING_ALIGNMENT(str) - 1] = \
sizeof (str), \
}, \
.ca = str, \
}

#define STRING_UNION(str) \
union { \
STRING_STRUCT(str) s; \
byte__ ba[sizeof (STRING_STRUCT(str))]; \
}

#define STRING(str) ( \
(char *) ( \
(STRING_UNION(str)) { \
STRING_STRUCT_INIT(str), \
}.ba + \
STRING_ALIGNMENT(str) - \
1 \
) \
)

static char * foo = STRING("foo");
static char * bar = STRING("hum a few bars");
static char * baz = STRING("very, very baz");

static void dump_string(char * string) {
printf("size: %d, chars: { ", *string);
while (*++string)
printf("'%c', ", *string);
printf("'\\0' }\n");
return;
}

int main(void) {
dump_string(foo);
dump_string(bar);
dump_string(baz);
return 0;
}
 
S

Shao Miller

Shao Miller said:
It looks like you have C99, based on your "can do" with a compound
literal. If you don't mind having roughly an equal amount of padding to
your string literals, perhaps you could use something like:
char
foo_[2][sizeof M_FOO_STR] = {
{ [sizeof M_FOO_STR - 1] = sizeof M_FOO_STR },
{ M_FOO_STR },
},

That's pretty ingenious. The overhead is annoying, so I'm not sure if
I really want to take this route, but it's nice to know there is at
least one really portable way to do this. Thanks!

Would this work for you?


#include<stdio.h>

#define mystring(variable, str_value) \
char variable[] = " " str_value; \
variable[0] = sizeof str_value;

I got the impression that the original poster wanted the construction to
be completed during translation, rather than at execution time. Since
you have an assignment there, that requires execution. I certainly
agree that it'd be easier during execution!
[...]

// [...] I just copied your dump_string function :)
static void dump_string(char * string) {
printf("size: %d, chars: { ", *string);
while (*++string)
printf("'%c', ", *string);
printf("'\\0' }\n");
return;
}

Heheh. I got the impression that the original poster might be more
interested in a dumping function which actually uses the stored size,
rather than the null character sentinel value, so this function could
actually be rewritten for that scenario, instead...

Within functions, it can be fun to do something like:

#include <stddef.h>

char * init_string(char * string, size_t size) {
/* Omit size and null terminator */
string[0] = size - 2;
return string;
}

void func(void) {
char
foo[1 + sizeof "foo"] = " " "foo",
* dummy = init_string(foo, sizeof foo);
/* ... */
return;
}

thus lending itself to a macro like:

#define MAKE_STRING(string_name, string_literal) \
char \
(string_name)[1 + sizeof string_literal] = \
" " string_literal, \
* string_name ## _dummy_ = init_string( \
string_name, \
sizeof string_name \
)

void func(void) {
MAKE_STRING(foo, "foo");
/* ... */
return;
}

but that only works within functions due to the function call. :S
 
L

Lauri Alanko

struct {
unsigned char size[XXX];
char string[sizeof "foo"];
} foo = {
{ [XXX - 1] = sizeof "foo" },
string = "foo",
};

where 'XXX' was the offset of 'string' we noted and the position 'XXX -
1' ought to designate the 'size' element that comes immediately before
the 'string' member. That'll probably do the trick. (Some insane
implementation might add further padding even though we "adjusted" for
padding. If we used struct tags, we could actually get guarantees,
instead!)

How would a struct tag help? If we define a new struct type after
checking the offset in an older one, we have to provide a new struct
tag (if any).
But if we are going to point a 'char *' to the last element of the
'size' member and we pass that pointer to a function which attempts to
read 'char' elements beyond, we could run out-of-bounds.

How could this happen, if we ensure there is no padding?

Anyway, this is a nice approach. Perhaps one can use ?: to check
structs with various lengths of the size field and choose the one
where there is no padding. A bit cumbersome, but should be workable.

Thanks again.


Lauri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top