Non-constant constant strings

Joe keane · Jan 21, 2014

But the array whose first element s points to is still
just 6 characters long, and unlike string literals, an object created by
a compound has automatic storage duration (it ceases to exist when you
leave the enclosing block).

How about this?

@ cat bar.c
char *bar[2] =
{
"jjj",
(char []) { "kkk" },
};
@ cc -S bar.c
@ cat bar.s
.file "bar.c"
.data
.type __compound_literal.0, @object
.size __compound_literal.0, 4
__compound_literal.0:
.string "kkk"
..globl bar
.section .rodata
..LC0:
.string "jjj"
.data
.align 4
.type bar, @object
.size bar, 8
bar:
.long .LC0
.long __compound_literal.0
.ident "GCC: (NetBSD nb2 20110806) 4.5.3"

Rick C. Hodgin · Jan 21, 2014

I'm one who would not readily change his mind, because (in
part) as things stand I can write stuff like:

const char *archiveFormats[] = {
#if CPIO_SUPPORTED
"cpio",
#endif
#if TAR_SUPPORTED
"tar",
#endif
#if ZIP_SUPPORTED
"ZIP",
#endif
#if APK_SUPPORTED
"apk",
#endif
};

It's *possible* to manage this sort of thing without introducing
an extra comma, but it's ugly as all-get-out:
[snip]

Try this, and then just always start at 1 instead of 0, and process until
you reach null:

const char *archiveFormats[] = {
null
#if CPIO_SUPPORTED
,"cpio"
#endif
#if TAR_SUPPORTED
,"tar"
#endif
#if ZIP_SUPPORTED
,"ZIP"
#endif
#if APK_SUPPORTED
,"apk"
#endif
,null
};

Best regards,
Rick C. Hodgin

Keith Thompson · Jan 21, 2014

Ian Collins said:
Most if not all of the programmer's editors I've used on Windows
recognise Unix line endings and gcc on Unix recognises Windows endings.
Text mode is something of a curse!

Not all Unix tools tolerate Windows-style line endings. For example,
if you write:

if [ "$x" = 42 ] ; then
echo ok
fi

in a bash script, and the script file uses Windows-style line endings,
bash will complain that "then\r" is an unrecognized token. (Except that
it will print the "\r" literally, causing a very confusing error
message.)

Blindly using "foreign" format text files on any system is not a good
idea.

James Kuyper · Jan 21, 2014

I'm one who would not readily change his mind, because (in
part) as things stand I can write stuff like:

const char *archiveFormats[] = {
#if CPIO_SUPPORTED
"cpio",
#endif
#if TAR_SUPPORTED
"tar",
#endif
#if ZIP_SUPPORTED
"ZIP",
#endif
#if APK_SUPPORTED
"apk",
#endif
};

It's *possible* to manage this sort of thing without introducing
an extra comma, but it's ugly as all-get-out:
[snip]

Click to expand...

Try this, and then just always start at 1 instead of 0, and process until
you reach null:

const char *archiveFormats[] = {
null
#if CPIO_SUPPORTED
,"cpio"
#endif
#if TAR_SUPPORTED
,"tar"
#endif
#if ZIP_SUPPORTED
,"ZIP"
#endif
#if APK_SUPPORTED
,"apk"
#endif
,null
};

As he said: ugly. The two extra nulls (and "null" needs to be defined)
seem far worse to me than the extra comma - they survive into the object
file, and even into the final executable, taking up extra space. The
extra comma disappears during translation phase 7 and has no impact on
the actual executable.

Keith Thompson · Jan 21, 2014

Rick C. Hodgin said:
I'm not sure I would've been keen on that idea. I would rather have
maintained it as a deprecated functionality that would have been
slated to be removed in a few version releases. The old compilers
could've generated object code in a particular version of a compiler
that could be maintained for backward compatibility without negating
the language in moving forward. My opinion.

(Reformatting your long lines *again*.)

That's exactly what they did. I gave you a link to a recent draft
of the C standard. Take a look at section 6.11.6:

The use of function declarators with empty parentheses (not
prototype-format parameter type declarators) is an obsolescent
feature.

I'm personally not happy with how long it's taken to actually remove the
feature, but it's been officially obsolescent (which means that it may
be considered for withdrawal in future revisions of the standard) since
1989.

[...]

Rick C. Hodgin · Jan 21, 2014

Most if not all of the programmer's editors I've used on Windows
recognise Unix line endings and gcc on Unix recognises Windows endings.
Text mode is something of a curse!

Click to expand...

Not all Unix tools tolerate Windows-style line endings. For example,
if you write:

if [ "$x" = 42 ] ; then
echo ok
fi

in a bash script, and the script file uses Windows-style line endings,
bash will complain that "then\r" is an unrecognized token. (Except that
it will print the "\r" literally, causing a very confusing error
message.)

Blindly using "foreign" format text files on any system is not a good
idea.

It's why my algorithm looks for \r or \n in any order, and then checks the
character after for the alternate (\r\n or \n\r combinations). If found,
it considers that grouping to be one newline. If not, it considers the single
character to be one newline. Then it continues parsing.

bash sounds like it needs some post-rebirth rehabilitation.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 21, 2014

As he said: ugly. The two extra nulls (and "null" needs to be defined)
seem far worse to me than the extra comma - they survive into the object
file, and even into the final executable, taking up extra space. The
extra comma disappears during translation phase 7 and has no impact on
the actual executable.

There's a part of me that agrees with you. I would go to lengths to avoid
having this kind of issue. Since this is a heavily used feature, I would
probably create some type of generic tool to distribute around which is an
on-the-fly builder capable of preparing lists, and then returning the source
code. And it would know how to handle commas.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 21, 2014

I'm not sure I would've been keen on that idea. I would rather have

(Reformatting your long lines *again*.)

I see the text in a window on Google Groups which is about 72 characters
wide. I have to manually insert carriage returns to break it up. I
sometimes forget. I apologize.

What news reader are you using? Try groups.google.com and subscribe to the
comp.lang.c group.

That's exactly what they did. I gave you a link to a recent draft
of the C standard. Take a look at section 6.11.6:
The use of function declarators with empty parentheses (not
prototype-format parameter type declarators) is an obsolescent
feature.

Awesome!

To quote the three drones from Voyager, former members of the
tertiary adjunct of unimatrix one that seven was in, one who appears very
much like Admiral Forrest from ST:Enterprise, "we have consensus."

I'm personally not happy with how long it's taken to actually remove the
feature, but it's been officially obsolescent (which means that it may
be considered for withdrawal in future revisions of the standard) since
1989.

I hear you. Always that backward compatibility. It's why it's important to
include dates. We have them in our U.S. Constitution even.

From the 21st amendment:
3. The article shall be inoperative unless it shall have been
ratified as an amendment to the Constitution ... within seven
years from the date of the submission hereof to the States
by the Congress.

Seven years is a good time period. It's the biblical period of forgiveness
(Deu 15:1), "At the end of every seven years you must cancel debts." How
the world would be better were that guidance followed.

Best regards,
Rick C. Hodgin

Keith Thompson · Jan 21, 2014

Rick C. Hodgin said:
In your proposed C-like language, what would this snippet print?
for (int i = 0; i < 2; i ++) {
char *s = "hello";
if (i == 0) {
s[0] = 'H';
}
puts(s);
}

In my proposed language, it would print "Hello" both times because the
char* s definition would've been pulled out of the loop and defined as a
function variable.

Click to expand...

That's fine; if you don't want to write code like that, you don't have
to. But I didn't ask how you'd re-write it; I asked how *that code*
should behave.

Click to expand...

I answered you. How should it behave?

I don't believe you did answer me.

In my compiler, I would pull the variable out and make it a function-variable
defined at the top, so it would've been altered the first time through and
both times would print Hello.

Do you mean by that that you would *change the source code I posted* so
that s is declared at a higher level? The result might be a better
program, but it's a different program than the one I posted, so that
doesn't answer my question at all.

Or do you mean that the compiler would implicitly do the equivalent of
moving s to a higher level? If so, it's unclear what that would mean.

How *should* it behave? In standard C, the behavior is undefined,
because it attempts to modify a string literal. I have no interest in
changing that rule (well, I'd prefer string literals to be const, but I
understand why they're not), so I have no further answer. You're the
one proposing changes; I'm asking you for details on how you can make
those changes consistently.

No ... I'm creating my own new language, RDC, which is C-like, but dumps a
lot of what I view as "hideous baggage left over from a bygone era" ... while
also adding a lot of new features I see as looking to the future of multiple
cores, GUI developer environments, touch screens, eventual 3D interfaces, and
more.

Ok. Then why are you discussing your non-C language in comp.lang.c?
Perhaps comp.lang.misc would be of interest to you.

[snip]

Keith Thompson · Jan 21, 2014

But the array whose first element s points to is still
just 6 characters long, and unlike string literals, an object created by
a compound has automatic storage duration (it ceases to exist when you
leave the enclosing block).

Click to expand...

How about this?

@ cat bar.c
char *bar[2] =
{
"jjj",
(char []) { "kkk" },
};
@ cc -S bar.c
@ cat bar.s

[SNIP]

I don't understand assembly language well enough to figure out what
point you're making.

glen herrmannsfeldt · Jan 21, 2014

(snip)

In my experience, the special features of text mode as compared to
binary mode are conventions associated with operating systems. As such,
files adhering to those conventions can be used to communicate between
any two programs compiled for that operating system, whether or not
they're running on the same platforms or different platforms.
I wouldn't be surprised to learn that there are conventions for the
layout of text files that are associated with things other than
operating systems - but offhand I can't think of any.

I believe that HTTP (and so HTML) are OS independent, and,
as well as I know, use the "\r\n" line endings.

-- glen

glen herrmannsfeldt · Jan 21, 2014

(snip)

I don't use that feature, and I don't like it. However, this feature
simplifies the creation of machine-generated C code, and people who
write such generators are apparently sufficiently numerous that the
committee felt a need to accommodate their desires.

It does, and I do sometimes generate look-up tables,

But I believe that simplifying the use of the preprocessor is
a more important use. One can #ifdef table entries, without
a special case for the last one. (Since you don't know which one
will be the last.)

The second best choice would be to waste the last entry, with a
null, zero, or some other useless item. Complicates a lot of
other coding, though.

-- glen

Keith Thompson · Jan 21, 2014

Rick C. Hodgin said:
Most if not all of the programmer's editors I've used on Windows
recognise Unix line endings and gcc on Unix recognises Windows endings.
Text mode is something of a curse!

Click to expand...

Not all Unix tools tolerate Windows-style line endings. For example,
if you write:

if [ "$x" = 42 ] ; then
echo ok
fi

in a bash script, and the script file uses Windows-style line endings,
bash will complain that "then\r" is an unrecognized token. (Except that
it will print the "\r" literally, causing a very confusing error
message.)

Blindly using "foreign" format text files on any system is not a good
idea.

Click to expand...

It's why my algorithm looks for \r or \n in any order, and then checks the
character after for the alternate (\r\n or \n\r combinations). If found,
it considers that grouping to be one newline. If not, it considers the single
character to be one newline. Then it continues parsing.

That's workable if your tool runs only on systems that use one of \r,
\n, \r\n, or \n\r to mark line endings. (And either you treat \n\n as
an empty line, or you can safely ignore empty lines.) But it breaks
down if you want to *write* text files.

C has text mode for a reason. Take a moment to consider the bare
possibility that the people who designed it were not idiots.

Keith Thompson · Jan 21, 2014

Rick C. Hodgin said:
I see the text in a window on Google Groups which is about 72 characters
wide. I have to manually insert carriage returns to break it up. I
sometimes forget. I apologize.

What news reader are you using? Try groups.google.com and subscribe to the
comp.lang.c group.

groups.google.com is the problem. Google provides a web interface to
Usenet, something that predates the web and even the Internet. Google
has done a horribly poor job with their interface and has been
unresponsive to complaints.

I use the news.eternal-september.org free Usenet server. The client I
use is Gnus, which runs under Emacs. Mozilla Thunderbird is another
popular client.

Rick C. Hodgin · Jan 21, 2014

Rick C. Hodgin said:
Rick C. Hodgin said:

In your proposed C-like language, what would this snippet print?
for (int i = 0; i < 2; i ++) {
char *s = "hello";
if (i == 0) {
s[0] = 'H';
}
puts(s);
}
In my proposed language, it would print "Hello" both times because the
char* s definition would've been pulled out of the loop and defined as a
function variable.

That's fine; if you don't want to write code like that, you don't have
to. But I didn't ask how you'd re-write it; I asked how *that code*
should behave.

Click to expand...

Click to expand...

I answered you. How should it behave?

Click to expand...

I don't believe you did answer me.

In my compiler, I would pull the variable out and make it a function-
variable defined at the top, so it would've been altered the first
time through and both times would print Hello.

Click to expand...

Do you mean by that that you would *change the source code I posted* so
that s is declared at a higher level? The result might be a better
program, but it's a different program than the one I posted, so that
doesn't answer my question at all.

Or do you mean that the compiler would implicitly do the equivalent of
moving s to a higher level? If so, it's unclear what that would mean.

The compiler would receive the definition of char* s where it is, but
it would logically create it as a local variable within the single
function. In short, I would not allow scoped variables within a block
within a function. I would have them all defined as local variables,
and they would all be available for use inside or outside of the block
they were defined in.

How *should* it behave? In standard C, the behavior is undefined,
because it attempts to modify a string literal. I have no interest in
changing that rule (well, I'd prefer string literals to be const, but I
understand why they're not), so I have no further answer. You're the
one proposing changes; I'm asking you for details on how you can make
those changes consistently.

In my case, it would not be a constant, but would be a string defined to
be the initial value indicated.

Ok. Then why are you discussing your non-C language in comp.lang.c?
Perhaps comp.lang.misc would be of interest to you.

Perhaps. It is/was all back story to my original question, the explanation
as to why I believe the strings in char* list[] = { "one", "two", "three" } should be read/write.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 21, 2014

I don't understand assembly language well enough to figure out what

point you're making.

I do understand assembly language, but I still didn't understand the
point being made.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 21, 2014

It's why my algorithm looks for \r or \n in any order, and then checks the

That's workable if your tool runs only on systems that use one of \r,
\n, \r\n, or \n\r to mark line endings. (And either you treat \n\n as
an empty line, or you can safely ignore empty lines.) But it breaks
down if you want to *write* text files.

Not at all. If it finds \r\n it is a single newline. If it finds \n\r it
is a single newline. If it finds \n\n it stops after the first \n and
considers it its own newline, and then continues parsing and encounters
the second \n and it is also its own newline. \n\n would be a double space.

C has text mode for a reason. Take a moment to consider the bare
possibility that the people who designed it were not idiots.

It's interesting that such a handy helper feature like text mode exists
to "help" developers, while other more obvious assistance features are
left completely out -- such as certain variable types not always being a
specified number of bits across platforms.

For the record, I believe C is one of the best languages ever constructed.
I also believe it has many many flaws. I hope to undo many of them with
my effort.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 21, 2014

groups.google.com is the problem. Google provides a web interface to
Usenet, something that predates the web and even the Internet. Google
has done a horribly poor job with their interface and has been
unresponsive to complaints.

I use the news.eternal-september.org free Usenet server. The client I
use is Gnus, which runs under Emacs. Mozilla Thunderbird is another
popular client.

I cannot help but consider the fact that Google Groups provides a frew
web-based interface which removes shortcomings in the text-based Usenet
group. It allows HTML messages, longer lines with automatic wrapping,
immediate access to many groups, complex searching, and more.

It seems that the future may be speaking, in an attempt to bring Usenet
into the 2010s and beyond.

Text-based interfaces were nice ... they used the technology available at
the time (limited disk space, limited memory, slower clock speeds). But
the technology of the 2010s is significantly beyond anything we've had
previously. Most modern multi-core CPU desktops with 8+ GB of memory,
1+ TB of disk storage, an average to high-end GPU, have more computing
power than supercomputers did 15+ years ago.

GUIs provide a far better user experience, and are only becoming more
common as time goes on. Smart phones. Tablets. Touch screen. We're
changing our computing needs.

Best regards,
Rick C. Hodgin

Ben Bacarisse · Jan 21, 2014

Rick C. Hodgin said:
I do understand assembly language, but I still didn't understand the
point being made.

The listing (from Joe Keane) contains a fragment of C with an excellent
suggestion in it:

char *bar[2] =
{
"jjj",
(char []) { "kkk" },
};

The construct (char []){ "kkk" } is called a compound literal and
represents a anonymous object of the type the heads it up -- in this
example char array of char. The resulting object is writable.

The array bar contains two pointers to the start of two arrays. The one
built from a string literal is not writable, but the one built by the
compound literal is. Pretty much what you want.

Keith Thompson · Jan 21, 2014

Rick C. Hodgin said:
Not at all. If it finds \r\n it is a single newline. If it finds \n\r it
is a single newline. If it finds \n\n it stops after the first \n and
considers it its own newline, and then continues parsing and encounters
the second \n and it is also its own newline. \n\n would be a double space.

It's interesting that such a handy helper feature like text mode exists
to "help" developers, while other more obvious assistance features are
left completely out -- such as certain variable types not always being a
specified number of bits across platforms.

Are you acknowledging that text mode is useful?

If C had defined specified sizes for predefined types, then int would
probably be 16 bits and long would be 32 (the sizes they had in early
PDP-11 implementations).

Or perhaps not. The first edition of K&R, the book that defined the
language in 1978, showed int with a size of 16 bits on the PDP-11, 36
bits on the Honeywell 6000, and 32 bits on the IBM 370 and Interdata
8/32. None of those platforms had a 64-bit integer type, because 64-bit
integer arithmitec was not supported on the hardware of the time.

Which of those choices would you want to impose on all C implementations
for all platforms?

On the other hand, if you want fixed-width types, you can use int8_t,

which was added to said:
For the record, I believe C is one of the best languages ever constructed.
I also believe it has many many flaws. I hope to undo many of them with
my effort.

I suggest you need to be more familiar with what's already been done.
Reinventing the wheel is fine, but you may find that someone has already
figured out how to make it round.

Tic Tac Toe Game	2	Mar 10, 2024
Constant Strings	17	Aug 30, 2007
Newbie: Array of pointers to strings questions.	22	May 10, 2005
Help in this program.	2	May 14, 2022
Weird Behavior with Rays in C and OpenGL	4	Feb 12, 2024
Constant time insertion into a sorted list?	1	Jul 15, 2008
Python point location of intersect between two lines	0	Feb 28, 2018
Command Line Arguments	0	Mar 7, 2023

Non-constant constant strings

Joe keane

Rick C. Hodgin

Keith Thompson

James Kuyper

Keith Thompson

Rick C. Hodgin

Rick C. Hodgin

Rick C. Hodgin

Keith Thompson

Keith Thompson

glen herrmannsfeldt

glen herrmannsfeldt

Keith Thompson

Keith Thompson

Rick C. Hodgin

Rick C. Hodgin

Rick C. Hodgin

Rick C. Hodgin

Ben Bacarisse

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads