Non-constant constant strings

Seebs · Jan 29, 2014

If a, b, and c are of some built-in type, then `a = b + c;` cannot
involve any function calls in either C or C++.

Sure it can. It could involve a call to a function called __addquad().

-s

Keith Thompson · Jan 29, 2014

Seebs said:
Sure it can. It could involve a call to a function called __addquad().

As I mentioned in the following paragraph (though the part you quoted
was not quite correct).

James Kuyper · Jan 29, 2014

Sure it can. It could involve a call to a function called __addquad().

... For that matter, the addition or one
of the conversions could, on some systems, require an implicit function
call. ...

So his first statement was clearly intended to distinguish between an
actual C function call, and implicit function call. I would have chosen
different wording to make that distinction clearer, but he did address
the issue you raise.

Rick C. Hodgin · Jan 29, 2014

My point is that C cannot reasonably be described as any kind of
assembly language. Do you actually disagree with that statement?

No. My position is that C is exceedingly mechanical ... so much so that
there is nearly a 1:1 ratio between the things it does and the things the
CPU must do to conduct the workload. That closeness is known to all who
know C and assembly. The fact that other C developers may not know it as
a matter of stored knowledge is immaterial because the relationship exists
fundamentally.

I also prefer Colby Jack on pizza in lieu of Mozzarella. I'm not sure
what bearing that has on this discussion, but it seemed appropriate to
add as a drop in (in case we ever want to get together and have a pizza
party).

Best regards,
Rick C. Hodgin

James Kuyper · Jan 29, 2014

No. My position is that C is exceedingly mechanical ... so much so that
there is nearly a 1:1 ratio between the things it does and the things the
CPU must do to conduct the workload.

The term ratio really implies numbers; in the only senses I can figure
out for which numbers might apply, 1:1 is clearly false. For instance,
the number of lines of assembler is almost completely unrelated to the
number of lines of C code. It can be quite a bit larger or smaller,
depending upon what the C code actually says.

I assume that what you really mean is "correspondence", rather than
"ratio". For some simple low-level languages, it it possible to set up a
1-to-1 correspondence between language constructs and the generated
assembly code. However, such a language is really nothing more than a
high-level assembler. C is not, and never has been, such a language,
though the correspondence was closer in the early days of C than it is
now. You're underestimating the looseness of the correspondence between
C code and assembly; combined with that fact that you've also proposed
eliminating some of that looseness, this suggests that you may be
unaware of just how much advantage the typical C compiler takes of that
looseness. Turn optimization up high on a sophisticated modern C
compiler targeting a well-known platform that has been around for a
while. If you really believe that there is (or at least, should be) a
1-to-1 correspondence, you're going to be quite shocked at how hard it
is to identify the corresponding elements.

One of the more common events in this newsgroup is someone posting a
message complaining about the fact that he can't figure out how to make
a C compiler generate assembly language that matches that person's
opinion of how the assembler should be written. This complaint is based
upon the same mistaken 1-to-1 assumption that you're making. Such people
should either abandon that assumption, or write in assembler; the C
standard, as a matter of deliberate design goes far out of it's way to
make sure that the correspondence does NOT have to be 1-to-1.

... That closeness is known to all who
know C and assembly. The fact that other C developers may not know it as
a matter of stored knowledge is immaterial because the relationship exists
fundamentally. ...

The fact that some people mistakenly assume a 1-to-1 correspondence is
immaterial to the fact that, as a matter of very deliberate design, it
is not actually required to be 1-to-1, and usually isn't.

Keith Thompson · Jan 29, 2014

Rick C. Hodgin said:
No. My position is that C is exceedingly mechanical ... so much so that
there is nearly a 1:1 ratio between the things it does and the things the
CPU must do to conduct the workload. That closeness is known to all who
know C and assembly. The fact that other C developers may not know it as
a matter of stored knowledge is immaterial because the relationship exists
fundamentally.

Do you ignore optimization?

A compiler can, for example, generate no code at all for a given
statement, or for an entire function, if it can prove that that
statement or function has no effect.

If I write:

#include <stdio.h>
int main(void) {
int x = 2;
int y = 2;
printf("2 + 2 = %d\n", x + y);
}

I expect the program, when I run it, to print:

2 + 2 = 4

I don't care (except as a matter of idle curiosity) whether the
compiler generates code that performs the addition and calls printf,
or reduces the 3 lines of code to the equivalent of

puts("2 + 2 = 4");

I care about the behavior of the running program. Machine or
assembly language is nothing more or less than a means to that end.

(If I cared for some reason about the existence of an ADD instruction in
the generated code, then I'd use assembly language.)

Phil Carmody · Jan 29, 2014

David Brown said:
<snip snivelling drivel>

You had the opportunity to snip the rest of the drivel too.
This discussion, and all like it, will never lead anywhere
useful.

Phil

Rick C. Hodgin · Jan 29, 2014

Rick C. Hodgin said:
Rick C. Hodgin said:

No. My position is that C is exceedingly mechanical ... so much so that
there is nearly a 1:1 ratio between the things it does and the things the
CPU must do to conduct the workload. That closeness is known to all who
know C and assembly. The fact that other C developers may not know it as
a matter of stored knowledge is immaterial because the relationship exists
fundamentally.

Click to expand...

Do you ignore optimization?
No.

A compiler can, for example, generate no code at all for a given
statement, or for an entire function, if it can prove that that
statement or function has no effect.
[snip]

I care about the behavior of the running program. Machine or
assembly language is nothing more or less than a means to that end.
Agreed.

(If I cared for some reason about the existence of an ADD instruction in
the generated code, then I'd use assembly language.)

Agreed.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 29, 2014

The term ratio really implies numbers; in the only senses I can figure
out for which numbers might apply, 1:1 is clearly false.

I said "nearly a 1:1 ratio between the things it does and the things the
CPU must do to conduct the workload."

int main(void)
{
int a, b;

populate_my_variables(&a, &b);
printf("The product is: %d\n", a, b);
return 0;
}

; Off the top of my head, please forgive any mistakes:
; // int main(void)
; // {
; // int a, b
enter 8,0
; [ebp-0] - a
; [ebp-4] - b

; // populate_my_variables(&a, &b);
; // [implicit return][populate_my_variables][address_of a][address_of b]
push ebp ; [address_of b]
mov eax,ebp ; [address_of b];
sub eax,4
push eax
call populate_my_variables ; [populate_my_variables]
add esp,8 ; [implicit return]

; // printf("The product is: %d\n", a, b);
; // [printf]["The..\n"][a]
push dword ptr [ebp-4] ;
push dword ptr [ebp-0] ; [a]
push address_of "The product is: %d\n" ; ["The..\n"]
call printf ; [printf]
add esp,12

; // return 0
mov eax,0

; // }
leave
ret

In this case, there are 14 separate things that must be considered
for conversion:

[1]int [2]main([3]void)
{
int [4]a, [5]b;

[6]populate_my_variables([7]&a, [8]&b);
[9]printf([10]"The product is: %d\n", [11]a, [12]b);
[13]return [14]0;
}

These are translated to 15 separate things done in assembly (including
function overhead), and this in wholly un-optimized mode.

; Off the top of my head, please forgive any mistakes:
; // int main(void)
; // {
; // int a, b
01: enter 8,0
; [ebp-0] - a
; [ebp-4] - b

; // populate_my_variables(&a, &b);
; // [implicit return][populate_my_variables][address_of a][address_of b]
02: push ebp ; [address_of b]
03: mov eax,ebp ; [address_of b];
04: sub eax,4
05: push eax
06: call populate_my_variables ; [populate_my_variables]
07: add esp,8 ; [implicit return]

; // printf("The product is: %d\n", a, b);
; // [printf]["The..\n"][a]
08: push dword ptr [ebp-4] ;
09: push dword ptr [ebp-0] ; [a]
10: push address_of "The product is: %d\n" ; ["The..\n"]
11: call printf ; [printf]
12: add esp,12

; // return 0
13: mov eax,0

; // }
14: leave
15: ret

My compiler actually does things notably differently so I never have to
pass more than one parameter (which is register passed). But, that's
a whole separate discussion.

One of the more common events in this newsgroup is someone posting a
message complaining about the fact that he can't figure out how to make
a C compiler generate assembly language that matches that person's
opinion of how the assembler should be written. This complaint is based
upon the same mistaken 1-to-1 assumption that you're making. Such people
should either abandon that assumption, or write in assembler; the C
standard, as a matter of deliberate design goes far out of it's way to
make sure that the correspondence does NOT have to be 1-to-1.

Click to expand...

I apologize if the wrong idea was conveyed through my wording. I hope
the example above now makes it clearer.

The fact that some people mistakenly assume a 1-to-1 correspondence is
immaterial to the fact that, as a matter of very deliberate design, it
is not actually required to be 1-to-1, and usually isn't.

Click to expand...

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 29, 2014

This should be:

int main(void)
{
int a, b;

populate_my_variables(&a, &b);
printf("The values are a:%d b:%d\n", a, b);
return 0;
}

Best regards,
Rick C. Hodgin

Keith Thompson · Jan 29, 2014

Rick C. Hodgin said:
Rick C. Hodgin said:

No. My position is that C is exceedingly mechanical ... so much so that
there is nearly a 1:1 ratio between the things it does and the things the
CPU must do to conduct the workload. That closeness is known to all who
know C and assembly. The fact that other C developers may not know it as
a matter of stored knowledge is immaterial because the relationship exists
fundamentally.

Click to expand...

Do you ignore optimization?
No.

A compiler can, for example, generate no code at all for a given
statement, or for an entire function, if it can prove that that
statement or function has no effect.
[snip]

I care about the behavior of the running program. Machine or
assembly language is nothing more or less than a means to that end.
Agreed.

(If I cared for some reason about the existence of an ADD instruction in
the generated code, then I'd use assembly language.)

Click to expand...

Agreed.

So you've completely reversed your position?

Rick C. Hodgin · Jan 29, 2014

So you've completely reversed your position?

Not at all. You've misunderstood a great deal about what I've been
talking about, and there are aspects of the things you say which I do
agree with, but these are secondary to the remaining components which
are not being accurately conveyed between us because we are not using
a common language.

Best regards,
Rick C. Hodgin

James Kuyper · Jan 29, 2014

The term ratio really implies numbers; in the only senses I can figure
out for which numbers might apply, 1:1 is clearly false.

Click to expand...

I said "nearly a 1:1 ratio between the things it does and the things the
CPU must do to conduct the workload."

int main(void)
{
int a, b;

populate_my_variables(&a, &b);
printf("The product is: %d\n", a, b);
return 0;
}

; Off the top of my head, please forgive any mistakes:
; // int main(void)
; // {
; // int a, b
enter 8,0
; [ebp-0] - a
; [ebp-4] - b

; // populate_my_variables(&a, &b);
; // [implicit return][populate_my_variables][address_of a][address_of b]
push ebp ; [address_of b]
mov eax,ebp ; [address_of b];
sub eax,4
push eax
call populate_my_variables ; [populate_my_variables]
add esp,8 ; [implicit return]

; // printf("The product is: %d\n", a, b);
; // [printf]["The..\n"][a]
push dword ptr [ebp-4] ;
push dword ptr [ebp-0] ; [a]
push address_of "The product is: %d\n" ; ["The..\n"]
call printf ; [printf]
add esp,12

; // return 0
mov eax,0

; // }
leave
ret

In this case, there are 14 separate things that must be considered
for conversion:

[1]int [2]main([3]void)
{
int [4]a, [5]b;

[6]populate_my_variables([7]&a, [8]&b);
[9]printf([10]"The product is: %d\n", [11]a, [12]b);
[13]return [14]0;
}

These are translated to 15 separate things done in assembly (including
function overhead), and this in wholly un-optimized mode.

Well, it's the optimized mode that's really most relevant, and this code
is far to simple to allow significant opportunities for optimization.

; Off the top of my head, please forgive any mistakes:
; // int main(void)
; // {
; // int a, b
01: enter 8,0
; [ebp-0] - a
; [ebp-4] - b

; // populate_my_variables(&a, &b);
; // [implicit return][populate_my_variables][address_of a][address_of b]
02: push ebp ; [address_of b]
03: mov eax,ebp ; [address_of b];
04: sub eax,4
05: push eax
06: call populate_my_variables ; [populate_my_variables]
07: add esp,8 ; [implicit return]

; // printf("The product is: %d\n", a, b);
; // [printf]["The..\n"][a]
08: push dword ptr [ebp-4] ;
09: push dword ptr [ebp-0] ; [a]
10: push address_of "The product is: %d\n" ; ["The..\n"]
11: call printf ; [printf]
12: add esp,12

; // return 0
13: mov eax,0

; // }
14: leave
15: ret

Click to expand...

You'll have to provide a platform-dependent definition of how to count
the number of things that a given piece of C code must do, in order to
make your comment meaningful. If you do, I guarantee that whatever
definition you choose, it will be trivial to identify some combination
of source code, platform, compiler, and compiler options for which the
ratio is NOT 1:1. In fact, it will be far easier to identify such
combinations than to identify ones for which it is 1:1.

It's feasible, in code this simple, to take each line of assembly code
and associate it with a unique part of the original source code. If you
call those the "things" that the C code must do, then of course the
numbers will match. But different compilers will produce different sets
of assembly language instructions when targeting different platforms,
and when different optimizations are turned on. Those can't all be in a
1-to-1 relationship to the same "thing" count; which renders your claim
that there must be such a relationship nonsense. I've seen the same C
code converted into 10 assembly language instructions by one compiler,
and 500 instructions by another. Both sets of "things" to do were fully
consistent with the requirements of the C standard. Which number
constituted the correct count of the "things" that the C code was
supposed to do? Note that some of the relevant optimizations take things
like a+b*c and convert then into a single floating point instruction
that takes three arguments.

On the flip side, on a platform where there is no native support for
floating point or for any data type larger than 32 bits, a simple
statement a=b, which does a maximum of three "things" as far as C is
concerned, can involve a MUCH large list of assembly language if a and b
have the types "long long" and "long double complex", respectively.

My compiler actually does things notably differently so I never have to
pass more than one parameter (which is register passed). But, that's
a whole separate discussion.

Click to expand...

Actually, the fact that your compiler does something different is the
norm, not the exception, which is precisely what is being discussed.

Rick C. Hodgin · Jan 29, 2014

On 01/29/2014 04:57 PM, Rick C. Hodgin wrote:
Well, it's the optimized mode that's really most relevant, and this code
is far to simple to allow significant opportunities for optimization.

Optimized code is nearly always shorter than un-optimized code.

You'll have to provide a platform-dependent definition of how to count
the number of things that a given piece of C code must do, in order to
make your comment meaningful.

In general, everything that has a name definition, has an operator, is
separated by a comma, is part of an assignment, or is part of a logic test.

If you do, I guarantee that whatever
definition you choose, it will be trivial to identify some combination
of source code, platform, compiler, and compiler options for which the
ratio is NOT 1:1. In fact, it will be far easier to identify such
combinations than to identify ones for which it is 1:1.

No doubt. I'm not proposing a standard. It's just the way things are.

It's feasible, in code this simple, to take each line of assembly code
and associate it with a unique part of the original source code. If you
call those the "things" that the C code must do, then of course the
numbers will match.

It works the same in all code (except for certain types of parallel
heterogeneous code, some types of parallel homogeneous code, and
some code that works in serial on parallel items (such as SSE code)
due to the requirements of filling data horizontally in preparation
for vertical processing.

But different compilers will produce different sets
of assembly language instructions when targeting different platforms,

And the same platforms.

and when different optimizations are turned on. Those can't all be in a
1-to-1 relationship to the same "thing" count;

The 1:1 ratio is on un-optimized code. It typically only gets better
when optimizations are added.

which renders your claim
that there must be such a relationship nonsense. I've seen the same C
code converted into 10 assembly language instructions by one compiler,
and 500 instructions by another.

Let's go with the upper echelon of leading compilers in our comparisons,
shall we?

Both sets of "things" to do were fully
consistent with the requirements of the C standard. Which number
constituted the correct count of the "things" that the C code was
supposed to do?

The one that is not obviously taxed by inadequacy.

Note that some of the relevant optimizations take things
like a+b*c and convert then into a single floating point instruction
that takes three arguments.

Yes. That's exactly my point. Optimizations generally only improve
upon the 1:1 ratio, but the ratio generally holds true across all
generated assembly instructions on 32-bit x86 platforms. It may not
be true on other platforms which contain more general purpose registers,
or other hardware features which alter the way the C code is translated.

On the flip side, on a platform where there is no > native support for
floating point or for any data type larger than 32 bits, a simple
statement a=b, which does a maximum of three "things" as far as C is
concerned, can involve a MUCH large list of assembly language if a and b
have the types "long long" and "long double complex", respectively.

There are a million ways you can bring it unique conditions which destroy
my argument. It doesn't change the merit of it on the whole.

Actually, the fact that your compiler does something different is the
norm, not the exception, which is precisely what is being discussed.

To this point, I had not mentioned mechanics of my compiler's implementation,
but only outward syntax and theory.

Best regards,
Rick C. Hodgin

Seebs · Jan 29, 2014

No. My position is that C is exceedingly mechanical ... so much so that
there is nearly a 1:1 ratio between the things it does and the things the
CPU must do to conduct the workload. That closeness is known to all who
know C and assembly.

.... Uh. I am pretty sure I know a number of people who know C and assembly
who don't "know" that. Possibly because it's not true.

-s

Kaz Kylheku · Jan 29, 2014

Optimized code is nearly always shorter than un-optimized code.

Firstly, there is no upper bound on the length of a piece of code to solve a
task, or the time that it takes; there is no uniquely determined "unoptimized"
state.

For any program which we call "optimized", we can make a program which is
longer, and slower and call that one "unoptimized". Q.E.D.: unoptimized
programs are longer.

However, the changes which make a given program faster, do not always make it
shorter.

If we insert NOP instructions to that branch targets are cache-aligned, the program grows longer.

If we unroll loops for speed, the program grows longer.

If we inline functions for speed, all the places where they are inlined grow
larger.

If we use static lookup tables instead of computing something at run-time, the
program may grow larger.

The "meat and potatoes" optimizations performed on crudely compiled code
usually do make it shorter, because crudely compiled code does some obviously
poor things, like load values into registers which are then not subsquently
used and such: a consequence of translating the various pieces of the program
in isolation, according to fixed translation templates that "dove tail"
together according to rules that make the code generation easy.

In general, everything that has a name definition, has an operator, is
separated by a comma, is part of an assignment, or is part of a logic test.

Shorter characterization: anything that has a node in the abstract syntax tree.

glen herrmannsfeldt · Jan 30, 2014

I don't think this is a universal view of C. C is derived (as I
may have said before ) from BCPL, and one of the stated aims
of BCPL was to eliminate hidden overheads. I believe C was also
intended to follow this philosphy. So users can expect their
code to do no more than what they have written, and this may
well be one of the reasons why they are using C in the first place.

I suppose, but some of that went away with the ANSI standard.

For one, ANSI allows initializing auto arrays, which involve a
fair amount of work each time. For another, passing struct by
value. K&R allowed passing a pointer to a struct, but not the
value of the struct. Again, more hidden work.

But I still miss an exponential operator.

(snip)

-- glen

glen herrmannsfeldt · Jan 30, 2014

(snip)

Not really. If the computation isn't horribly expensive,
you can run through 2**32 examples in a few minutes.

2**64 would likely be intractable, though.

I do remember when 2**32 was close enough to infinity.
Maybe greater than the MTBF for many processors, or at least
more than the CPU time you would want to pay for.

Some years ago, I had a book on digital logic where one of the
projects was a 40 bit counter counting at 1Hz. Next to each
light was the time that it would first come on.

As most people here can figure out those times, I won't mention
them, but the 40th one is a year starting with 19 and three more
digits after that.

-- glen

BartC · Jan 30, 2014

James Kuyper said:
The term ratio really implies numbers; in the only senses I can figure
out for which numbers might apply, 1:1 is clearly false. For instance,
the number of lines of assembler is almost completely unrelated to the
number of lines of C code. It can be quite a bit larger or smaller,
depending upon what the C code actually says.

I assume that what you really mean is "correspondence", rather than
"ratio". For some simple low-level languages, it it possible to set up a
1-to-1 correspondence between language constructs and the generated
assembly code. However, such a language is really nothing more than a
high-level assembler.

I once implemented an actual /machine-oriented/ language which was quite low
level (lower than C), and even then there wasn't a 1:1 correspondence with
machine instructions, although it was extremely easy to see how any
statement mapped to actual instructions (it didn't need a stack for example
as there was no operator precedence, and there was no optimisation).

The execution model however was directly linked to the machine, with the
data-types being the available word-sizes, and you could refer to registers
by name.

But it wasn't a high-level assembler in my opinion, because it wasn't
possible to directly express machine instructions (iirc).

C implements a more abstract execution model, with data-types that may or
may not be available in the hardware, although the model is still simple:
int types of various widths, floating point, and pointers. It tries to be
independent of the hardware, but invariably an 'int' type might still be a
machine word in width.

It can still be possible, with the simpler types and detailed knowledge of
the target hardware, to guess what machine instructions *might* be used to
implement a statement, and thereby get some idea of its efficiency or
otherwise. But optimising compilers make that more difficult. The 700 pages
of the C standard which sets out how any construct has to behave don't make
it any easier either.

C is not, and never has been, such a language,
though the correspondence was closer in the early days of C than it is
now.

I might actually be agreeing with you for once...

James Kuyper · Jan 30, 2014

On 01/29/2014 07:59 PM, Rick C. Hodgin wrote:
....

There are a million ways you can bring it unique conditions which destroy
my argument. It doesn't change the merit of it on the whole.

They're not particularly unique, if you think there's as few as a
million of them. Personally, I think there's a lot more than that. I
think that some variant of those "unique conditions" applies in almost
every imaginable case. Therefore, the way you had to adjust your claim
to deal with each of those issues pretty much completely demolishes your
argument.

What you really seem to be claiming is that it might be possible to
define a psuedo-generic assembly language for which there can be a
1-to-1 correspondence between C constructs and instructions in that
assembly language, if no optimizations are performed. I'm not willing to
concede the feasibility of defining such an assembly language unless and
until you actually provide a precise definition for it, but it might be
possible. However, if it can be done, I suspect that many aspects of it
would look more like a complicated encoding of C than a real-world
assembly language. But even if you can define such a language, the only
way to accommodate real world assembly languages is to dismiss the
differences in the way they handle things from the way the
pseudo-generic one does as mere "optimizations", as you have already
done. Your claim becomes pretty pointless if such optimizations are
sufficient to render it irrelevant, because your claim was in response
to statements Keith was making that included optimized translations of C
code.

Tic Tac Toe Game	2	Mar 10, 2024
Constant Strings	17	Aug 30, 2007
Newbie: Array of pointers to strings questions.	22	May 10, 2005
Help in this program.	2	May 14, 2022
Weird Behavior with Rays in C and OpenGL	4	Feb 12, 2024
Constant time insertion into a sorted list?	1	Jul 15, 2008
Python point location of intersect between two lines	0	Feb 28, 2018
Command Line Arguments	0	Mar 7, 2023

Non-constant constant strings

Seebs

Keith Thompson

James Kuyper

Rick C. Hodgin

James Kuyper

Keith Thompson

Phil Carmody

Rick C. Hodgin

Rick C. Hodgin

Rick C. Hodgin

Keith Thompson

Rick C. Hodgin

James Kuyper

Rick C. Hodgin

Seebs

Kaz Kylheku

glen herrmannsfeldt

glen herrmannsfeldt

BartC

James Kuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads