reference type for C

M

Malcolm McLean

On 05/21/2013 04:44 AM, Malcolm McLean wrote:

for ordinary numbers being added by people, the difference > between 3+42 as "counting on" and 3+42 as "arithmetic"
depends upon how the person thinks about the process.
Yes. Counting on is how you do addition, without really
understanding that you are adding two values.
Actually, if we define addition as operation one,
multiplication as operation two, exponentiation as
operation three, then there's a "operation zero" which bears
the same relation to addition as addition does to
multiplication.
Operation zero is "a plus count of a's". So it's a counting
operation, you start from a then count one for each time
the value a appears in the list.
However, for pointers being added by a computer, array[42]
is inherently "counting on", regardless of how the
computer implements it, even though the way that the
computer handles it has no meaningful similarity to the
way human beings do "counting on"?
Looking at the machine code is a distraction. There might
be an explicit "array + 42" op code in there, there
might not. The store might even be optimised out to a
temporary register.
 
G

glen herrmannsfeldt

(e-mail address removed) wrote:

(snip)
But if you want my pick on the subject ;-) : the discussion is
somewhat missing the point. I think Malcom was in hindsight
talking about behaviour vs implementation.
A compiler language only talks about behaviour.
In the discussion, I feel that people talk about the equivalence
of indexing and pointer (arithmetic, arghh -no) usage, mostly
because they "know" how it translates in machine code.

Yes. And while it is well known that the ++ and -- operators
did not have their origin in the PDP-11, much early C coding
was done on that machine.

As far as I know, the while(*s++=*t++) form was popularized on
the PDP-11 (even if it existed earlier) where it was known to
generate fast code. (And also a few less keystrokes are used.)

As machines changed, and compilers improved, people didn't
rethink the way to write those loops.

There should be reasonably names for the "indexing in loop"
and "increment pointers in loop" form, but I don't know what
they are.

-- glen
 
G

glen herrmannsfeldt

(snip)
I find the nitpicking somewhat more interesting than your original
question, but then I'm well known for my pedantry. Most of the other
people monitoring this newsgroup seem to find both topics equally
uninteresting. You seem to think I'm distracting people from discussing
the topic you want to talk about. The fact that no one's talking about
your topic is due to the fact that no one monitoring this newsgroup is
interested in talking about it; my side issue has nothing to do with that.
(snip)
Yes.

My understanding of what constitutes pointer arithmetic is grounded
entirely in the abstract definition of the language itself, and does not
depend in any way upon the implementation of that language. That seems
reasonable to me, since my understanding of what "arithmetic" means for
ordinary numbers is also grounded entirely in the abstract definitions
of mathematics, and does not depend in any way upon the specific symbol
manipulation processes that people use to implement arithmetic.

Yes, but consider two specific examples to compare:

for(i=0;i<n;i++) *s++=*t++;
s -=n;
t -=n;

and

for(i=0;i<n;i++) s=t;

it doesn't seem so unreasonable to call these the "pointer
arithmetic" version and the "indexed version" even though,
as we well know, indexing is defined in C in terms of pointer
arithmetic.
Malcolm's understanding does appear to depend upon the implementation,
but in ways that make no sense to me: implementation of array
subscription has never, as far as I know, ever been implemented in a way
that has any meaningful similarity to "counting on", as Malcolm defines
that term for ordinary numbers.

Yes, but it does seem a convenient example to show that there are
two different ways to look at what is otherwise the same problem.

In the two cases above, one would expect the compiler to increment
two registers containing addresses for the first case, and to
increment one register containing the index for the second.

Fortran compilers have known how to optimize the index case when
needed for about 40 years now, keeping the addresses in temporary
registers. The former example requires restoring the pointers, as
they might be used again. That seems fair to me, but in some
cases, such as strcpy, not needed.

Strictly as written, the first case might require fetching two
pointers from memory each iteration, where the second might keep
the index (and loop invariant origins) in registers.

-- glen
 
K

Keith Thompson

David Brown said:
This is not about "garbage collection" (which C++ does not have any more
than C does - you need libraries or classes that support it if you want
garbage collection), or the lifetime of memory allocations (which is a
separate issue). C objects never "die" - they just fade away from lack
of use, and the compiler re-uses their space (registers, stack space,
etc.)

As James Kuyper points out, the beginning and end of the lifetimes of
objects are very well defined in C. The end of an object's lifetime
typically doesn't require any specific action to occur; it just means
that the behavior of accessing the object becomes undefined.
In C++, objects have a specific "point of death" - it is when their
destructor is called.

Not all C++ objects have destructors. (C++'s definition of the word
"object" is quite similar to C's, and has nothing to do with
"object-oriented"; given "int foo;", "foo" is an object.)
Normally for a local or temporary object, that will happen when it goes
out of scope (the compiler can, of course, shuffle things around to
improve the code - but logically the destructor is called at the end of
scope). Using a reference, you can extend and delay this destructor call.

Scope and lifetime, in both C and C++, are distinct things. For
example, if an object is defined with the "static" keyword inside a
block, it has block scope (meaning that its identifier is visible only
inside the enclosing block), but its lifetime (static storage duration)
is the entire execution of the program.

For a non-static object defined inside a block, its scope and its
lifetime end at the same point (the closing "}" of the nearest enclosing
block) -- but the scope is a region of program text, and the lifetime is
a range of time during the program's execution.

I don't know C++ as well as I know C, but I don't think that a reference
can affect the lifetime of an object with automatic storage duration.
For example, I think this:

int& func() {
int local = 42;
int &ref = local;
return ref;
}

has undefined behavior in C++.
 
K

Keith Thompson

A compiler language only talks about behaviour. In the discussion, I
feel that people talk about the equivalence of indexing and pointer
(arithmetic, arghh -no) usage, mostly because they "know" how it
translates in machine code.
[...]

At least around here, most people talk about the equivalence of
indexing and pointer arithmetic simply because the C language
standard defines them to be equivalent. It defines the behavior
of pointer arithmetic (in abstract terms, not as machine-level
operations), and defines the [] operator on top of that.

I suppose it could have been done the other way around with no change
in semantics. Indexing could have been defined as a fundamental
operation, with *ptr defined as equivalent to ptr[0].

C's semantics are based on, but avoid directly depending on,
machine-level semantics. That's the motivation for defining []
in terms of pointer arithmetic.
 
B

Bart van Ingen Schenau

There should be reasonably names for the "indexing in loop" and
"increment pointers in loop" form, but I don't know what they are.

I would think that "array notation" and "pointer notation" cover the
territory well enough.
The most important aspect of the terms being that they indicate what it
looks like in the source code, not how that source code should be
interpreted by either the human reader or the compiler.

Bart v Ingen Schenau
 
B

Bart van Ingen Schenau

James Kuyper said:
My understanding of what constitutes pointer arithmetic is grounded
entirely in the abstract definition of the language itself, and does
not depend in any way upon the implementation of that language. That
seems reasonable to me, since my understanding of what "arithmetic"
means for ordinary numbers is also grounded entirely in the abstract
definitions of mathematics, and does not depend in any way upon the
specific symbol manipulation processes that people use to implement
arithmetic.

Yes, but consider two specific examples to compare:

for(i=0;i<n;i++) *s++=*t++;
s -=n;
t -=n;

and

for(i=0;i<n;i++) s=t;

it doesn't seem so unreasonable to call these the "pointer arithmetic"
version and the "indexed version" even though, as we well know, indexing
is defined in C in terms of pointer arithmetic.


It is not unreasonable to refer to the two different ways to write that
loop with different names (I personally prefer respectively "pointer
notation" and "array notation"), but it *is* unreasonable to claim that
the index/array version does not have any pointer arithmetic in it.

Yes, but it does seem a convenient example to show that there are two
different ways to look at what is otherwise the same problem.

But it does not help if you need to re-define core language constructs in
the process, such as re-defining array indexing such that it no longer
involves pointer arithmetic, while talking about the C language.
In the two cases above, one would expect the compiler to increment two
registers containing addresses for the first case, and to increment one
register containing the index for the second.

For a straight-forward, unoptimized implementation, yes.
Fortran compilers have known how to optimize the index case when needed
for about 40 years now, keeping the addresses in temporary registers.
The former example requires restoring the pointers, as they might be
used again. That seems fair to me, but in some cases, such as strcpy,
not needed.

Strictly as written, the first case might require fetching two pointers
from memory each iteration, where the second might keep the index (and
loop invariant origins) in registers.

I am sorry, but I don't see why in the first case the pointers can't be
held in registers as well.
And you forgot an address calculation on each iteration of the second
loop, which isn't needed in the first loop.

I don't believe there is a clear-cut winner here and any half-decent
optimizer should be able to produce entirely equivalent code for both of
them.

Bart v Ingen Schenau
 
G

glen herrmannsfeldt

(snip)
(snip, then I wrote)
Yes, but consider two specific examples to compare:
for(i=0;i<n;i++) *s++=*t++;
s -=n;
t -=n;
and
for(i=0;i<n;i++) s=t;
it doesn't seem so unreasonable to call these the "pointer arithmetic"
version and the "indexed version" even though, as we well know, indexing
is defined in C in terms of pointer arithmetic.

It is not unreasonable to refer to the two different ways to write that
loop with different names (I personally prefer respectively "pointer
notation" and "array notation"), but it *is* unreasonable to claim that
the index/array version does not have any pointer arithmetic in it.

Any two names are fine with me, as long as they are different.
But it does not help if you need to re-define core language constructs
in the process, such as re-defining array indexing such that it no
longer involves pointer arithmetic, while talking about the C language.

I don't think it requires redefining the language, but how you use
the available language features. a is defined in terms of pointer
arithmetic, but that doesn't require that the compiler actually
do the arithmetic. On the PDP-11, it may have actually done it.
For a straight-forward, unoptimized implementation, yes.

It could get more interesting if you add:

volatile i, s, t;
I am sorry, but I don't see why in the first case the pointers can't be
held in registers as well.

Yes, they can be. Assuming the processor has enough available.
(And you don't add volatile.)
And you forgot an address calculation on each iteration of the second
loop, which isn't needed in the first loop.

You mean multiplying by sizeof(s) and sizeof(t)?

For one, some hardware has indexing modes that index in units of
the size being referenced. That is in addition to the ability to
do indexed addressing at all.

Otherwise, previously mentioned Fortran compilers have been known
to generate a loop with the loop variable four times the value of i,
and so avoid any multiply. Even more, fold in any constant multiplier
in the index itself:

for(i=0;i<n;i++) s[3*i]=t[3*i];

compilers can avoid both the explicit multply by three, and the
implied multiply by sizeof(s) and sizeof(t).
I don't believe there is a clear-cut winner here and any half-decent
optimizer should be able to produce entirely equivalent code for
both of them.

Consider the Fortran 66 loop:

I=5
DO 1 I=1,10
1 S(I)=T(I)

what should the value of I be after the loop?
(Especially when the compiler keeps I in a register.)

-- glen
 
A

army1987

This is the only silent difference between C and C++ that I am aware of.
If there are any others, they will likely affect you even less than this
one.

Well, if you're using C89, there's silly stuff like 2//* */2, which will
equal 2 in C++ (and C99) but 1 in C89.
 
A

army1987

You can compile and link both modules as C code, or as C++ code; the
resulting executables are guaranteed by the applicable standards to exit
with an failure status, [...]
[...]

int main(void) {
return Cfunc(&Cppname_Ctype) && tag;
}

Actually, it's implementation-defined what values EXIT_SUCCESS and
EXIT_FAILURE have and what exit(i) does for i other than 0 or those two;
so in principle an implementation could have EXIT_SUCCESS equal to 1, and
then the program would return a successful status no matter what. (But
that's trivial to patch up.)
 
J

James Kuyper

You can compile and link both modules as C code, or as C++ code; the
resulting executables are guaranteed by the applicable standards to exit
with an failure status, [...]
[...]

int main(void) {
return Cfunc(&Cppname_Ctype) && tag;
}

Actually, it's implementation-defined what values EXIT_SUCCESS and
EXIT_FAILURE have and what exit(i) does for i other than 0 or those two;
so in principle an implementation could have EXIT_SUCCESS equal to 1, and
then the program would return a successful status no matter what. (But
that's trivial to patch up.)

I now remember that there was a "? EXIT_SUCCESS : EXIT_FAILURE" at the
end of that expression in an earlier version of that program; it must
have gotten lost at some point during editing. That would also explain
the discrepancy Bart brought up - which doesn't explain why I failed to
catch that discrepancy during testing (I know that I tested it - but I
must have been concentrating on getting different results with the two
compilers, rather than matching the description I had already written
up). I remember wanting to remove the dependency on <stdlib.h> - but I
think the reason why I wanted to do that is no longer present in this
version of the code.
 
A

army1987

C does of course have types, but I
think it would be considered a leap too far for a function call that
looks like it is referring to a variable to actually be pushing a
disguised pointer instead.

Even in C++, I can't stand using non-const references as function
parameters in most cases (at least for POD types), because I don't like
to not be able to assume that calling foo(i) won't affect i.
 
M

Martin Shobe

The "as-if" rule often applies - actually making use of a C++ object
after it's lifetime has ended has undefined behavior, so the only things
that prevent a delay in execution of a destructor are any side-effects
it may cause. If all the destructor does is release resources (a common
case), the "as-if" rule allows it to be delayed almost indefinitely.

There's a reason why I wanted to exclude those that were a result of the
"as if" rule. Since the actual execution of a destructor isn't
observable, you can't even count on an explicitly invoked destructor
executing if the implementation can find a different way to do the same
thing.
In any event, there is one case that does not rely on the "as-if" rule:
"The completions of the destructors for all initialized objects with
thread storage duration within that thread are sequenced before the
initiation of the destructors of any object with static storage
duration." (3.6.3p1) It could be a VERY long time between the end of any
particular thread and the time when objects with static storage duration
get destroyed.

That's more of what I was looking for. Thanks.

Martin Shobe
 
R

Rosario1903

Even in C++, I can't stand using non-const references as function
parameters in most cases (at least for POD types), because I don't like
to not be able to assume that calling foo(i) won't affect i.
-----------
non c'è bisogno di usare l'inutile parola "const" basta assumere come
default di regola di programmazione che reference e valori normali
per argomenti di funzione nn variano subito dopo la chiamata della
funzione
-----------
one would not use the useless word "const"...
it is enough one has the formal programming law that arg value for
function as in
f(a,b,c)
not modify a, b c just after the call
if their definition [a,b,c] is not type pointer

for modify that value their have to be pointers not values or
reference.
e.g
int a;
f(&a, b)
so the value that contain a can be changed

if
g(int& a, b)

int a;
g(a,b)

than a is not changed..
 
S

Stephen Sprunk

one would not use the useless word "const"... it is enough one has
the formal programming law that arg value for function as in f(a,b,c)
not modify a, b c just after the call if their definition [a,b,c] is
not type pointer

for modify that value their have to be pointers not values or
reference. e.g int a; f(&a, b) so the value that contain a can be
changed

if g(int& a, b)

int a; g(a,b)

than a is not changed..

The way C++ does operator overloading requires that functions be able to
modify (non-const) reference arguments.

IIRC, C++'s other major need for references is to avoid needing to call
copy constructors for const arguments.

Since C doesn't have operator overloading or constructors, though, they
would be mere syntactic sugar, not a necessity as they are in C++.

S
 
R

Rosario1903

one would not use the useless word "const"... it is enough one has
the formal programming law that arg value for function as in f(a,b,c)
not modify a, b c just after the call if their definition [a,b,c] is
not type pointer

for modify that value their have to be pointers not values or
reference. e.g int a; f(&a, b) so the value that contain a can be
changed

if g(int& a, b)

int a; g(a,b)

than a is not changed..

The way C++ does operator overloading requires that functions be able to
modify (non-const) reference arguments.

but i not modify them as a law for write code
IIRC, C++'s other major need for references is to avoid needing to call
copy constructors for const arguments.

Do not compiler find if no instruction modify what reference
'point'to?

reference is useful for pass big data as pointer in function args
without write the big data in to the stack
 
B

Bart van Ingen Schenau

You mean multiplying by sizeof(s) and sizeof(t)?

No, I meant adding the offset/index value to the origin pointers.
But I see that that can be part of the CPU instruction.

Consider the Fortran 66 loop:

I=5
DO 1 I=1,10
1 S(I)=T(I)

what should the value of I be after the loop? (Especially when the
compiler keeps I in a register.)

I am sorry, but I am not familiar enough with Fortran 66 to determine
that with any certainty.
My initial guess would be 10, but I would not be surprised if it should
be 5 or 11.
And if it is about the value in the register, I would have no problems
with that being 20, 40, or even 0.

But I don't see what that has to do with what a optimizer can do with two
semantically equivalent pieces of code written in different styles.

Bart v Ingen Schenau
 
S

Stephen Sprunk

one would not use the useless word "const"... it is enough one
has the formal programming law that arg value for function as in
f(a,b,c) not modify a, b c just after the call if their
definition [a,b,c] is not type pointer

for modify that value their have to be pointers not values or
reference. e.g int a; f(&a, b) so the value that contain a can
be changed

if g(int& a, b)

int a; g(a,b)

than a is not changed..

The way C++ does operator overloading requires that functions be
able to modify (non-const) reference arguments.

but i not modify them as a law for write code

Consider this basic snippet:

cin << "Hello World!" << endl;

That calls operator<<() twice, and in both cases, the function modifies
its first reference argument (cin); it cannot work any other way.

Granted, you may not take advantage of that feature of the language for
your own functions, but you're probably missing out on opportunities to
write more efficiently and clearly when that's the best solution.
Do not compiler find if no instruction modify what reference
'point'to?

Sorry; I don't understand what you're saying here.
reference is useful for pass big data as pointer in function args
without write the big data in to the stack

Pass-by-reference and pass-by-pointer are no different in that way.

Like pointer arguments, a reference argument may be const or not
depending on whether the function needs to modify it. There is no good
reason to require that non-const arguments always be passed by pointer;
it is perfectly valid to pass them by reference. Use whichever form
makes the code easier for humans (particularly ones other than you) to
understand.

S
 
K

Keith Thompson

Stephen Sprunk said:
Consider this basic snippet:

cin << "Hello World!" << endl;

I think you mean "cout << ...".
That calls operator<<() twice, and in both cases, the function modifies
its first reference argument (cin); it cannot work any other way.

Does it? As far as I know, the overloaded << operator takes two
arguments of types whatever-the-type-of-cout-is and some-other-type, and
returns a result of type whatever-the-type-of-cout-is. It doesn't need
to modify either operand.

Similarly, the C equivalent:

fprintf(some_file, "%s\n", "Hello World!");

does not (and cannot) modify its first argument, which is of type FILE*.
If the type whatever-the-type-of-cout-is is similar to FILE*, there's no
need for << to modify it.

It may well be the case that it *does* modify it, but I think it could
work perfectly well if it didn't.
 
G

glen herrmannsfeldt

Keith Thompson said:
Does it? As far as I know, the overloaded << operator takes two
arguments of types whatever-the-type-of-cout-is and some-other-type, and
returns a result of type whatever-the-type-of-cout-is. It doesn't need
to modify either operand.
Similarly, the C equivalent:
fprintf(some_file, "%s\n", "Hello World!");
does not (and cannot) modify its first argument, which is of type FILE*.
If the type whatever-the-type-of-cout-is is similar to FILE*, there's no
need for << to modify it.
It may well be the case that it *does* modify it, but I think it could
work perfectly well if it didn't.

As you say, it doesn't and can't modify its argument (some_file),
but it can modify (*some_file), and sort of by definition modifies
the actual file (disk, printer, terminal screen).

Would you say it doesn't modify the argument of type (FILE*) but
does modify the pointee of type (FILE)?

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top