Reading from files and range of char and friends

James Kuyper · Mar 26, 2011

James Kuyper said:
James Kuyper said:

Yes, and I'll provide it soon, when I don't have a wife yelling at me
that it's time to get to bed.

Click to expand...

[...]

http://xkcd.com/386/

Precisely

James Kuyper · Mar 26, 2011

For types T1 and T2 I define INPL(T1 , T2) ("IN PLace") to mean
that by using the standard and without considering the peculiarities of
any specific implementation you can reach the following conclusion :
for every i>0, for every function foo() which takes at least i
arguments , for every call foo() which has an object of type T2 given
by expression E at position i and the call does not cause undefined
behavior , then by putting at position i "(T1)E" and keeping everything
else the same , the function will operate in precisely the same manner
i.e. return the same value (if it returns a value) , cause the same
side effects etc.

I call T1 and T2 interchangeable (as arguments to functions) if we have
INPL(T1 , T2) and INPL(T2 , T1) .

OK, I'll first explore the implications of this definition. First of
all, it seems clear that compatible types are always interchangeable
under this definition. As a special case, every type is compatible with
itself, and interchangeable with itself. As far as that goes, we're in
agreement. Therefore, I'll be concentrating my attention on incompatible
types, to see how "interchangeable" differs from "compatible".

Some of the functions your definition refers to would have been declared
with a prototype. When that's the case, the argument is implicitly
converted to the type of the corresponding parameter, if the conversion
is permitted; otherwise, it's a syntax error, and undefined behavior if
the implementation chooses to accept the code after having issued the
required diagnostic.
Therefore, in order for two incompatible types to be interchangeable, we
must have

(T1)(T2)t1 == t1 && (T2)(T1)t2 == t2

for all possible values t1 and t2 of types T1 and T2, respectively. When
T2 is void*, intptr_t, or uintptr_t, and T2 is a pointer to an object
type, the first half of that requirement is guaranteed by the standard,
but it doesn't say anything to ensure that the second half is true. If
T1 and T2 are pointers that differ only in the qualification of the type
that they point at, both sides of the comparison are guaranteed to be
true; this happens to be one of the cases where SRAR is required. The
same is true of pointers to two types which are guaranteed to have the
same alignment, and to two pointer-to-function types.

So far, so good: we've got a fair selection of incompatible but possibly
interchangeable types, all of them pointer types.

However, some of the functions your definition refers to would have been
declared without a prototype - a K&R style declaration. When that's the
case, arguments are promoted, where applicable, and are otherwise passed
on, unchanged. The behavior is undefined if the promoted type is not
compatible with the promoted type of the corresponding parameter, with
two specific exceptions. Your definition appears to require identical
behavior for all possible values of the argument, whereas one of those
exceptions only applies to positive values of signed integers. The other
exception is from the last sentence of 6.5.2.2p6:

"both types are pointers to qualified or unqualified versions
of a character type or void."

Therefore, two incompatible types could be interchangeable only if
covered by that exception, or if their promoted types are compatible.
The only cases where the promoted type is different from the actual type
are integer types with a rank less than 'int', and 'float' - this
excludes every pointer type not covered by the exception at the end of
6.5.2.2p6.
However, in each of those cases, conversion from the promoted type to
the unpromoted type can be lossy if the two types have different ranges,
or (in the case of float/double, have different precision). Therefore,
since you defined interchangeability as being independent of the
peculiarities of particular implementations, none of those types can be
interchangeable, either.

Net result: the only interchangeable types that are incompatible are
those types which are covered by the last sentence of 6.5.2.2p6. The
term "interchangeable" is otherwise equivalent to "compatible". Was that
your intent? The issue you raised involves only those types, so I could
imagine that you'd be content with this conclusion. However, it excludes
most of the other types for which the standard requires "same
representation and alignment".

In quote A I pointed out that if you have at some point in the code
foo(...,E,...) and futher down foo(...,(T1)E,...) then it may happen
that the object in position i may be passed in register R in the first
call and R' in the second call with R!=R' and the types being
interchangeable or compatible has nothing to do with it. From your
puzzled reaction I take it that when you spoke of "different registers"
you had something else in mind.

If foo() was declared with a prototype, then both E and (T1)E will be
converted to the type specified for the corresponding parameter in that
prototype, before the call to foo() ever occurs. Since data of the same
type will be sent to foo() in either case, I would be quite shocked to
find it passed to foo() using different registers. foo() will try to
read that value from one specific register; if that's not the register
which it was written to, how could foo() retrieve the right value? The
behavior is perfectly well-defined - the function call is not permitted
to fail, and I don't understand how you think it could succeed in both
cases with such an implementation. Because it would not work in one of
those two cases, a compiler which did this would not be conforming.

On the other hand, if foo() was declared without a prototype, what
you're saying is precisely what I was thinking of. If E and (T1)E and
the type of the corresponding parameter are all types covered by the
exception at the end of 6.5.2.2p6, there is no problem. Otherwise, if
either E or (T1)E does not have the same promoted type as the
corresponding parameter, then the corresponding function call could pass
the value of that argument using a different register. than the one
foo() will be reading to set the value of the corresponding parameter.
As a result, foo() won't actually receive the value written by the
function call. This is allowed for a conforming implementation only
because the two types are incompatible, so the behavior is undefined,
and a failure to communicate the value is therefore permitted. I don't
understand how you could imagine that this would not be a problem.

....

And here you lost me. Interchangeability being meaningful has nothing
to do with whether it is implied by SRAR or anything else. If A has a
meaningful definition then A is meaningful. You can't know whether B
implies A without first knowing that A is meaningful. Similarly I can't
imagine what connection can there possibly be between interchangeability
being meaningful and undefined behavior or lack thereof.

I had a great deal of trouble trying to figure out the right way to
explain what I was trying to say. The way I choose wasn't it. However,
analyzing your definition of interchangeability gave me the necessary
perspective to choose a different way of saying it.

Considered purely as a definition of interchangeability, yours is fine
(though it differs so little from the definition of compatibility to be
of arguably negligible value).

However, you're proposing that something like this definition could
possibly what the committee had in mind when they wrote that "[same
representation and alignment] is meant to imply interchangeability".
That doesn't make any sense, because the representation of two different
types is completely irrelevant to your definition of interchangeability.
Your definition works through conversions: it has an explicit conversion
built directly into the definition, implicit conversions occur if the
function is declared declared with a prototype, and the standard
promotions are involved if it's not declared with a prototype.
Conversions take a value, and convert it to a value of the other type,
if the resulting value gets written to an object of the destination
type, that value is represented by a bit pattern that conforms to the
the representation of the destination type. At no point is there any
interaction between the representations of the two types. They could be
completely different, but if the appropriate conversions are
appropriately defined, they would still be interchangeable, according to
your definition.
Therefore, how could the committee have concluded, even mistakenly, that
"same representation" has anything to do with interchangeability?

My understanding of what interchangeability means does involve mixing up
the representations of the two types; in that context, "same
representation and same alignment" makes a lot of sense - in fact, it's
essential; but it's not sufficient.

If the behavior is undefined then there are no constraints whatsoever
nor are there any expected results. Undefined behavior can turn your
computer into a frog for all you know.

"undefined behavior" only means that the standard fails to define it. An
implementation is still free to define what the behavior will actually
be. Reality is still allowed to constrain the actual possibilities. An
implementation that converted my computer into a frog when the behavior
of the code is undefined would not violate any requirement of the
standard, but it would violate any number of the requirements imposed by
reality itself.

If those footnotes were in fact intended (or will be rewritten) to
encourage interchangeability of those types as "Recommended practice",
then it would be entirely reasonable to expect behavior in accordance
with that recommended practice, even though it is only recommended, and
not mandatory. As a matter of actual practical fact, interchangeability
of those types can be expected, because most real implementations do
allow them to be interchanged. This isn't so much because implementors
were following that recommendation; it's simply because not making those
type interchangeable is difficult to do unless you're deliberately
trying to make them not be interchangeable.

I take it that last line should read "called function".
correct.

All this may happen but I still don't see how it justifies your quote
B. Perhaps this particular ABI makes it impossible to write a
conforming C implementation ?

Not as far as I can see. It only affects code with behavior not defined
by the C standard, and as such has no conformance implications.

... And I'm still not clear what scenario
you have in mind. Could you show some C code which may cause all these
things to happen ?

Here's my example of code that I would expect to work on any
implementation where the recommendation of making these types
interchangeable were followed. Please note, I'm not recommending such
code, merely explaining what I think the committee meant by making that
recommendation:

func.c:
=========================================================================
#include <locale.h>
#include <stdio.h>
#include <time.h>

struct lconv* func(
struct tm *pt
){
char utc[]="2011-03-24T16:19";
struct lconv *pl = localeconv();
if(pt)
{
strftime(utc, sizeof utc, "%YT%X", pt);
printf("Now: %s\n", utc);
}
else
printf("pt is null.\n");

return pl;
}
===========================================================================
main.c:
===========================================================================
#ifdef __STDC_VERSION__ > 199901L
#include <complex.h>
#endif
#include <locale.h>
#include <stdio.h>
#include <time.h>

struct tm* func(struct lconv*);

int main(void)
{
char hello[] = "Hello, world!";
const void* cvp = hello;
time_t now = time(NULL);
struct lconv* pl;
#ifdef _Imaginary_I
double imaginary di = 3.0*_Imaginary_I;

printf("di: %f" , di);
#endif
printf("%p:\"%s\"\n", hello, cvp);

pl = (struct lconv*)func1( (struct lconv*)gmtime(&now) );
if(pl)
printf("decimal point:\"%s\"\n", pl->decimal_point);
else
printf("pl is null\n");

return 0;
}

The argument type and the return type of func() are both different in
func.c and main.c, rendering their declarations incompatible, and the
behavior of the program undefined.
None of the arguments after the format string in the first two calls to
printf() in main() has a type that is even compatible with the type
expected for the corresponding format specifier (much less being the
exact same type, as mandated in 7.19.6.1p9). This provides yet another
reason why the behavior is undefined.
However, those footnote were intended to recommended that each of the
problematic pairs of types in the above code be interchangeable. If that
recommendation were followed, I would expect the code to behave just as
it would if those problematic type pairs were compatible types (or, in
the case of printf(), the exact same type).

....

It's not clear to me what distinction you're making here. If the
correctness of the belief has to be used in reaching a contradiction
then there's no contradiction "between those two facts" , the
contradiction arises if we assume that a) "Same" in footnote 39 of
6.2.5 p27 has its usual meaning b) The belief expressed in footnote 39
is correct c) unsigned char* and char* are not compatible d) A few
other parts of the standard. That's the contradiction I was describing
in [2] where I used the term "contradiction" and I suggested that we
resolve the contradiction by abandoning the assumption that "SRAR is
transitive" which follows from a).

Yes, but a and c are not necessary to demonstrate the contradiction. b)
and d) are sufficient. That was the point I was making. Concluding from
that contradiction that one possible solution is to assume that b) is
false, is both unnecessary and insufficient.

What do you mean by "conventional logic" ?

Mathematicians have invented a large variety of alternative logic
systems, and the proof I'm referring to probably isn't valid for all of
them. However, it does apply to the oldest, most traditional forms of
logic. I'm being deliberately vague because I know enough about these
issues to realize that anything more specific that I said about this
topic would almost certainly contain an error, and there might be people
reading this newsgroup capable of identifying the error. However, I had
not intended, by my vagueness, to imply that there was anything informal
about he system of logic that I described as "conventional".

... If you mean the logic we use
in everyday life then for the most part it doesn't have formal rules.

I most certainly did not intend to refer to anything that informal.

....

Ok , the assumption that "same" is used with its usual meaning in
footnote 39.

Could you show me some code which you find useful and depends on the
transitivity of SRAR ?

The examples I gave above involve what I would consider sloppy coding
practices; however, I would no longer consider them sloppy if the
standard were changed to explicitly mandate interchangeability (as
opposed to merely recommending it). In that case, I would identify the
printf() cases involving %p and %f as slightly useful, because they
remove the need for type casts that would, in fact, be nops. I see no
point in taking advantage of the other aspects of interchangeability.

In order to explain this, I need to use a word like "same", but
interpreted strictly in a transitive sense, and not in the
possibly-intransitive sense that you're considering might apply in the
phrase "same representation and alignment". I can't come up with a good
alternative to the word "same", so I'm warning you in advance: every use
of the word "same" below in intended to be understood in a strictly
transitive sense, one which makes "different" == "not the same". I will
take SRAR to be defined as "similar representation and alignment", where
"similar" has the same intransitive meaning that you've been
hypothesizing was intended when the word "same" was selected.

While "same representation and alignment" is not sufficient to ensure
interchangeability, I see no way for interchangeability to be true
unless the interchangeable types have the same representation and
alignment. If two types have similar representations, but not the same
representation, then there must be at least one bit pattern which
represents a different value in the first type than in the second type.
If one of those types is used in place of the other, that bit pattern
will be misinterpreted, and interchangeability (as I understand the
term) will have failed. If the two types have similar alignments, but
not the same alignment, the situation is much worse. replacing one type
with the other may cause the value to be stored in a different memory
location, because the memory location that would have been used with the
first type might not have had the correct alignment for the second type.

Therefore, my code example above depends upon the transitive
relationship "same representation and alignment"; not as something that
ensures interchangeability, but as something that can be inferred from
interchangeability. "similar representation and alignment" (a
relationship that could be intransitive) would simply not be adequate.

....

How can you be sure he didn't make a mistake ?

I can't be; certainty about the truth of any statement about reality is
always a delusional state. I merely said "I would expect", a phrase
that explicitly allows for the possibility that my expectations might be
incorrect in any particular case. However, I would still consider that
expectation to be justified.

... Apart from that , does
your idea of interchangeability imply that the assignment from unsigned
char to char , which may happen when strcmp is executing , will always
produce predictable results ?

Assignment? Perhaps you meant strcpy(), rather than strcmp()? Even then,
I can't figure out why you might expect a copy from unsigned char to
char; I can see why you might incorrectly expect an copy from char to
unsigned char. For my answer, I'll assume you were referring to
strcpy(); if that's not correct, I'll need you to explain why you would
expect any assignments (whether with matched types or mismatched types)
to occur.

Nothing of the kind would occur if unsigned char* and char * are
interchangeable. What would happen would be that the bit patterns
representing each pointer parameter would be interpreted as a char*
pointers. regardless of the types of the corresponding arguments. The
reinterpreted pointers would point at the same location in memory as the
originals, because those two types are required to use the same
representation. The second pointer would be used to interpret the bit
patterns in the input string as representing char values. The first
pointer would be used to write the bit patterns corresponding to those
char values to the output. It's been argued that if char is a signed
type with 1's complement or sign-magnitude representation, such a
process would be required to convert negative zeroes into positive
zeroes, but that's most certainly not the intent of the committee. The
result should be that the output string has bytes containing bit
patterns exactly identical to that of the input string. The types used
as arguments to strcpy() are irrelevant to that conclusion, so long as
they are interchangeable with char*.

It would stop the footnote from being misleading so it wouldn't be
pointless at all.

In the absence of a change to the normative words of the standard, such
a re-wording would make a claim that had as little basis in the
normative text as the current claim, and would therefore be equally
misleading.

lawrence.jones · Mar 26, 2011

Spiros Bousbouras said:
I don't see why it should work.

Because C has traditionally considered things with the same
representation and alignment to be interchangeable and even goes so far
as to require it in some cases (e.g., calls to non-prototyped functions
and varargs parameters). So it makes sense that it should also apply to
those cases where it isn't strictly required (because the committee
didn't want to screw up the type system to require it or just require it
by fiat).

gcc , tcc and Sun Studio 12 c99 will all warnings turned on do not
complain. Shocking.

Hardly, that's how separate compilation works.

But Sun Studio 12 lint and splint both point out
both type mismatches.

One of the major reasons for creating lint was to find cross-file issues
that compilers typically don't find.

James Kuyper · Mar 31, 2011

I apologize for beating a dead horse here (I hadn't realized that it was
dead - I had been expecting a response from you to my last message
sooner than this). However, I've since figured out a simpler, and I
hope, clearer, way of responding to one question you asked:

On 03/23/2011 06:59 PM, Spiros Bousbouras wrote:
....

Could you show me some code which you find useful and depends on the
transitivity of SRAR ?

The transitivity of SRAR isn't the thing that the code (such as the
examples that I and Lawrence Jones provided) relies upon. It's the
sameness that the code relies upon; transitivity is merely a trivially
demonstrated consequence of the sameness.

Access violation reading location	0	Oct 23, 2022
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Casting to unsigned char for isupper() and friends	24	Mar 23, 2007
May fgetc() and friends return 163? Or UCHAR_MAX?	4	Jun 7, 2007
Printing the range s of unsigned char and unsigned int.	20	Sep 12, 2007
Problems reading from files	11	Aug 25, 2007
[half OT] About the not-in-common range of signed and unsigned char	6	Jul 14, 2010
Is char obsolete?	20	Apr 8, 2011

Reading from files and range of char and friends

James Kuyper

James Kuyper

lawrence.jones

James Kuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads