translating C++ -> C

R

Roderick Bloem

I am going to take the liberty of crossposting this to comp.lang.c++
(originally comp.lang.c), and to summarize the discussion for the sake
of those reading only c++.

The question is: If you are writing C and you have a struct P, can you
create a struct C that is an extension of the first (starts just like P
and adds some data), and then use a C* as if it were an P*?

Example:

typedef struct {
int a;
short b;
} P;

typedef struct {
int a;
short b;
long long c;
} C;

C *c; P *p;
c = (C*) malloc(sizeof(C));
c->a = 1; c->b =2;
p = (P*) c;
printf("%d\n", p->a);


P and C stand for parent and child, and hint at the OO structure that we
are tying to mimick in C. We want to be able to use a child struct as a
parent struct, as you would in C++.

The basic answer in comp.lang.c is "it works on any compiler I have
seen, but there is no guarantee".

The standard appears to limit the freedom of the compiler in laying out
the struct: the order of the elements if fixed, padding can be added
between elements and at the end, but only if necessary for alignment.
This does not quite prescribe where the padding should be. If you have
three byte-aligned bytes, and a 4-byte aligned 4-byte word, you need one
byte of padding, but you can put that whereever you want before the
word: bbbpwwww or bpbbwwww are both allowed (b is a byte, p padding, and
w part of the word). The standard apperently does not require that the
padding is applied the same way in different structs.

Another problem that has been pointed out is this: what if P ends in a
4-byte aligned byte b and 3 bytes of padding. The compiler may decide
that the most efficient way to clear b is to do a four-byte clear
operation. If C adds 3 bytes to the struct, these may go in the
padding, and an attempt to assign p.b=0 may clear the extra the extra
bytes in C if p points to a C struct.

Now the reason to crosspost to comp.lang.c++: I think the c++ to c
translator used overlapping for inheritance, so the c++ people must be
experts. Am I correct? Does that mean that the translator depended on
features of the compilers that are not prescribed by the standard, or am
I missing something?

It is clear that there are alternatives, e.g., we may define C as
typedef struct {
P p;
long c;
} C;
at the expense of some extra typing when accessing common elements.

[disclaimer: I do not have the C standard. Everyting I write about it
is either hearsay or Harbison & Steele.]

Roderick
 
K

Keith Thompson

Roderick Bloem said:
The standard appears to limit the freedom of the compiler in laying
out the struct: the order of the elements if fixed, padding can be
added between elements and at the end, but only if necessary for
alignment.

C99 6.7.2.1p13:
Within a structure object, the non-bit-field members and the units
in which bit-fields reside have addresses that increase in the
order in which they are declared. A pointer to a structure object,
suitably converted, points to its initial member (or if that
member is a bit-field, then to the unit in which it resides), and
vice versa. There may be unnamed padding within a structure
object, but not at its beginning.

C99 6.7.2.1p15:
There may be unnamed padding at the end of a structure or union.

There is no implication that padding can be added only if necessary
for alignment. The compiler is free to insert padding because it
makes the struct look bigger and scares away predators.

[...]
Now the reason to crosspost to comp.lang.c++: I think the c++ to c
translator used overlapping for inheritance, so the c++ people must be
experts. Am I correct? Does that mean that the translator depended
on features of the compilers that are not prescribed by the standard,
or am I missing something?

Are you referring to cfront?

It probably means that the author(s) of the translator either were
experts on C, or were lucky enough not to run into any problems. It
doesn't imply anything about the C expertise of C++ programmers other
than the ones who worked on the translator.

There's no fundamental reason why either the translator or the code it
generated had to be written in perfectly portable C. As long as it
did the job, that may have been good enough, and the authors were free
to take advantage of assumptions that happen to be valid for all C
implementations of interest, even if they're not guaranteed by the
standard. (Portable standard-conforming code is generally better, all
else being equal, but all else is not always equal.)
 
C

Chris Torek

Now the reason to crosspost to comp.lang.c++: I think the c++ to c
translator used overlapping for inheritance, so the c++ people must be
experts. Am I correct?

On the first, perhaps; on the second, well... :)
Does that mean that the translator depended on features of the
compilers that are not prescribed by the standard ...

If you are referring to cfront, it *definitely* *did* depend on
non-portable features. In particular, you had to tell it all about
how the C compiler it used as its "assembler" laid out structures,
including padding, so that it could track the C compiler's work
and subvert it.

Note that cfront was in fact a "real compiler" according to the
definition I prefer:

To decide if Step S is a "preprocessor" or a "compiler",
answer the following question: if an error occurs *after*
Step S, is it a mistake by the programmer, or is it a
mistake in Step S?

Consider the following examples:

foo.c, line 123: invalid operand to unary &
# or same with "foo.cpp" as the file name

/tmp/151522.c, line 123: invalid operand to unary &

/tmp/151523.s, line 5012: invalid register operand to add

When compiling a C or C++ program named "foo.c" or "foo.cpp", the
first message is perfectly natural if you goofed up some "#define",
because the preprocessor part of the language does not understand
the language proper. But getting (just) the second message from
a C++ compiler, when compiling "foo.cpp", indicates a bug in the
C++ compiler, not invalid C++ code that was simply copied through
to the C compiler. So C++ is not a "preprocessor", because it is
a bug in the C++ system, not a bug in your own code, that produced
the message about file in /tmp.

In all cases, the last message (from the assembler) indicates a
bug in the compiler, because the compiler should not be emitting
invalid CPU register names. The exception to this rule occurs if
the compiler happens to have an "insert arbitrary assembly code"
escape clause (like __asm__), and you used it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,060
Latest member
BuyKetozenseACV

Latest Threads

Top