C as a Subset of C++ (or C++ as a superset of C)

Jens Gustedt · Aug 28, 2012

Am 28.08.2012 22:56, schrieb James Kuyper:

On 08/28/2012 04:36 PM, Jens Gustedt wrote:

You won't be happy. "in-order" has two different possible meanings, and
each implementation has the option of choosing either one. Both C and
C++ are in agreement on this. Different implementations for the same
platform could make different choices (though market forces make that
unlikely); a single implementation could even make it a command-line option.
C++ allows more freedom than C, because it doesn't require consecutive
bit-fields in the same allocation unit to occupy adjacent sets of bits.
That means that there's a lot more than just two different possible orders.
argh

Even in C, there's a lot more than just two possible orders...

Effectively, I read this phrase

The order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined.

that an implementation would have to choose one of the orderings once
and for all and apply this to all cases consistently.

But it seems that interpreting such text then becomes more a question
of English skills than of anything else.

If in fact, people are interpreting this differently in the sense that
"given storage location X here is an algorithm to determine the
ordering" is sufficient as defintion for an implementation, and so
allowing for a determination of that ordering from location of the
storage unit inside a bigger unit and stuff like that, then

Do you still feel that the C standard is sufficiently specific?

no

and then it would need more precision. But my English is probably not
good enough to be capable to make a proposal that is better than the
existing text, then.

Jens

Melzzzzz · Aug 28, 2012

Given that C++ doesn't allow implicit conversions from void*, char*
can be just about as good as void* for that kind of purpose.

True... Only thing useful is that one don't have to cast when
converting to void*, so it's halfway there

BGB · Aug 28, 2012

BGB wrote 2012-08-28 05:34:

But you don't use memcpy for C++ type, you use std::copy which is typed.
And you don't do malloc either, you use the containers from the C++
standard library. Or use the occasional new X, which doesn't need a cast
either.

yes, but the topic here would be C code running in a C++ compiler, in
such a scenario where plain C in a C++ compiler worked the same as in a
C compiler.

expecting the code to be rewritten into C++ in the process is a no-go.

No, these can be templates and accept or return T*, for any type T.

not in C code.

If you do it the C++ way, it actually uses fewer casts.

A big problem with trying to finds the common ground between C and C++
is that lots of good C++ code isn't using many C features at all. And
well written C code tend to be very non-idiomatic C++ code.

but, more importantly though:
existing C code has to still work, if the languages were to be merged.

idiomatic or not, things still working is the main thing.

yes, it is possible to just write in the "least-common-denominator" of C
and C++ (I have done this a few times), but this is not the issue here.

Jens Gustedt · Aug 28, 2012

Am 28.08.2012 23:25, schrieb Casey Carter:

malloc is not the only function - standard or otherwise - that returns a
void*.

In the C11 standard I found:

void *aligned_alloc(size_t alignment, size_t size);
void *calloc(size_t nmemb, size_t size);
void *malloc(size_t size);
void *realloc(void *ptr, size_t size);
void *memcpy(void * restrict s1, const void * restrict s2, size_t n);
void *memmove(void *s1, const void *s2, size_t n);
void *memchr(const void *s, int c, size_t n);
void *memset(void *s, int c, size_t n);

void *bsearch(const void *key, const void *base,
size_t nmemb, size_t size,
int (*compar)(const void *, const void *));

void *bsearch_s(const void *key, const void *base,
rsize_t nmemb, rsize_t size,
int (*compar)(const void *k, const void *y, void *context),
void *context);

void *tss_get(tss_t key);

Maybe I overlooked something, possible, but these are not really a
lot, and nothing which would be considered good style to be used in
C++, I think.

Are you suggesting that C++ should forbid malloc, or that C++
should forbid void*, or that C++ should have some other type whose only
purpose is to represent void* returns from C functions?

That it'd leave "void*" alone and uses it as it was originally defined
in C. It just makes no sense. Somebody that uses C memory functions is
beyond the red line anyhow. Requiring a cast here is just useless, and
is not adding to compability, but the contrary.

Use "unsigned char*" instead, when you feel the need of an untyped
pointer inside C++, which should be rare in C++, anyhow.

Jens

Jens Gustedt · Aug 28, 2012

Am 28.08.2012 20:38, schrieb Bo Persson:

Jens Gustedt wrote 2012-08-27 18:34:

Why would you want to make this mandatory for everyone? Why not be
satisfied if your code is portable to systems having these types?

No I was asking a question to somebody and not necessarily promoting
that idea. The starting point of that discussion was seeking a way to
be able to access a buffer in "typeless" chunks that correspond to the
common object types that a particular architecture might
have. Seemingly the word "mandatory" provoked some allergic reaction.

Would it be more acceptable for you to have different names for such
beasts (say void8_t, void16_t etc) or would you be ok to mandate
uintXX_t where XX are some multiples of CHAR_BIT?

Jens

James Kuyper · Aug 28, 2012

Am 28.08.2012 22:56, schrieb James Kuyper:

Sorry, that was an editing error. The previous comment was the correct
one: "C++ ... [allows] a lot more than two different possible orders"

Effectively, I read this phrase

The order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined.

that an implementation would have to choose one of the orderings once
and for all and apply this to all cases consistently.

That's correct. The comment was actually intended to be about C++, not
C. The text in the C++ standard that comes closest to saying the same
thing merely gives "left-to-right" and "right-to-left" (sic) as examples
of orders that actually occur, without specifying that they are the only
permitted orders.

no

and then it would need more precision. But my English is probably not
good enough to be capable to make a proposal that is better than the
existing text, then.

It would be a trivial exercise to rewrite this to remove almost all of
the freedom implementations currently have in laying out bit fields. It
wouldn't require particularly good English skills:

Specify that plain 'int' is signed in bit fields, just as it is
everywhere else. Specify the size of the allocation unit. Specify 2's
complement representation for signed integer types. Prohibit
representations that have padding bits or allow trap representations.
Mandate big-endian order (or little endian order - but don't allow both
- don't give implementations a choice) prohibiting all other orders.
Specify that bits are assigned to bit-fields in order from high-order
bits to low-order bits (or vice-versa - but pick one as allowed by the
standard, and prohibit the other). Continue to require, as C already
does, but C++ does not, that consecutive bit-fields within the same
allocation unit be allocated adjacent sets of bits. Either mandate or
prohibit bit-fields that cross allocation unit boundaries - but choose one.

The hard part would not be writing up such a change, but getting it
approved. It would make bit-fields much more portable, at the cost of
breaking huge quantities of existing code when it tries to read data
written using a different lay-out, or tries to communicate with existing
routines (many of them not written in C) that use a different lay-out.

Ike Naar · Aug 28, 2012

There's no such thing as 'extern "C" code': extern "foo" is a *linkage
specification* whose sole purpose is to tell the implementation to
engage the ABI machinery appropriate for language "foo".

For one thing, 'extern "C"' may turn off C++ name mangling.

Öö Tiib · Aug 28, 2012

Specify that plain 'int' is signed in bit fields, just as it is
everywhere else. Specify the size of the allocation unit. Specify 2's
complement representation for signed integer types. Prohibit
representations that have padding bits or allow trap representations.
Mandate big-endian order (or little endian order - but don't allow both
- don't give implementations a choice) prohibiting all other orders.
Specify that bits are assigned to bit-fields in order from high-order
bits to low-order bits (or vice-versa - but pick one as allowed by the
standard, and prohibit the other). Continue to require, as C already
does, but C++ does not, that consecutive bit-fields within the same
allocation unit be allocated adjacent sets of bits. Either mandate or
prohibit bit-fields that cross allocation unit boundaries - but choose one.

Why it is needed and what it helps? Two platforms do not operate
on same memory. When two platforms communicate then there is
either some protocol or file format involved. So it is up to the
protocol or file format specification to specify the bits not up
to C or C++ standards.

To keep things internally in memory bit-by bit like in some protocol
or file format would be terribly inefficient.

James Kuyper · Aug 28, 2012

Am 28.08.2012 22:56, schrieb James Kuyper: ....

Click to expand...

Sorry, that was an editing error. The previous comment was the correct
one: "C++ ... [allows] a lot more than two different possible orders"

When I wrote that reply , I couldn't easily see my previous message,
only the part that you had quoted. I assumed that I'd left a stupid
editing error in my message. That assumption was incorrect.

My comment was in fact perfectly correct. The key point that you missed
was that the order is specified only "within a unit", not "across
units". If the unit is 16 bits long, then "a" and "b" must both be in
the first allocation unit, and "c", "d", and "e" must be in the second
allocation unit, but "a" could occupy either the first byte or the
second byte of the first unit, and "e" could occupy either the first
byte or the second byte of the second unit. There's no good reason for
an implementation to make inconsistent choices in the two units, but
it's not required to make consistent ones, either. "ab cde" and "ba ecd"
are plausible possibilities, "ab ecd" and "ba cde" are implausible but
permitted possibilities, while (assuming 16-bit allocation units) "edc
ba" is NOT a possibility. With 32-bit allocation units, "edcba" is a
possibility.

James Kuyper · Aug 28, 2012

Why it is needed and what it helps? Two platforms do not operate
on same memory. When two platforms communicate then there is
either some protocol or file format involved. So it is up to the
protocol or file format specification to specify the bits not up
to C or C++ standards.

With the above changes (and a few additional ones I didn't bother to
specify), there could be a struct definition that uniquely corresponds
to any given protocol or file format, making reading and writing such
files, or parsing/writing such protocols, a lot easier. With the
language as currently written, structs are useless for portable code of
that kind, only bit-wise operations on arrays of unsigned char can be
used (even that breaks down if CHAR_BIT != 8, but that's rather uncommon).

To keep things internally in memory bit-by bit like in some protocol
or file format would be terribly inefficient.

You can always read it into a bit-packed struct, and then unpack it into
a more efficient structure. The code would still be a lot simpler than
what I currently have to write.

Keith Thompson · Aug 28, 2012

Jens Gustedt said:
Also I think that CHAR_BIT==9 could imply the existence of uint9_t,
wouldn't it? One could expect that it then has the types that
correspond to sizeof(int)*CHAR_BIT and similar multiples, no?

The standard makes {,u}int{8,16,32,64}_t mandatory if the
implementation provides types that meet the requirements. All other
such types are optional. A system with CHAR_BIT==9 would *probably*
provide uint9_t, but the C standard doesn't require it to. I suppose
the same applies to C++ as of the 2011 standard.

88888 Dihedral · Aug 28, 2012

Casey Carteræ–¼ 2012å¹´8æœˆ29æ—¥æ˜ŸæœŸä¸‰UTC+8ä¸Šåˆ1æ™‚18åˆ†35ç§’å¯«é“ï¼š

I should have been more clear about my point here: I was trying to say

that the only reason this definition of NULL is unacceptable to C++ is

because of the lack of the implicit conversion from void*. Given that

conversion, there would be no other barrier in C++ to using ((void*)0)

for NULL.

I think the rules for overload resolution and template type deduction

are complicated enough without introducing a feature that behaves

differently depending on whether or not it interacts with those systems.

There's no such thing as 'extern "C" code': extern "foo" is a *linkage

specification* whose sole purpose is to tell the implementation to

engage the ABI machinery appropriate for language "foo".

Returns, not accepts. The implicit conversion _from_ void* is dangerous,

implicit conversion _to_ void* is perfectly safe type erasure.

I'll handwave here and claim that's what compiler options are for. Valid

C90 code will always be valid C90 code, I see no reason why a future C20

compiler couldn't be instructed to compile code as C90. Old code will

always be old code, but does that mean that we have to keep on writing

old code forever?

It's already the case that C11 made some C99 features optional: there

may be conforming C11 compilers that refuse to compile some conforming

C99 programs. The kind of change I suggest is quantitatively but not

qualitatively different.

I am differentiating between the implicit conversion _from_ any pointer

type and the implicit conversion _to_ any pointer type; I posit that

those two features have unique design intentions and should therefore be

represented by distinct types. C conflates the two ideas in void*, C++

doesn't have the conversion _to_ any pointer type at all, except for

nullptr. (Given how simple it is to make a user-defined type in C++ that

implicitly converts to any pointer type, it's notable that I've never

seen anyone feel the need to do so.)

void* has 2 uses in C:

1. it's the "sink" type to which all other pointer types can be

implicitly converted.

2. it's the "source" type that implicitly converts to all other pointer

types.

and 2 uses in C++:

1. "sink" type just as in C.

2. type-erased pointer that designates _some_kind_ of object about which

nothing is known except its location in memory.

The secondary usage is diametrically opposite between C and C++: one

disallows using a void* for any purpose without a cast, the other allows

you to pass a void* where you would any pointer type without a cast. In

C, I can pass the same void* to fclose, free, strcat, and

hundreds/thousands of other functions without a compiler diagnostic.

Using void* you've effectively opted out of a large part of the type system.

C programmers also often use void* as either a type-erased or generic

pointer but do so purely based on convention and discipline: you will

get no help from the compiler. If an intern jumps into your code the

next day and passes your type-erased pointer to fputs, the compiler will

accept it happily.

While it's true that making the language more permissive doesn't impact

the correctness of existing programs, that doesn't necessarily make it

always a good idea. I seriously doubt that the C++ community would ever

accept implicit conversion from void* into the language; the case I'm

attempting to make here is that C shouldn't have that feature either and

likely would not if it was designed afresh today.

Given that preserving the semantics of old code has a higher priority to

the C community than almost any other concern, I think it's unlikely

that we will ever have a C++ that is truly a superset of C. If anything

I think it's more likely that C++ would introduce even more breaking

changes to become _less_ compatible with C.

Please check objective C. Programmers will pick the right tool
at the right time for their tasks to earn money in the industry.

Jens Gustedt · Aug 29, 2012

Am 29.08.2012 00:03, schrieb Leigh Johnston:

What utter nonsense mate.

That could well be possible. Did you read the previous discussion?
Otherwise your statement doesn't qualify for much more intelligence
than mine.

I just would prefered to have a reason that goes beyond something like
"we tradionally use void* like this". I still didn't get a consistent
reply why "void*" is needed in C++ (but for compability with C and for
"operator new").

It still seems that such discussions aren't possible and that at least
some people just react with ideology.

Use void* not unsigned char* for untyped pointers in C++.

What higher entity revealed you that commandment?

Jens

Jens Gustedt · Aug 29, 2012

Am 29.08.2012 09:41, schrieb David Brown:

Correct me if I'm wrong, but I believe this is actually just an
"unwritten rule" that all compilers have agreed to follow. If you make
a union of two types (say, a "float" and a "uint32_t"), then the
compiler must put the two entries in the same space, and the programmer
may access either field. But if the program writes to one field, then
reads to the other field are undefined according to the standards. The
compiler can therefore ignore aliasing issues, and even ignore the data
completely, if it sees this situation. Type-punning unions only work
because all compiler vendors make it work - not because the standards
say so.

No, this is not an "unwritten rule" for C. C99 originally had a
wording that could be interpreted as you state. A corrigendum has made
it clear that it is not intended. Type-punning in C works as long as
the value that you are reading is valid for the type through which you
are doing so.

A footnote in the C standard clarifies that:

95) If the member used to read the contents of a union object is not
the same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be
a trap representation.

So if you happen to have a platform that implements uint32_t (or
another type that has not trap and no padding) it is always safe to
use a union that overlays your data with such a beast or even an array
of these. You'd just have to be careful when interpreting this data,
but it is not UB as such.

AFAIK C++ stayed with the original wording, and so now the
interpretation of union is different in both languages. Basically the
only use of union in C++ would be to save storage in low level
implementations. (real highlevel C++ has completely different means to
achieve that.)

The main intent of unions in C, namely type-punning, isn't guaranteed
in C++.

Jens

Jens Gustedt · Aug 29, 2012

Am 29.08.2012 10:44, schrieb David Brown:

On 29/08/2012 10:19, Jens Gustedt wrote:
The great thing about standards is that there are so many to choose from!

I don't know that type-punning was the "main intent" of unions - I think
saving storage space was the main motivation.

I was perhaps pushing a bit too far, but type punning has been around
since the beginning. And if I understand it right it was the idea of
the committee to affirm that it always had been an intented use case,
and that just the wording in the standard had been too ambiguous.

Type-punning just became
a popular use of it, and has been even more important since alias
analysis has broken older methods of type-punning. This is backed up by
the fact that unions for saving space have been around as long as C, but
type-punning unions have only been officially supported since a
correction to C99.

The most prominent use case of type-puning is the network layer in
POSIX and probably all its predecessors. It is really there since the
beginning.

I guess with C++, the "correct" way to handle type-punning is with
static_cast. I don't know whether this guarantees the same alias-safe
characteristics as a type-punning union in C.

No, I don't think that static_cast will do. Probably reinterpret_cast
would be better. But I am not sure that I personally would be capable
to come up with code that is strictly conforming to C++ and that would
deal with the socket layer correctly.

Jens

James Kuyper · Aug 29, 2012

On 08/29/2012 04:44 AM, David Brown wrote:
....

I guess with C++, the "correct" way to handle type-punning is with
static_cast. I don't know whether this guarantees the same alias-safe
characteristics as a type-punning union in C.

The relevant named cast is reinterpret_cast<>, not static_cast<>. While
it is safer than the corresponding C cast for a number of reasons,
avoiding aliasing issues is not one of those reasons. reinterpret_cast
is the least safe of the named casts, and should always be regarded as a
danger sign.

Casey Carter · Aug 29, 2012

Given that C++ doesn't allow implicit conversions from void*, char* can
be just about as good as void* for that kind of purpose.

The standard requires that casting from any object pointer type to void*
and back preserves the original pointer value (5.2.9/13). That guarantee
is not present for any other pointer type.

Casey Carter · Aug 29, 2012

What higher entity revealed you that commandment?

Jens

ISO 14882-2011 5.2.9/13 (static_cast) says:

....A value of type pointer to object converted to "pointer to cv void"
and back, possibly with a different cv-qualification, shall have its
original value.

Jens Gustedt · Aug 29, 2012

Am 29.08.2012 17:09, schrieb Casey Carter:

The standard requires that casting from any object pointer type to void*
and back preserves the original pointer value (5.2.9/13).
right.

That guarantee is not present for any other pointer type.

That is wrong. It also says (this is from C11 but I suppose that C++11
should have something similar):

A pointer to void shall have the same representation and alignment
requirements as a pointer to a character type.

It also says

A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned for the referenced type, the behavior is
undefined. Otherwise, when converted back again, the result shall
compare equal to the original pointer.

So at least in C there is no problem in converting back and forth to
"unsigned char*". (Alignment is not a problem, since "char" types have
the weakest alignment requirements.) So the same guarantee as you
mention above for "void*" also holds for all character pointer types.

+++++

In any case, the use cases for "void*" in C++ haven't convinced me
much, yet. I still think that usage of "void*" is peripheral to C++
and mostly there to ensure interface compatibility with C.

Veritable C++ code that uses "void*" to implement say the common core
of a template library is not very clear with itself. For me such
things look more that the implementation of such a core would better
be done with the right tool. In the same way as it is sometimes
appropriate to use some inline assembler from C, it should be more
common praxis to use just pure C for the implementation of some type
agnostic core functionality.

Jens

James Kuyper · Aug 29, 2012

On 2012-08-28 16:30, James Kuyper wrote: ....

The standard requires that casting from any object pointer type to void*
and back preserves the original pointer value (5.2.9/13). That guarantee
is not present for any other pointer type.

Section 6.3.2.3p7 of the C standard says: "When a pointer to an object
is converted to a pointer to a character type, the result points to the
lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining
bytes of the object." It says nothing about the reverse conversion, however.

In general, such a requirement would be inconsistent with the C++ object
model, but I'm surprised at being unable to find a corresponding
guarantee in the C++ standard for some restricted category of types,
such as "trivially copyable" or "standard layout" types.

C as a scripting language	88	Mar 26, 2009
On the development of C	211	Mar 9, 2009
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
Are c++ features a subset of java features?	148	Jan 19, 2007
binary encode 7 ([7].pack("C")) as "\007" instead of "\a"	3	Jul 29, 2010
As a programmer of both languages...	39	Dec 11, 2007
ANN: C Compiler Update Available	7	Jun 2, 2009
C++ Now 2013 Call for Submissions	0	Oct 31, 2012

C as a Subset of C++ (or C++ as a superset of C)

Jens Gustedt

Melzzzzz

BGB

Jens Gustedt

Jens Gustedt

James Kuyper

Ike Naar

Öö Tiib

James Kuyper

James Kuyper

Keith Thompson

88888 Dihedral

Jens Gustedt

Jens Gustedt

Jens Gustedt

James Kuyper

Casey Carter

Casey Carter

Jens Gustedt

James Kuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads