C as a Subset of C++ (or C++ as a superset of C)

Jens Gustedt · Aug 30, 2012

Am 30.08.2012 02:38, schrieb Luca Risolia:

That is a valid use of void*, although some modern compilers are smart
enough to avoid code bloat without that kind of help.

So ok, can I conclude that "void*" had been useful in C++ in early
stages when the tools chains where not yet there where the designers
of C++ wanted it to be?

Sure that is a valid use case. Historically it would perhaps been
better to say:

"ok, these things can't be done with C++, yet, let's use C until we
can get rid of it"

So if I understand correctly, such a use should now be more and more
rare, if not completely avoided in new code?

Jens

Jens Gustedt · Aug 30, 2012

Am 30.08.2012 09:47, schrieb David Brown:

You skipped the important part of Leigh's post - "char* is .. wait for
it .. a pointer to char; void* is .. wait for it .. a pointer to
anything; this has not changed from C to C++."

If you write in your code "char* p", you are saying that "p is a pointer
to a character or a list of characters". If you write "void* p", you
are saying that "p points to something which could be anything".

These are different things. It really doesn't matter how these are
implemented, and what casts can be done implicitly or explicitly - they
/say/ different things. Your compiler will spend milliseconds reading
your code, and doesn't care about style, logic, or readability. But
programmers might spend days reading it or writing it, and they might
need to do so again after years have passed. This is why you write code
that says what it should say. This is why you never use an "unsigned
char" to store a small number, but rather use an "uint8_t" - it does
exactly the same thing, but makes the purpose clear.

You probably didn't see my reply to that either, this thread got
somehow out of bounds.

You just should push the button a bit further where Leigh stopped. He
resolved the term "void" but didn't for "char". "char" is the smallest
addressable unit in C and C++ (Pascal had byte for that, no) that is
almost completely stripped of a semantic interpretation.

Traditionally "char*" was also that, a pointer to a region of
unspecific bytes. Nowadays "unsigned char*" serves that purpose,
because it has the advantage of letting you access any bit (!) and
bite of the data and you'd don't have confusion with the other
completely orthogonal use of "char*" for C strings.

Semantically I don't see much difference with that respect between
"void*" and "unsigned char*". Both are pointers to unspecific
data. They have different properties, sure, one of my interest in this
thread was to know about what properties people actually use. (And
from the C++ side I only got partial answers, yet.)

And you are completely right with your argument that small numbers
that are meant as such should use "uint8_t" or similar and not
"unsigned char". Generally, in C you should use the semantic
predefined typedef's as much as possible to mark your intent with a
particular data.

Jens

James Kuyper · Aug 30, 2012

On 29/08/2012 20:34, James Kuyper wrote: ....

You skipped the important part of Leigh's post - "char* is .. wait for
it .. a pointer to char; void* is .. wait for it .. a pointer to
anything; this has not changed from C to C++."

If you write in your code "char* p", you are saying that "p is a pointer
to a character or a list of characters". If you write "void* p", you
are saying that "p points to something which could be anything".

Yes, I skipped that, because I considered it an unimportant and silly
distinction. It's the functional differences between char* and void*
that render void* the appropriate way to store a pointer of unknown
type; remove those functional differences, and I wouldn't bother using
'void*' rather than 'char*' just because 'void*' documents a supposedly
different meaning. My point being that C++ lacks some (though not yet
all) of the functional differences that C has between those two types.

These are different things. It really doesn't matter how these are
implemented, and what casts can be done implicitly or explicitly - they
/say/ different things.

If they were functionally equivalent, the fact that they had different
names wouldn't mean that they actually say anything different, however
much you might want to think otherwise.

Casey Carter · Aug 30, 2012

So again give me a proper use case in C++ that isn't just for calling
a C interface.

void* is used to point at space that has been allocated but not yet
initialized or destroyed but not yet freed to indicate that references
to that space are semantically invalid.

Jens Gustedt · Aug 30, 2012

Am 30.08.2012 14:54, schrieb David Brown:

A "char *" says "this is a pointer to a character", while a "void *"
says "this is a pointer to an unknown type". These convey different
meanings to the reader.

No a "char*" doesn't say pointer to character, that is you who is
putting the sematics here. It says "pointer to char" and nothing
else. (and "unsigned char*" does even less so define a pointer to an
"unsigned character", whatever that would be.)

Clearly there is a certain amount of style - and therefore differing
personal opinions - in these matters. But you cannot deny the fact that
different names for the same thing convey different meanings beyond
their function.

Would you be more comfortable if the standard introduced a type alias
"byte" for "unsigned char"?

That would completely be conceivable. For example C11 introduces

typedef int errno_t;

but by reclaiming a different semantic for "errno_t" than for "int".

Jens

Jens Gustedt · Aug 30, 2012

Am 30.08.2012 15:53, schrieb Scott Lurndal:

What about the existing massive volumes of C++ code which use void*? Particularly
code that dates from the 80's when there was no STL?

I don't completely get the sense of your question. You mean to take
them as an example for uses cases? Or you want to emphasize that there
would be a problem with old code?

If (supposing that for a minute) C++ would align itself on the C usage
of "void*" no old code would break, since this all does a cast to the
target type, if I understand correctly.

Then, also I would tend to say that C++ code from the 80's wouldn't
compile with a modern compiler, anyhow. At least that is my narrow
experience with C++. The big project I have written some years ago,
still needs syntax adjustments at any major revision of g++.

Possibly there would be some other possible solutions to get C and C++
closer on that point. But before thinking of solutions I first wanted
to know if this corresponds to something real on the C++ side.

I conclude for myself that the reality is that (besides interfacing
with C) "void*" is less and less useful in C++. And that declaring
variables of type "void*" and manipulating them (and the objects they
represent) should be more considered the "C side" of the game.

Jens

Garrett Hartshaw · Aug 30, 2012

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 30.08.2012 14:54, schrieb David Brown:

No a "char*" doesn't say pointer to character, that is you who is
putting the sematics here. It says "pointer to char" and nothing
else. (and "unsigned char*" does even less so define a pointer to
an "unsigned character", whatever that would be.)

'char *' says "this is a pointer to an object of type 'char'",
'unsigned char *' says "this is a pointer to an object of type
'unsigned char'", 'void *' says "this is a pointer to an object of an
unknown/any type".

If you want to interpret an object as a sequence of bytes, then use
'unsigned char *' or 'uint8_t *'. I see 'unsigned char' as meaning a
character in an encoding that is represented as an 8-bit unsigned
integer, while 'uint8_t' is an arbitrary 8-bit unsigned integer. (e.g.
I would write "unsigned char a = 'a'" and "uint8_t b = 10", but not
"unsigned char c = 10" or "uint8_t d = 'd'")

If on the other hand you wish to have an opaque pointer to an object
of unknown type, or to an uninitialized memory region, then use 'void *'.

Even if 'void *' was redefined to mean 'unsigned char *', I would
still distinguish between their uses just as I distinguish between
'unsigned char' and 'uint8_t'.

Would you be more comfortable if the standard introduced a type
alias "byte" for "unsigned char"?

That would completely be conceivable. For example C11 introduces

typedef int errno_t;

but by reclaiming a different semantic for "errno_t" than for
"int".

Jens

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJQP6eRAAoJEP8JewLaHvzSOKkIAKpJG+LK1H1IcKMNdIrbzrgt
+GZskz9DpN5YUUCdAWWI1XbNwm0qJnmmzWvXEBm8ajDnyzwoBnBo89ss0KrbXC9+
J5TfAP21aXFCdPcbGpCVD8LvwKvrj6mu0s5vTkqh+BNfPBfOVKGkvvCL5EpsE8tw
f0mEoZf6vkaHez9A5GFW4j2RJh5wkJXy9s8fnPiY7pUqXGoIGF0JyUDo9Y1IQq5b
//yCE9qiLEnDbjthwyexkdqOIHv9VTj0TqkATeKp2wZIL3h6d6KGBozNrJmbmeJN
pxSYSBz9+pTOxP7ELLp2uWQcC9ED7JeCOdYwu0x/0Cyg2T0qdpPS9EQQYn4x4kQ=
=8ONX
-----END PGP SIGNATURE-----

James Kuyper · Aug 30, 2012

That is obviously incorrect. A comment is functionally equivalent to
white space - are you suggesting that comments do not "say" anything in
source code?

Comments are for talking with other humans; the active part of the
language is for talking to the compiler. If the differences, for the
compiler, between 'void*' and 'char*' were reduced to a matter of how
they were spelled, I would not recommend using that spelling to convey
meaningful information to any humans who were reading the code.

This assertion is somewhat inconsistent with the fact that I have chosen
to use distinctions between other equivalent constructs to convey actual
meanings. For instance, when writing function declarations in C, I use T
*parameter_name to represent a pointer to a single object, and T
parameter_name[] when it is a pointer to the first in a series of
objects (in C++ I would use a reference for the first purpose).

I'll not bother to try and excuse the inconsistency; I'll just say that
the pointer parameter thing feels harmless to me, whereas I feel a
strong antipathy about removing the distinctions between void* and char*
(I really dislike the gnu extension that allows pointer arithmetic on
void*).

Malcolm McLean · Aug 30, 2012

×‘×ª××¨×™×š ×™×•× ×—×ž×™×©×™, 30 ×‘××•×’×•×¡×˜ 2012 18:57:44 UTC+1, ×ž××ª James Kuyper:

On 08/30/2012 08:54 AM, David Brown wrote:

I'll not bother to try and excuse the inconsistency; I'll just say that
the pointer parameter thing feels harmless to me, whereas I feel a
strong antipathy about removing the distinctions between void* and char*
(I really dislike the gnu extension that allows pointer arithmetic on
void*).

The problem is that often void * are to objects, which you need to copy. For instance if you're writing a qsort() style sorting routine, or a general-purose compressor, you'll need to cast to unsigned char * to actually access the bits.

Keith Thompson · Aug 30, 2012

Garrett Hartshaw said:
If you want to interpret an object as a sequence of bytes, then use
'unsigned char *' or 'uint8_t *'. I see 'unsigned char' as meaning a
character in an encoding that is represented as an 8-bit unsigned
integer, while 'uint8_t' is an arbitrary 8-bit unsigned integer. (e.g.
I would write "unsigned char a = 'a'" and "uint8_t b = 10", but not
"unsigned char c = 10" or "uint8_t d = 'd'")

[...]

Strictly speaking, `unsigned char *` and `uint8_t *` are not
interchangeable. `unsigned char` is one byte by definition.
`uint8_t` is 8 bits by definition -- and if CHAR_BIT > 8, then
`uint8_t` will not exist.

C's treatment of characters is a bit of a mess IMHO, partly due to
changes in character semantics since C was first defined. When we
could assume 7-bit ASCII (or 8-bit EBCDIC with plain char being
unsigned), C's model worked. With the introduction of ASCII-based
8-bit character representations, with plain char remaining signed
on many systems, things started getting confusing.

Conflating the concepts of a single character and a single
fundamental storage unit into the types `char`, `signed char`, and
`unsigned char`, all required to be the same size, no longer makes
as much sense now as it did then.

If I were designing a C-like language from scratch today, I'd
separate the concepts of characters and fundamental storage units.
I'd probably also separate the concept of very short integers
from both. So `char`, `byte`, and `int8` might be three distinct
and incompatible types.

Making such a change to C now would break so much code that the
resulting language could not reasonably be called "C"; it ain't
gonna happen.

Jens Gustedt · Aug 30, 2012

Am 30.08.2012 19:49, schrieb Garrett Hartshaw:

'char *' says "this is a pointer to an object of type 'char'",
'unsigned char *' says "this is a pointer to an object of type
'unsigned char'", 'void *' says "this is a pointer to an object of an
unknown/any type".

no, at least for C "void" is a type, too.

The void type comprises an empty set of values; it is an incomplete
object type that cannot be completed.

so "void*" says pointer to void. This is not "nothing" or "unknown",
in that sense void is not much different from "struct nix" when you
never actually define "struct nix" somewhere. It behaves a bit
different in terms of implicit conversions, sure, but the sematic is
the same.

If you want to interpret an object as a sequence of bytes, then use
'unsigned char *' or 'uint8_t *'. I see 'unsigned char' as meaning a
character in an encoding that is represented as an 8-bit unsigned
integer, while 'uint8_t' is an arbitrary 8-bit unsigned integer. (e.g.
I would write "unsigned char a = 'a'" and "uint8_t b = 10", but not
"unsigned char c = 10" or "uint8_t d = 'd'")

If on the other hand you wish to have an opaque pointer to an object
of unknown type, or to an uninitialized memory region, then use 'void *'.

A "void*" pointer is not "an uninitialized" memory region. It may well
already be initialized.

But I am really surprised about this strong need of opaque pointer in
C++. If we follow the basic ideas of C++ such a usage should be very
rare in C++, no? Isn't that what "abstract base classes" have been
invented for?

Even if 'void *' was redefined to mean 'unsigned char *', I would
still distinguish between their uses just as I distinguish between
'unsigned char' and 'uint8_t'.

Therefore I had my question that came afterward and that you seem to
have overlooked:

The naming of "char" (and the two others) certainly doesn't cover all
its uses that these type nowadays have, and it would be good to find
semantic names for the different use cases.

typedef char byte_t; // for the minimal addressable storage unit
typedef unsigned char bitX_t; // for an arbitrary collection of
unspecific data, usually X == 8
typedef unsigned char uintX_t; // for use as small unsigned integer,
usually X == 8
typedef signed char intX_t; // for use as small signed integer,
usually X == 8

Jens

Keith Thompson · Aug 30, 2012

Jens Gustedt said:
Am 30.08.2012 19:49, schrieb Garrett Hartshaw:

no, at least for C "void" is a type, too.

Yes, void is a type; specifically, it's an incomplete type that cannnot
be completed.

The void type comprises an empty set of values; it is an incomplete
object type that cannot be completed.

so "void*" says pointer to void. This is not "nothing" or "unknown",
in that sense void is not much different from "struct nix" when you
never actually define "struct nix" somewhere. It behaves a bit
different in terms of implicit conversions, sure, but the sematic is
the same.

Not really. A `char*` pointer points to an object of type `char`. A
`void*` pointer does not point to an object of type `void`, because
there is no such thing.

Any type of pointer to object or incomplete type (i.e., any pointer
other than a function pointer) points to some object in memory (or to
nothing if it's currently a null pointer). What's special about `void*`
is that, while you can perform certain pointer operations on it, you
can't access the object it points to without first converting it to some
other pointer type.

[...]

A "void*" pointer is not "an uninitialized" memory region. It may well
already be initialized.

I think that was a recommendation for *how* to use `void*`, not a
statement about what `void* means to the compiler.

[...]

Garrett Hartshaw · Aug 30, 2012

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 30.08.2012 19:49, schrieb Garrett Hartshaw:

no, at least for C "void" is a type, too.

The void type comprises an empty set of values; it is an
incomplete object type that cannot be completed.

so "void*" says pointer to void. This is not "nothing" or
"unknown", in that sense void is not much different from "struct
nix" when you never actually define "struct nix" somewhere. It
behaves a bit different in terms of implicit conversions, sure, but
the sematic is the same.

A "void*" pointer is not "an uninitialized" memory region. It may
well already be initialized.

As Keith said, that was meant to be how it should be used, rather than
exactly what it means to the compiler.

But I am really surprised about this strong need of opaque pointer
in C++. If we follow the basic ideas of C++ such a usage should be
very rare in C++, no? Isn't that what "abstract base classes" have
been invented for?

Therefore I had my question that came afterward and that you seem
to have overlooked:

The naming of "char" (and the two others) certainly doesn't cover
all its uses that these type nowadays have, and it would be good to
find semantic names for the different use cases.

typedef char byte_t; // for the minimal addressable
storage unit typedef unsigned char bitX_t; // for an arbitrary
collection of unspecific data, usually X == 8 typedef unsigned char
uintX_t; // for use as small unsigned integer, usually X == 8
typedef signed char intX_t; // for use as small signed
integer, usually X == 8

Jens

One of the major uses of typedef is to allow this kind of
disambiguation between types that are represented identically. While I
don't often have need to work low-level directly on bytes, if I did I
would not hesitate to create a byte_t typedef, and if it was already
included in the standard, so much the better (although I would
probably typedef it as 'unsigned char' to avoid the
implementation-defined signed-ness of plain 'char').

However, even if 'byte_t' were added to the standard, I would still
want to distinguish between 'byte_t *' (used as a pointer to a byte)
and 'void *' (used as an opaque pointer or a pointer to uninitialized
memory) *even if they both meant the same thing to the compiler*.

Granted it would better for 'memcpy' and friends to use 'byte_t *'
rather than 'void *', as their underlying functionality is specified
in terms of moving/copying bytes. 'void *' would still be required *in
C* as the return value of 'malloc' (unitialized memory) and for use as
a userData parameter in callbacks (opaque pointer to an object of
unspecified type). There is little use for 'void *' in pure C++ code,
but it would still be required in the language to interface with C code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJQQBxQAAoJEP8JewLaHvzSQJ0H+gNUnCggPWCdrovjjEDnANY0
mL2/5I+R+Ckn7sMYDWmdc6jhxQPcfoKX7aHT9C2DV1MWRfigMb4z0rsKqcc+WjUY
FJXEjLtlv9H+IHHuypXWNqVAPm7HgpXq74L/yhv2H2IlJqra6DW58ULdWRm6DUSN
xHijMzGpuNIw7LE3zzilinJfvP26+pxXZDuFmWXs2pCAPXgikZS1AG9PQWnVWXLz
DXDZ7nejxIc8FIesNfxO3NLzzMVZ69ZhyfB4M6UwCD2TdeRdrAYay5f3sEHKtIvG
N2RXXXV+J36ZX4w0i6dQuuCuAHzBaPwBcvtW79nIzgAv5pqlCDA4kuMl8WF1Y4s=
=SUbB
-----END PGP SIGNATURE-----

Jens Gustedt · Aug 31, 2012

Am 31.08.2012 04:07, schrieb Garrett Hartshaw:

Granted it would better for 'memcpy' and friends to use 'byte_t *'
rather than 'void *', as their underlying functionality is specified
in terms of moving/copying bytes. 'void *' would still be required *in
C* as the return value of 'malloc' (unitialized memory) and for use as
a userData parameter in callbacks (opaque pointer to an object of
unspecified type).

Perhaps this got a bit lost in the discussion, but all of that started
by trying to find replacements for "void*" in C++.

"void*" is a concept that works suitably well in C. It is just a
nuisance in C++.

There is little use for 'void *' in pure C++ code,
but it would still be required in the language to interface with C code.

Sure that was my starting point.

I even think that in C++ all conversion from "void*" to another type
should and could be avoided. Basically I would say code that uses
"void*" (manipulating pointers to void, or obfuscating types) are "C
in disguise".

I haven't heard of a good C++ use case, yet, that couldn't be packed
into an 'extern "C"' interface and where the implementation then
couldn't be done in proper C. A bit in the same spirit as some
specific functions for OS or compilers are implemented in assembler.

Jens

Nick Keighley · Aug 31, 2012

Am 30.08.2012 09:47, schrieb David Brown:

ugly though

You probably didn't see my reply to that either, this thread got
somehow out of bounds.

You just should push the button a bit further where Leigh stopped. He
resolved the term "void" but didn't for "char". "char" is the smallest
addressable unit in C and C++ (Pascal had byte for that, no)

no. Pascal did not have a byte type

that is
almost completely stripped of a semantic interpretation.

Traditionally "char*" was also that, a pointer to a region of
unspecific bytes. Nowadays "unsigned char*" serves that purpose,
because it has the advantage of letting you access any bit (!) and
bite of the data and you'd don't have confusion with the other
completely orthogonal use of "char*" for C strings.

which is why I use a byte or octet typedef

BGB · Aug 31, 2012

Garrett Hartshaw said:
Garrett Hartshaw said:

If you want to interpret an object as a sequence of bytes, then use
'unsigned char *' or 'uint8_t *'. I see 'unsigned char' as meaning a
character in an encoding that is represented as an 8-bit unsigned
integer, while 'uint8_t' is an arbitrary 8-bit unsigned integer. (e.g.
I would write "unsigned char a = 'a'" and "uint8_t b = 10", but not
"unsigned char c = 10" or "uint8_t d = 'd'")

Click to expand...

[...]

Strictly speaking, `unsigned char *` and `uint8_t *` are not
interchangeable. `unsigned char` is one byte by definition.
`uint8_t` is 8 bits by definition -- and if CHAR_BIT > 8, then
`uint8_t` will not exist.

C's treatment of characters is a bit of a mess IMHO, partly due to
changes in character semantics since C was first defined. When we
could assume 7-bit ASCII (or 8-bit EBCDIC with plain char being
unsigned), C's model worked. With the introduction of ASCII-based
8-bit character representations, with plain char remaining signed
on many systems, things started getting confusing.

Conflating the concepts of a single character and a single
fundamental storage unit into the types `char`, `signed char`, and
`unsigned char`, all required to be the same size, no longer makes
as much sense now as it did then.

If I were designing a C-like language from scratch today, I'd
separate the concepts of characters and fundamental storage units.
I'd probably also separate the concept of very short integers
from both. So `char`, `byte`, and `int8` might be three distinct
and incompatible types.

my own language (BGBScript) more or less does this.

char is 16-bits (more-or-less, 1), byte is 8-bits unsigned, and sbyte is
8-bits signed.

'int8' is also a type but is essentially an alias to sbyte.
'char8' also exists as an 8-bit character (also a distinct type, and is
mostly intended for ASCII).

1: char is often interpreted as a single UTF-16 codepoint for storage
reasons, but may often be 24-bit internally, and strings may often be
stored using UTF-8 to save space (and still tend to be null-terminated,
note that this can emulate the external behavior of a length-bound
UTF-16 string, but on-average uses around 1/2 as much memory).

Making such a change to C now would break so much code that the
resulting language could not reasonably be called "C"; it ain't
gonna happen.

yep.

it is about like the issues of making C be based on a module-import
system (rather than including headers and linking objects into
binaries), and generating target neutral VM bytecode. maybe it can be
done, theoretically, but it is not likely an easy task to do so without
violating the standards (or at least adding some fairly ugly extensions).

BGB · Aug 31, 2012

The active part of the language is for talking to the compiler /and/
humans. That is the reason we have typedefs, and the reason we give
variables useful names rather than "i1", "i2", etc.

(partial ironic satire):

but what if i, j, and k are already used up?
i0/i1/i2/i3, j0/j1/j2/j3, and k0/k1/k2/k3, can effectively get 4x (or
5x) as many integer-variables from the same letters...

then there may be {li/lj/lk}{0/1/2/3} for long-long variables,
{a/b/c/d}{0/1/2/3} for doubles, {f/g/h}{0/1/2/3} for floats, ...

If you don't think "void*" and "char*" convey a different meaning to
humans, then I see your point. IMHO, they /do/ convey a different
meaning - so they have different usage even if they had identical
functional meaning.

in a lot of GCC compiled code, they are not too far off.
me remembering sometimes porting code from GCC and having to often
convert "void *" to "byte *" (typedef'ed "unsigned char") or similar to
make it work.

This assertion is somewhat inconsistent with the fact that I have chosen
to use distinctions between other equivalent constructs to convey actual
meanings. For instance, when writing function declarations in C, I use T
*parameter_name to represent a pointer to a single object, and T
parameter_name[] when it is a pointer to the first in a series of
objects (in C++ I would use a reference for the first purpose).

I'll not bother to try and excuse the inconsistency; I'll just say that
the pointer parameter thing feels harmless to me, whereas I feel a
strong antipathy about removing the distinctions between void* and char*
(I really dislike the gnu extension that allows pointer arithmetic on
void*).

Click to expand...

There is nothing wrong with a bit of inconsistency here and there - you
don't want your style rules to be /too/ rigid!

better IMO to be like "whatever makes the most sense at the moment".

as long as there is at least some semblance of consistency, then it is
probably good enough.

Keith Thompson · Aug 31, 2012

David Brown said:
On 30/08/2012 23:07, Keith Thompson wrote: [...]

If I were designing a C-like language from scratch today, I'd
separate the concepts of characters and fundamental storage units.
I'd probably also separate the concept of very short integers
from both. So `char`, `byte`, and `int8` might be three distinct
and incompatible types.

Making such a change to C now would break so much code that the
resulting language could not reasonably be called "C"; it ain't
gonna happen.

Click to expand...

This is pretty much exactly what Python did with Python 3. Previously,
strings and characters were also used as raw data - now they are
separate types. But I agree that it is not going to happen in C.

I would be happy with a "byte" being officially defined as the type for
memory units, however. Maybe "data8_t" would be more appropriate - then
we could have "data16_t", etc., as well.

The name "data8_t" works only if a byte is 8 bits, something that's
not guaranteed by the language.

If you want to require CHAR_BIT to be exactly 8, there's an argument
to be made for that, though it would exclude some systems (mostly
DSPs (digital signal processors), I think) from having conforming
C implementations.

Otherwise, the size in bits of the fundamental memory unit *has*
to be implementation-defined.

Keith Thompson · Aug 31, 2012

Kenneth Brody said:
Not really. A `char*` pointer points to an object of type `char`. A
`void*` pointer does not point to an object of type `void`, because
there is no such thing.

Any type of pointer to object or incomplete type (i.e., any pointer
other than a function pointer) points to some object in memory (or to
nothing if it's currently a null pointer). What's special about `void*`
is that, while you can perform certain pointer operations on it, you
can't access the object it points to without first converting it to some
other pointer type.

Click to expand...

[...]

While technically correct that "void *" doesn't point to an object of
type "void" because there's no such thing, I think it is convenient to
think of it in those terms.

I disagree. That is, I agree that it's technically correct (the
best kind of correct!), but I think it's correct in every other
sense as well. I don't think pretending that there's such a thing
as an object of type void, that a void* pointer points to, is useful.

Consider, for example:

struct IncompleteType;
typedef struct IncompleteType MyVoid;

In this case, "MyVoid *" is kind of like "void *", as long as struct
IncompleteType is never "completed".

And that's a *huge* difference. If you have a pointer to struct,
even if the struct type hasn't yet been completed, it tells you
that it actually points to an object of that type. Presumably code
that has visibility to the full definition of struct IncompleteType
will be able to dereference the pointer and access the pointed-to
object. And that's the whole point of declaring something as struct
IncompleteType*.

On the other hand, if you deliberately never complete the type
struct IncompleteType, then there can be no objects of type struct
IncompleteType, and you'll never able to dereference a pointer to
struct IncompleteType without first converting it to some other
pointer type. If void* didn't exist, that might be a reasonable
thing to do -- but it does.

[...]

As I said, it is technically correct that "void *" cannot point to
something of type "void", because there is no such thing. (You can't
have an object of type "struct IncompleteType" in my example, either.)
However, I do think it's a convenient way to think of it, since any
other "X *" would be a pointer to something of type "X".

Yes, any *other* type X* points to something of type X. void*
is an exception to that rule, which is why it exists.

[...]

Les Cargill · Aug 31, 2012

Scott said:
What about the existing massive volumes of C++ code which use void*? Particularly
code that dates from the 80's when there was no STL?

So use the old compiler. Check *everything* in to the CM system; ideally
you can run a script that'll check it out, install the tools and build
it all in one fell swoop. Makes for a fine test of the CM system....

C as a scripting language	88	Mar 26, 2009
On the development of C	211	Mar 9, 2009
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
Are c++ features a subset of java features?	148	Jan 19, 2007
binary encode 7 ([7].pack("C")) as "\007" instead of "\a"	3	Jul 29, 2010
As a programmer of both languages...	39	Dec 11, 2007
ANN: C Compiler Update Available	7	Jun 2, 2009
C++ Now 2013 Call for Submissions	0	Oct 31, 2012

C as a Subset of C++ (or C++ as a superset of C)

Jens Gustedt

Jens Gustedt

James Kuyper

Casey Carter

Jens Gustedt

Jens Gustedt

Garrett Hartshaw

James Kuyper

Malcolm McLean

Keith Thompson

Jens Gustedt

Keith Thompson

Garrett Hartshaw

Jens Gustedt

Nick Keighley

BGB

BGB

Keith Thompson

Keith Thompson

Les Cargill

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads