On what does size of data types depend?

Skarmander · Oct 5, 2005

Alexei A. Frounze wrote:

C is wonderful in this respect. Perhaps because of this Java AFAIK has no
unsigned types.

Which, incidentally, is a spectacular misfeature when they then call the
8-bit signed type "byte". Either you keep promoting things back and
forth to integers, or you use two's complement (which Java conveniently
guarantees). Either way it's annoying. Consistency isn't everything.

S.

Alexei A. Frounze · Oct 6, 2005

Skarmander said:
Which, incidentally, is a spectacular misfeature when they then call the
8-bit signed type "byte". Either you keep promoting things back and
forth to integers, or you use two's complement (which Java conveniently
guarantees). Either way it's annoying. Consistency isn't everything.

But you know, there are different kinds and levels of consistency. In
certain places I'd like C behave more like in math (e.g. singed vs unsigned,
promotions and related things), more humane and straightforward (e.g. the
way the type of a variable in the declaration/definition is specified), etc
etc. I'm not saying Java or C is definetely better, no. Each has its good
sides and bad sides and there's always a room for an improvmenet, not
necessarily big or very important, but good enough to be considered and
desired...

Alex

Eric Sosman · Oct 6, 2005

Alexei A. Frounze wrote On 10/05/05 18:30,:

...

Correct

C is wonderful in this respect. Perhaps because of this Java AFAIK has no
unsigned types.

Java has two unsigned types (one of which might be
better termed "signless"). IMHO, it would be better if
it had three.

Jack Klein · Oct 7, 2005

<SNIP>

This brings to mind something that I have wondered about.

I often see advice elsewhere, and in other peoples programs,
suggesting hiding all C "fundamental" types behind typedefs such as

typedef char CHAR;
typedef int INT32;
typedef unsigned int UINT32;

The first one is useless, the second two are worse than useless, they
are dangerous, because on another machine int might have only 16 bits
and INT32 might need to be a signed long.

typedef char* PCHAR;

This is more dangerous yes, never typedef a pointer this way. At
least not if the pointer will ever be dereferenced using that alias.

The theory is that application code which always uses these typedefs
will be more likely to run on multiple systems (provided the typedefs
are changed of course).

More than theory, very real fact.

I used to do this. Then I found out that C99 defined things like
"uint32_t", so I started using these versions instead. But after
following this group for a while I now find even these ugly and don't
use them unless unavoidable.

Nobody says you have to care about portability if you don't want to.
That's between you, your bosses, and your users. If you are writing a
program for your own use, the only one you ever have to answer to is
yourself.

On the other hand, both UNIXy and Windows platforms are having the
same problems with the transition from 32 to 64 bits that they had
moving from 16 to 32 bits, if perhaps not quite so extreme.

For more than a decade, the natural integer type and native machine
word on Windows has been called a DWORD, and on 64 bit Windows the
native machine word is going to be a QWORD.

What do people here think is best?

On one embedded project we had CAN communications between the main
processor, a 32-bit ARM, and slave processors that were 16/32 bit
DSPs.

The only types that were identical between the two were signed and
unsigned short, and signed and unsigned long. In fact, here are the
different integer types for the two platforms:

'plain' char unsigned 8-bit signed 16-bit
signed char signed 8-bit signed 16-bit
unsigned char unsigned 8-bit unsigned 16-bit
signed short signed 16-bit signed 16-bit
unsigned short unsigned 16-bit signed 16-bit
signed int signed 32-bit signed 16-bit
unsigned int unsigned 32-bit unsigned 16-bit
signed long signed 32-bit signed 32-bit
unsigned long unsigned 32-bit unsigned 32-bit

Both processors had hardware alignment requirements. The 32-bit
processor can only access 16-bit data at an even address and 32-bit
data on an address divisible by four. The penalty for misaligned
access is a hardware trap. The DSP only addresses memory in 16-bit
words, so there is no misalignment possible for anything but long, and
they had to be aligned on an even address (32-bit alignment). The
penalty for misaligned access is just wrong data (read), or
overwriting the wrong addresses (write).

Now the drivers for the CAN controller hardware are completely
off-topic here, but the end result on both systems is two 32-bit words
in memory containing the 0 to 8 octets (0 to 64 bits) of packet data.
These octets can represent any quantity of 8-bit, signed or unsigned
16-bit, or 32-bit data values that can fit in 64 bits, and have any
alignment.

So your mission, Mr. Phelps, if you decide to accept it, is to write
code that will run on both processors despite their different
character sizes and alignment requirements, that can use a format
specifier to parse 1 to 8 octets into the proper types with the proper
values.

The code I wrote runs on both processors with no modifications. And I
couldn't even use 'uint8_t', since the DSP doesn't have an 8-bit type.
I used 'uint_least8_t' instead.

As for the C99 choice of type definitions like 'unit8_t' and so on,
they are not the best I have ever seen, but they are also far from the
worst. And they have the advantage of being in a C standard, so with
a little luck they will eventually edge out all the others.

John Devereux · Oct 7, 2005

Jack Klein said:
On 05 Oct 2005 14:14:03 +0100, John Devereux

The first one is useless, the second two are worse than useless, they
are dangerous, because on another machine int might have only 16 bits
and INT32 might need to be a signed long.

Perhaps I was not clear; the typedefs go in a single "portability"
header file and are specific to the machine. E.g.

#ifdef __X86
typedef short int INT16;
....
#endif
#ifdef __AVR
typedef int INT16;
....
#endif

(made up examples)

It should be understood that this file will need to be changed for
each new machine, but that hopefully nothing else will. By using
UINT32 etc thoughout, nothing needs to change except this one file.

> typedef char* PCHAR;

This is more dangerous yes, never typedef a pointer this way. At
least not if the pointer will ever be dereferenced using that alias.

More than theory, very real fact.

So that would make them a good thing? Sorry if I miss the point; you
seem to be saying they are "worse than useless" but do improve
portability?

Nobody says you have to care about portability if you don't want to.
That's between you, your bosses, and your users. If you are writing a
program for your own use, the only one you ever have to answer to is
yourself.

I don't really care about portability to the extent sometimes apparent
on CLC. For example, I am quite happy to restrict myself to twos
complement machines. However the idea of writing code the right way
once, rather than the wrong way lots of times, does appeal! I am
starting to see real productivity benefits from my attempts to do this
in my work.

On the other hand, both UNIXy and Windows platforms are having the
same problems with the transition from 32 to 64 bits that they had
moving from 16 to 32 bits, if perhaps not quite so extreme.

For more than a decade, the natural integer type and native machine
word on Windows has been called a DWORD, and on 64 bit Windows the
native machine word is going to be a QWORD.

I had to write a fairly simple windows program last week, and it was
horrible. All those WORDS, DWORDS, LPCSTR, HPARAMS, LPARAMS etc. I
think that experience was what prompted my post.

On one embedded project we had CAN communications between the main
processor, a 32-bit ARM, and slave processors that were 16/32 bit
DSPs.

The only types that were identical between the two were signed and
unsigned short, and signed and unsigned long. In fact, here are the
different integer types for the two platforms:

'plain' char unsigned 8-bit signed 16-bit
signed char signed 8-bit signed 16-bit
unsigned char unsigned 8-bit unsigned 16-bit
signed short signed 16-bit signed 16-bit
unsigned short unsigned 16-bit signed 16-bit
signed int signed 32-bit signed 16-bit
unsigned int unsigned 32-bit unsigned 16-bit
signed long signed 32-bit signed 32-bit
unsigned long unsigned 32-bit unsigned 32-bit

Both processors had hardware alignment requirements. The 32-bit
processor can only access 16-bit data at an even address and 32-bit
data on an address divisible by four. The penalty for misaligned
access is a hardware trap. The DSP only addresses memory in 16-bit
words, so there is no misalignment possible for anything but long, and
they had to be aligned on an even address (32-bit alignment). The
penalty for misaligned access is just wrong data (read), or
overwriting the wrong addresses (write).

Now the drivers for the CAN controller hardware are completely
off-topic here, but the end result on both systems is two 32-bit words
in memory containing the 0 to 8 octets (0 to 64 bits) of packet data.
These octets can represent any quantity of 8-bit, signed or unsigned
16-bit, or 32-bit data values that can fit in 64 bits, and have any
alignment.

So your mission, Mr. Phelps, if you decide to accept it, is to write
code that will run on both processors despite their different
character sizes and alignment requirements, that can use a format
specifier to parse 1 to 8 octets into the proper types with the proper
values.

The code I wrote runs on both processors with no modifications. And I
couldn't even use 'uint8_t', since the DSP doesn't have an 8-bit type.
I used 'uint_least8_t' instead.

As for the C99 choice of type definitions like 'unit8_t' and so on,
they are not the best I have ever seen, but they are also far from the
worst. And they have the advantage of being in a C standard, so with
a little luck they will eventually edge out all the others.

Thanks for the detailed discussion. I have been working with slightly
similar programming tasks recently, implementing modbus on PC and two
embedded systems. I must be getting better; the generic modbus code I
wrote for the (8 bit) AVR system did compile and run fine on the 32
bit ARM system.

Eric Sosman · Oct 7, 2005

John Devereux wrote On 10/07/05 05:40,:

Perhaps I was not clear; the typedefs go in a single "portability"
header file and are specific to the machine. E.g.

#ifdef __X86
typedef short int INT16;
...
#endif
#ifdef __AVR
typedef int INT16;
...
#endif

(made up examples)

It should be understood that this file will need to be changed for
each new machine, but that hopefully nothing else will. By using
UINT32 etc thoughout, nothing needs to change except this one file.

IMHO it's preferable to base such tests on the actual
characteristics of the implementation and not on the name
of one of its constituent parts:

#include <limits.h>
#if INT_MAX == 32767
typedef int INT16;
#elif SHRT_MAX == 32767
typedef short INT16;
#else
#error "DeathStation 2000 not supported"
#endif

This inflicts <limits.h> on every module that includes
the portability header, but that seems a benign side-
effect.

John Devereux · Oct 7, 2005

Eric Sosman said:
John Devereux wrote On 10/07/05 05:40,:

IMHO it's preferable to base such tests on the actual
characteristics of the implementation and not on the name
of one of its constituent parts:

#include <limits.h>
#if INT_MAX == 32767
typedef int INT16;
#elif SHRT_MAX == 32767
typedef short INT16;
#else
#error "DeathStation 2000 not supported"
#endif

This inflicts <limits.h> on every module that includes
the portability header, but that seems a benign side-
effect.

That does seem much better. Why did I not think of that?

Walter Roberson · Oct 7, 2005

IMHO it's preferable to base such tests on the actual
characteristics of the implementation and not on the name
of one of its constituent parts:

#include <limits.h>
#if INT_MAX == 32767
typedef int INT16;
#elif SHRT_MAX == 32767
typedef short INT16;
#else
#error "DeathStation 2000 not supported"
#endif

An implementation is not required to use the entire arithmetic space
possible with its hardware. In theory, INT_MAX == 32767 could
happen on (say) an 18 bit machine.

Skarmander · Oct 7, 2005

John said:
That does seem much better. Why did I not think of that?

Possibly because when you've got system dependencies, there tend to be
more of them than the size of the data types. So it's very common to get
stuff like

everything.h:
#ifdef __FOONLY
typedef short INT16;
#define HAVE_ALLOCA 1
#define HCF __asm__("hcf")
#define TTY_SUPPORTS_CALLIGRAPHY 1
#include <foonlib.h>
...etc...

In fact, the ever-popular GNU autoconf does this, except that it takes
care of all the tests and writes just one header with the appropriate
defines.

S.

Eric Sosman · Oct 7, 2005

Walter Roberson wrote On 10/07/05 11:16,:

An implementation is not required to use the entire arithmetic space
possible with its hardware. In theory, INT_MAX == 32767 could
happen on (say) an 18 bit machine.

Adjust the tests appropriately for the semantics
you desire for "INT16". As shown they're appropriate
for an "exact" type (which is a pretty silly thing to
ask for in a signed integer; sorry for the bad example).
If you want "fastest," change == to >=. If you want
"at least," change == to >= and test short before int.
If you want some other semantics, test accordingly.

It is not possible to test in this way for every
possible characteristic somebody might want to ask
about -- there's no Standard macro or other indicator
to say what happens on integer overflow, for example.
Still, I believe tests that *can* be made portably
*should* be made portably, and as broadly as possible.
Testing the name of the compiler or of the host machine
is not broad; it's the opposite. Test them if you must,
but test more portably if you can.

Keith Thompson · Oct 7, 2005

John Devereux said:
Perhaps I was not clear; the typedefs go in a single "portability"
header file and are specific to the machine. E.g.

#ifdef __X86
typedef short int INT16;
...
#endif
#ifdef __AVR
typedef int INT16;
...
#endif

(made up examples)

It should be understood that this file will need to be changed for
each new machine, but that hopefully nothing else will. By using
UINT32 etc thoughout, nothing needs to change except this one file.

Given that the definitions change for each platform (and assuming that
you always get it right), the INT16 and UIN32 typedefs are reasonable.
Since C99 defines similar typedefs in <stdint.h>, and since it also
distinguishes among exact-width, minimum-width, and fastest types,
you'd probably be better of using <stdint.h> if it's available, or
using a C90-compatible version of it if it's not (see
<http://www.lysator.liu.se/c/q8/>). (I can't connect to that site at
the moment.)

But the typedefs CHAR (for char) and PCHAR (for char*) are either
utterly useless or dangerously misleading. If you want type char, use
char; if you want a pointer to char, use char*. There's no point in
hiding these types behind typedefs that won't change from one platform
to another. And if they are going to change, they should be called
something other than CHAR and PCHAR.

So that would make them a good thing? Sorry if I miss the point; you
seem to be saying they are "worse than useless" but do improve
portability?

I'm not sure what Jack Klein meant here, but I doubt that he meant
that CHAR and PCHAR are useful.

Chris Torek · Oct 7, 2005

... But the typedefs CHAR (for char) and PCHAR (for char*) are either
utterly useless or dangerously misleading. If you want type char, use
char; if you want a pointer to char, use char*. There's no point in
hiding these types behind typedefs that won't change from one platform
to another. And if they are going to change, they should be called
something other than CHAR and PCHAR.

Indeed. The whole point to "creating a type" (which typedef fails
to do, but that is another problem entirely) is to obtain abstraction:
"moving up a level" in a problem, making irrelevant detail go away
so that you work only with relevant detail. "Pointer to char" is
no more abstract than C's raw "char *": what irrelevant detail has
been removed?

Christian Bau · Oct 8, 2005

John Devereux said:
I had to write a fairly simple windows program last week, and it was
horrible. All those WORDS, DWORDS, LPCSTR, HPARAMS, LPARAMS etc. I
think that experience was what prompted my post.

I can feel your pain. I don't mind things like UINT32; it seems to be
quite self-explanatory. I have a real problem with "WORD" and "DWORD"
which is used in Windows programs a lot: "WORD" is defined as a 16 bit
type and DWORD as a 32 bit type, which means that on your average
Pentium or Athlon processor a WORD is a halfword and a DWORD is a word,
whereas on a 64 bit processor a WORD is a quarterword and a DWORD is a
halfword - in other words, these typenames are complete nonsense.

And LPCSTR - "Long Pointer to C String". For heavens sake, what is a
"long pointer"?

Ben Pfaff · Oct 8, 2005

Christian Bau said:
I have a real problem with "WORD" and "DWORD"
which is used in Windows programs a lot: "WORD" is defined as a 16 bit
type and DWORD as a 32 bit type, which means that on your average
Pentium or Athlon processor a WORD is a halfword and a DWORD is a word,
whereas on a 64 bit processor a WORD is a quarterword and a DWORD is a
halfword - in other words, these typenames are complete nonsense.

And LPCSTR - "Long Pointer to C String". For heavens sake, what is a
"long pointer"?

I suppose you do realize that these names refer to the types that
they do for historical reasons? That's not to say that they
aren't deceptive, but there was some sense behind them at the
time they were invented.

Skarmander · Oct 8, 2005

Christian said:
I can feel your pain. I don't mind things like UINT32; it seems to be
quite self-explanatory. I have a real problem with "WORD" and "DWORD"
which is used in Windows programs a lot: "WORD" is defined as a 16 bit
type and DWORD as a 32 bit type, which means that on your average
Pentium or Athlon processor a WORD is a halfword and a DWORD is a word,
whereas on a 64 bit processor a WORD is a quarterword and a DWORD is a
halfword - in other words, these typenames are complete nonsense.

And LPCSTR - "Long Pointer to C String". For heavens sake, what is a
"long pointer"?

No, LPCSTR is Hungarian abracadabra for "long pointer to *constant*
string". These days, it's the same thing as a regular pointer, and
"LPCSTR" is the same thing as "PCSTR", which, however, is almost never
used for hysterical reasons.

But back when Windows 3.0 roamed the earth, the 8086 segmented memory
model meant Windows too made the difference between "far" and "near"
pointers (calling them "long" and, well, nothing pointers for
consistency), depending on whether a pointer was constrained by the 64K
range of a segment or not.

The problem is that Microsoft tried to abstract away from actual data
types and mostly got it wrong; the abstraction wasn't and code that went
from 16 to 32 bits still broke happily -- though that wasn't Microsoft's
fault, they didn't help matters either.

They had an idea that might have been worthwhile, didn't stop to think
whether it was feasible and went on to implement it in a half-assed way,
yielding the current mess. You see, char* is typedef'ed to PCCHAR (yes,
"pointer to C char", not "constant char" -- const char* has no typedef),
to PSZ ("pointer to string that's zero-terminated", of course), then
char is typedef'ed to CHAR (huh?) and CHAR* is in turn typedef'ed to
PCHAR, LPCH, PCH, NPSTR, LPSTR and PSTR!

The semantic differences these are intended to convey is lost on the
vast majority of Windows programmers out there, and no small wonder too.
Of course the C compiler doesn't give a rat's ass about these fancy
typedefs, which means any "errors" in using them go undetected, except
by people who are fluent in this make-belief type system.

S.

Anonymous 7843 · Oct 12, 2005

This is more dangerous yes, never typedef a pointer this way. At
least not if the pointer will ever be dereferenced using that alias.

What exactly is the danger you are alluding to here?

Adding adressing of IPv6 to program	1	Feb 16, 2023
Fibonacci	0	May 13, 2023
Setting array size with a variable - What does the C compiler do?	3	Feb 25, 2022
Please help with C programming to save GPS reception data in Raspberry Pi.	0	Dec 8, 2022
If(strcmp(str, "") == 0) - What does this line of code mean?	0	Aug 8, 2022
Windows LLDP Driver Responds With No Data	0	Mar 17, 2023
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
URGENT	1	Jan 31, 2023

On what does size of data types depend?

Skarmander

Alexei A. Frounze

Eric Sosman

Jack Klein

John Devereux

Eric Sosman

John Devereux

Walter Roberson

Skarmander

Eric Sosman

Keith Thompson

Chris Torek

Christian Bau

Ben Pfaff

Skarmander

Anonymous 7843

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads