Syntax for union parameter

David Brown · Feb 10, 2014

Remember the complaints he made about certain aspects of C that turned
out to be due to his using a C++ compiler to compile programs written in
files with names ending in *.cpp? His understanding of the difference
between C and C++ is even poorer than his understanding of C - but he
does use C++, if only by accident.

Yes, I meant that he doesn't /intentionally/ use C++. At best, he uses
a C++ compiler as "a better C" (for some definition of "better" that
probably only applies when using MSVS).

David Brown · Feb 10, 2014

I am sure it goes back much farther than that.

Yes, but I don't! I was about ten when I played with p-code Pascal.

As another example of the fuzzy boundaries, the format string for
"printf" is arguably an interpreted mini-language.

David Brown · Feb 10, 2014

That's what I keep hearing, however...

The following are timings, in seconds, for an interpreter written in C,
running a set of simple benchmarks.

GCC A B C D

-O3 79 130 152 176
-O0 87 284 304 297

A, B, C, D represent different bytecode dispatch methods; 'C' and 'D'
use standard C, while 'B' uses a GCC extension.

'A' however uses a dispatch loop in x86 assembler (handling the simpler
bytecodes).

The difference is not huge: barely twice as fast as the 'C' method, and
when executing real programs the difference narrows considerably. But it
still seems worth having as an option. (And with other compilers which
are not as aggressive at optimising as GCC, it might be more worthwhile.)

In general however you are probably right; interpreters are a
specialised application, as the assembler code is only written once, not
for each program it will run, and there are issues with maintenance,
portability and reliability. (And even with interpreters, there are
cleverer ways of getting them up to speed than this brute-force method.)

That's what it boils down to - in general, well-written C and a good
compiler will outperform hand-written assembly, especially on modern
processors. But there are exceptions - areas where assembly can give
higher speeds, and where these higher speeds are worth the extra effort.
A bytecode dispatcher is definitely one such case. (Actually,
hand-written assembly is probably a poor choice here - something that
generates specialised assembly is likely to give better results for less
effort.)

You have to be very careful, however - even if the same assembly works
as expected on different x86 cpus, the tuning for maximum speed can be
very different. It would not be a surprise to see that method "B" above
was faster than "A" on some cpus. A compiler can tune to different cpus
using different flags, and can even make multiple different variations
that are chosen at run-time for optimal speed on the cpu in use at the
time. This can all be done in assembly too, of course, but the
development, testing, and maintenance costs go up considerably.

David Brown · Feb 10, 2014

(snip)

I agree, but ...

In addition, it sometimes results in a tendency to write for specific
machine code when there is no need to do that. That is, the old saying
"Premature optimization is the root of all evil".

Agreed. Of course, people also "optimise" their C code (changing array
operations into pointers "for speed", and so on) - they too are evil.

I find it useful to examine the generated assembly from time to time.
Much of my programming is for small processors - sometimes it only takes
a few changes to make significant differences. But I find that I need
to do far less "target-optimised" C programming than I did a decade ago
- partly due to better small microcontrollers, and partly due to better
compiler tools. I still have some systems where it can make a
significant difference using something like a "while" loop rather than a
"for" loop, but thankfully these moments are rare.

Rick C. Hodgin · Feb 10, 2014

The convention in C and C++ is to generally *not* #include .cpp files.

Yes. It has highlighted a bug in the Whole Tomato Visual Assist X tool:

wholetomatoforum.com/forum/topic.asp?TOPIC_ID=11654

If you search for "#include" in that forum, you'll find the reference.
When I renamed the .cpp file I #include to .h, the bug in VAX goes away.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Feb 10, 2014

Eh? It's not commented at all, it's "commented" by #ifdef #endif.

The #define _TEST_ME line is commented out.

Yes, I do know this stuff and I can read. I am letting you know that it
does not compile as a stand-alone program due to errors unrelated to the
types.

It does in Visual Studio. I'll tweak what's missing for GCC and push it
back up at some point. Thank you for reporting the bug.

You took a perfectly normal translation unit and turned it into a file
whose purpose is to included with #include? That seems... odd.

I'm a hard man to know.

Note that no one would normally deduce "as an include file" from "can be
directly included in other projects".

Perhaps. If they were to look at the remainder of my projects, or search
for the text (including the quote) sha1.cpp", they would find the
references and see how it's used.

Does that make it not C anymore? I.e. is there some stuff related to VS
that makes it compile there when it won't just by giving the file to a
compiler?

Yes. It's not C. The extension on sha1.cpp is .cpp.

Nothing I can do can make "only in the land of C" be correct, nor make
"from assembly through Java, they are a known size" look any less
ill-informed.

Then I wouldn't worry about trying to do so. It would be a waste of time.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Feb 10, 2014

If the compiler is installed properly no-one needs to remember to point
anything at anything. Are you suggesting that GCC would accidentally
pick up Microsoft's header files and not throw any errors?

I don't know, I've never tried it. My concerns were more along the lines
of multiple installed cross-compilers all based on GCC, for example. I
have one for x86, one for ARM, one for some other CPU, all running on my
x86 box. In those cases, the #include files are probably very similar
(because they are all GCC), but there are likely subtle differences, such
as where int32_t maps to.

Picking up the wrong include/ file path in that case may not be such a
difficult thing to do ... and it would take a bit of tracking down to sort
out the cause.

It's never happened to me though (that I remember). But I can see it being
a possibility.

Or do you think header files are necessary for execution?

If used, header files are necessary for compilation, and compilation is
necessary for execution, so in that way ... yes.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Feb 10, 2014

http://www.ajile.com/index.php?option=com_content&view=article&id=2&Itemid=6

ARM also provides an optional module which executes Java byte codes directly
in hardware.

http://www.arm.com/products/processors/technologies/jazelle.php

That doesn't change the fact that they are interpreted on all other machines,
and were always interpreted prior to those specialized chips coming into
existence.

Java's bytecodes were created to run in the Java virtual machine on any
hardware using the Java program which provides the standardized environment.
It is very similar to what I'm doing with my virtual machine.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Feb 10, 2014

If you run more than one compiler on the same machine, there is
always the possibility of overlap in environment variable usage.

Yes. And that's all I was saying.

It is way too common to use LIB and INCLUDE as environment
variables for the corresponding directories. Some compilers
have other variables that they will check for ahead of those,
to allow them to coexist.

I believe GCC also supports the -I command line switch to provide the path
to include files, and -L for lib.

Best regards,
Rick C. Hodgin

James Kuyper · Feb 10, 2014

On 02/10/2014 07:00 AM, Robert Wessel wrote:
....

Something like "#if (-INT_MAX) > INT_MIN" should do the trick for
detecting twos complement.

Even simpler:

#if INT_MAX + INT_MIN

Kaz Kylheku · Feb 10, 2014

C is backwards. It is backwards in this area of unknown sizes, but minimum
allowed sizes, and it is backwards in the area of undefined behavior, where
it should have defined behavior and overrides.

That's my position.

It appears you have a position about a written work (the C standard, in its
various editions).

Have you ever seen at least its cover page?

C has support for integral types of exact sizes. In 1990 it didn't; it was
added in a newer revision of the C standard in 1999.
There is a header <stdint.h> which declares various typedefs for them,
and you can test for their presence with macros.

If your program requires a 32 bit unsigned integer type, its name is
uint32_t. If you're concerned that it might not be available, you can test
for it. (Believe it or not, there exist such computers that don't have a 32
bit integral type natively: historic systems like like 36 bit IBM mainframes,
and some DEC PDP models.)

Programs can be written such that the exact size doesn't matter.
For instance a library which implements bit vectors can use any unsigned
integer type for its "cell" size, and adjust its access methods accordingly.
I use a multi-precision integer library whose basic "radix" can be any one
of various unsigned integer sizes, chosen at compile time. It will build
with 16 bit "digits", or 32 bit "digits", or 64 bit "digits".
In principle, it could work with 36 bit "digits".

C is defined in such a way that it can be efficiently implemented in
situations that don't look like a SPARC or x86 box.

Thompson and Ritchie initially worked on a PDP-7 machine with 18 words.
To specify sizes like 16 and 8 would ironically make C poorly targettable to
its birthplace.

C is not Java; C targets real hardware. If the natural word size on some
machine is 17 bits, then that can readily be "int", and cleverly written
programs can take advantage of that while also working on other machines.

Just because you can't deal with it doesn't mean it's insane or wrong;
maybe you're just inadequate as a programmer.

David Brown · Feb 10, 2014

I don't know, I've never tried it. My concerns were more along the lines
of multiple installed cross-compilers all based on GCC, for example. I
have one for x86, one for ARM, one for some other CPU, all running on my
x86 box. In those cases, the #include files are probably very similar
(because they are all GCC), but there are likely subtle differences, such
as where int32_t maps to.

Picking up the wrong include/ file path in that case may not be such a
difficult thing to do ... and it would take a bit of tracking down to sort
out the cause.

It's never happened to me though (that I remember). But I can see it being
a possibility.

I have at least 30 different gcc cross-compilers on my oldest PC
currently in use, plus 15 or so non-gcc cross-compilers (this is
counting different versions of tools for the same target - I only have
about 10 different targets). I have never seen - or even heard of - the
sorts of problems you are worrying about here. It is only even a
/possibility/ if you deliberately and intentionally go out of your way
to cause yourself problems.

What /is/ a possibility, however, is that you get your IDE pointing to
the wrong include paths - you often need to do that somewhat manually
for cross compilers. But that won't affect compilation, and anyway ARM
and x86 have the same sizes for their integers (assuming you really mean
x86, and not amd64).

If used, header files are necessary for compilation, and compilation is
necessary for execution, so in that way ... yes.

"Necessary for execution" means that the files are needed at run-time,
which is not the case for headers (Dr. Nick's question was rhetorical).

David Brown · Feb 10, 2014

Total brain-fart on my part.

Good explanation - I was a little surprised to see you write "at least".

Something like "#if (-INT_MAX) > INT_MIN" should do the trick for
detecting twos complement.

That would certainly work, as would James' version, but a nice, obvious
pre-defined symbol would be clearer. A lot of implementation-defined
behaviour could be covered this way, with a final "__SANE_ARCHITECTURE"
symbol being defined for processors with two's complement, 8-bit chars,
clear endian ordering, etc.

Incidentally, don't the standards allow two's complement signed integers
ranging from -INT_MAX to +INT_MAX, with -(INT_MAX + 1) being undefined
behaviour? I have a hard time imagining such an architecture in practice.

I've been of the opinion for years that C should have a good way to
specify external structures. Heck, Cobol does a better job.

Agreed.

James Kuyper · Feb 10, 2014

On 10/02/14 13:00, Robert Wessel wrote: ....

That would certainly work, as would James' version, but a nice, obvious
pre-defined symbol would be clearer. A lot of implementation-defined
behaviour could be covered this way, with a final "__SANE_ARCHITECTURE"
symbol being defined for processors with two's complement, 8-bit chars,
clear endian ordering, etc.

Incidentally, don't the standards allow two's complement signed integers
ranging from -INT_MAX to +INT_MAX, with -(INT_MAX + 1) being undefined
behaviour? I have a hard time imagining such an architecture in practice.

Yes, and my test doesn't handle that issue any better than his does. A
test that deals with that issue, the possibility of padding bits, and
the completely lack of restrictions on bit-ordering, gets somewhat
complicated, and can't be done in the pre-processor. A way of checking
directly would help.

Keith Thompson · Feb 10, 2014

David Brown said:
One feature I would love to see in the standards - which would require
more work from the compiler and not just a typedef - is to have defined
integer types that specify explicitly big-endian or little-endian
layout. Non-two's-complement systems are rare enough to be relegated to
history, but there are lots of big-endian and little-endian systems, and
lots of data formats with each type of layout. I have used a compiler
with this as an extension feature, and it was very useful.

[...]

Implementing arithmetic on foreign-endian integers strikes me as
a waste of time. If I need to read and write big-endian integers
on a little-endian machine (a common requirement, since network
protocols typically often big-endian), I can just convert big-
to little-endian on input and little- to big-endian on output.

And POSIX provides htonl, htons, ntohl, and ntohs for exactly that
purpose ("h" for host, "n" for network).

There are no such functions that will convert between big-endian
and little-endian on a big-endian system, but such conversions are
rarely needed (and it's easy enough to roll your own if necessary).

https://en.wikipedia.org/wiki/Endianness

Rick C. Hodgin · Feb 10, 2014

It does in Visual Studio. I'll tweak what's missing for GCC and push it
back up at some point. Thank you for reporting the bug.

Fixed. Pushed. Compiles in GCC for x86 on Windows, and VS 2008.

Please report any additional bugs if you'd like.

Best regards,
Rick C. Hodgin

Keith Thompson · Feb 10, 2014

David Brown said:
Incidentally, don't the standards allow two's complement signed integers
ranging from -INT_MAX to +INT_MAX, with -(INT_MAX + 1) being undefined
behaviour? I have a hard time imagining such an architecture in practice.

Yes, the most negative value can be a trap representation. It lets an
implementation use that value as a distinguished representation; it
might implicitly initialize all int objects to that value, making
detection of uninitialized variables easier.

Making it a trap representation doesn't mean that references to it must
"trap". You could take an existing implementation, change the
definition of INT_MIN, document that the representation that would
otherwise have been INT_MIN is a trap representation, and still have a
conforming implementation. The fact that operations on that value would
still "work" is within the bounds of undefined behavior.

James Kuyper · Feb 10, 2014

....
Although the test actually works for what most people are probably
assuming when they specify twos complement. In the case of ones
complement or sign/mag, the trap value doesn't reduce the range (it
just replaces the negative zero).

True, but what some people are worried about the result of applying
bit-wise operators to negative values, and neither of those tests covers
that issue properly. I'd recommend strongly against applying those
operators to signed values, but not everyone follows that recommendation.

glen herrmannsfeldt · Feb 10, 2014

(snip)

If your program requires a 32 bit unsigned integer type, its name is
uint32_t. If you're concerned that it might not be available, you can test
for it. (Believe it or not, there exist such computers that don't have a 32
bit integral type natively: historic systems like like 36 bit IBM mainframes,
and some DEC PDP models.)

The DEC 36 bit machines are twos complement, the IBM machines
sign magnitude. Would be nice to have a C compiler for the 7090
so we could try out sign magnitude arithmetic in 36 bits.

-- glen

Ben Bacarisse · Feb 10, 2014

James Kuyper said:
Yes, and my test doesn't handle that issue any better than his does. A
test that deals with that issue, the possibility of padding bits, and
the completely lack of restrictions on bit-ordering, gets somewhat
complicated, and can't be done in the pre-processor. A way of checking
directly would help.

What's the problem with testing -1 & 3? You get 3, 2 or 1 depending on
whether ints are represented using 2's complement, 1's complement or
sign-magnitude.

Portability issues (union, bitfields)	7	Nov 4, 2009
UNION global variabl initialize	10	Sep 12, 2011
Union and strict aliasing	4	Jul 28, 2012
Can one get away with an under-allocated union?	5	Dec 25, 2010
Union test for endianess	47	Jun 16, 2011
How to understand the union part in this C segment	1	Sep 13, 2010
Union trouble	7	Mar 27, 2008
Union and pointer casts?	13	Feb 24, 2011

Syntax for union parameter

David Brown

David Brown

David Brown

David Brown

Rick C. Hodgin

Rick C. Hodgin

Rick C. Hodgin

Rick C. Hodgin

Rick C. Hodgin

James Kuyper

Kaz Kylheku

David Brown

David Brown

James Kuyper

Keith Thompson

Rick C. Hodgin

Keith Thompson

James Kuyper

glen herrmannsfeldt

Ben Bacarisse

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads