Syntax for union parameter

glen herrmannsfeldt · Feb 9, 2014

(snip)

Java has never been an interpreted language. The original
implementations of the JVM all strictly interpreted the byte codes, as
opposed to more modern JVMs that just-in-time compile many of the byte
codes and execute then in that form (although mostly, if not all, JVMs
fall back on byte code interpretation in some cases). But the
language has always been compiled (to byte codes).

Yes. Well, some might have different definition, but there are many
languages that can only be interpreted. TeX, for example, lets you
change the category codes of characters, which changes, among others,
which are letters, and so allowed in multiple character names.

Languages that allow dynamic typing of variables, such as matlab and
octave, are mostly meant to be interpreted. (Though many now use the
just-in-time compilation to speed things up.)

It is pretty much always possible to interpret a compiled language,
but not always the other way around.

-- glen

Keith Thompson · Feb 9, 2014

Robert Wessel said:
And that deserves some emphasis: if you do use int32_t, presumably
because you want a (at least) 32 bit, signed, twos complement,
integer, then that code will fail to compile when someone tries to
build it on a ones complement Clearpath/2220, which supports no such
thing. Which, presumably, is exactly what you'd want.

Or you *might* want a type for which the compiler emulates
2's-complement in software -- but you're not going to get that unless
you pay for it somehow. C doesn't require compiler implementers
to do that work.

It might be useful to have a standard typedef that's guaranteed to refer
to a 32-bit signed integer type without specifying which of the three
permitted representations it uses, but I think non-two's-complement
systems are rare enough that it wasn't thought to be worth adding it to
the standard.

Rick C. Hodgin · Feb 9, 2014

Sorry, I meant *not* defined. (I doesn't compiler with _TEST_ME defined
either, but that is because of other problems no related to the types.)

The comment reads:

"Note: For testing this in Visual Studio, see the #define _TEST_ME
line. It is not normally un-commented so the sha1.cpp can be
directly included in other projects.
However, for testing it must be uncommented to expose the main()
function and related at the end of this source code file."

And:

"[_TEST_ME] should only be defined when testing from within the
sha1.sln project."

It indicates three things:

(1) That for stand-alone testing, _TEST_ME needs to be defined.
(2) That it is included in other libsf projects as an include file.
(3) That it is designed for Visual Studio (sha1.sln).

Since it uses non-standard naming for the fixed-size types, porting may
be an issue.

So, what are the languages in this wide range which do as you expect?
None of the ones you listed as knowing fit the bill.

I don't know. It's not a great concern for me to know. There are
undoubtedly several. Choose several of those and there's your list.

Best regards,
Rick C. Hodgin

Keith Thompson · Feb 9, 2014

David Brown said:
On 09/02/14 23:10, Rick C. Hodgin wrote: [...]

It is a wide range. Assembly is the machine level. Java is a virtual
machine, originally a fully interpreted language. The scope from the
hardware to the entirely software has fixed components.

Click to expand...

It is a /tiny/ range of programming languages. The fact that you even
imagine that Assembly, C, xbase, and a smattering of Java is a "wide
range" shows how ignorant you are.

My scope in computer languages is limited. I did nearly all of my [40 lines deleted]
debugger, and plugin framework).

Best regards,
Rick C. Hodgin

Click to expand...

David, you don't need to quote the entire parent article. Just delete
any quoted text that you're not replying to. You can mark deletions
with something like "[...]" or "[snip]" if you like. Apart from saving
bandwidth, snipping avoids making readers scroll down to the bottom of
your article to see if you've added anything else.

Keith Thompson · Feb 9, 2014

BartC said:
Why not? That range is pretty much universal now, and everyone seems to use
their own private naming schemes anyway which can itself lead to problems.
If it was all built-in then they wouldn't arise.

Fair enough -- except of course that adding new keywords like u8 et al
would break existing code.

(I've just been sorting a problem with my '#define byte unsigned char' which
somehow clashes with a 'byte' type in windows.h. If I change my definition
to use typedef, then it goes away, but only because C seems to allow the
same typedef twice. The point however is if all these little types were
built-in to the compiler, the problem wouldn't have arisen.)

Just use a typedef; that's what it's for. (Windows does have a
regrettable tendency to pollute the user's namespace with its own type

names if you include said:
Probably not (but see below).

I'm working on a language front-end for C, where the types have to be
compatible. In this front-end, an exact 32-bit signed integer type can be
represented as any of i32, int32, int:32 or int*4 (and written out in C as
i32, which is an alias for int32_t).

Any of these would have been fine choices to adopt in C itself (in fact
something like int:32 is already used for bitfields, and this form can allow
9-bit types to be defined too, although I prohibit it).

C as it's currently defined carefully allows for conforming
implementations on systems that don't have the most common integer sizes
(8, 16, 32, 64). You can either continue that (which means that u8 et
all have to be optional), or you can change it (which means that no
future system that doesn't conform to your new requirements can have a
conforming C implementation).

It's not hard even for a language that has been around a long time, nor is
it difficult to add some sort of language version control (there are already
options for C99 and so on), and I can't believe it is out of the question
for some refactoring program to pick out and change identifiers in source
code which are now new keywords.

Changing source code to conform to a new standard shouldn't be terribly
difficult -- but it only solves a fairly small part of the problem. How
sure can you be that that conversion tool is 100% reliable?

If you force a change to millions of lines of source code, it all has to
be retested, rebuilt, and rereleased, and that process will not go as
smoothly as you expect it to. If a new C standard broke existing code
like that, most large projects would probably just continue to use
compilers that only support the older standards. It would probably
permanently fragment the language.

< Would u8 and unsigned char be

I have had u8 and c8 as distinct types, but there was very little advantage
(and that wouldn't apply in C). I would now make them the same (if char is
going to stay an 8-bit type).

char, signed char, and unsigned char are all distinct types. Keeping
that rule while making u8 and c8 aliases for unsigned char (or for char)
and for signed char (or for char) would IMHO be too confusing. And
there's already a mechanism for creating new type names for existing

glen herrmannsfeldt · Feb 9, 2014

BartC said:
(snip)

You mean Oak?

I think these days, 'interpreted' means, at the very least,
working on compiled byte-code, not pure source code.
(I doubt Java was dynamic enough anyway to interpret
directly from source.)

Well, interpreters have often used various tricks to speed things up.

Languages meant to be user commands, such as unix shells and
Windows CMD are likely 100% interpreted.

I remember the HP TSB2000 BASIC systems would convert numeric constants
to internal form, such that when you list the program it comes out
different than you typed it in.

But you should always be able to interpret a compiled language,
though that might mean an initial pass to find variable names and
statement labels.

-- glen

glen herrmannsfeldt · Feb 9, 2014

(snip, someone wrote)

I'd reject that definition, while the boundary between an interpreter
and a compiler is certainly fuzzy, Java comes nowhere near the fuzz.

A byte code is a virtual machine code. The case of Java is no
different than the p-codes of the old UCSD p-system. The target of
the compiler happened to be JBC or p-code rather than x86 machine
code.

And if the target "machine code" needs to be "real" in some sense,
note that both p-codes and JBCs have been implemented in hardware as
the machine codes of real processors. Unless you'd care to argue that
the machine on which the "machine code" being executed defines the
language as being interpreted or not?

And soon you get to the question about microprogrammed processors
"interpreting" the machine code. In most cases, though, the
underlying hardware is specifically designed to make it easier
to implement in microcode, and definitely isn't true for
horizontal microcode, which might execute only one microinstruction
per host instruction.

In which case a C program compiled to x86 machine code when
run on a POWER7 machine with DOSbox suddenly becomes interpreted.
Or since JBCs and UCSD p-codes have had hardware implementations,
perhaps only MSIL/CIL based language (C#,
etc.) are really interpreters, since there's never been a
hardware implementation of MSIL? And where then would we put
a program running via binary recompilation?

To be sure, a Java interpreter ought to be possible, but to the best
of my knowledge, no such has ever existed, and certainly none of the
original Java implementations shown to the outside world by Sun ever
were.

No idea about Oak. It was originally designed for set-top boxes,
and may have been somewhat different than now.

It would have been silly too - the JBC interpreter would have
been far smaller than a full scale interpreter, and given that Java
was originally targeted at embedded devices, going the interpreter
route would have been counterproductive.

-- glen

Ben Bacarisse · Feb 9, 2014

Rick C. Hodgin said:
The comment reads:

"Note: For testing this in Visual Studio, see the #define _TEST_ME
line. It is not normally un-commented so the sha1.cpp can be
directly included in other projects.
However, for testing it must be uncommented to expose the main()
function and related at the end of this source code file."

Eh? It's not commented at all, it's "commented" by #ifdef #endif.

And:

"[_TEST_ME] should only be defined when testing from within the
sha1.sln project."

Yes, I do know this stuff and I can read. I am letting you know that it
does not compile as a stand-alone program due to errors unrelated to the
types.

It indicates three things:

(1) That for stand-alone testing, _TEST_ME needs to be defined.
(2) That it is included in other libsf projects as an include
file.

You took a perfectly normal translation unit and turned it into a file
whose purpose is to included with #include? That seems... odd.

Note that no one would normally deduce "as an include file" from "can be
directly included in other projects".

(3) That it is designed for Visual Studio (sha1.sln).

Does that make it not C anymore? I.e. is there some stuff related to VS
that makes it compile there when it won't just by giving the file to a
compiler?

I don't know. It's not a great concern for me to know. There are
undoubtedly several. Choose several of those and there's your
list.

| I've never had another language where fundamental data types are of
| variable size. From assembly through Java, they are a known size.
| Only in the land of C, the home of the faster integer for those
| crucial "for (i=0; i<10; i++)" loops, do we find them varying in
| size.

Nothing I can do can make "only in the land of C" be correct, nor make
"from assembly through Java, they are a known size" look any less
ill-informed.

David Brown · Feb 10, 2014

FWIW, I run Windows 2000 Professional, or Windows Server 2003 for all of
my personal development, using Visual Studio 2003, or Visual Studio 2008.
I use Windows 7/64 and Visual Studio 2008 at my job.

Out of curiosity, what is your day job? We know you are not qualified
to program in C, and you don't know or use C++, so what do you do with
VS2008 at work? I gather VS supports other languages (like C#, F#,
etc.), so maybe you work with one of them? In which case, why don't you
use that for RDC?

David Brown · Feb 10, 2014

It is a wide range. Assembly is the machine level. Java is a virtual
machine, originally a fully interpreted language. The scope from the
hardware to the entirely software has fixed components.

My scope in computer languages is limited. I did nearly all of my
development in assembly, xbase, and C/C++, while also writing my own
compilers, interpreters, operating system, in assembly and C. I have
always had a particular vision in mind for what a language should be.
I came to that vision by examining the x86 CPU and asking myself the
question, "Knowing what I know about its design, what would a computer
language designed to run on that hardware need to look like?" And I
went from there. It's the same question I asked myself about how to
build my operating system. I didn't look at existing standards, or
current designs. I looked at the hardware, and then went up from there.

Over the years, and migrating into understanding Itanium later, and ARM
more recently, I am exceedingly glad to see that the design I had for
Exodus, and for my compiler, are exactly in line with what is required
at the machine level. It provides high level abilities as through the
C-like language, yet low-level abilities to not be so far away from the
machine to prevent many features which are hidden away today from it in
C source code, to be employed.

Just to be clear on a point here (especially since most of my posts have
been somewhat negative), I think it is a very good thing for developers
to have assembly language experience - precisely because it helps them
understand what is going on "under the hood". This is particularly true
for lower level languages such as C. I don't mean that it is a good
idea to do /much/ programming in assembly, especially with overly
complex architectures like x86, but understanding how the generated code
works lets you get a better feel for some types of C programming. For
embedded systems with small cpus, it is particularly important, so that
you have a reasonable idea of the size and speed you can expect from a
given section of C code.

However, don't get carried away - there are lots of things that are done
in assembly programming that don't match well in higher level code (less
so if your assembly programming is structured, modular, and
maintainable). And there are some things that can be done in assembly
that are almost impossible in most high level languages (such as
co-routines, multiple entry points, etc.). And of course, on a modern
cpu, well-written C compiled with a good compiler will outclass
hand-written assembly on most tasks - so there is seldom reason to write
assembly for real code.

And one should certainly /never/ try to learn assembly for the Itanium,
unless someone is paying you lots of money to write a C compiler for it.
It's a dead-end architecture, so all the effort will be wasted, and
you'll quickly drive yourself insane as you try to track all these
registers and manual instruction scheduling while being unable to resist
the temptation to squeeze another instruction into the same cycle count.

David Brown · Feb 10, 2014

I think these days, 'interpreted' means, at the very least, working on
compiled byte-code, not pure source code. (I doubt Java was dynamic
enough anyway to interpret directly from source.)

"Interpreted" means reading the source code and handling it directly.
"Compiled" means a tool reads the source code and generates an
executable that is later run.

I don't think the meaning of "interpreted" has changed - it is simply
that languages are no longer easily divided into "interpreted" and
"compiled". In particular, many languages are "bytecode compiled" and
run on virtual machines, and sometimes these are "Just In Time" compiled
to machine code. And the boundaries between "compiled", "bytecoded" and
"interpreted" have become more blurred.

This is not actually a new thing - it is just that bytecoding has become
a lot more relevant for modern languages. There have been bytecoded
languages for decades (such as the "P-Code" Pascal system I used briefly
some thirty years ago).

David Brown · Feb 10, 2014

On 10/02/14 00:48, Keith Thompson wrote:

David, you don't need to quote the entire parent article. Just delete
any quoted text that you're not replying to. You can mark deletions
with something like "[...]" or "[snip]" if you like. Apart from saving
bandwidth, snipping avoids making readers scroll down to the bottom of
your article to see if you've added anything else.

Yes, I know - sorry. My excuse is the post was late at night and I was
lazy.

David Brown · Feb 10, 2014

You use "int32_t" when you want /exactly/ 32-bit bits - not "at least
32-bits". For that, you can use int_least32_t". When you are
communicating outside the current program (files on a disk, network
packets, access to hardware, etc.), then exact sizes are often important.

I agree that such systems are too rare to make it worth having to
specify integer representations in standard typedefs (and who wants to
write "int_twoscomp32_t" ? Certainly not those that already complain
about "int32_t" !). It might be nice with some standardised pre-defined
macros, however, so that one could write:

#ifndef __TWOS_COMPLEMENT
#error This code assumes two's complement signed integers
#endif

One feature I would love to see in the standards - which would require
more work from the compiler and not just a typedef - is to have defined
integer types that specify explicitly big-endian or little-endian
layout. Non-two's-complement systems are rare enough to be relegated to
history, but there are lots of big-endian and little-endian systems, and
lots of data formats with each type of layout. I have used a compiler
with this as an extension feature, and it was very useful.

It might, and would involve nothing more than some additional typedefs
and #defines in stdint.h, but given the tiny number of C users not
running on twos complement machines, I'm content to just offer them
sympathies.

One might argue that the real solution is to specify what you want the
variable to contain, say "declare a(-1000..1000);" and then let the
compiler pick an appropriate implementation (an integer capable of
holding that range of values, in this case). That doesn't prevent you
from supplying something like an stdint.h which defines "typedef
int16_t declare(-16384..16383);" to handle common/standard cases.

Ranged integer types would be nice - some other languages (like Pascal
and Ada) have them. They let you be explicit about the ranges in your
code, and give the compiler improved opportunity for compile-time checking.

Of course, it gets complicated trying to specify behaviour of how these
types should interact with other integral types.

OTOH, you'd probably also want some way of declaring the contents of
bits of memory, so that you can easily deal with binary formats from
outside.

But that takes us rather far afield of C.

Yes. There is probably a C++ template library that covers all this

glen herrmannsfeldt · Feb 10, 2014

David Brown said:
On 10/02/14 00:38, BartC wrote:
(snip)

"Interpreted" means reading the source code and handling it directly.
"Compiled" means a tool reads the source code and generates an
executable that is later run.

Except that it often doesn't mean exactly that.

It probably does in the case of command langauges, as often the
program execution logic is just spliced into the command processor,
and most often speed isn't all that important.

But note that the original statement was "interpreted language"
not just "interpreter".

For one, it is often desirable that the interpreter do a syntax
check first, as it is surprising to users to have a syntax error
detected much later. If the language allows for GOTO and labels,
the first pass may also recognize the position of labels for faster
reference. Maybe also put the symbols into a symbol table, and
allocate space for variables. All features that an "interpreter"
shouldn't have, but aren't much extra work.

As I mentioned previously, some BASIC interpreters convert constants
to internal form on input, and convert back when generating a listing.
(Funny, because a different value might come back.) It is also
simple to convert keywords to a single character (byte) token,
especially if keywords are reserved. The result is sometimes
called an incremental compiler, but the result isn't so different
from the usual interpreter.

Just to add more confusion, consider the in-core compiler. Just
like usual compilers, it generates actual machine instructions,
but doesn't write them to a file. That avoids much overhead of I/O,
and in addition simplifies the fixup problem on forward branches.
You have to have enough memory for both the program and compiler
at the same time, but that is often the case for smaller programs.
The OS/360 WATFOR and WATFIV are favorite examples for this case.

I don't think the meaning of "interpreted" has changed - it is simply
that languages are no longer easily divided into "interpreted" and
"compiled". In particular, many languages are "bytecode compiled" and
run on virtual machines, and sometimes these are "Just In Time" compiled
to machine code. And the boundaries between "compiled", "bytecoded" and
"interpreted" have become more blurred.

For another case, consider exactly the same code as might be a
bytecode (except that the size might be other than bytes) and
instead for each one place a subroutine call instruction to the
routine that processes that code. (Or maybe indirectly to that
processing routine.) On many machines, the result looks exactly
the same, except for a machine specific operation code instead of
zeros before each operation. Yet now the result is exectuable code,
as usually generated by compilers.

This is not actually a new thing - it is just that bytecoding has become
a lot more relevant for modern languages. There have been bytecoded
languages for decades (such as the "P-Code" Pascal system I used briefly
some thirty years ago).

I am sure it goes back much farther than that.

-- glen

glen herrmannsfeldt · Feb 10, 2014

If the compiler is installed properly no-one needs to remember to point
anything at anything. Are you suggesting that GCC would accidentally
pick up Microsoft's header files and not throw any errors? Or do you
think header files are necessary for execution?

If you run more than one compiler on the same machine, there is
always the possibility of overlap in environment variable usage.

It is way too common to use LIB and INCLUDE as environment
variables for the corresponding directories. Some compilers
have other variables that they will check for ahead of those,
to allow them to coexist.

-- glen

BartC · Feb 10, 2014

David Brown said:
On 09/02/14 23:10, Rick C. Hodgin wrote:

And of course, on a modern
cpu, well-written C compiled with a good compiler will outclass
hand-written assembly on most tasks - so there is seldom reason to write
assembly for real code.

That's what I keep hearing, however...

The following are timings, in seconds, for an interpreter written in C,
running a set of simple benchmarks.

GCC A B C D

-O3 79 130 152 176
-O0 87 284 304 297

A, B, C, D represent different bytecode dispatch methods; 'C' and 'D' use
standard C, while 'B' uses a GCC extension.

'A' however uses a dispatch loop in x86 assembler (handling the simpler
bytecodes).

The difference is not huge: barely twice as fast as the 'C' method, and when
executing real programs the difference narrows considerably. But it still
seems worth having as an option. (And with other compilers which are not as
aggressive at optimising as GCC, it might be more worthwhile.)

In general however you are probably right; interpreters are a specialised
application, as the assembler code is only written once, not for each
program it will run, and there are issues with maintenance, portability and
reliability. (And even with interpreters, there are cleverer ways of getting
them up to speed than this brute-force method.)

glen herrmannsfeldt · Feb 10, 2014

(snip)

Just to be clear on a point here (especially since most of my posts have
been somewhat negative), I think it is a very good thing for developers
to have assembly language experience - precisely because it helps them
understand what is going on "under the hood". This is particularly true
for lower level languages such as C. I don't mean that it is a good
idea to do /much/ programming in assembly, especially with overly
complex architectures like x86, but understanding how the generated code
works lets you get a better feel for some types of C programming. For
embedded systems with small cpus, it is particularly important, so that
you have a reasonable idea of the size and speed you can expect from a
given section of C code.

I agree, but ...

However, don't get carried away - there are lots of things that are
done in assembly programming that don't match well in higher level
code (less so if your assembly programming is structured, modular, and
maintainable).

In addition, it sometimes results in a tendency to write for specific
machine code when there is no need to do that. That is, the old saying
"Premature optimization is the root of all evil".

And there are some things that can be done in assembly
that are almost impossible in most high level languages (such as
co-routines, multiple entry points, etc.). And of course, on a modern
cpu, well-written C compiled with a good compiler will outclass
hand-written assembly on most tasks - so there is seldom reason to write
assembly for real code.

Yes. And also not to try to write C code to "help" the compiler
along when it isn't needed. (But not forget how when it is.)

And one should certainly /never/ try to learn assembly for the Itanium,
unless someone is paying you lots of money to write a C compiler for it.
It's a dead-end architecture, so all the effort will be wasted, and
you'll quickly drive yourself insane as you try to track all these
registers and manual instruction scheduling while being unable to resist
the temptation to squeeze another instruction into the same cycle count.

-- glen

glen herrmannsfeldt · Feb 10, 2014

(snip)

You use "int32_t" when you want /exactly/ 32-bit bits - not "at least
32-bits". For that, you can use int_least32_t". When you are
communicating outside the current program (files on a disk, network
packets, access to hardware, etc.), then exact sizes are often important.
(snip)

Ranged integer types would be nice - some other languages (like Pascal
and Ada) have them. They let you be explicit about the ranges in your
code, and give the compiler improved opportunity for compile-time checking.

Some languages allow you to specify approximately the needed range.
PL/I allows one to specify the number of decimal digits or binary
bits needed, such that the compiler can supply at least that many.

Fortran now has SELECTED_INT_KIND() and SELECTED_REAL_KIND that allow
one to specify the needed number of decimal digits.

Still, I remember compiling Metafont on a Pascal compiler that didn't
generate a single byte for a 0..255 integer. The result was that the
output files had a null byte before each actual byte. The result was
that I wrote one of the simplest C programs doing both input and output:

while(getchar() != EOF) putchar(getchar());

Of course, it gets complicated trying to specify behaviour of
how these types should interact with other integral types.

Not to mention floating point types.

-- glen

BartC · Feb 10, 2014

By that definition then Python is not interpreted, since it is translated
into bytecode. (Usually on-the-fly but but it can also be pre-compiled.)

I'd reject that definition, while the boundary between an interpreter
and a compiler is certainly fuzzy, Java comes nowhere near the fuzz.

A byte code is a virtual machine code. The case of Java is no
different than the p-codes of the old UCSD p-system. The target of
the compiler happened to be JBC or p-code rather than x86 machine
code.

There seem to be dozens of ways of executing Java bytecode. But if that
involves repeatedly re-interpreting the same codes, then most people would
say that it is being interpreted (and you will notice the difference because
it might be a magnitude slower).

But this makes Java more 'fuzzier', unless you pin down exactly what happens
to the bytecode.

And if the target "machine code" needs to be "real" in some sense,
note that both p-codes and JBCs have been implemented in hardware as
the machine codes of real processors.

So? In theory you can create a machine to run intermediate code. And Java
bytecode seems particularly simple, since it is statically typed (I haven't
looked into it in detail, but I don't understand why a load-time conversion
to native code isn't just done anyway, the result of which can be cached.)

James Kuyper · Feb 10, 2014

On 09/02/14 23:34, Rick C. Hodgin wrote: ....
Out of curiosity, what is your day job? We know you are not qualified
to program in C, and you don't know or use C++, so what do you do with
VS2008 at work?

Remember the complaints he made about certain aspects of C that turned
out to be due to his using a C++ compiler to compile programs written in
files with names ending in *.cpp? His understanding of the difference
between C and C++ is even poorer than his understanding of C - but he
does use C++, if only by accident.

Portability issues (union, bitfields)	7	Nov 4, 2009
UNION global variabl initialize	10	Sep 12, 2011
Union and strict aliasing	4	Jul 28, 2012
Can one get away with an under-allocated union?	5	Dec 25, 2010
Union test for endianess	47	Jun 16, 2011
How to understand the union part in this C segment	1	Sep 13, 2010
Union trouble	7	Mar 27, 2008
Union and pointer casts?	13	Feb 24, 2011

Syntax for union parameter

glen herrmannsfeldt

Keith Thompson

Rick C. Hodgin

Keith Thompson

Keith Thompson

glen herrmannsfeldt

glen herrmannsfeldt

Ben Bacarisse

David Brown

David Brown

David Brown

David Brown

David Brown

glen herrmannsfeldt

glen herrmannsfeldt

BartC

glen herrmannsfeldt

glen herrmannsfeldt

BartC

James Kuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads