9bit arithmetics in C

Hallvard B Furuseth · May 6, 2009

Tim said:
Unwise.

Why so? Yes, it's likely it'll be a bad idea due to unportability, but
if a simple solution happens to be available and satisfactory, why not
use it?

C++ operator overloading, like someone mentioned, might also be such a
simple solution - unless it turns out horribly complex of course...

This could be error prone as it would be easy to miss a check and
depending how the checks/conversions are implented it could spread
unportable code throughout the code base (having do deal with endian
order, 1s vs 2s compliment, etc.).

Well yes, I certainly think such details should be centralized in a few
places if possible. However from what I vaguely remember, code related
such details tend to ooze into a lot of the operations anyway. Carry vs
shift, address arithmetic, etc. At least unless the emulated
instruction set is very orderly. But if what you say below works out,
that sounds nice.

I would think that it would be far better to write two translation
functions where implementations specific code would be stored.
Data values are translated to normal C types before any operations
take place. The arithmetic is done normally without having to deal
with any special conditions. Then when the operation is finished, the
C types are handed back to an implementations specific function which
handles overflow, endian issues, sign issues (1s vs 2s compliment), and
hands back the necessary 9-bit representations for storage. It would be
important, however, to keep operations as simple, atomic, single steps,
as performing multiple operations might proceed normally without regard
for the fact that an intermediate operation would have caused an overflow
in the 9-bit representation.

About 1s/2s complement etc: Such implementation-specific details can be
avoided by using unsigned types internally, since these have defined
behavior and no arithmetic overflow. Then translate to/from signed
representations in the emulated processor. Haven't tried though, so I
don't know if it costs more than it saves.

Flash Gordon · May 6, 2009

I can't see a compelling reason for them *not* to be the same. Of
course a bad choice of "virtual format" might give you no end of
trouble, but if you get that right, I can't see why the code could not
be portable.

Ben's post (most of which I've snipped) correctly states and expands on
my position.

One clarification, where Ben says "...can't see a compelling reason..."
I would say this also means that making the code portable is not
particularly difficult. If it was difficult (but still possible) that
would, in my opinion, be a compelling reason not to do it!

Tim Harig · May 6, 2009

There may be a terminology problem here. I think what Flash Gordon is
saying is that there does not appear to be any reason why the code
could not be written is such a way as to entirely portable.
I.e. there would be no need to change the source when moving from a
big- to little-endian machine. Let me illustrate where the problem
might lie.

Again, that is appsolutely true if all you want is the arithmatic values.
That would work sufficiently if you just want a hex printout of the
arithmatic result values for analysis. This might also work for simple
functional testing a DSP without actually doing a full hardware simulation
to get an idea of what a waveform should look like if the DSP software
is properly implemented.

It may not, however, be a close enough for the emulation required to
develope and debug software designed to run on the DSP/MCU/MPU. That would
require not only the arithmatic value and printed representation of that
value; but, would require a true to form binary equivilant of how the value
is actually stored inside of the emulated hardware.

I work with PIC and AVR 8bit (actually 12-16 bits internally) MCUs. I have
to test and debug software that I write for them using an emulator. In
doing so, I need to know that exact binary representation of values within
the MCU's memory. If I copy the result of an arithmatic operation
(say -64) to a pin register, I need to know which pins are on and off:
7,6 for 2's compliment, 7,6 sign and magnitude, or 7,5,4,3,2,1 for 1's
compliment. The end results could be very different. The OP may be
making emulation software for such purposes.

If I choose to represent 9-bit signed values as long ints in the range
-256 to 255. Lets also assume the simulation requores that these
9-bit int look like 2's compliment numbers (no matter how they may
actually be held in the simulation). I can dump a single memory
location (x) like this:
printf("%d%02x", x < 0, (0x100 + x) % 0x100);

Once again, this is great for printed representation; but, lacks the fine
grainded understanding of the actual binary representation on the emulated
hardware. It basically just ignores it entirely. Dumping like this also
creates a problem with streams. The actual returned value is still more
then the nine bits that should actually be stored in the stream (whether
using int or long) thereby synthetically padding the stream with extra
bits. Analysis of this stream would therefore yield gibberish.

For example, if I am trying to debug a software involving serial
communications I might run the software and dump the generated output
from the serial pin to a file for validation and analysis. Dumping the
following three words:

100100110
001101011
101010111

the actual meaning for this example is irrelavant and for this example I
just added 1s and 0s. Whats is important is that the actual dump should
be:

100100110001101011101010111

The extra bits that I stored in the host computer which only works with
byte sized divisions must be removed or the end result would be something
like (again not calculated to be exact):

000000010010011000000000011010110000000101010111

which is gibberish in regards program running on the simulated hardware.

Yes, all of this detail may not be needed depending on what the OP is
trying to do. If it is needed, then I don't see any way around binary
manipulation.

Unless there is some devious problem to do with the OP's question, all
assumptions about the hosting computer can be avoided.

The OP didn't state his exact intentions or the hardware that is working
with. It may very well be that he only needs to the arithmatic values. If
so, great. Not knowing, I prefer to be safe and assume that he might need
a binary accurate model.

I can't see a compelling reason for them *not* to be the same. Of
course a bad choice of "virtual format" might give you no end of
trouble, but if you get that right, I can't see why the code could not
be portable.

I will define "internal format" as the normal C types which are actually
used for calculations, printed representations, etc. Everything done at
this level is perfectly valid, portable C. No implementation differences
as we are merely dealing with arithmatic values which will add, subtract,
divide, modulo, etc more or less the same no matter how they are
represented.

I will define "virtual format" as the format actually apparent to
the software running on the virtual machine; and, which for simulation
purposes should have a binary representation and sequential storage that
is close as possible to the actual emulated hardware. These values are
only used when importing data from I/O sources as the emulated hardware
would receive it at which time they would be converted to internal format
for host storage; when viewing the internal binary representation of the
emulated machines memory at which time they would be converted from the
internal format, when dealing with instructions of the simulated hardware
that are binary in nature (instructions for flipping single bits and
shifting registers are very common operations for MCU's and) and must act
on the data as apperant to the simulated hardare in which case they are
converted for the internal format and then recoverted back to the internal
format after the operation is completed, or when determining the pin
outputs of the MCU at which would again be determined from a conversion of
the internal format stored by the host computer.

Only the conversion routines ever know about the actual virtual format and
its differences from the host hardware. Everything else is portable. The
conversion routines must know how to convert the internal representation to
the virtual representation and vise versa. As the binary representation of
internal format data on the host may differ from the binary format of the
same value on the simulated machine, these routines may need to do binary
manipulation to achieve a proper conversion. This may be simple and
portable recasting and calculating rollover/clamps, it may be removing
buffer and combing entries of the internal format to make virtual formats
to make streams which do not align on even byte boundries, or it may mean
converting one sign format to another. Depending on the host machine and
its similarity to the simulated machine this may be possible with portable
code. But, other machines will require binary manipulation to meet these
goals making it implemenation dependant as C does not provide binary
standards for its numeric types.

The end result is that if one is conserned about a highly realistic
emulation of the simulated hardware and of the about portability of the
emulator, one should consider keeping the internal format and virtual
format separated. They may then place any implementation dependant
code in two conversion functions which allows the rest of the program
to ignore implementation specific details. If you don't need one or the
other, then my comments need not apply.

Tim Harig · May 6, 2009

C++ operator overloading, like someone mentioned, might also be such a
simple solution - unless it turns out horribly complex of course...

Agreed, using a class object that could worry about its own representions,
calculations, and interfaces would be ideal.

Well yes, I certainly think such details should be centralized in a few
places if possible. However from what I vaguely remember, code related
such details tend to ooze into a lot of the operations anyway. Carry vs
shift, address arithmetic, etc. At least unless the emulated
instruction set is very orderly. But if what you say below works out,
that sounds nice.

It's certainly possible to 'ooze' responsibility out into the code if one
is not very careful. I think it is probably possible in this case if one
uses disipline. It should be possible to perform all operations on internal
representations -- especially given that overflows do not need to be
calculated. One must always be weary of shortcuts and always make sure to
make all operations atomic and make sure the internal representation agrees
with the virtual representation.

About 1s/2s complement etc: Such implementation-specific details can be
avoided by using unsigned types internally, since these have defined
behavior and no arithmetic overflow. Then translate to/from signed
representations in the emulated processor. Haven't tried though, so I
don't know if it costs more than it saves.

I would somewhat agree. Signs make trouble and this problem would be much
simpler if not for having to deal with possibly different binary
representations for sign. All of them agree more where positive numbers
are conserned. I would still use signed number for the calculations are
conserned as they are more intuitve. With these small numbers and
satuation there shouldn't be too many problems with overflow and it will
keep the code more readable. I would definitly use unsigned numbers for
the virtual representation where binary manipulation may be involved in the
conversion. Here, not worring about sign, binary representation,
and having simple whole numbers (1,2,4,8,16,32,64,128) to work with
coorisponding bits becomes invaluable.

Tim Harig · May 6, 2009

One clarification, where Ben says "...can't see a compelling reason..."
I would say this also means that making the code portable is not
particularly difficult. If it was difficult (but still possible) that
would, in my opinion, be a compelling reason not to do it!

Simple. We have different goals. You only want arithmatic representation.
That is easy to produce with simple, portable C code using same
representations for input, calculation, and output.

If you need binary accuracy to the hardware that you are emulating; then,
you need implementation specific code somewhere as C does not define
any (or very little/not enough) standards for binary representation.
Portable code that does not meet it goals is worthless. Sometimes you
need to do non-portable things to meet your goals. This is generally
true where working with binary data in C is concerned. Meeting the goal must
supercede great code. As it stands, only a small amount of implementation
specific code is needed and can be easily hidden behind two functions for
the goal of binary representation true to the emulated hardware to be
achieved. I must consider this an acceptable situation, use another
language, or beg the C standards committee for the C standard to include
binary definitions for C types (which is not entirely desirable).

Whether the OP needs binary accuracy is unstated. I now that I probably
would. I have certainly needed it before working with 8-bit MCUs.

Flash Gordon · May 6, 2009

Tim said:
Again, that is appsolutely true if all you want is the arithmatic values.
That would work sufficiently if you just want a hex printout of the
arithmatic result values for analysis. This might also work for simple
functional testing a DSP without actually doing a full hardware simulation
to get an idea of what a waveform should look like if the DSP software
is properly implemented.

It's enough for a lot more than that.

It may not, however, be a close enough for the emulation required to
develope and debug software designed to run on the DSP/MCU/MPU. That would
require not only the arithmatic value and printed representation of that
value; but, would require a true to form binary equivilant of how the value
is actually stored inside of the emulated hardware.

No, it does NOT "require a true to forb binary equivilant", it only
requires that when it makes the information available to you it has been
transformed in to such.

I work with PIC and AVR 8bit (actually 12-16 bits internally) MCUs. I have
to test and debug software that I write for them using an emulator. In
doing so, I need to know that exact binary representation of values within
the MCU's memory. If I copy the result of an arithmatic operation
(say -64) to a pin register, I need to know which pins are on and off:
7,6 for 2's compliment, 7,6 sign and magnitude, or 7,5,4,3,2,1 for 1's
compliment. The end results could be very different. The OP may be
making emulation software for such purposes.

Guess what, the emulation engine can be written entirely in portable
standard C. Since I know of simulators which run on both big and little
endian hardware (although it is over 10 years since I used the one I'm
thinking of) I suspect that at least some companies do this!

Once again, this is great for printed representation; but, lacks the fine
grainded understanding of the actual binary representation on the emulated
hardware. It basically just ignores it entirely. Dumping like this also
creates a problem with streams. The actual returned value is still more
then the nine bits that should actually be stored in the stream (whether
using int or long) thereby synthetically padding the stream with extra
bits. Analysis of this stream would therefore yield gibberish.

No, when you analyse the stream you ignore the extra bits. This is not
rocket science, it is extremely simple.

For example, if I am trying to debug a software involving serial
communications I might run the software and dump the generated output
from the serial pin to a file for validation and analysis. Dumping the
following three words:

100100110
001101011
101010111

the actual meaning for this example is irrelavant and for this example I
just added 1s and 0s. Whats is important is that the actual dump should
be:

100100110001101011101010111

The extra bits that I stored in the host computer which only works with
byte sized divisions must be removed or the end result would be something
like (again not calculated to be exact):

000000010010011000000000011010110000000101010111

which is gibberish in regards program running on the simulated hardware.

Not it isn't. You use programs which ignore the extra bits. This is
extremely simple.

Yes, all of this detail may not be needed depending on what the OP is
trying to do. If it is needed, then I don't see any way around binary
manipulation.

That seems to be because you are determined to assume there is no way
around it. In actual fact every issue you have raised is easy to deal
with in standard portable C.

The OP didn't state his exact intentions or the hardware that is working
with. It may very well be that he only needs to the arithmatic values. If
so, great. Not knowing, I prefer to be safe and assume that he might need
a binary accurate model.

If you need a binary accurate output that is easy to provide and does
not require what you are suggesting.

I will define "internal format" as the normal C types which are actually
used for calculations, printed representations, etc. Everything done at
this level is perfectly valid, portable C. No implementation differences
as we are merely dealing with arithmatic values which will add, subtract,
divide, modulo, etc more or less the same no matter how they are
represented.

Which is fine.

I will define "virtual format" as the format actually apparent to
the software running on the virtual machine; and, which for simulation
purposes should have a binary representation and sequential storage that
is close as possible to the actual emulated hardware. These values are
only used when importing data from I/O sources as the emulated hardware
would receive it at which time they would be converted to internal format
for host storage;

Which can easily be done in standard portable C.

when viewing the internal binary representation of the
emulated machines memory at which time they would be converted from the
internal format,

Which can be done in standard portable C.

when dealing with instructions of the simulated hardware
that are binary in nature (instructions for flipping single bits and
shifting registers are very common operations for MCU's and) and must act
on the data as apperant to the simulated hardare in which case they are
converted for the internal format and then recoverted back to the internal
format after the operation is completed, or when determining the pin
outputs of the MCU at which would again be determined from a conversion of
the internal format stored by the host computer.

Which can be done in standard portable C.

Only the conversion routines ever know about the actual virtual format and
its differences from the host hardware.

Which can be written in standard portable C.

Everything else is portable. The
conversion routines must know how to convert the internal representation to
the virtual representation and vise versa. As the binary representation of
internal format data on the host may differ from the binary format of the
same value on the simulated machine, these routines may need to do binary
manipulation to achieve a proper conversion. This may be simple and
portable recasting and calculating rollover/clamps, it may be removing
buffer and combing entries of the internal format to make virtual formats
to make streams which do not align on even byte boundries, or it may mean
converting one sign format to another. Depending on the host machine and
its similarity to the simulated machine this may be possible with portable
code. But, other machines will require binary manipulation to meet these
goals making it implemenation dependant as C does not provide binary
standards for its numeric types.

No, WHATEVER the C implementation uses, these routines can STILL be
written in standard portable C!

The end result is that if one is conserned about a highly realistic
emulation of the simulated hardware and of the about portability of the
emulator, one should consider keeping the internal format and virtual
format separated.

Yes, you do.

They may then place any implementation dependant
code in two conversion functions which allows the rest of the program
to ignore implementation specific details. If you don't need one or the
other, then my comments need not apply.

You do NOT need implementation dependent code to write those conversion
functions. They CAN be written in standard portable C and, in fact, it
is easy to do so.

In the office I've got standard portable C code that reads a 32 bit
integer from a file in a specific endianness (I can't remember if it is
big or little endian off the top of my head) and whilst reading it
converts it to the native endianness of the C implementation, and this
code works correctly on both big and little endian machines, so the
files often have been written with a big-endian machine and then read by
a little endian machine. There is NO conditional compilation involved
and all the code is completely standard. If I wanted I could easily make
the code read/write sign-magnitude correctly whether or not the host
implementation is sign-magnitude.

Ben Bacarisse · May 6, 2009

Tim Harig said:
Again, that is appsolutely true if all you want is the arithmatic values.
That would work sufficiently if you just want a hex printout of the
arithmatic result values for analysis. This might also work for simple
functional testing a DSP without actually doing a full hardware simulation
to get an idea of what a waveform should look like if the DSP software
is properly implemented.

It may not, however, be a close enough for the emulation required to
develope and debug software designed to run on the DSP/MCU/MPU. That would
require not only the arithmatic value and printed representation of that
value; but, would require a true to form binary equivilant of how the value
is actually stored inside of the emulated hardware.

I work with PIC and AVR 8bit (actually 12-16 bits internally) MCUs. I have
to test and debug software that I write for them using an emulator. In
doing so, I need to know that exact binary representation of values within
the MCU's memory. If I copy the result of an arithmatic operation
(say -64) to a pin register, I need to know which pins are on and off:
7,6 for 2's compliment, 7,6 sign and magnitude, or 7,5,4,3,2,1 for 1's
compliment. The end results could be very different. The OP may be
making emulation software for such purposes.

I still don't see why the code to do this can't be portable. Maybe
you could give a real example in C of something that has to be
re-written when you move the source between implementations (say
because of endian-ness or number format).

Once again, this is great for printed representation; but, lacks the fine
grainded understanding of the actual binary representation on the emulated
hardware. It basically just ignores it entirely.

I picked dumping because you suggested it as an area where there might
be problems.

Dumping like this also
creates a problem with streams. The actual returned value is still more
then the nine bits that should actually be stored in the stream (whether
using int or long) thereby synthetically padding the stream with extra
bits. Analysis of this stream would therefore yield gibberish.

For example, if I am trying to debug a software involving serial
communications I might run the software and dump the generated output
from the serial pin to a file for validation and analysis. Dumping the
following three words:

100100110
001101011
101010111

the actual meaning for this example is irrelavant and for this example I
just added 1s and 0s. Whats is important is that the actual dump should
be:

100100110001101011101010111

That is exactly what my example could would print. I am having real
trouble understanding what problem you are seeing.

The extra bits that I stored in the host computer which only works with
byte sized divisions must be removed or the end result would be something
like (again not calculated to be exact):

000000010010011000000000011010110000000101010111

which is gibberish in regards program running on the simulated
hardware.

Of course it would be removed. To print a stream of these you'd
simply wrap the printf in a function and iterate over the stream of
values:

typedef long int ninebits;

void dump(ninebits x)
{
printf("%d%02x", x < 0, (0x100 + x) % 0x100);
}

void dump_stream(size_t n, ninebits *stream)
{
while (n--) dump(*stream++);
}

Because this is an emulation, you never have to expose the
representation you have chosen to use.

Yes, all of this detail may not be needed depending on what the OP is
trying to do. If it is needed, then I don't see any way around binary
manipulation.

As I said, there may be a reason but I can't think of it. Maybe if
you gave a real C example it would help me.

The OP didn't state his exact intentions or the hardware that is working
with. It may very well be that he only needs to the arithmatic values. If
so, great. Not knowing, I prefer to be safe and assume that he might need
a binary accurate model.

I will define "internal format" as the normal C types which are actually
used for calculations, printed representations, etc. Everything done at
this level is perfectly valid, portable C. No implementation differences
as we are merely dealing with arithmatic values which will add, subtract,
divide, modulo, etc more or less the same no matter how they are
represented.

OK, but I am little worried because there are "implementation
differences" in arithmetic in some cases. The trick, of course is to
avoid them.

I will define "virtual format" as the format actually apparent to
the software running on the virtual machine; and, which for simulation
purposes should have a binary representation and sequential storage that
is close as possible to the actual emulated hardware. These values are
only used when importing data from I/O sources as the emulated hardware
would receive it at which time they would be converted to internal format
for host storage; when viewing the internal binary representation of the
emulated machines memory at which time they would be converted from the
internal format, when dealing with instructions of the simulated hardware
that are binary in nature (instructions for flipping single bits and
shifting registers are very common operations for MCU's and) and must act
on the data as apperant to the simulated hardare in which case they are
converted for the internal format and then recoverted back to the internal
format after the operation is completed, or when determining the pin
outputs of the MCU at which would again be determined from a conversion of
the internal format stored by the host computer.

Only the conversion routines ever know about the actual virtual format and
its differences from the host hardware. Everything else is portable. The
conversion routines must know how to convert the internal representation to
the virtual representation and vise versa. As the binary representation of
internal format data on the host may differ from the binary format of the
same value on the simulated machine, these routines may need to do binary
manipulation to achieve a proper conversion. This may be simple and
portable recasting and calculating rollover/clamps, it may be removing
buffer and combing entries of the internal format to make virtual formats
to make streams which do not align on even byte boundries, or it may mean
converting one sign format to another. Depending on the host machine and
its similarity to the simulated machine this may be possible with portable
code. But, other machines will require binary manipulation to meet these
goals making it implemenation dependant as C does not provide binary
standards for its numeric types.

I really need an example. It all sounds as if it can be done in
portable C. I am sorry if I sound obstinate, but I've seen a lot of
non-portable code used where perfectly portable alternatives exist and
I can't yet see where the unavoidable implementation dependencies are
that you are clearly seeing.

The end result is that if one is conserned about a highly realistic
emulation of the simulated hardware and of the about portability of the
emulator, one should consider keeping the internal format and virtual
format separated.

Absolutely. Agree 100%.

They may then place any implementation dependant
code in two conversion functions which allows the rest of the program
to ignore implementation specific details. If you don't need one or the
other, then my comments need not apply.

The bit I don't get is the need for "implementation specific details"
in the conversion functions.

Willem · May 6, 2009

Tim Harig wrote:
) If you need binary accuracy to the hardware that you are emulating; then,
) you need implementation specific code somewhere as C does not define
) any (or very little/not enough) standards for binary representation.

Nonsense. You don't need any standards for binary representation, all you
need is a way to map the emulated binary representation on an unsigned int.
Which is perfectly possible in standard C.

) Portable code that does not meet it goals is worthless. Sometimes you
) need to do non-portable things to meet your goals. This is generally
) true where working with binary data in C is concerned.

Why is that ? AFAIK, binary data can be handled quite portably.
Can you give an example of what you'd want to do that's not doable in
portable C ?

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Ben Bacarisse · May 6, 2009

Tim Harig said:
Simple. We have different goals. You only want arithmatic representation.
That is easy to produce with simple, portable C code using same
representations for input, calculation, and output.

If you need binary accuracy to the hardware that you are emulating; then,
you need implementation specific code somewhere as C does not define
any (or very little/not enough) standards for binary representation.

Can you give an example of how one could tell the difference between
an emulator that did it one way and one that had "binary accuracy"?

This may be the heart of the issue. To me, an emulator that does all
the right things is accurate. There is no issue about whether it is
"binary accurate" or "arithmetically accurate", it is simply accurate.
What kind of things could an emulator have to do that require one sort
of accuracy (the "binary" sort). If I knew that I might be able to
say how to do it portably, or I might concede that it can't be done
without knowledge of the implementation.

[I'm sorry for putting so many phrases in quotes -- I am not mocking
your use of words. It is simply that I don't understand the
distinction so I feel uncomfortable using them until I know what I am
saying by using them!]

<snip>

Tim Harig · May 6, 2009

I really need an example. It all sounds as if it can be done in

portable C. I am sorry if I sound obstinate, but I've seen a lot of
non-portable code used where perfectly portable alternatives exist and
I can't yet see where the unavoidable implementation dependencies are
that you are clearly seeing.

You are absolutly correct. I can be done with Standard/Portable C. I
could add a level of indirection, ignore direct binary issues entirely, by
representing bits using bitfield structures, chars, short ints, or even
manipulating BCD values. It would employ a natural duplication of code and
it would bring a performance hit; but, the code readability would be
excellent -- almost taking on an object oriented quality.

I could also use simple tests using portable code which would detect
endedness, sign representation, and type sizing. Then simply use that
metadata with conditionals determining what steps to use to match the
implementation currently being used. Endedness is easily checked by
checking the order of short integers represented by the same integer value
(You cannot use chars because you don't know what character set the C
implementation is using). Sign represetation may be checked by comparing
the binary representation of known unsigned integers against the signed
integers produced by the machine. Type sizes are avaible from limits.h.

Once the meta-data about the machine is known, then I just select the
proper procedure for the given implementation to make the conversion. I
could do this with a single function with conditionals or I could make the
determination at startup and assign a function pointer to the proper
procedure. This conforms 100% to the C99 standard and it should be
extremenly portable. Only the most freakish C implementations would stand
any chance of breaking it. A C interpreter that stores all of its
numerical values as numeric character strings might have a chance.

Its standards complient, highly portable, and likely good quality code. I
still do not consider it to be implementation agnostic. The bottom line is
that the procedure for converting an 66 bit, little endian, sign bit
representation number to a 9bit, 2s complement, number is still slightly
different from making the conversion from a 16, little endian, 2s
complement number. Therefore, I still suggest that this standard,
portable, procedure be encapsilated behind a set functions, along with code
that hands saturations and overflows, rather then being dispersed
throughout the rest of the project's code.

Does that finally make everybody happy?

Tim Harig · May 6, 2009

Nonsense. You don't need any standards for binary representation, all you
need is a way to map the emulated binary representation on an unsigned int.
Which is perfectly possible in standard C.

8Bit:
10000000
11111111
10000001
16Bit:
1111111110000001
0000000011111111

All of the above values can be used to specify the number -127; but, they
are not equivalant. If these binary values are used to control motors
in a CNC mill they may not produce the same shape. To effectively model
this system, it is necessary to that the the emulator deal with the same
binary format as will the real hardware whether or not it is the same
as the C implementation modeling it.

Willem · May 6, 2009

Tim Harig wrote:
)> Nonsense. You don't need any standards for binary representation, all you
)> need is a way to map the emulated binary representation on an unsigned int.
)> Which is perfectly possible in standard C.
)
) All of the above values can be used to specify the number -127; but, they
) are not equivalant.

They are irrelevant. All you use is the value itself: -127.

) If these binary values are used to control motors
) in a CNC mill they may not produce the same shape.

The emulator is not linked to a real CNC machine, is it ?
So, the CNC machine emulator should also be written in
standard C, using the value, and not caring about
the underlying representation.

) To effectively model
) this system, it is necessary to that the the emulator deal with the same
) binary format as will the real hardware whether or not it is the same
) as the C implementation modeling it.

Why ? C operators are defined on values. They *don't care* about
the underlying binary format. You don't need to care either.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Tim Harig · May 6, 2009

Tim Harig wrote:
) All of the above values can be used to specify the number -127; but, they
) are not equivalant.
They are irrelevant. All you use is the value itself: -127.

They are very relevant when emulating MCU designs for debugging. The bits
very likely run on individual relays which in turn control individual
motors, lights, pumps, etc.

) If these binary values are used to control motors
) in a CNC mill they may not produce the same shape.
The emulator is not linked to a real CNC machine, is it ?
So, the CNC machine emulator should also be written in
standard C, using the value, and not caring about
the underlying representation.

When you program software for embedded systems, you generally test and
debug your software using an emulator. If you have the cash you might also
be able to test and debug using a simulator which is actually a cable that
plugs into the IC socket and uses a bridge whereby the computer does
directly emulate the hardware of the microcontroller. In either instance
the actual binary representation used within the microcontroller is
signification. Microcontrollers place a high emphasis on individual bits.
Individual bits likely control the pins of the IC, which switches the
relay, which controls the current to a motor ect. Alternatively, they
could represent the current on an input pin which detects whether a safety
shield has been deployed before starting the motor or part of a bitmask
used to control which interrupts are operational.

Therefore, it is very important to model the actual binary representation
used within the microprocessor -- not just the arithmatic value.

) To effectively model
) this system, it is necessary to that the the emulator deal with the same
) binary format as will the real hardware whether or not it is the same
) as the C implementation modeling it.
Why ? C operators are defined on values. They *don't care* about
the underlying binary format. You don't need to care either.

C operators are defined for the hardware and implementation on which they
run. They are not designed to match the representations of other hardware
and implementations.

Tim Harig · May 6, 2009

Here are functions to convert between a native int_least64_t into a 9-bit
two's complement value in two bytes, with the second byte having the sign
in the lowest bit and the upper bits zero (and ignored when reading):

void to_9bit( unsigned char out [2], int_least64_t in )
{
unsigned u = in; // convert to two's complement
out [0] = u & 0xFF;
out [1] = u >> 8 & 0x01;
}

Lets check how your function would fare on different hosts using different
binary representations of signed numbers. We will test how they convert -64
in their native implementation to the 9bit 2's complement. I am only 32
bits for space and to much calculations more manageable.

2's complement:
-- I don't think that it should make too much difference here.

out[0]
-40 = 11111111111111111111111111000000 = -64d
&FF = 00000000000000000000000011111111 = 255d
=C0 = 00000000000000000000000011000000 = 192d

out[1]
-40 = 11111111111111111111111111000000 = -64d
>>8 = 00000000000000000000000000001000 = 08d
=-1 = 11111111111111111100000000000000 = -1d *
&01 = 00000000000000000000000000000001 = 01d
=01 = 00000000000000000000000000000001 = 1d

* from C Pocket Reference, Peter prinz & Ulla Kirch-Prinz,
O'reilly inc., p.25
> The bit positions vacated at the left by the rith shift >> are
> filled with 0 bits if the left operand is an unsigned type or
> has a non-negative value. If the left operand is signed and
> negative, the left bits may be filled with 0 (logical shift) or
> with the value of the sign bit (arithmetic shift), depending on
> the compiler
Note that it makes no difference for this calculation. The version
I used fills using the sign bit.

111000000 = -64 Good although the value needs to be returned or
redirected by pointer

1's complement:

out[0]
-40 = 11111111111111111111111110111111 = -64d
&FF = 00000000000000000000000011111111 = 255d
=BF = 00000000000000000000000010111111 = 191d

out[1]
-40 = 11111111111111111111111110111111 = -64d
>>8 = 00000000000000000000000000001000 = 08d
=-0 = 11111111111111111111111111111111 = -0d * see above
&01 = 00000000000000000000000000000001 = 01d
=01 = 00000000000000000000000000000001 = 01d

110111111 = -65 Close but no cigar, the correct value should
be 111000000.

That means your function will not work when the host computer uses
1's complement signed numbers. If you want this application to
work on those machines then you need to use a different procedure
for those implementations.

sign and magnitude:

out[0]
-40 = 10000000000000000000000001000000 = -64d
&FF = 00000000000000000000000011111111 = 255d
=40 = 00000000000000000000000001000000 = 64d

out[1]
-40 = 10000000000000000000000001000000 = -64d
>>8 = 00000000000000000000000000001000 = 08d
= = 11111111100000000000000000000000 = A very small number or a
&01 = 00000000000000000000000000000001 = 01d
=0 = 00000000000000000000000000000000 = 00d

001000000 = 64 Right number wrong sign. So, your code will not
work on a host machine that uses sign and magnitude calculations
either.If you want this application to work on those machines then
you need to use a different procedure for those implementations.

int_least64_t from_9bit( unsigned char const in [2] )
{
int i = (in [1] << 8 & 0x100) | in [0];
return (i ^ 0x100) - 0x100; // convert from two's complement
}

I might try these on for size latter.

These are fully portable and don't need to know anything about the host
implementation. They are also reasonably efficient for most purposes.

The calculations beg to differ. Your calculations do not work for
host machines which do not use 2's compliment representation for signed
numbers. Thus, other implementations are required to work on those machines.

Tim Harig · May 6, 2009

"One solution to the problem of portable binary files is to avoid them."
~ Practical C Programming, Steve Oualline, p.339

Ben Bacarisse · May 6, 2009

Tim Harig said:
Here are functions to convert between a native int_least64_t into a 9-bit
two's complement value in two bytes, with the second byte having the sign
in the lowest bit and the upper bits zero (and ignored when reading):

void to_9bit( unsigned char out [2], int_least64_t in )
{
unsigned u = in; // convert to two's complement
out [0] = u & 0xFF;
out [1] = u >> 8 & 0x01;
}

Click to expand...

Lets check how your function would fare on different hosts using different
binary representations of signed numbers. We will test how they convert -64
in their native implementation to the 9bit 2's complement. I am only 32
bits for space and to much calculations more manageable.

2's complement:

1's complement:

out[0]
-40 = 11111111111111111111111110111111 = -64d
&FF = 00000000000000000000000011111111 = 255d
=BF = 00000000000000000000000010111111 = 191d

out[1]
-40 = 11111111111111111111111110111111 = -64d
>>8 = 00000000000000000000000000001000 = 08d
=-0 = 11111111111111111111111111111111 = -0d * see above
&01 = 00000000000000000000000000000001 = 01d
=01 = 00000000000000000000000000000001 = 01d

110111111 = -65 Close but no cigar, the correct value should
be 111000000.

You've forgotten what 'unsigned u = in;' does. Blargg commented it so
as to be clear. C's arithmetic conversions are defined by value not
bit pattern.

Tim Harig · May 6, 2009

Tim Harig said:
Tim Harig said:

Here are functions to convert between a native int_least64_t into a 9-bit
two's complement value in two bytes, with the second byte having the sign
in the lowest bit and the upper bits zero (and ignored when reading):

void to_9bit( unsigned char out [2], int_least64_t in )
{
unsigned u = in; // convert to two's complement
out [0] = u & 0xFF;
out [1] = u >> 8 & 0x01;
}

Click to expand...

1's complement:

out[0]
-40 = 11111111111111111111111110111111 = -64d
&FF = 00000000000000000000000011111111 = 255d
=BF = 00000000000000000000000010111111 = 191d

out[1]
-40 = 11111111111111111111111110111111 = -64d
>>8 = 00000000000000000000000000001000 = 08d
=-0 = 11111111111111111111111111111111 = -0d * see above
&01 = 00000000000000000000000000000001 = 01d
=01 = 00000000000000000000000000000001 = 01d

110111111 = -65 Close but no cigar, the correct value should
be 111000000.

Click to expand...

You've forgotten what 'unsigned u = in;' does. Blargg commented it so
as to be clear. C's arithmetic conversions are defined by value not
bit pattern.

This is what I have found:

from: http://stackoverflow.com/questions/50605/signed-to-unsigned-conversion-in-c-is-it-always-safe

When you cast from signed to unsigned (and vice versa) the internal
representation of the number does not change. What changes is how the
compiler interprets the sign bit. So yes, aside from the possible
overflows, it is safe to cast from signed to unsigned, though the result
will probably be much larger after changing sign.

There are several comments. One says the results are implementation
dependant. He links to what seems to be the K&R book but the link leads to
a 404.

This is a greate source which also discusses 1s complement vs. 2s
complement.

from: http://www.codeguru.com/cpp/sample_chapter/article.php/c11111

When signed integer types are converted to unsigned, there is no lost data
because the bit pattern is preserved. However, the high-order bit loses its
function as a sign bit. If the value of the signed integer is not negative,
the value is unchanged. If the value is negative, the resulting unsigned
value is evaluated as a large, signed integer. In line 3 of Figure 5-8, the
value of c is compared to the value of l. Because of integer promotions, c
is converted to an unsigned integer with a value of 0xFFFFFFFF or
4,294,967,295.

I hate to reference Microsoft but... Have a look here:

http://msdn.microsoft.com/en-us/library/xbfs6fd4(VS.80).aspx

This seems self explanitory.

from: http://docs.sun.com/app/docs/doc/802-5776/6i9gsvj0b?l=Ja&a=view

The result of the addition has type int (value preserving) or unsigned int
(unsigned preserving), but the bit pattern does not change between these
two.

from: http://www.fiendish.demon.co.uk/c/casting.html#signedvals

A quick word about signed values and types before we go any further. A
signed type like int or long can hold a value that is either negative,
positive or zero. An unsigned type like unsigned long or unsigned int can
only hold positive or zero values. Often, signed and unsigned types of the
same basic type (unsigned int and int for example) take up the same number
of bytes of memory. The difference is in how the underlying hardware treats
the individual bit values in the bytes that make up the value. Usually
there is what is called a sign-bit in a signed value that determines
whether the value is positive or negative.

Consider what happens then to a signed value whose sign-bit is set
(indicating a negative value) that is cast to an unsigned type of the same
basic type. The value of the bits do not change in this instance, only
their meaning. For example:

int sVal;
unsigned int uVal;

sVal = -1;
uVal = sVal;

Here the compiler casts sVal to an unsigned int following the general rules
above. Assuming the types are the same number of bytes in size uVal will be
a very large positive number. Why ?

The answer lies in how negative numbers are represented and good old binary
math. To hold a value of -1 and be able to perform binary math on it (don't
forget computers only use binary) all the bits in the number are set,
including the sign-bit. If the number of bytes required to store an int on
our imaginary machine is 4 and the sign-bit is the left most one, then -1
would be represented by a binary number consisting of 32 1s (0xffffffff in
hexadecimal). Casting this to an unsigned int leaves the number and values
of the bits the same, but now there is no sign-bit. The left most bit is
treated as part of the number. So we still have 32 1s, so the resulting
value is 232 - 1, a fairly large positive number. Any negative number when
cast to an unsigned type will miraculously change it's value. Unfortunately
the degree to which the value will change can not be predicted in general
terms because it is dependent on the hardware and the number of bytes used
to hold values of differing types.

There is discussion here:
http://bytes.com/groups/c/215930-question-integral-promotion-signed-unsigned

I could be very wrong in relation to the standard. But if I am, I am in
good company.

Keith Thompson · May 6, 2009

Tim Harig said:
from: http://stackoverflow.com/questions/50605/signed-to-unsigned-conversion-in-c-is-it-always-safe

There are several comments. One says the results are implementation
dependant. He links to what seems to be the K&R book but the link leads to
a 404.

You can't believe everything you read on the web. (Or on Usenet, for
that matter).

Conversion from signed to unsigned does *not* necessarily keep the
same representation. It happens to do so on a 2's-complement system
with no padding bits.

Here's what the standard says (C99 6.3.1.3):

When a value with integer type is converted to another integer
type other than _Bool, if the value can be represented by the
new type, it is unchanged.

Otherwise, if the new type is unsigned, the value is converted
by repeatedly adding or subtracting one more than the maximum
value that can be represented in the new type until the value
is in the range of the new type.

Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.

If signed integers are represented using something other
than 2's-complement, the implementation *must* still make
signed-to-unsigned conversion work as specified -- which means that
it can't just copy or reinterpret the representation. For
example, (unsigned)-1 == UINT_MAX, regardless of the representation.

(Nearly all implementations these days use 2's-complement, so the
distinction rarely matters.)

The latest draft of the C standard is available at
<http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf> if you
want to see for yourself.

I've just posted a comment on stackoverflow.com.

[big snip]

I could be very wrong in relation to the standard. But if I am, I am in
good company.

Yes, it seems to be a widespread misconception.

Ben Bacarisse · May 7, 2009

Tim Harig said:
Tim Harig said:

Here are functions to convert between a native int_least64_t into a 9-bit
two's complement value in two bytes, with the second byte having the sign
in the lowest bit and the upper bits zero (and ignored when reading):

void to_9bit( unsigned char out [2], int_least64_t in )
{
unsigned u = in; // convert to two's complement
out [0] = u & 0xFF;
out [1] = u >> 8 & 0x01;
}
1's complement:

out[0]
-40 = 11111111111111111111111110111111 = -64d
&FF = 00000000000000000000000011111111 = 255d
=BF = 00000000000000000000000010111111 = 191d

out[1]
-40 = 11111111111111111111111110111111 = -64d
>>8 = 00000000000000000000000000001000 = 08d
=-0 = 11111111111111111111111111111111 = -0d * see above
&01 = 00000000000000000000000000000001 = 01d
=01 = 00000000000000000000000000000001 = 01d

110111111 = -65 Close but no cigar, the correct value should
be 111000000.

Click to expand...

You've forgotten what 'unsigned u = in;' does. Blargg commented it so
as to be clear. C's arithmetic conversions are defined by value not
bit pattern.

Click to expand...

This is what I have found:

I'll go though them all but there is a definitive reference -- the C
standard and it specifies what happens.

from: http://stackoverflow.com/questions/50605/signed-to-unsigned-conversion-in-c-is-it-always-safe

Best ignore that page. A set of random people comment on what some C
code does and then vote on the results. Only the bottom ranked answer
is correct. That one is new though and I am pretty sure I know who
wrote it. If you knew to believe it (if indeed it was there when you
looked) then you'd have the answer.

There are several comments. One says the results are implementation
dependant. He links to what seems to be the K&R book but the link leads to
a 404.

I have K&R and it says the same as the standard: the conversion is to
least unsigned integer congruent to the signed value mod 2^wordsize.

This is a greate source which also discusses 1s complement vs. 2s
complement.

from: http://www.codeguru.com/cpp/sample_chapter/article.php/c11111

That is flat out wrong. The page is not bad though it has a few
mistakes. It fails to warn people about padding bits in integer types
for example and is plain wrong (again) about unsigned to signed
conversions. In fact the conversions bit is pretty awful.

Yes, but fails to say which one! The rules is exact and not
implementation defined. They could actually tell you!

I hate to reference Microsoft but... Have a look here:

http://msdn.microsoft.com/en-us/library/xbfs6fd4(VS.80).aspx

This seems self explanitory.

They are describing what their compiler does and that conforms to the
standard. They don't say anything about what you can rely on from the
C language. Everything they say is system specific but as a C
implementor they are permitted to do that.

Nowhere do they say what happens on non 2's complement systems which
is what you were considering.

from: http://docs.sun.com/app/docs/doc/802-5776/6i9gsvj0b?l=Ja&a=view

Again, I think that is assuming a 2's complement system. If it is
intended as a general description of C, it is wrong, but I think that
would be unfair. It seems to be about specific systems and
compilers.

from: http://www.fiendish.demon.co.uk/c/casting.html#signedvals

I think this is purporting to be about C in general so this one is
plain wrong. Everything it says is correct on 2's complement
machines, but if you were to take it literally you'd make a mistake
about the others.

It is also sadly vague. Neither of the two general source actually
tell you the simple rule: the value is reduced modulo 2^number of
value bits. I get the feeling they are written by people who've been
a bot perplexed and have just got out of the state of perplexity but
have not grasped the whole picture.

There is discussion here:
http://bytes.com/groups/c/215930-question-integral-promotion-signed-unsigned

I could be very wrong in relation to the standard. But if I am, I am in
good company.

Well the first answer there from Dan Pop is (unsurprisingly) correct
and it clears that matter up. In any discussion the trouble is who do
you believe? In this case you read the standard (only paragraphs 1
and 2 apply to this case we are considering):

6.3.1.3 Signed and unsigned integers

1 When a value with integer type is converted to another integer
type other than _Bool, if the value can be represented by the new
type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value
that can be represented in the new type until the value is in the
range of the new type.49)

3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or
an implementation-defined signal is raised.

None of this depends on whether negative numbers are stored using 1's
complement, sign-and-magnitude or 2's complement -- the result depends
only on the maximum value of the unsigned type.

Ben Bacarisse · May 7, 2009

Keith Thompson said:
I've just posted a comment on stackoverflow.com.

I was surprised to see a full and correct answer on a page so riddled
with misconceptions! I failed (at first) to spot the time the comment
was added so I was puzzled that it was rated so low.

Portable pointer arithmetics?	6	Jul 24, 2010
Unable to read input from keyboard, in below C code, for a BST.	0	Jul 20, 2025
Homework in C - Help Needed	1	Oct 16, 2024
Mini Web Server in C++ (Part One)	4	Oct 2, 2025
looking for the fastest complex number arithmetics...	9	Aug 19, 2007
Keyboard event detection in C#	1	Feb 8, 2023
Arithmetics	21	Feb 15, 2004
C exercise	1	Feb 3, 2022

9bit arithmetics in C

Hallvard B Furuseth

Flash Gordon

Tim Harig

Tim Harig

Tim Harig

Flash Gordon

Ben Bacarisse

Willem

Ben Bacarisse

Tim Harig

Tim Harig

Willem

Tim Harig

Tim Harig

Tim Harig

Ben Bacarisse

Tim Harig

Keith Thompson

Ben Bacarisse

Ben Bacarisse

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads