Two's Complement, Serialization, etc.

Beej Jorgensen · Sep 9, 2009

Here's what I'm trying to do: portably serialize signed 16-bit integers.
I'd like some ideas for improvement here.

1 void pack16(unsigned char buf[2], int16_t v)
2 {
3 uint16_t v2 = v;
4 buf[0] = v2 >> 8;
5 buf[1] = v2;
6 }

7 int16_t unpack16(unsigned char buf[2])
8 {
9 uint16_t v2 = buf[0] << 8 | buf[1];
10
11 if (v2 > 0x7fffu) { // check for negative number
12 return -(0xffffu - v2 + 1u);
13 }
14
15 return v2;
16 }

Comments/Questions:

1. I believe the direct assignment to the unsigned variable in pack16()
line 3 should give us the two's complement negative by C99-6.3.1.3p2.

2. By the same paragraph, I think "&0xff"s on lines 4 and 5 are
unnecessary.

3. Are casts required on line 12 for any of the constants? I'm thinking
the answer is "no" by C99-6.4.4.1p5.

4. Are casts required on the return values? Or anywhere on the
expression on line 12? I'm thinking no, but can't find a passage to
cite.

5. In unpack16(), I cannot portably assign the (buf[0]<<8)|buf[1]) into an
int16_t as it might overflow (C99-6.3.1.3p3.) And in pack16(), I
cannot portably assign v>>8 into buf[0] if v is negative
(C99-6.3.1.3p4.)

6. In the code, I assume the implementation supports int16_t and
uint16_t, which it is not required to do. But I don't see why the
above code shouldn't work with the "least16" variants.

7. Endianess issues? sizeof-related issues? sizeof-related issues if
least16-type variables were used? CHAR_BIT-related issues? Integer
promotion-related issues?

8. Any way to take advantage of the fact that int16_t is two's
complement in unpack16()?

Anything else? Something feels wrong or oversimplified or
overcomplicated or lame.

-Beej

Ben Bacarisse · Sep 9, 2009

Beej Jorgensen said:
Here's what I'm trying to do: portably serialize signed 16-bit integers.
I'd like some ideas for improvement here.

1 void pack16(unsigned char buf[2], int16_t v)
2 {
3 uint16_t v2 = v;
4 buf[0] = v2 >> 8;
5 buf[1] = v2;
6 }

7 int16_t unpack16(unsigned char buf[2])
8 {
9 uint16_t v2 = buf[0] << 8 | buf[1];
10
11 if (v2 > 0x7fffu) { // check for negative number
12 return -(0xffffu - v2 + 1u);
13 }
14
15 return v2;
16 }

Comments/Questions:

Anything else? Something feels wrong or oversimplified or
overcomplicated or lame.

One thing that may be useful. I've tested code that does this sort of
thing using signed bit fields of odd sizes. The idea is you define

struct char9 { signed int v: 9; };
struct int17 { signed int v:17; };

and with some macro magic you can (on some compilers) try out signed
9-bits chars and 17-bit ints.

Beej Jorgensen · Sep 9, 2009

Eric Sosman said:
Beej said:

1 void pack16(unsigned char buf[2], int16_t v)
2 {
3 uint16_t v2 = v;
4 buf[0] = v2 >> 8;
5 buf[1] = v2;
6 }

7 int16_t unpack16(unsigned char buf[2])
8 {
9 uint16_t v2 = buf[0] << 8 | buf[1];
10
11 if (v2 > 0x7fffu) { // check for negative number
12 return -(0xffffu - v2 + 1u);
13 }
14
15 return v2;
16 }

Click to expand...

2. By the same paragraph, I think "&0xff"s on lines 4 and 5 are
unnecessary.

Click to expand...

Unnecessary in line 4, but required in line 5 if CHAR_BIT > 8.

I wonder if since this is for networking where all bytes are octets it
wouldn't just be better to use uint8_t for the individual bytes and be
done with it. Sorry, CHAR_BIT==9!

Portable representations of binary data have received a lot of
study, and you might consider adopting an existing solution instead
of rolling your own freshly-smoothed wheel ...

I agree. I actually recommend a couple of them (which would, aside from
being correct, probably also be faster), but people are always curious,
so I like to provide them a simple example of what kind of mechanisms
are used in cases like this.

Packing up multibyte ints into an array is a fairly simple example, but
recently someone took the stuff and had it not work with negative
numbers on a different architecture, and so now I am compelled to make
it righter--there are several portability flaws in the current code.
(This is basically Exercise 9-1 in POP).

Or I could just go the other way and just make everything unsigned. But
where's the fun in that?

Thanks for the feedback!

-Beej

Beej Jorgensen · Sep 9, 2009

Ben Bacarisse said:
struct char9 { signed int v: 9; };
struct int17 { signed int v:17; };

and with some macro magic you can (on some compilers) try out signed
9-bits chars and 17-bit ints.

Oh, cool!

-Beej

Mark · Sep 10, 2009

Eric said:
1 void pack16(unsigned char buf[2], int16_t v)
2 {
3 uint16_t v2 = v;
4 buf[0] = v2 >> 8;
5 buf[1] = v2;
6 }
1. I believe the direct assignment to the unsigned variable in
pack16() line 3 should give us the two's complement negative by
C99-6.3.1.3p2.

Click to expand...

The conversion "stuff" still confuses me a lot. Am I right assuming that in

int16_t v = 3;
uint16_t v2 = v;

the rules of integer promotions are applied? So, 'v' in this example is
promoted to uint16_t, because int16_t can't represent all the values of the
original type and should be converted to 'unsigned'. But it applies only to
'int' and 'unsigned int' operands.

In truth, it can't give a negative of any description, since
an unsigned type cannot express a negative value. However, you're
right that the bit pattern in `v2' will be the same that `v' would
have if `v' used two's complement (whether it actually does or not).

.... and the bit pattern in `v2' remains because 6.3.1.1p3 says so, is that
correct ?

Eric Sosman · Sep 10, 2009

Mark said:
Eric said:

1 void pack16(unsigned char buf[2], int16_t v)
2 {
3 uint16_t v2 = v;
4 buf[0] = v2 >> 8;
5 buf[1] = v2;
6 }
1. I believe the direct assignment to the unsigned variable in
pack16() line 3 should give us the two's complement negative by
C99-6.3.1.3p2.

Click to expand...

Click to expand...

The conversion "stuff" still confuses me a lot. Am I right assuming that in

int16_t v = 3;
uint16_t v2 = v;

the rules of integer promotions are applied? So, 'v' in this example is
promoted to uint16_t, because int16_t can't represent all the values of
the original type and should be converted to 'unsigned'. But it applies
only to 'int' and 'unsigned int' operands.

No promotions appear to be called for, but there are two
conversions (promotions are conversions, but not all conversions
are promotions):

- The `int' value 3 is converted from `int' to `int16_t'.
Since 3 is within the range of the latter type, nothing
peculiar happens: `v' is initialized with the value three,
type `int16_t'.

- The `int16_t' value of `v' is converted to `uint16_t'.
Since the value of `v' (three) is within the range of
`uint16_t', nothing weird happens and `v2' is initialized
with the value three, type `uint16_t'.

... and the bit pattern in `v2' remains because 6.3.1.1p3 says so, is
that correct ?

I don't think so, because that paragraph concerns the integer
promotions, and no promotions occur. I think the applicable text
is 6.3.1.3p1, which describes integer conversions where the "target"
type can represent the value of the "source."

The other applicable pieces are 6.2.6.2p1, which describes the
representation of unsigned integer types, and 7.18.1.1.2, which
forbids padding bits in exact-width unsigned integers.

(And 7.18.1.1p1 requires padding-free two's complement for
the exact-width signed integers, making my parenthetical remark
vacuous. I've already dope-slapped myself for that blunder; see
elsethread.)

Mark · Sep 10, 2009

Eric said:
int16_t v = 3;
uint16_t v2 = v;

Click to expand...

[skip]
No promotions appear to be called for, but there are two
conversions (promotions are conversions, but not all conversions
are promotions):

I carefully read the Standards, it says:
"The integer promotions are applied only: as part of the usual arithmetic
conversions, to certain argument expressions, to the operands of the unary
+, -, and ~ operators, and to both operands of the shift operators, as
specified by their respective subclauses"

I understand that promotion occurs for function arguments, for operands of
+/-/~ and 'shift' operator. Any other "converting activity" would be a
*conversion*. Is that more correct wording ?

- The `int' value 3 is converted from `int' to `int16_t'.
Since 3 is within the range of the latter type, nothing
peculiar happens: `v' is initialized with the value three,
type `int16_t'.

- The `int16_t' value of `v' is converted to `uint16_t'.
Since the value of `v' (three) is within the range of
`uint16_t', nothing weird happens and `v2' is initialized
with the value three, type `uint16_t'.

Do the conversions you described have to do with usual arithmetic
conversion? Also I don't understand why 6.3.1.3 is seperate from 6.3.1.8,
their purpose is very close.

Eric Sosman · Sep 10, 2009

Mark said:
Eric said:

int16_t v = 3;
uint16_t v2 = v;

Click to expand...

[skip]
No promotions appear to be called for, but there are two
conversions (promotions are conversions, but not all conversions
are promotions):

Click to expand...

I carefully read the Standards, it says:
"The integer promotions are applied only: as part of the usual
arithmetic conversions, to certain argument expressions, to the operands
of the unary +, -, and ~ operators, and to both operands of the shift
operators, as specified by their respective subclauses"

I understand that promotion occurs for function arguments, for operands
of +/-/~ and 'shift' operator. Any other "converting activity" would be
a *conversion*. Is that more correct wording ?

The broad topic is "Conversions" (6.3). All these things
are conversions. Some subsets of all the possible conversions
are given special names:

1) The conversions described in 6.3.1.1p2 are called the
"integer promotions." They are still conversions, just
a specially designated subset of all conversions.

2) The integer promotions plus float-to-double are called
the "default argument promotions" (6.5.2.2p6). They,
too, are still conversions, just a slightly larger
designated subset.

3) The "usual arithmetic conversions" (6.3.1.8) is more
than just a subset of all possible conversions; it's
accompanied by an algorithm that chooses which conversions
to apply in particular circumstances. If we ignore the
algorithm, though, what's left are the conversions it can
choose from, and those are a subset of all conversions.

Do the conversions you described have to do with usual arithmetic
conversion?

Not as far as I can see. The purpose of the U.A.C. is to
find "common ground" for the operands of an operator, and the
examples have no operators (they are initializations). Even if
we think of them as assignments (initialization occurs "as if"
by assignment), the assignment operator doesn't use the U.A.C.
(see 6.5.16.1). So I don't think the U.A.C. enter in at all.

Also I don't understand why 6.3.1.3 is seperate from
6.3.1.8, their purpose is very close.

6.3.1.3 describes the effect of conversion between integers
of various types; it says "what happens" when conversion is
performed. 6.3.1.8 describes some conversions that are done
automatically; it says "which" conversions are chosen.
6.3.1.8 tells you to whether to bake potatoes or brownies;
6.3.1.3 has the brownie recipe.

Jef Driesen · Sep 10, 2009

Eric said:
Unnecessary in line 4, but required in line 5 if CHAR_BIT > 8.

What happens if you try to write a character to disk (or socket, or
anything else that expects 8bit octets) on such a system? Are all bits
written, or only 8 of them? If everything is written, how can you even
communicate with 8bit based systems, given that even the smallest
datatype is larger?

Beej Jorgensen · Sep 11, 2009

Beej Jorgensen said:
7 int16_t unpack16(unsigned char buf[2])
8 {
9 uint16_t v2 = buf[0] << 8 | buf[1];

Looks like I also need to cast these to the result type when it's
possible they don't fit in an int, due to the usual arithmetic
conversions? Is that right?

uint32_t v2 = ((uint32_t)buf[0]<<24) | ((uint32_t)buf[1]<<16) |
(buf[2]<<8) | buf[1];

I get warnings if I don't when messing with 64-bit values on this
machine (Athlon64, gcc, 32-bit ints).

-Beej

Beej Jorgensen · Sep 11, 2009

Eric Sosman said:
You're right: You need a cast on buf[0] before shifting it, to
make sure it doesn't promote to a signed 16-bit int.
Gah--thanks!

I'm not sure what "messing with" means. Warnings are at the
compiler's whim; some are important, some are not.

I just included the warnings as evidence I was doing something wrong in
this particular case.

-Beej

VHDL Subtraction two’s complement	0	Dec 13, 2016
Does [stdint.h] guarantee two's complement form?	2	Sep 20, 2010
Portable replacement	15	Apr 28, 2008
Standard way of converting a byte stream to two's complement	1	Nov 18, 2007
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Inserting IPv4 header checksum into dummy IP header	6	Dec 1, 2010
Help with code	0	Jun 12, 2022
How to calculate 2's complement 8 bits checksum ?	4	Jul 13, 2003

Two's Complement, Serialization, etc.

Beej Jorgensen

Ben Bacarisse

Beej Jorgensen

Beej Jorgensen

Mark

Eric Sosman

Mark

Eric Sosman

Jef Driesen

Beej Jorgensen

Beej Jorgensen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads