Storing 4 characters using 3 bytes

B

becte

I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

(The 6 bits integers represent characters in range A-Z and 0-9)
 
T

Tristan Miller

Greetings.

I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

Sounds suspiciously like a homework problem, so you're unlikely to get
complete solutions here. Have you read up on C's bit-shifting and bitwise
comparison operators? If not, do so and ask again if you still need help.

Regards,
Tristan
 
M

Mark A. Odell

(e-mail address removed) (becte) wrote in

I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

(The 6 bits integers represent characters in range A-Z and 0-9)

How about the untested:

#define PACKED_INT_RANGE 0x3F /* 6-bits */

/* pPacked had better be 3 bytes or more
*/
int packInts(int first, int second, int third, int fourth, char *pPacked)
{
int failures = !0;

if ((first & PACKED_INT_RANGE) == first
&& (second & PACKED_INT_RANGE) == second
&& (third & PACKED_INT_RANGE) == third
&& (fourth & PACKED_INT_RANGE) == fourth)
{
pPacked[0] = (first & PACKED_INT_RANGE) << 2;
pPacked[0] |= (second & PACKED_INT_RANGE) >> 4;
pPacked[1] = (second & PACKED_INT_RANGE) << 4;
pPacked[1] |= (third & PACKED_INT_RANGE) >> 2;
pPacked[2] = (third & PACKED_INT_RANGE) << 6;
pPacked[2] |= (fourth & PACKED_INT_RANGE) >> 0;

failures = 0;
}

return failures;
}
 
I

Ivan Vecerina

becte said:
I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

(The 6 bits integers represent characters in range A-Z and 0-9)

You could store the 4 6-bit values contiguously into a 32-bit
unsigned long, then split these 24-bit value back into 3 bytes.

I think you should give it a try, and post some code if you
need additional help. All you need is the bit-shift and
bit-or/bit-and operators ( << >> | & ).


Cheers,
Ivan
 
E

Eric Sosman

becte said:
I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

"Straightforward" suffices:

o1 = (c1 << 2) | (c2 >> 4);
o2 = ((c2 & 0xF) << 4) | (c3 >> 2);
03 = ((c3 & 0x3) << 2) | c4;

For "clever" you might try something like this (requires
an `int' of 25 or more bits):

o1 = (c1 << 18) | (c2 << 12) | (c3 << 6) | c4;
o3 = o1 & 0xFF;
o1 >>= 8;
o2 = o1 & 0xFF;
o1 >>= 8;
 
C

Christopher Benson-Manica

Mark A. Odell said:
int packInts(int first, int second, int third, int fourth, char *pPacked)
int failures = !0;

Just FMI, this assumes that !0 isn't a trap representation and that CHAR_BIT
is 8, right? Does it do anything else non-portable?
 
J

Joona I Palaste

Just FMI, this assumes that !0 isn't a trap representation and that CHAR_BIT
is 8, right? Does it do anything else non-portable?

Umm... what do you think !0 equals?

--
/-- Joona Palaste ([email protected]) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"When a man talks dirty to a woman, that's sexual harassment. When a woman talks
dirty to a man, that's 14.99 per minute + local telephone charges!"
- Ruben Stiller
 
N

Nudge

becte said:
I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

How about bit fields?
 
C

cody

typedef union
{
char chars[3];
struct
{
int _1 : 6;
int _2 : 6;
int _3 : 6;
int _4 : 6;
} BITS

} PACKED
 
C

Chris Torek

typedef union
{
char chars[3];
struct
{
int _1 : 6;
int _2 : 6;
int _3 : 6;
int _4 : 6;
} BITS

} PACKED

Even once the syntax errors are fixed, the code simply does not
work at all on the SPARC. Try it and see.

(You will find that c[0] is never changed. This is a hint as to
what is going wrong. :) )
 
I

Irrwahn Grausewitz

cody said:
typedef union
{
char chars[3];
struct
{
int _1 : 6;
int _2 : 6;
int _3 : 6;
int _4 : 6;
} BITS

} PACKED

If you'd followed the thread "How to extract bytes from a long?"
you'd already know this is calling for nasal demons.

Regards
 
C

cody

typedef union
{
char chars[3];
struct
{
int _1 : 6;
int _2 : 6;
int _3 : 6;
int _4 : 6;
} BITS

} PACKED

Even once the syntax errors are fixed, the code simply does not
work at all on the SPARC. Try it and see.

Missing semicolons? Sorry.
(You will find that c[0] is never changed. This is a hint as to
what is going wrong. :) )

Why? Structure padding? I don't think the struct BITS will be padding bits
inserted. added at the end maybe but that is not of interest here.
 
C

CBFalconer

becte said:
I need to use three bytes to store four 6-bit integers (4 * 6 = 3 * 8)
like this
11111122|22223333|33444444
Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
and the output is int o1,o2,o3, range 0 .. 2^8-1
How to do this in a clever way?

(The 6 bits integers represent characters in range A-Z and 0-9)

You can do better. You can store three of those chars into 16
bits. You should include at least a blank, for 37 chars each
represented by a value in 0 through 39

You store it by: i = 40 * (40 * c3 + c2) + c1. You extract by
dividing by 40 and taking the remainder.

This is a very old mechanism, and is at the root of 6 char
filenames in bygone years. Look up Rad50 (the 50 is in octal).
 
C

Chris Torek

[regarding a union of "char" array and bitfields, I wrote:]
Actually, I picked the wrong machine; I now think it *would* work
(depending on desired output) on the SPARC. Ones on which it would
likely fail badly are certain Motorola 680x0 implementations.

Missing semicolons? Sorry.

That, and the names "_1" through "_4" are dubious. (I would have
to consult the C standards to see whether those names are reserved
to users in the structure-member namespace. Instead of using names
whose form requires careful scrutiny of a C standard, why not use
names that are clearly off-limits to the implementor?)
Why? Structure padding? I don't think the struct BITS will be padding bits
inserted. added at the end maybe but that is not of interest here.

For discussion purposes, here is the proposed code, turned into
compilable C:

void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
union {
unsigned char c[3];
struct {
unsigned int one:6, two:6, three:6, four:6;
} bits;
} u;
u.bits.one = in1;
u.bits.two = in2;
u.bits.three = in3;
u.bits.four = in4;

out[0] = u.c[0];
out[1] = u.c[1];
out[2] = u.c[2];
}

Now, suppose we have a fairly typical (in today's terms) machine
with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".
If you print out the size of the union, you will find that it
requires four bytes, not three.

These four bytes are laid out in memory in the following way,
as required by the C standards:

a) when viewed as array-of-char:
at offset 0: u.c[0] (one byte)
at offset 1: u.c[1] (one byte)
at offset 2: u.c[2] (one byte)
at offset 3: <unnamed> (one byte of padding)
(remember that arrays *must* be contiguous)

b) when viewed as unnamed structure with bitfields:
storage unit at offset 0: <four bytes wide, unspecified layout>

Part (b) is where the problem occurs. The layout of the bitfields
within the storage unit is unspecified. Suppose the compiler
chooses to start at "bit 0" of a 32-bit big-endian word. In this
case, u.bits.one occupies bits 0 through 5 inclusive of that 32-bit
word, which live in the unnamed padding byte at offset 3 in the
array in u.c. u.bits.two occupies bits 6 through 11, which uses
two more bits in the unnamed byte and then four bits in the array
element u.c[3]. (At least one Motorola 68020 C compiler handles
-- or handled -- bitfields this way; I used it in the early Sun
days.)

As it happens, SPARC compilers in big-endian mode (which is to say
most of them) are "encouraged" to allocate bitfields starting at
bit 31 and working down towards bit 0. This means that u.bits.one
occupies bits 31 through 26 inclusive, which live in the array
element u.c[0].

On Intel x86 compilers, the situation is reversed: the usual
convention is to allocate bitfields starting at bit 0 and working
up towards bit 31. As it happens, Intel x86 CPUs are little-endian
-- so this means that u.bits.one occupies bits 0 through 5 inclusive,
which are also bits 0 through 5 of u.c[0]. u.bits.two occupies
bits 6 through 11, which are in bits 6 and 7 of u.c[0] (for the
two low-order bits of u.bits.two) and then bits 0 through 3 of
u.c[1].

Of course, there is nothing but convention to prevent an x86 C
compiler from allocating bits from 31 down to 0, in which case
u.c[0] would not map to any of the bits in u.bits. In any case,
the output on the x86 seems a bit peculiar -- the bits are spread
over the bytes in an interesting manner:

#include <stdio.h>

int main(void) {
unsigned char buf[3];
pack(buf, 0x00, 0x3f, 0x00, 0x00);
printf("result: buf[0..3] = 0x%2.2x%2.2x%2.2x\n",
buf[0], buf[1], buf[2]);
return 0;
}

produces:

result: buf[0..3] = 0xc00f00

Packing the tuple <0x00, 0x03, 0x00, 0x00> results in 0xc00000, so
we see that, indeed, buf[0] contains as its two *uppermost* bits
the two *lowermost* bits of the 2nd value.

(The result on the big-endian SPARC is probably closer to what was
expected.)
 
D

Dan Pop

In said:
typedef union
{
char chars[3];
struct
{
int _1 : 6;
int _2 : 6;
int _3 : 6;
int _4 : 6;
} BITS

} PACKED

Even once the syntax errors are fixed, the code simply does not
work at all on the SPARC. Try it and see.

Missing semicolons? Sorry.
(You will find that c[0] is never changed. This is a hint as to
what is going wrong. :) )

Why? Structure padding? I don't think the struct BITS will be padding bits
inserted. added at the end maybe but that is not of interest here.

Clue: in what order do bit fields get allocated inside the storage unit?
What if this storage unit happens to be a 32-bit entity?

10 An implementation may allocate any addressable storage unit large
enough to hold a bit-field. If enough space remains,
a bit-field that immediately follows another bit-field in
a structure shall be packed into adjacent bits of the same
unit. If insufficient space remains, whether a bit-field that
does not fit is put into the next unit or overlaps adjacent
units is implementation-defined. The order of allocation of
bit-fields within a unit (high-order to low-order or low-order
to high-order) is implementation-defined. The alignment of the
addressable storage unit is unspecified.

Dan
 
C

CBFalconer

Chris said:
[regarding a union of "char" array and bitfields, I wrote:] .... snip ...

For discussion purposes, here is the proposed code, turned into
compilable C:

void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
union {
unsigned char c[3];
struct {
unsigned int one:6, two:6, three:6, four:6;
} bits;
} u;
u.bits.one = in1;
u.bits.two = in2;
u.bits.three = in3;
u.bits.four = in4;

out[0] = u.c[0];
out[1] = u.c[1];
out[2] = u.c[2];
}
... snip about problems etc. ...

However, for this problem, simply work with values using
multiplication and division and all the problems and
non-portabilities go away. See my post elsethread.
 
B

becte

Eric Sosman said:
"Straightforward" suffices:

o1 = (c1 << 2) | (c2 >> 4);
o2 = ((c2 & 0xF) << 4) | (c3 >> 2);
03 = ((c3 & 0x3) << 2) | c4;

I go for the straightforward solution.
BTW, there is an error in the last line, should be
o3 = ((c3 & 0x3) << 6) | c4; // shift 6, not 2
 
C

cody

Actually, I picked the wrong machine; I now think it *would* work
(depending on desired output) on the SPARC. Ones on which it would
likely fail badly are certain Motorola 680x0 implementations.

Why? byte order doesn't matter here.You put 3 bytes in and get 4 bytes out.
That, and the names "_1" through "_4" are dubious. (I would have
to consult the C standards to see whether those names are reserved
to users in the structure-member namespace. Instead of using names
whose form requires careful scrutiny of a C standard, why not use
names that are clearly off-limits to the implementor?)

OK, I admit that _1 is a stupid identifier.
Why? Structure padding? I don't think the struct BITS will be padding bits
inserted. added at the end maybe but that is not of interest here.

For discussion purposes, here is the proposed code, turned into
compilable C:

void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
union {
unsigned char c[3];
struct {
unsigned int one:6, two:6, three:6, four:6;
} bits;
} u;
u.bits.one = in1;
u.bits.two = in2;
u.bits.three = in3;
u.bits.four = in4;

out[0] = u.c[0];
out[1] = u.c[1];
out[2] = u.c[2];
}

Now, suppose we have a fairly typical (in today's terms) machine
with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".
If you print out the size of the union, you will find that it
requires four bytes, not three.

These four bytes are laid out in memory in the following way,
as required by the C standards:

a) when viewed as array-of-char:
at offset 0: u.c[0] (one byte)
at offset 1: u.c[1] (one byte)
at offset 2: u.c[2] (one byte)
at offset 3: <unnamed> (one byte of padding)
(remember that arrays *must* be contiguous)

b) when viewed as unnamed structure with bitfields:
storage unit at offset 0: <four bytes wide, unspecified layout>

Part (b) is where the problem occurs. The layout of the bitfields
within the storage unit is unspecified. Suppose the compiler
chooses to start at "bit 0" of a 32-bit big-endian word. In this
case, u.bits.one occupies bits 0 through 5 inclusive of that 32-bit
word, which live in the unnamed padding byte at offset 3 in the
array in u.c. u.bits.two occupies bits 6 through 11, which uses
two more bits in the unnamed byte and then four bits in the array
element u.c[3]. (At least one Motorola 68020 C compiler handles
-- or handled -- bitfields this way; I used it in the early Sun
days.)

As it happens, SPARC compilers in big-endian mode (which is to say
most of them) are "encouraged" to allocate bitfields starting at
bit 31 and working down towards bit 0. This means that u.bits.one
occupies bits 31 through 26 inclusive, which live in the array
element u.c[0].

On Intel x86 compilers, the situation is reversed: the usual
convention is to allocate bitfields starting at bit 0 and working
up towards bit 31. As it happens, Intel x86 CPUs are little-endian
-- so this means that u.bits.one occupies bits 0 through 5 inclusive,
which are also bits 0 through 5 of u.c[0]. u.bits.two occupies
bits 6 through 11, which are in bits 6 and 7 of u.c[0] (for the
two low-order bits of u.bits.two) and then bits 0 through 3 of
u.c[1].

Of course, there is nothing but convention to prevent an x86 C
compiler from allocating bits from 31 down to 0, in which case
u.c[0] would not map to any of the bits in u.bits. In any case,
the output on the x86 seems a bit peculiar -- the bits are spread
over the bytes in an interesting manner:

#include <stdio.h>

int main(void) {
unsigned char buf[3];
pack(buf, 0x00, 0x3f, 0x00, 0x00);
printf("result: buf[0..3] = 0x%2.2x%2.2x%2.2x\n",
buf[0], buf[1], buf[2]);
return 0;
}

produces:

result: buf[0..3] = 0xc00f00

Packing the tuple <0x00, 0x03, 0x00, 0x00> results in 0xc00000, so
we see that, indeed, buf[0] contains as its two *uppermost* bits
the two *lowermost* bits of the 2nd value.

(The result on the big-endian SPARC is probably closer to what was
expected.)

Who says that the size of the struct will be 32bit? The standard just says:
[#9] An implementation may allocate any addressable storage
unit large enough to hold a bit-field. If enough space
remains, a bit-field that immediately follows another bit-
field in a structure shall be packed into adjacent bits of
the same unit. If insufficient space remains, whether a
bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The
order of allocation of bit-fields within a unit (high-order
to low-order or low-order to high-order) is implementation-
defined. The alignment of the addressable storage unit is
unspecified.Additionally you can adjust the structure padding with
compiler options.

union {
unsigned char c[3*4];
struct {
unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
unsigned int bit8:6, bit2:9, bit3:10, bit11:6;
unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
} bits;
} u;

This should solve the problem because the union is now a multiple of 32 bits
in size so there will be no internal padding and byte order won't matter.
 
C

Chris Torek

[I wrote, in part:]
Who says that the size of the struct will be 32bit?

I did, right in the text you quoted (and I retained) above. And
it is -- on today's typical implementations. It does not *have*
to be, but it *is*, right now. (On a few implementations the
storage-units are 16 bits, and on a few more they are 64 bits, and
there are no doubt some oddballs with 24 bits and the like out
there today. But 32 is pretty common at the moment.)
The standard just says:
[#9] An implementation may allocate any addressable storage
unit large enough to hold a bit-field. If enough space
remains, a bit-field that immediately follows another bit-
field in a structure shall be packed into adjacent bits of
the same unit. If insufficient space remains, whether a
bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The
order of allocation of bit-fields within a unit (high-order
to low-order or low-order to high-order) is implementation-
defined. The alignment of the addressable storage unit is
unspecified.

Yes, this is what it says.
Additionally you can adjust the structure padding with compiler options.

(If they exist.)
union {
unsigned char c[3*4];
struct {
unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
unsigned int bit8:6, bit2:9, bit3:10, bit11:6;
unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
} bits;
} u;

This should solve the problem because the union is now a multiple of 32 bits
in size so there will be no internal padding and byte order won't matter.

Will it? (Assume we have fixed the typos above -- bit2:9 and
bit3:10 should be bit9:6 and bit10:6. I let a compiler find this.)

"If enough space remains, a bit-field that immediately follows
another bit-field in a structure shall be packed into adjacent
bits of the same [storage] unit."

Suppose storage-units are 32 bits wide. Then 6+6+6+6+6 = 30, so
that the fields name "bit0" through "bit4" inclusive are packed
into the first 32-bit storage unit. The field named "bit5" now
needs another 6 bits, which is greater than the remaining space
(2 bits), so the Standard's wording continues:

"If insufficient space remains," -- and this is indeed the
case -- "whether a bit-field that does not fit is put into
the next unit or overlaps adjacent units is implementation-
defined."

So the implementation must document whether "bit5" is split across
two storage units, or the two leftover bits are skipped and then
"bit5" is placed in the next storage-unit.

Suppose the implementation's documentation says that bitfields are
*not* so split. In this case, "bit5" through "bit9" inclusive go
in the 2nd 32-bit storage-unit, "bit10" through "bit14" in the
third, and "bit15" occupies some implementation-defined portion
of the last 32-bit storage unit.

Thus, the "struct ... bits" member will need four 32-bit storage
units, or 16 bytes (remember that we are still assuming a common
8-bit-byte 32-bit-int machine). This is indeed a multiple of 32
bits -- but hardly the desired multiple, as illustrated by the
following program and its output:

#include <stdio.h>

union u {
unsigned char c[3*4];
struct s {
unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
unsigned int bit8:6, bit9:6, bit10:6, bit11:6;
unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
} bits;
};

int main(void) {
printf("sizeof union u = %lu\n",
(unsigned long)sizeof(union u));
printf("sizeof struct s = %lu\n",
(unsigned long)sizeof(struct s));
printf("sizeof unsigned char [3*4] = %lu\n",
(unsigned long)sizeof(unsigned char [3*4]));
return 0;
}

When compiled with "gcc -ansi -pedantic" and run, this prints:

sizeof union u = 16
sizeof struct s = 16
sizeof c[3*4] = 12

which means that the various bitfields were in fact padded out in
just the way I suggested above. The array, of course, has no
padding.

Ultimately, you *can* do what the original poster asked using unions
and bitfields -- but the result is inherently implementation
dependent, because of all those "implementation-defined" items
above; and it is surprisingly hard to get the right answers.
(Dennis Ritchie once called C's bitfields "a botch and a blemish",
and I think this helps illustrate why.) As a general rule, if you
want to do bitwise computations in C, it all works a lot better if
you use the bitwise operators instead of bitfield members of
structures. The results are far easier to predict -- thus easier
to get right -- and you have much more hope of producing portable,
or even just "mostly portable", code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top