T
T Koster
Hi group,
I'm having some difficulty figuring out the most portable way to read 24
bits from a file. This is related to a Base-64 encoding.
The file is opened in binary mode, and I'm using fread to read three
bytes from it. The question is though, where should fread put this? I
have considered two alternatives, but neither seem like a good idea:
In most cases, the width of a char is 8 bits, so an array of 3 chars
would suffice, but the width of a char is guaranteed to be only *at
least* 8 bits, so the actual number of chars required would be 24 /
CHAR_BIT, rounded up. Since you can't round in a constant integral
expression, 3 chars is a good safe buffer size because it's guaranteed
to be at least 24 bits. However, since I need to be able to divide
those 24 bits into four 6-bit numbers, indices into the char array
become more complicated as the 6-bit numbers do not fall evenly on the
(presumably) 8-bit boundaries that indexes in the array would give me.
If the width of a char is not 8 bits, then knowing which indices to look
at and shift/mask is even more difficult. As such, I thought of the
second option.
The second option is to allocate the input buffer as simply one int
object that is guaranteed to be at least 24 bits wide: the long int,
which even has 8 bytes to spare. fread can safely write 3 bytes of data
into a long int. I only have worries that because a long int is a
multi-byte integer, accessing various parts of it is dangerous due to
endianness considerations, or is endianness only relevant to the
represented *value* of the multi-byte integer as a whole? fread doesn't
care about that: it writes three bytes into the address of the long int,
starting at the lowest-positioned byte, but would the shifting/masking
be portable? For example a multi-byte integer constant 0x1234 has a
most-significant byte of value 0x12, but on a big-endian machine would
be stored on the *lowest* memory address of the space it takes up. As
such, the mask required to leave only the *lowest* 6 bits of a 32-bit
integer could be either 0x3F000000 or 0x0000003F depending on
endianness, right? Or are hexadecimal integer constants always stored
as-is? That is, the lowest byte is positioned last in an integer
constant instead of the least significant byte positioned last? This
seems counter-intuitive.
If neither of these options is good, is there another way?
Thanks in advance,
Thomas
I'm having some difficulty figuring out the most portable way to read 24
bits from a file. This is related to a Base-64 encoding.
The file is opened in binary mode, and I'm using fread to read three
bytes from it. The question is though, where should fread put this? I
have considered two alternatives, but neither seem like a good idea:
In most cases, the width of a char is 8 bits, so an array of 3 chars
would suffice, but the width of a char is guaranteed to be only *at
least* 8 bits, so the actual number of chars required would be 24 /
CHAR_BIT, rounded up. Since you can't round in a constant integral
expression, 3 chars is a good safe buffer size because it's guaranteed
to be at least 24 bits. However, since I need to be able to divide
those 24 bits into four 6-bit numbers, indices into the char array
become more complicated as the 6-bit numbers do not fall evenly on the
(presumably) 8-bit boundaries that indexes in the array would give me.
If the width of a char is not 8 bits, then knowing which indices to look
at and shift/mask is even more difficult. As such, I thought of the
second option.
The second option is to allocate the input buffer as simply one int
object that is guaranteed to be at least 24 bits wide: the long int,
which even has 8 bytes to spare. fread can safely write 3 bytes of data
into a long int. I only have worries that because a long int is a
multi-byte integer, accessing various parts of it is dangerous due to
endianness considerations, or is endianness only relevant to the
represented *value* of the multi-byte integer as a whole? fread doesn't
care about that: it writes three bytes into the address of the long int,
starting at the lowest-positioned byte, but would the shifting/masking
be portable? For example a multi-byte integer constant 0x1234 has a
most-significant byte of value 0x12, but on a big-endian machine would
be stored on the *lowest* memory address of the space it takes up. As
such, the mask required to leave only the *lowest* 6 bits of a 32-bit
integer could be either 0x3F000000 or 0x0000003F depending on
endianness, right? Or are hexadecimal integer constants always stored
as-is? That is, the lowest byte is positioned last in an integer
constant instead of the least significant byte positioned last? This
seems counter-intuitive.
If neither of these options is good, is there another way?
Thanks in advance,
Thomas