Implemenation Indepdent Way to Move LSByte of Char to MSB of Int, etc

Guest · Jan 12, 2005

What is the implementation independent way of moving the least significant
byte of unsigned char to the most significant byte of unsigned int?

And the least significant word (if not a word, have the preprocessor force
an error) of unsigned int to the most significant word of unsigned long?

Mike Wahler · Jan 12, 2005

no spam said:
What is the implementation independent way of moving the least significant
byte
of unsigned char

This is the easy part. All the character types have a size of
one byte by definition. So the least significant bye and
most significant byte are the same.

to the most significant byte of unsigned int?

unsigned char c = 42;
unsigned int i = 0;
*(unsigned char *)&i = c;

Note that this is only well defined for unsigned integer types;
signed integer types can have representations where putting
arbritrary bit patterns into them could cause e.g. a
'trap' representation.

And the least significant word (if not a word, have the preprocessor force
an error) of unsigned int to the most significant word of unsigned long?

C does not define the concept of 'word'.

-Mike

Mike Wahler · Jan 12, 2005

Mike Wahler said:
This is the easy part. All the character types have a size of
one byte by definition. So the least significant bye and
most significant byte are the same.

unsigned char c = 42;
unsigned int i = 0;
*(unsigned char *)&i = c;

This won't necessarily assign to the most significant byte
of the unsigned integer, it depends upon the underlying platform's
representation of the integer. The above stores the char value
in the first byte (in memory) of its representation.

This might or might not be what you want.

-Mike

Guest · Jan 12, 2005

Mike Wahler said:
This won't necessarily assign to the most significant byte
of the unsigned integer, it depends upon the underlying platform's
representation of the integer. The above stores the char value
in the first byte (in memory) of its representation.

This might or might not be what you want.

-Mike

Does C define the concept of "most significant" and "least significant"?

xarax · Jan 12, 2005

no spam said:
Does C define the concept of "most significant" and "least significant"?

Maybe something like this:

#include <stddef.h>
#include <limits.h>

void lsb_to_msb(unsigned int * out, unsigned char in)
{
const unsigned int shift = (CHAR_BIT * ((sizeof *out)-1));
const unsigned int mask = ~( ( (1u << CHAR_BIT) - 1 ) << shift);

*out &= mask; /* clear msb */
*out |= (in << shift); /* set msb */
}

Eric Sosman · Jan 12, 2005

no said:
What is the implementation independent way of moving the least significant
byte of unsigned char to the most significant byte of unsigned int?

#include <limits.h>
unsigned char uc = ...;
unsigned int ui;

/* Almost right: works on every machine I've ever
* run across, but is not actually guaranteed by
* the Standard
*/
ui = (unsigned int)uc << (CHAR_BIT * (sizeof ui - 1));

/* Best "completely portable" solution I've thought of;
* allows for padding bits in unsigned int
*/
ui = uc;
for (unsigned int n = UINT_MAX >> CHAR_BIT; n > 0; --n)
ui <<= 1;

Maybe there's a way to calculate (UINT_MAX + 1)/(UCHAR_MAX + 1)
without risking zero in the numerator and/or denominator, but I
haven't figured one out. (Note that UCHAR_MAX == ULLONG_MAX is
permitted by the Standard.)

And the least significant word (if not a word, have the preprocessor force
an error) of unsigned int to the most significant word of unsigned long?

What is a "word?" The C Standard uses the term mostly to
refer to its own content, twice to refer to "words in a line
of text," and once in connection with floating-point numbers;
it is never used in connection with an unsigned int.

S.Tobias · Jan 12, 2005

no spam said:
Does C define the concept of "most significant" and "least significant"?

No, or at least none that I know of. "Most/least significant byte"
concepts are taken from machine representation of multi-byte integers,
and integer operations in ISO-C are defined in terms of values
and mathematical operations, and they don't depend on any representation
(however we know that integers consist of bits that have a few specific
properties).
In an integer value bits might be scattered randomly throughout
the whole object mixed together with some padding bits; which byte
should be called MSB/LSB?

Lawrence Kirby · Jan 13, 2005

#include <limits.h>
unsigned char uc = ...;
unsigned int ui;

/* Almost right: works on every machine I've ever
* run across, but is not actually guaranteed by
* the Standard
*/
ui = (unsigned int)uc << (CHAR_BIT * (sizeof ui - 1));

/* Best "completely portable" solution I've thought of;
* allows for padding bits in unsigned int
*/
ui = uc;
for (unsigned int n = UINT_MAX >> CHAR_BIT; n > 0; --n)
ui <<= 1;

If sizeof(unsigned int) is 1 the right shift results in undefined
behaviour. You could something like

static int shift_width = -1;
unsigned char uc;
unsigned ui;

if (shift_width < 0) {
unsigned testbit = UCHAR_MAX + 1U;

for (shift_width = 0; testbit != 0; shift_width++, testbit <<= 1)
;
}

ui = (unsigned)uc << shift_width;

If you need to set the top byte in an existing unsigned int value

value = (value & ~((unsigned)UCHAR_MAX << shift_width)) | ui;

The mask value is also a constant that can be set up once.

Lawrence

Eric Sosman · Jan 13, 2005

Lawrence said:
If sizeof(unsigned int) is 1 the right shift results in undefined
behaviour.

Oh, drat. You're right: my attempt to be "completely
portable" merely traded one error for another.

Personally, I prefer the first of the two erroneous
forms as "less likely to get caught" ...

CBFalconer · Jan 13, 2005

Eric said:
Oh, drat. You're right: my attempt to be "completely
portable" merely traded one error for another.

Personally, I prefer the first of the two erroneous
forms as "less likely to get caught" ...

OTOH if we revise the specification to specify "8 bits" in place of
"byte" we can handle it in a portable manner:

#define UINT_BIT (CHAR_BIT * sizeof(unsigned int))

unsigned int ui;
unsigned char uc;

ui = (uc & 255) << (UINT_BIT - 8);

xarax · Jan 13, 2005

CBFalconer said:
OTOH if we revise the specification to specify "8 bits" in place of
"byte" we can handle it in a portable manner:

#define UINT_BIT (CHAR_BIT * sizeof(unsigned int))

unsigned int ui;
unsigned char uc;

ui = (uc & 255) << (UINT_BIT - 8);

You are presuming that CHAR_BIT == 8.

aegis · Jan 13, 2005

Lawrence said:
If sizeof(unsigned int) is 1 the right shift results in undefined
behaviour. You could something like

Why would N >> X cause undefined behavior?
where N is some object and X is a value equal in width
to that object.

static int shift_width = -1;
unsigned char uc;
unsigned ui;

if (shift_width < 0) {
unsigned testbit = UCHAR_MAX + 1U;

What if UCHAR_MAX == UINT_MAX? then you would get zero

for (shift_width = 0; testbit != 0; shift_width++, testbit <<= 1)
;

and then this condition would fail

}

ui = (unsigned)uc << shift_width;

and you would shift left by a negative one?
could you clarify this? shifting left by negative one
does not make sense to me.

If you need to set the top byte in an existing unsigned int value

if UINT_MAX == UCHAR_MAX then there is no top byte, right?
They would both be a single byte.

Eric Sosman · Jan 13, 2005

aegis said:
Why would N >> X cause undefined behavior?
where N is some object and X is a value equal in width
to that object.

Answer #1: Because the Standard says so, in section
6.5.7 paragraph 3.

Answer #2: Some machines' instruction sets are unable
to express shift amounts greater than the operand width.
For example, an instruction to left-shift a 32-bit value
by X bits might encode X in a five-bit field, making it
impossible to perform a 32-bit shift in one instruction.

Observation: Answer #2 is probably the motivation
behind Answer #1 ...

CBFalconer · Jan 14, 2005

xarax said:
You are presuming that CHAR_BIT == 8.

No I am not. Read the revised specification, and then the code.

Peter Nilsson · Jan 14, 2005

no said:
What is the implementation independent way of moving the least
significant byte of unsigned char to the most significant byte
of unsigned int?

There isn't one. The following will place an unsigned char in the
highest bits of an unsigned int...

#include <limits.h>

#define move_uc_to_umsb(u,uc) \
((((unsigned )(u )) & (-1u >> (CHAR_BIT - 1) >> 1) ) \
|(((unsigned char)(uc)) * ((-1u >> (CHAR_BIT - 1) >> 1) + 1)))

And the least significant word (if not a word, have the preprocessor
force an error) of unsigned int to the most significant word of
unsigned long?

Define what you mean by 'word'.

Whilst the above probably does what you want, it sounds like you're
trying to do something inherently implementation specific.

dandelion · Jan 14, 2005

Eric Sosman said:
Answer #1: Because the Standard says so, in section
6.5.7 paragraph 3.

Answer #2: Some machines' instruction sets are unable
to express shift amounts greater than the operand width.
For example, an instruction to left-shift a 32-bit value
by X bits might encode X in a five-bit field, making it
impossible to perform a 32-bit shift in one instruction.

Observation: Answer #2 is probably the motivation
behind Answer #1 ...

That and the fact that such a shift (in either direction) inevitably
results in 0, i presume. The operation seems pointless. which is
(i suspect) the motivation behind not supporting it for many HW
vendors.

I would not call it an "observation" though.

Lawrence Kirby · Jan 14, 2005

....

Why would N >> X cause undefined behavior?
where N is some object and X is a value equal in width
to that object.

As others have noted this is because the standard says so.

What if UCHAR_MAX == UINT_MAX? then you would get zero

and then this condition would fail

Which results in a final value of 0 for shift_width, which is
appropriate. The first expresison in a for () loop is always executed
once, when the loop is entered.

and you would shift left by a negative one? could you clarify this?
shifting left by negative one does not make sense to me.

Shifting by a negative value is an error. However that doesn't happen.

if UINT_MAX == UCHAR_MAX then there is no top byte, right? They would
both be a single byte.

Correct which means that the byte data is already in the correct place so
no shifting is required.

Lawrence

Allan Bruce · Jan 14, 2005

no spam said:
What is the implementation independent way of moving the least significant
byte of unsigned char to the most significant byte of unsigned int?

And the least significant word (if not a word, have the preprocessor force
an error) of unsigned int to the most significant word of unsigned long?

If you know how your platform stores the data or can determine it
programatically then you can convert from Big Endian to Little Endian (or
vice versa) with the following function:

void ReverseBytes(void *xbBytes, int xiNumBytes)
{
int loop
char *lT1 = (char *)xbBytes;
char *lT2 = (char *)xbBytes;
char lT3;
lT2 += xiNumBytes-1;

for (loop=0; loop<xiNumBytes/2; loop++)
{
lT3 = *lT2;
*lT2 = *lT1;
*lT1 = lT3;
lT1++;
lT2--;
}
}

Allan

pete · Jan 14, 2005

dandelion said:
That and the fact that such a shift (in either direction) inevitably
results in 0, i presume. The operation seems pointless. which is
(i suspect) the motivation behind not supporting it for many HW
vendors.

I would not call it an "observation" though.

One common form of undefined behavior
for the above mentioned 5 bit field,
is that for (u >> x), you wind up shifting by (x % 31) bits.

Eric Sosman · Jan 14, 2005

dandelion said:
That and the fact that such a shift (in either direction) inevitably
results in 0, i presume. The operation seems pointless. which is
(i suspect) the motivation behind not supporting it for many HW
vendors.

Shifting by zero bits is even more pointless, but that
doesn't seem to have prompted instruction-set designers to
omit the operation (unless they also omit all multi-bit
shifts; I've used machines whose only shift instructions
were single-bit shifts).

Bit positions in CPU instructions are usually a scarce
resource, because the machine can usually be made faster if
its instructions require less memory (it takes fewer cycles
to fetch and decode an instruction that occupies one word
than an instruction requiring three). Given the scarcity of
instruction bits, a designer faced with encoding a shift
distance that "should almost always" lie between 1..31 will
be unlikely to allocate a six-bit field; the "spare" bit can
probably be put to more effective use. And that, I think, is
the motivation for hardware ceilings on shift counts.

(Since the zero-bit shift also seems useless, I imagine a
designer might decide to use the opcode that resembles "shift
by zero" to denote some entirely different operation. I don't
know whether any have done so, but I imagine it might complicate
the instruction decode process and require a bunch of extra
silicon -- it's probably easier to allow the pointless zero-bit
shift than to detect it and recycle its code space for other
purposes.)

I would not call it an "observation" though.

All right, how about "conjecture?" Or would you prefer
"damfoolishness?"

Macro for setting MSB - Intended to work on both Little andBig-endian machines	0	Mar 26, 2013
Macro for setting MSB - Intended to work on both Little and Bigendian machines	8	Mar 26, 2013
Getting lengths of short, int, etc	16	Aug 28, 2010
Macro for setting MSB - Intended to work on both Little andBig-endian machines	0	Mar 26, 2013
Macro for setting MSB - Intended to work on both Little andBig-endian machines	16	Mar 26, 2013
Macro for setting MSB - Intended to work on both Little andBig-endian machines	0	Mar 26, 2013
Good way to write integer overflow checks?	66	Nov 9, 2013
Cross-platform way to pack (int + flags) to unsigned int	21	Jun 10, 2013

Implemenation Indepdent Way to Move LSByte of Char to MSB of Int, etc

Guest

Mike Wahler

Mike Wahler

Guest

xarax

Eric Sosman

S.Tobias

Lawrence Kirby

Eric Sosman

CBFalconer

xarax

aegis

Eric Sosman

CBFalconer

Peter Nilsson

dandelion

Lawrence Kirby

Allan Bruce

pete

Eric Sosman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads