Implemenation Indepdent Way to Move LSByte of Char to MSB of Int, etc

G

Guest

What is the implementation independent way of moving the least significant
byte of unsigned char to the most significant byte of unsigned int?

And the least significant word (if not a word, have the preprocessor force
an error) of unsigned int to the most significant word of unsigned long?
 
M

Mike Wahler

no spam said:
What is the implementation independent way of moving the least significant
byte
of unsigned char

This is the easy part. All the character types have a size of
one byte by definition. So the least significant bye and
most significant byte are the same.

to the most significant byte of unsigned int?

unsigned char c = 42;
unsigned int i = 0;
*(unsigned char *)&i = c;

Note that this is only well defined for unsigned integer types;
signed integer types can have representations where putting
arbritrary bit patterns into them could cause e.g. a
'trap' representation.
And the least significant word (if not a word, have the preprocessor force
an error) of unsigned int to the most significant word of unsigned long?

C does not define the concept of 'word'.

-Mike
 
M

Mike Wahler

Mike Wahler said:
This is the easy part. All the character types have a size of
one byte by definition. So the least significant bye and
most significant byte are the same.



unsigned char c = 42;
unsigned int i = 0;
*(unsigned char *)&i = c;

This won't necessarily assign to the most significant byte
of the unsigned integer, it depends upon the underlying platform's
representation of the integer. The above stores the char value
in the first byte (in memory) of its representation.

This might or might not be what you want.

-Mike
 
G

Guest

Mike Wahler said:
This won't necessarily assign to the most significant byte
of the unsigned integer, it depends upon the underlying platform's
representation of the integer. The above stores the char value
in the first byte (in memory) of its representation.

This might or might not be what you want.

-Mike
Does C define the concept of "most significant" and "least significant"?
 
X

xarax

no spam said:
Does C define the concept of "most significant" and "least significant"?

Maybe something like this:

#include <stddef.h>
#include <limits.h>

void lsb_to_msb(unsigned int * out, unsigned char in)
{
const unsigned int shift = (CHAR_BIT * ((sizeof *out)-1));
const unsigned int mask = ~( ( (1u << CHAR_BIT) - 1 ) << shift);

*out &= mask; /* clear msb */
*out |= (in << shift); /* set msb */
}
 
E

Eric Sosman

no said:
What is the implementation independent way of moving the least significant
byte of unsigned char to the most significant byte of unsigned int?

#include <limits.h>
unsigned char uc = ...;
unsigned int ui;

/* Almost right: works on every machine I've ever
* run across, but is not actually guaranteed by
* the Standard
*/
ui = (unsigned int)uc << (CHAR_BIT * (sizeof ui - 1));

/* Best "completely portable" solution I've thought of;
* allows for padding bits in unsigned int
*/
ui = uc;
for (unsigned int n = UINT_MAX >> CHAR_BIT; n > 0; --n)
ui <<= 1;

Maybe there's a way to calculate (UINT_MAX + 1)/(UCHAR_MAX + 1)
without risking zero in the numerator and/or denominator, but I
haven't figured one out. (Note that UCHAR_MAX == ULLONG_MAX is
permitted by the Standard.)
And the least significant word (if not a word, have the preprocessor force
an error) of unsigned int to the most significant word of unsigned long?

What is a "word?" The C Standard uses the term mostly to
refer to its own content, twice to refer to "words in a line
of text," and once in connection with floating-point numbers;
it is never used in connection with an unsigned int.
 
S

S.Tobias

no spam said:
Does C define the concept of "most significant" and "least significant"?

No, or at least none that I know of. "Most/least significant byte"
concepts are taken from machine representation of multi-byte integers,
and integer operations in ISO-C are defined in terms of values
and mathematical operations, and they don't depend on any representation
(however we know that integers consist of bits that have a few specific
properties).
In an integer value bits might be scattered randomly throughout
the whole object mixed together with some padding bits; which byte
should be called MSB/LSB?
 
L

Lawrence Kirby

#include <limits.h>
unsigned char uc = ...;
unsigned int ui;

/* Almost right: works on every machine I've ever
* run across, but is not actually guaranteed by
* the Standard
*/
ui = (unsigned int)uc << (CHAR_BIT * (sizeof ui - 1));

/* Best "completely portable" solution I've thought of;
* allows for padding bits in unsigned int
*/
ui = uc;
for (unsigned int n = UINT_MAX >> CHAR_BIT; n > 0; --n)
ui <<= 1;

If sizeof(unsigned int) is 1 the right shift results in undefined
behaviour. You could something like



static int shift_width = -1;
unsigned char uc;
unsigned ui;

if (shift_width < 0) {
unsigned testbit = UCHAR_MAX + 1U;

for (shift_width = 0; testbit != 0; shift_width++, testbit <<= 1)
;
}

ui = (unsigned)uc << shift_width;

If you need to set the top byte in an existing unsigned int value

value = (value & ~((unsigned)UCHAR_MAX << shift_width)) | ui;

The mask value is also a constant that can be set up once.

Lawrence
 
E

Eric Sosman

Lawrence said:
If sizeof(unsigned int) is 1 the right shift results in undefined
behaviour.

Oh, drat. You're right: my attempt to be "completely
portable" merely traded one error for another.

Personally, I prefer the first of the two erroneous
forms as "less likely to get caught" ...
 
C

CBFalconer

Eric said:
Oh, drat. You're right: my attempt to be "completely
portable" merely traded one error for another.

Personally, I prefer the first of the two erroneous
forms as "less likely to get caught" ...

OTOH if we revise the specification to specify "8 bits" in place of
"byte" we can handle it in a portable manner:

#define UINT_BIT (CHAR_BIT * sizeof(unsigned int))

unsigned int ui;
unsigned char uc;

ui = (uc & 255) << (UINT_BIT - 8);
 
X

xarax

CBFalconer said:
OTOH if we revise the specification to specify "8 bits" in place of
"byte" we can handle it in a portable manner:

#define UINT_BIT (CHAR_BIT * sizeof(unsigned int))

unsigned int ui;
unsigned char uc;

ui = (uc & 255) << (UINT_BIT - 8);

You are presuming that CHAR_BIT == 8.
 
A

aegis

Lawrence said:
If sizeof(unsigned int) is 1 the right shift results in undefined
behaviour. You could something like

Why would N >> X cause undefined behavior?
where N is some object and X is a value equal in width
to that object.
static int shift_width = -1;
unsigned char uc;
unsigned ui;

if (shift_width < 0) {
unsigned testbit = UCHAR_MAX + 1U;

What if UCHAR_MAX == UINT_MAX? then you would get zero
for (shift_width = 0; testbit != 0; shift_width++, testbit <<= 1)
;

and then this condition would fail
}

ui = (unsigned)uc << shift_width;

and you would shift left by a negative one?
could you clarify this? shifting left by negative one
does not make sense to me.

If you need to set the top byte in an existing unsigned int value

if UINT_MAX == UCHAR_MAX then there is no top byte, right?
They would both be a single byte.
 
E

Eric Sosman

aegis said:
Why would N >> X cause undefined behavior?
where N is some object and X is a value equal in width
to that object.

Answer #1: Because the Standard says so, in section
6.5.7 paragraph 3.

Answer #2: Some machines' instruction sets are unable
to express shift amounts greater than the operand width.
For example, an instruction to left-shift a 32-bit value
by X bits might encode X in a five-bit field, making it
impossible to perform a 32-bit shift in one instruction.

Observation: Answer #2 is probably the motivation
behind Answer #1 ...
 
P

Peter Nilsson

no said:
What is the implementation independent way of moving the least
significant byte of unsigned char to the most significant byte
of unsigned int?

There isn't one. The following will place an unsigned char in the
highest bits of an unsigned int...

#include <limits.h>

#define move_uc_to_umsb(u,uc) \
((((unsigned )(u )) & (-1u >> (CHAR_BIT - 1) >> 1) ) \
|(((unsigned char)(uc)) * ((-1u >> (CHAR_BIT - 1) >> 1) + 1)))
And the least significant word (if not a word, have the preprocessor
force an error) of unsigned int to the most significant word of
unsigned long?

Define what you mean by 'word'.

Whilst the above probably does what you want, it sounds like you're
trying to do something inherently implementation specific.
 
D

dandelion

Eric Sosman said:
Answer #1: Because the Standard says so, in section
6.5.7 paragraph 3.

Answer #2: Some machines' instruction sets are unable
to express shift amounts greater than the operand width.
For example, an instruction to left-shift a 32-bit value
by X bits might encode X in a five-bit field, making it
impossible to perform a 32-bit shift in one instruction.

Observation: Answer #2 is probably the motivation
behind Answer #1 ...

That and the fact that such a shift (in either direction) inevitably
results in 0, i presume. The operation seems pointless. which is
(i suspect) the motivation behind not supporting it for many HW
vendors.

I would not call it an "observation" though.
 
L

Lawrence Kirby

....


Why would N >> X cause undefined behavior?
where N is some object and X is a value equal in width
to that object.

As others have noted this is because the standard says so.
What if UCHAR_MAX == UINT_MAX? then you would get zero


and then this condition would fail

Which results in a final value of 0 for shift_width, which is
appropriate. The first expresison in a for () loop is always executed
once, when the loop is entered.
and you would shift left by a negative one? could you clarify this?
shifting left by negative one does not make sense to me.

Shifting by a negative value is an error. However that doesn't happen.
if UINT_MAX == UCHAR_MAX then there is no top byte, right? They would
both be a single byte.

Correct which means that the byte data is already in the correct place so
no shifting is required.

Lawrence
 
A

Allan Bruce

no spam said:
What is the implementation independent way of moving the least significant
byte of unsigned char to the most significant byte of unsigned int?

And the least significant word (if not a word, have the preprocessor force
an error) of unsigned int to the most significant word of unsigned long?

If you know how your platform stores the data or can determine it
programatically then you can convert from Big Endian to Little Endian (or
vice versa) with the following function:

void ReverseBytes(void *xbBytes, int xiNumBytes)
{
int loop
char *lT1 = (char *)xbBytes;
char *lT2 = (char *)xbBytes;
char lT3;
lT2 += xiNumBytes-1;

for (loop=0; loop<xiNumBytes/2; loop++)
{
lT3 = *lT2;
*lT2 = *lT1;
*lT1 = lT3;
lT1++;
lT2--;
}
}

Allan
 
P

pete

dandelion said:
That and the fact that such a shift (in either direction) inevitably
results in 0, i presume. The operation seems pointless. which is
(i suspect) the motivation behind not supporting it for many HW
vendors.

I would not call it an "observation" though.

One common form of undefined behavior
for the above mentioned 5 bit field,
is that for (u >> x), you wind up shifting by (x % 31) bits.
 
E

Eric Sosman

dandelion said:
That and the fact that such a shift (in either direction) inevitably
results in 0, i presume. The operation seems pointless. which is
(i suspect) the motivation behind not supporting it for many HW
vendors.

Shifting by zero bits is even more pointless, but that
doesn't seem to have prompted instruction-set designers to
omit the operation (unless they also omit all multi-bit
shifts; I've used machines whose only shift instructions
were single-bit shifts).

Bit positions in CPU instructions are usually a scarce
resource, because the machine can usually be made faster if
its instructions require less memory (it takes fewer cycles
to fetch and decode an instruction that occupies one word
than an instruction requiring three). Given the scarcity of
instruction bits, a designer faced with encoding a shift
distance that "should almost always" lie between 1..31 will
be unlikely to allocate a six-bit field; the "spare" bit can
probably be put to more effective use. And that, I think, is
the motivation for hardware ceilings on shift counts.

(Since the zero-bit shift also seems useless, I imagine a
designer might decide to use the opcode that resembles "shift
by zero" to denote some entirely different operation. I don't
know whether any have done so, but I imagine it might complicate
the instruction decode process and require a bunch of extra
silicon -- it's probably easier to allow the pointless zero-bit
shift than to detect it and recycle its code space for other
purposes.)
I would not call it an "observation" though.

All right, how about "conjecture?" Or would you prefer
"damfoolishness?"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top