Noob question about the >> operator

M

Michael Barry

Hello, everyone. As an inexperienced/rusty user of C, I have come across a question for you all.

Does C's >> operator shift in zeros or sign-bits in the compilers with which you are familiar? For example:

Code:
/* let's assume that unsigned is 32-bits wide here */
unsigned a = 0x80000000U;
printf("%X", a >> 3);

Should I expect 10000000 or F0000000 to be printed?

I seem to remember reading somewhere that this behavior of >> is compiler-dependent, and I will be using gcc for my 65m32 simulator. I can spend a bitmore effort to explicitly deal with the MSBs in my code, should it turn out to be a potential portability issue.

Thanks,

Mike
 
K

Kaz Kylheku

Hello, everyone. As an inexperienced/rusty user of C, I have come across a question for you all.

Does C's >> operator shift in zeros or sign-bits in the compilers with which
you are familiar? For example:

The C99 standard has this wording, in 6.5.7 Bitwise shift operators

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1 /
2E2 . If E1 has a signed type and a negative value, the resulting
value is implementation-defined.

In practice, two's complement machines repeat the sign bit.

If you don't want to rely on that, consider this: division is now required to
be truncating toward zero. This means that we can obtain truncating toward
negative infinity semantics by subtracting 1. And that is equivalent to
shifting with a sign extension.

Example:

For instance -5, which is the two's complement bit pattern ...11011,
when divided by two, is required to yield -2 (truncation toward zero).
And that pattern is ...11110: not the pattern we want in terms of
a right shift.

However, if we subtract 1 first, then divide, it works out.
...11011 (-5) goes to ...11010 (-6), which shifts to -3 ..11101

Thus we can do something like this:

int sign_extending_right_shift(int x)
{
if (x < 0)
return (x - 1) / 2;
return x >> 1;
}
Code:
/* let's assume that unsigned is 32-bits wide here */
unsigned a = 0x80000000U;
printf("%X", a >> 3);

This doesn't match your question. Unsigned types have no sign bit.

They always shift in a zero.

That's what the "un" means in front of "signed" in "unsigned".
 
D

David Brown

Hello, everyone. As an inexperienced/rusty user of C, I have come
across a question for you all.

Does C's >> operator shift in zeros or sign-bits in the compilers
with which you are familiar? For example:

Code:
 /* let's assume that unsigned is 32-bits wide here */ unsigned
a = 0x80000000U; printf("%X", a >> 3);

Should I expect 10000000 or F0000000 to be printed?

I seem to remember reading somewhere that this behavior of >> is
compiler-dependent, and I will be using gcc for my 65m32 simulator. I
can spend a bit more effort to explicitly deal with the MSBs in my
code, should it turn out to be a potential portability issue.

Thanks,

Mike

Kaz has given quite a complete answer. But the easy answer is to stick
to unsigned data when shifting (as you are doing here) - then there is
no sign bit to worry about. For left shifts and other bit manipulation
(&, |, ~, ^) you should usually use unsigned data - preferably
length-specific types like uint32_t. Without signs, everything is
clearly defined in the standards to work as you would expect.

If you really need to do shifts with signed integers, you can use tricks
like Kaz posted. Alternatively, you can cast your signed data into
unsigned, do the shifts unsigned, and cast back. The details of this
sort of thing is "implementation defined" in the standards - so the
exact results can vary between compilers and targets. In practice,
however, if you are sticking to "normal" processors then everything
works as you might expect using twos-compliment.

If you are thinking - as some "noobs" do - that it might be a good idea
to use shifts for "optimising" multiplies and divides by powers of two,
then don't do that. If you want to divide "a" by 8, write "a / 8" and
let the compiler figure out the most efficient implementation.


Here would be one way of being sure of signed shifts that I believe will
be correct (at least on "normal" processors with two's compliment
arithmetic, and support for the given integer sizes):


// Note that when pre-processing, the same rules should apply as
// during runtime for the target

#if (-1 >> 2) < 0
// Right-shift duplicates the sign bit ("arithmetic right shift")

static inline int32_t sign_extending_right_shift(int32_t x) {
return x >> 1;
}

static inline int32_t zero_extending_right_shift(int32_t x) {
return (int32_t) (((uint32_t) x) >> 1);
}

#else
// Right-shift zeros the sign bit ("logical right shift")

static inline int32_t sign_extending_right_shift(int32_t x) {
if (x < 0) return (x - 1) / 2;
return x >> 1;
}

static inline int32_t zero_extending_right_shift(int32_t x) {
return x >> 1;
}

#endif
 
B

BartC

Michael Barry said:
Hello, everyone. As an inexperienced/rusty user of C, I have come across a
question for you all.

Does C's >> operator shift in zeros or sign-bits in the compilers with
which you are familiar? For example:

Others have already answered that. But if you're new to C, watch out for the
strange precedence of the << and >> operators. For example, you might think
that a<<b+c means (a<<b)+c, but it doesn't!
 
M

Michael Barry

Thanks to all for your prompt and polite replies.

@Bart: Thanks for the extra heads-up about operator precedence. I have a feeling that my source is going to contain a lot of unnecessary casts and extra parentheses, which will undoubtedly be optimized away by gcc, but still show my noobness to all who care to read the source.

A follow-up question, drifting a bit OT: On a hypothetical naked CPU, assuming that there is not enough opcode space for both types of shifts (signedand unsigned), which group is more useful to keep? In other words, is it 'less-expensive' to convert an unsigned shift to a signed, or vice-versa? I know that I could probably figure it out for myself with a few test-cases, but I thought that it wouldn't hurt to ask you nice folks for your opinions.

Mike
 
E

Eric Sosman

Thanks to all for your prompt and polite replies.

@Bart: Thanks for the extra heads-up about operator precedence. I have a feeling that my source is going to contain a lot of unnecessary casts and extra parentheses, which will undoubtedly be optimized away by gcc, but still show my noobness to all who care to read the source.

Try to avoid unnecessary casts, or at least keep them down
to a dull roar. The problem is that a cast operator often tells
a compiler not to emit a warning (or error!) diagnostic that it
otherwise would have generated; the compiler will frequently read
a cast as the programmer's claim that "I know what I'm doing here."
If the cast is truly unnecessary there won't be any such diagnostic,
but if you get into the habit of throwing casts around (casting
them around?), sooner or later you'll find that you've silenced
some useful advice.

Parentheses are another matter: Unlike casts they are not
operators, but separators and groupers. They can certainly be
overdone, but they can also be underdone -- especially if you
find yourself working in multiple languages whose precedence rules
are subtly different. (The inventor of C admitted that there'd
been some poor choices in C's precedence rules, but that with
"several hundred kilobytes of source code and maybe 3 installations"
it was too late to change things.)
A follow-up question, drifting a bit OT: On a hypothetical naked CPU, assuming that there is not enough opcode space for both types of shifts (signed and unsigned), which group is more useful to keep? In other words, is it 'less-expensive' to convert an unsigned shift to a signed, or vice-versa? I know that I could probably figure it out for myself with a few test-cases, but I thought that it wouldn't hurt to ask you nice folks for your opinions.

Mu.
 
G

glen herrmannsfeldt

(snip)
A follow-up question, drifting a bit OT: On a hypothetical naked
CPU, assuming that there is not enough opcode space for both types
of shifts (signed and unsigned), which group is more useful to keep?

Well, first note that the C standard requires unsigned to shift in
zeros, but doesn't restrict signed in the same way.

Partly that is because C allows for ones complement or sign magnitude
representation for signed integer types, in addition to the more
common twos complement. The numerical effect of shifts, or the
proper arithmetic shift, depend on the representation.

But a CPU so restricted likely only has one bit shifts, where a
loop is required for multiple bit shifts. Many early CPUs had no
multiply or divide instruction, but add/subtract and a special shift
(or sometimes rotate) instruction to make software multiply and
divide easier. Often there was an accumulator and MQ (multiplicand/
quotient) registers, the latter convenient for the use it was named
for.
In other words, is it 'less-expensive' to convert an unsigned shift
to a signed, or vice-versa? I know that I could probably figure it
out for myself with a few test-cases, but I thought that it wouldn't
hurt to ask you nice folks for your opinions.

Not so many years ago, I was working with Dynamic-C (the name of
the compiler, it isn't actually so dynamic) and the Rabbit processor
(similar to Z280, descendant of the 8080 and Z80). Seems that there
are no signed 16 bit instructions, so 16 bit compare is done by
complementing the sign big of both operands, doing an unsigned compare
and, if necessary, uncomplementing.

But there could be processors with signed, but not unsigned, shift
instructions.

-- glen
 
J

Jorgen Grahn

Hello, everyone. As an inexperienced/rusty user of C, I have come
across a question for you all.
Does C's >> operator shift in zeros or sign-bits in the compilers
with which you are familiar?

I have no idea -- I avoid using it with signed types! If I had to
know in order to fix a bug in someone else's code, I'd write a test
program to find out.

/Jorgen
 
T

Tim Rentsch

Kaz Kylheku said:
The C99 standard has this wording, in 6.5.7 Bitwise shift operators

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If
E1 has an unsigned type or if E1 has a signed type and a
nonnegative value, the value of the result is the integral part
of the quotient of E1 / 2E2 . If E1 has a signed type and a
negative value, the resulting value is implementation-defined.

In practice, two's complement machines repeat the sign bit.

If you don't want to rely on that, consider this: division is now
required to be truncating toward zero. This means that we can
obtain truncating toward negative infinity semantics by
subtracting 1. And that is equivalent to shifting with a sign
extension.

Example:

For instance -5, which is the two's complement bit pattern ...11011,
when divided by two, is required to yield -2 (truncation toward
zero). And that pattern is ...11110: not the pattern we want in
terms of a right shift.

However, if we subtract 1 first, then divide, it works out.
..11011 (-5) goes to ...11010 (-6), which shifts to -3 ..11101

Thus we can do something like this:

int sign_extending_right_shift(int x)
{
if (x < 0)
return (x - 1) / 2;
return x >> 1;
}

It would be better not to rely on C99 division rules and
not to risk undefined behavior:

int sign_extending_right_shift(int x)
{
return (x - !!(x%2)) / 2;
}
 
T

Tim Rentsch

David Brown said:
If you really need to do shifts with signed integers, you can use
tricks like Kaz posted. Alternatively, you can cast your signed
data into unsigned, do the shifts unsigned, and cast back. The
details of this sort of thing is "implementation defined" in the
standards - so the exact results can vary between compilers and
targets. In practice, however, if you are sticking to "normal"
processors then everything works as you might expect using
twos-compliment.

complement, not compliment.
If you are thinking - as some "noobs" do - that it might be a good
idea to use shifts for "optimising" multiplies and divides by
powers of two, then don't do that. If you want to divide "a" by
8, write "a / 8" and let the compiler figure out the most
efficient implementation.

Here would be one way of being sure of signed shifts that I
believe will be correct (at least on "normal" processors with
two's compliment arithmetic, and support for the given integer
sizes):

// Note that when pre-processing, the same rules should apply as
// during runtime for the target

#if (-1 >> 2) < 0
// Right-shift duplicates the sign bit ("arithmetic right shift")

static inline int32_t sign_extending_right_shift(int32_t x) {
return x >> 1;
}

static inline int32_t zero_extending_right_shift(int32_t x) {
return (int32_t) (((uint32_t) x) >> 1);
}

#else
// Right-shift zeros the sign bit ("logical right shift")

static inline int32_t sign_extending_right_shift(int32_t x) {
if (x < 0) return (x - 1) / 2;
return x >> 1;
}

static inline int32_t zero_extending_right_shift(int32_t x) {
return x >> 1;
}

#endif

Better (using 'int' rather than the platform-dependent 'int32_t'):

static inline int sign_extending_right_shift(int x){
return (x - !!(x%2)) / 2;
}

static inline int zero_extending_right_shift(int x){
typedef union { unsigned u; int i; } ui;
return (ui){ .u = (ui){ .i = x }.u >> 1 }.i;
}

No platform-dependent preprocessor tests needed. Also the
second function shifts the actual bits in machines that
use signed magnitude or ones complement.
 
D

David Brown

complement, not compliment.

Spell chequers can't catch /all/ mistakes :)
Better (using 'int' rather than the platform-dependent 'int32_t'):

When you are dealing with shifts and other bit manipulation, you almost
certainly know the sizes of the data you are working with. It is
(normally) better to be explicit about that sort of thing - then any
assumptions you make are clear in the text.

"int32_t" is only "platform-dependent" in that theoretically, there may
be platforms that don't support it. I say "theoretically", because
there are no normal modern processors that don't support it. (There are
a few DSP's that don't support the 8-bit or 16-bit types, but they are
niche devices.)

"int", on the other hand, is totally "platform-dependent" and there are
many real-world systems where "int" is not 32-bit.

It is your own choice, of course, but I strongly prefer to use
explicitly sized types when the size matters.
static inline int sign_extending_right_shift(int x){
return (x - !!(x%2)) / 2;
}

Yes, that works without needing a platform-dependent preprocessor test -
but it is bigger, uglier, slower and harder to understand than either of
the platform-specific versions I gave. If you have some reason to
dislike having such preprocessor tests and multiple choices of
implementation (they are arguably a "hack"), then turn the check into a
static assert and just use the first versions of the functions. They
will work fine on virtually all useful platforms, and the static assert
will give you a compile-time error rather than weird run-time behaviour
in the rare cases that the assumption is not true.
static inline int zero_extending_right_shift(int x){
typedef union { unsigned u; int i; } ui;
return (ui){ .u = (ui){ .i = x }.u >> 1 }.i;
}

That is directly equivalent to the version I gave using casting (and I
agree it will work on either type of platform). But some compilers
might do messy things when implementing your version, such as allocating
the union on the stack. Even the most brain-dead compiler with
optimisations disabled will implement the simple cast version directly.

I know optimisation is not always vital - and it is /always/ secondary
to correctness - but code like this often needs to be optimal if it is
going to be useful. It is particularly common in embedded programming,
where "small and fast" means "cheap".
 
D

David Brown

It would be better not to rely on C99 division rules and
not to risk undefined behavior:

int sign_extending_right_shift(int x)
{
return (x - !!(x%2)) / 2;
}

As far as I understand it, the rules for "%" are defined in terms of the
rules for division. So your version still relies on those same rules.

From N1570 (C11 near-final draft):


6.5.5 Multiplicative operators
Syntax
1
multiplicative-expression:
cast-expression
multiplicative-expression * cast-expression
multiplicative-expression / cast-expression
multiplicative-expression % cast-expression
Constraints
2
Each of the operands shall have arithmetic type. The operands of the %
operator shall
have integer type.
Semantics
3 The usual arithmetic conversions are performed on the operands.
4 The result of the binary * operator is the product of the operands.
5 The result of the / operator is the quotient from the division of the
first operand by the
second; the result of the % operator is the remainder. In both
operations, if the value of
the second operand is zero, the behavior is undefined.
6 When integers are divided, the result of the / operator is the
algebraic quotient with any
fractional part discarded.105) If the quotient a/b is representable,
the expression
(a/b)*b + a%b shall equal a; otherwise, the behavior of both a/b and a%b is
undefined.
 
T

Tim Rentsch

David Brown said:
It would be better not to rely on C99 division rules and
not to risk undefined behavior:

int sign_extending_right_shift(int x)
{
return (x - !!(x%2)) / 2;
}

As far as I understand it, the rules for "%" are defined in terms
of the rules for division. So your version still relies on those
same rules.

[snip quoted excerpt of section 6.5.5]

Great! But now look at the code again and see if you can
understand why it works under both the C99 rules and the more
permissive C90 rules. The code I gave does rely on the rules for
remainder (and so indirectly division), but it does not rely on
the more restrictive C99 rules. That's a key distinction.
 
D

David Brown

David Brown said:
Hello, everyone. As an inexperienced/rusty user of C, I have come
across a question for you all.

Does C's >> operator shift in zeros or sign-bits in the compilers
with which you are familiar? For example:

The C99 standard has this wording, in 6.5.7 Bitwise shift operators

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If
E1 has an unsigned type or if E1 has a signed type and a
nonnegative value, the value of the result is the integral part
of the quotient of E1 / 2E2 . If E1 has a signed type and a
negative value, the resulting value is implementation-defined.

In practice, two's complement machines repeat the sign bit.

If you don't want to rely on that, consider this: division is now
required to be truncating toward zero. This means that we can
obtain truncating toward negative infinity semantics by
subtracting 1. And that is equivalent to shifting with a sign
extension.

Example:

For instance -5, which is the two's complement bit pattern ...11011,
when divided by two, is required to yield -2 (truncation toward
zero). And that pattern is ...11110: not the pattern we want in
terms of a right shift.

However, if we subtract 1 first, then divide, it works out.
..11011 (-5) goes to ...11010 (-6), which shifts to -3 ..11101

Thus we can do something like this:

int sign_extending_right_shift(int x)
{
if (x < 0)
return (x - 1) / 2;
return x >> 1;
}

It would be better not to rely on C99 division rules and
not to risk undefined behavior:

int sign_extending_right_shift(int x)
{
return (x - !!(x%2)) / 2;
}

As far as I understand it, the rules for "%" are defined in terms
of the rules for division. So your version still relies on those
same rules.

[snip quoted excerpt of section 6.5.5]

Great! But now look at the code again and see if you can
understand why it works under both the C99 rules and the more
permissive C90 rules. The code I gave does rely on the rules for
remainder (and so indirectly division), but it does not rely on
the more restrictive C99 rules. That's a key distinction.

Fair enough. But the very fact that this is difficult to spot counts
against writing the code in this way. When you have the choice, a
clearer and simpler solution is the best.

On the other hand, gcc (with -O) on amd64 spots that your version is
equivalent to "x >> 1", and generates optimal code. The conditional
version generates conditional code (at least with my somewhat outdated
gcc). And when you have a choice, something that produces optimal code
is always better.


My preference:

#if (-1 >> 2) < 0
// Right-shift duplicates the sign bit ("arithmetic right shift")

static inline int32_t sign_extending_right_shift(int32_t x) {
return x >> 1;
}

#else
// Right-shift zeros the sign bit ("logical right shift")
#error You've found one of the most awkward C compilers on the planet.
#endif

If you want to substitute your version of the function instead of a
compile-time error message, that's fine - but I would not consider it
acceptable code without a hefty comment section explaining how it works.

(I don't mean to say that there is no place for "clever" code like this
- just that it should only be used when needed, and it must be
clarified. It's the kind of thing you might put in a library but not in
application code.)
 
T

Tim Rentsch

David Brown said:
Better (using 'int' rather than the platform-dependent 'int32_t'):

When you are dealing with shifts and other bit manipulation, you
almost certainly know the sizes of the data you are working with.
[snip elaboration re int vs int32_t]

By focusing on this incidental distinction you are missing
the point. The examples I gave apply to any integer type
(adjusting various declarations appropriately), including
int32_t. However, because the point is about writing code
that will work on all platforms, including those where
int32_t cannot be provided, the examples use int instead
of int32_t. How to make these available with explicit
widths is a separate discussion - possibly an important
discussion, but nevertheless a separate discussion.
Yes, that works without needing a platform-dependent preprocessor
test - but it is bigger, uglier, slower and harder to understand
than either of the platform-specific versions I gave.

If you had bothered to try actually compiling it, as I did, I
believe you would find my version generates code that is just as
fast as the smaller of your two cases, and faster than the other,
on platforms of interest (I tried two).

Minor point: your statement that this way is "bigger" is simply
false. My version has one line of function body. Your version
has either one or two, or a total of three, and that is not
counting the preprocessor statements to choose between the two
alternatives. If we take into account the other needed overhead
then your version is much bigger.
If you have
some reason to dislike having such preprocessor tests and multiple
choices of implementation (they are arguably a "hack"), then turn
the check into a static assert and just use the first versions of
the functions. They will work fine on virtually all useful
platforms, and the static assert will give you a compile-time
error rather than weird run-time behaviour in the rare cases that
the assumption is not true.

My version has no need of either preprocessor tests or static
asserts, and works in all conforming implementations. Given
that it is just as fast or faster than the larger and less
general version, I see no reason to prefer the latter.
That is directly equivalent to the version I gave using casting
(and I agree it will work on either type of platform).

No, it isn't. The casting version works correctly only
on platforms that use twos complement. This version
doesn't have that restriction.
But some
compilers might do messy things when implementing your version,
such as allocating the union on the stack. Even the most
brain-dead compiler with optimisations disabled will implement the
simple cast version directly.

This is a specious argument. No real work is done under such
conditions. Your original example uses 'inline', which means the
compiler was done after 1999. No contemporary compiler is going
to be lacking in such rudimentary transformations, and anyone
interested in good performance obviously is going to enable
them.
I know optimisation is not always vital - and it is /always/
secondary to correctness - but code like this often needs to be
optimal if it is going to be useful. It is particularly common in
embedded programming, where "small and fast" means "cheap".

The functions I wrote produced code that is just as fast
or faster than either of your alternate versions, using
the lowest levels of optimization available. That includes
test compiles targeting an embedded processor.
 
T

Tim Rentsch

David Brown said:
David Brown said:
On 07/02/14 01:56, Tim Rentsch wrote:

Hello, everyone. As an inexperienced/rusty user of C, I have come
across a question for you all.

Does C's >> operator shift in zeros or sign-bits in the compilers
with which you are familiar? For example:

The C99 standard has this wording, in 6.5.7 Bitwise shift operators

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If
E1 has an unsigned type or if E1 has a signed type and a
nonnegative value, the value of the result is the integral part
of the quotient of E1 / 2E2 . If E1 has a signed type and a
negative value, the resulting value is implementation-defined.

In practice, two's complement machines repeat the sign bit.

If you don't want to rely on that, consider this: division is now
required to be truncating toward zero. This means that we can
obtain truncating toward negative infinity semantics by
subtracting 1. And that is equivalent to shifting with a sign
extension.

Example:

For instance -5, which is the two's complement bit pattern ...11011,
when divided by two, is required to yield -2 (truncation toward
zero). And that pattern is ...11110: not the pattern we want in
terms of a right shift.

However, if we subtract 1 first, then divide, it works out.
..11011 (-5) goes to ...11010 (-6), which shifts to -3 ..11101

Thus we can do something like this:

int sign_extending_right_shift(int x)
{
if (x < 0)
return (x - 1) / 2;
return x >> 1;
}

It would be better not to rely on C99 division rules and
not to risk undefined behavior:

int sign_extending_right_shift(int x)
{
return (x - !!(x%2)) / 2;
}


As far as I understand it, the rules for "%" are defined in terms
of the rules for division. So your version still relies on those
same rules.

[snip quoted excerpt of section 6.5.5]

Great! But now look at the code again and see if you can
understand why it works under both the C99 rules and the more
permissive C90 rules. The code I gave does rely on the rules for
remainder (and so indirectly division), but it does not rely on
the more restrictive C99 rules. That's a key distinction.

Fair enough. But the very fact that this is difficult to spot
counts against writing the code in this way. When you have the
choice, a clearer and simpler solution is the best.

Other things being equal, clearer and simpler are better. But
here they are not equal.
On the other hand, gcc (with -O) on amd64 spots that your version
is equivalent to "x >> 1", and generates optimal code. The
conditional version generates conditional code (at least with my
somewhat outdated gcc). And when you have a choice, something
that produces optimal code is always better.

My preference:

#if (-1 >> 2) < 0
// Right-shift duplicates the sign bit ("arithmetic right shift")

static inline int32_t sign_extending_right_shift(int32_t x) {
return x >> 1;
}

#else
// Right-shift zeros the sign bit ("logical right shift")
#error You've found one of the most awkward C compilers on the planet.
#endif

I'm a bit confused by your statement here - you prefer a larger,
less general solution to a smaller, optimal, and completely
general one?

Incidentally, I recently have been working on code that has lots of
platform variability under control of #ifdef, etc. This results in
a heavy cost doing code maintenance. IMO the complexity of having
different versions far outweighs the complexity of the expression
in the (one and only) return statement.
If you want to substitute your version of the function instead of
a compile-time error message, that's fine - but I would not
consider it acceptable code without a hefty comment section
explaining how it works.

I agree, a comment would be good.
(I don't mean to say that there is no place for "clever" code like
this - just that it should only be used when needed, and it must
be clarified. It's the kind of thing you might put in a library
but not in application code.)

IMO the code with the preprocessor test is more "clever" than
the platform-independent version. Also the name is wrong,
because what you want is an "arithmetic shift" (ie, divide
rounding down), not a "sign extending shift", which has a
completely different meaning on signed magnitude platforms.
So which version is more "clever" is subjective.
 
D

David Brown

David Brown said:
On 07/02/14 01:56, Tim Rentsch wrote:

Hello, everyone. As an inexperienced/rusty user of C, I have come
across a question for you all.

Does C's >> operator shift in zeros or sign-bits in the compilers
with which you are familiar? For example:

The C99 standard has this wording, in 6.5.7 Bitwise shift operators

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If
E1 has an unsigned type or if E1 has a signed type and a
nonnegative value, the value of the result is the integral part
of the quotient of E1 / 2E2 . If E1 has a signed type and a
negative value, the resulting value is implementation-defined.

In practice, two's complement machines repeat the sign bit.

If you don't want to rely on that, consider this: division is now
required to be truncating toward zero. This means that we can
obtain truncating toward negative infinity semantics by
subtracting 1. And that is equivalent to shifting with a sign
extension.

Example:

For instance -5, which is the two's complement bit pattern ...11011,
when divided by two, is required to yield -2 (truncation toward
zero). And that pattern is ...11110: not the pattern we want in
terms of a right shift.

However, if we subtract 1 first, then divide, it works out.
..11011 (-5) goes to ...11010 (-6), which shifts to -3 ..11101

Thus we can do something like this:

int sign_extending_right_shift(int x)
{
if (x < 0)
return (x - 1) / 2;
return x >> 1;
}

It would be better not to rely on C99 division rules and
not to risk undefined behavior:

int sign_extending_right_shift(int x)
{
return (x - !!(x%2)) / 2;
}


As far as I understand it, the rules for "%" are defined in terms
of the rules for division. So your version still relies on those
same rules.

[snip quoted excerpt of section 6.5.5]

Great! But now look at the code again and see if you can
understand why it works under both the C99 rules and the more
permissive C90 rules. The code I gave does rely on the rules for
remainder (and so indirectly division), but it does not rely on
the more restrictive C99 rules. That's a key distinction.

Fair enough. But the very fact that this is difficult to spot
counts against writing the code in this way. When you have the
choice, a clearer and simpler solution is the best.

Other things being equal, clearer and simpler are better. But
here they are not equal.
On the other hand, gcc (with -O) on amd64 spots that your version
is equivalent to "x >> 1", and generates optimal code. The
conditional version generates conditional code (at least with my
somewhat outdated gcc). And when you have a choice, something
that produces optimal code is always better.

My preference:

#if (-1 >> 2) < 0
// Right-shift duplicates the sign bit ("arithmetic right shift")

static inline int32_t sign_extending_right_shift(int32_t x) {
return x >> 1;
}

#else
// Right-shift zeros the sign bit ("logical right shift")
#error You've found one of the most awkward C compilers on the planet.
#endif

I'm a bit confused by your statement here - you prefer a larger,
less general solution to a smaller, optimal, and completely
general one?

I prefer a simpler, clearer solution that works on all realistic
platforms, and will be optimal on all compilers and all choice of flag
settings (not every compiler is as smart as gcc, and some people don't
enable optimisation). The pre-processor check is just for paranoia, in
case the code is used on a particularly odd architecture - compile-time
error messages are much better than run-time mistakes.
Incidentally, I recently have been working on code that has lots of
platform variability under control of #ifdef, etc. This results in
a heavy cost doing code maintenance. IMO the complexity of having
different versions far outweighs the complexity of the expression
in the (one and only) return statement.

If the complex stuff is well enough hidden from "normal" users, and well
enough commented, then I agree with you.

A few #ifdef's for conditional compilation is okay, but there is no
doubt that it makes the code hard to work with if there is too much of
it. Sometimes the balance tips in favour of a complex general solution
rather than multiple simple expressions (especially if you can be sure
the compiler will optimise it well). Sometimes it is best with general
solutions even if they are sub-optimal. Sometimes it is best to have
separate files (headers or even C files) that handle each platform
individually, so that you only have a single #ifdef at the start of each
file - that makes it clear what code is actually used, but risks
duplication and the maintenance issues that follow.

I don't think there is any good, general solution to such issues. But
if the job were easy, everyone could do it :)
I agree, a comment would be good.


IMO the code with the preprocessor test is more "clever" than
the platform-independent version. Also the name is wrong,
because what you want is an "arithmetic shift" (ie, divide
rounding down), not a "sign extending shift", which has a
completely different meaning on signed magnitude platforms.
So which version is more "clever" is subjective.

I copied the function name from a previous post - I agree that
"arithmetic_right_shift" would be better.

I suppose any discussion about what is "clever" code is highly
subjective - it depends on the programmer's experience, and the style of
code they typically work with.

I guess we can agree that both methods work, and both methods should
have at least some comments to say /how/ they work. And we let the OP
choose whatever he prefers.
 
D

David Brown

David Brown said:
Here would be one way of being sure of signed shifts that I
believe will be correct (at least on "normal" processors with
two's compliment arithmetic, and support for the given integer
sizes):

// Note that when pre-processing, the same rules should apply as
// during runtime for the target

#if (-1 >> 2) < 0
// Right-shift duplicates the sign bit ("arithmetic right shift")

static inline int32_t sign_extending_right_shift(int32_t x) {
return x >> 1;
}

static inline int32_t zero_extending_right_shift(int32_t x) {
return (int32_t) (((uint32_t) x) >> 1);
}

#else
// Right-shift zeros the sign bit ("logical right shift")

static inline int32_t sign_extending_right_shift(int32_t x) {
if (x < 0) return (x - 1) / 2;
return x >> 1;
}

static inline int32_t zero_extending_right_shift(int32_t x) {
return x >> 1;
}

#endif

Better (using 'int' rather than the platform-dependent 'int32_t'):

When you are dealing with shifts and other bit manipulation, you
almost certainly know the sizes of the data you are working with.
[snip elaboration re int vs int32_t]

By focusing on this incidental distinction you are missing
the point. The examples I gave apply to any integer type
(adjusting various declarations appropriately), including
int32_t. However, because the point is about writing code
that will work on all platforms, including those where
int32_t cannot be provided, the examples use int instead
of int32_t. How to make these available with explicit
widths is a separate discussion - possibly an important
discussion, but nevertheless a separate discussion.

The versions I gave also work for all integer types in the same way.

When I write C, I aim for the code to work on different platforms
(unless the code is directly accessing target-specific details, which is
not uncommon for my embedded code). I try to write in a way that will
work as expected on realistic platforms, and will produce compile-time
errors if any of my assumptions do not hold. So if I want to work with
32-bit values, I use the platform-independent type "int32_t" (or perhaps
"int_least32_t", depending on the circumstances). I rarely use plain
"int", precisely because my code often needs to be portable across
different platforms. ("int" is usually synonymous with "int_fast16_t"
as far as I am concerned.)

(You may consider this a separate discussion, but you brought it up
here. But I will try not to add more on the topic in this thread.)
If you had bothered to try actually compiling it, as I did, I
believe you would find my version generates code that is just as
fast as the smaller of your two cases, and faster than the other,
on platforms of interest (I tried two).

I tried the code, and reached the same basic conclusion.

However, it is only as small and fast when optimisation is enabled (not
everyone enables optimisation - I think that is generally a mistake on
their part, but it is still true), and with a good compiler (I have used
many compilers that would not have a chance of optimising your code
here). And we only know that it is as fast as the plain "x >> 1"
version on platforms which implement "x >> 1" as an arithmetic right
shift - we don't know what code a compiler would generate on a platform
that implemented "x >> 1" as a logical right shift, because we don't
have easy access to such a platform (if one even exists).

So we know your version is never faster than mine on any platform we can
try, and that it is slower than mine on some platforms (I tried
compiling for the 8-bit AVR, using avr-gcc 4.5.1), and slower if
optimisation is disabled. We know nothing about the case for platforms
which use logical right shift.
Minor point: your statement that this way is "bigger" is simply
false. My version has one line of function body. Your version
has either one or two, or a total of three, and that is not
counting the preprocessor statements to choose between the two
alternatives. If we take into account the other needed overhead
then your version is much bigger.

I meant bigger object code, not source code (I can count source lines
too). But to be fair, your code is not bigger than mine on "typical"
platforms.
My version has no need of either preprocessor tests or static
asserts, and works in all conforming implementations. Given
that it is just as fast or faster than the larger and less
general version, I see no reason to prefer the latter.

See above. If your version were always as optimal, then I would mostly
agree. (I still think your version is less clear - but as I said in
another post, that is a subjective opinion.)
No, it isn't. The casting version works correctly only
on platforms that use twos complement. This version
doesn't have that restriction.

Would that be because a cast /could/ change the bit representation of
the data when converting from unsigned to signed? I am not sure if I am
getting the details of the standards correct here, but I believe a cast
between signed int and unsigned int will count as a "conversion". The
cast from signed to unsigned is well defined, and will work even on
one's complement systems (-1 converts to 0xffff ffff for 32-bit ints).
But conversion from unsigned 0xffff ffff back to -1 is apparently
implementation dependent. If that interpretation is correct, then
perhaps my version is implementation-dependent even for two's complement
systems. I'd appreciate a "ruling" on this point.

As far as I can see from the standards, your type-punning method should
be correct.

However, the casting method will work correctly on platforms that are
realistic for running new code (how many targets are /not/ two's
complement?).

Testing with gcc on amd64 and the AVR shows the casting version to be
optimal for all settings, and the union method is optimal when
optimisation is enabled and very sub-optimal when optimisation is
disabled. And if anyone is interested (I know this is almost blasphemy
in this group), the casting version is also valid in C++ while the union
version is not. (This is the fault of the C++ committee - designated
initialisers should have been included in C++.)
This is a specious argument. No real work is done under such
conditions. Your original example uses 'inline', which means the
compiler was done after 1999. No contemporary compiler is going
to be lacking in such rudimentary transformations, and anyone
interested in good performance obviously is going to enable
them.

Unfortunately, that is not true. In the world of embedded programming,
there are some very poor compilers around. For many reasons, it is
sometimes necessary to use very old tools even for new code - and there
are plenty of "modern" compilers that do very little in the way of
optimisation.
The functions I wrote produced code that is just as fast
or faster than either of your alternate versions, using
the lowest levels of optimization available. That includes
test compiles targeting an embedded processor.

I tested on an 8-bit AVR, which is quite a common embedded processor.
Admittedly it was not the latest gcc available, but it was approximately
the same version as for amd64 that I have conveniently on my system. (I
also tested for a 16-bit msp430, but as the results were the same as for
amd64, I didn't bother giving the details.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top