"Interesting" C behaviours

R

Rennie deGraaf

In the last few days, I have discovered two "interesting" behaviours of
the C language, both of which are apparently correct. Could someone
please explain the reasoning behind them?

1. The operators '^', '&', and '|' have lower precedance than '==',
'!=', '>=", etc. I discovered this when the statement "if (array1 ^
array2 == 0xff)" failed to do what I expected. To me, it doesn't
make any sense to give the bitwise operators lower precedance than
comparators. I can't see any situation where someone would want to
perform a bitwise operation on a truth value, but that is what the
language specifies for the above expression.

2. In C99, the expression 'a%b' where a<0 and b>0 will return a negative
result (K&R lets this be machine dependent). While it is technically
correct (-1 is congruent to 6 mod 7), it isn't exactly what most people
expect. Mathematically, the congruence classes modulo n are usually
expressed as [0], [1], ... [n-1]. The value -1 would be a member of the
congruence class [n-1]. So, when I'm performing certain operations, I
have to check if a number is negative, and if so, add the modulus to the
residue to get a sensible result. (For example, when subtracting two
struct timevals, and passing the result to select(). select() will barf
if the tv_usec field is negative, at least on Linux, so I have to set it
to (a.tv_usec-b.tv_usec)%1000000, and then add 1000000 if the result is
negative.)

Rennie
 
C

Chris Torek

In the last few days, I have discovered two "interesting" behaviours of
the C language, both of which are apparently correct. Could someone
please explain the reasoning behind them?

1. The operators '^', '&', and '|' have lower precedance than '==',
'!=', '>=", etc. I discovered this when the statement "if (array1 ^
array2 == 0xff)" failed to do what I expected. To me, it doesn't
make any sense to give the bitwise operators lower precedance than
comparators. I can't see any situation where someone would want to
perform a bitwise operation on a truth value, but that is what the
language specifies for the above expression.


Dennis Ritchie has noted that, in Primeval C, there were no "&&" and
"||" operators at all, and the single "&" and single "|" operators
were overloaded, so that:

result = f() | g();

would call both f() and g(), and bitwise-OR the results together to
store in the variable "result", while:

if (f() | g())

would call f() first, and if the result was nonzero, would omit the
call to g() entirely (i.e., what || does now). Likewise:

result = f() & g();

always called both, while:

if (f() & g())

called g() only if f() returned a nonzero value. Stopping as soon
as the result is known is called "short-circuit behavior". Presumably
the operators' priorities were set based on the short-circuit
versions, which probably occurred more often.

At some point, it was deemed bogus to have a single operator for
two entirely separate meanings, so the short-circuit logical
operators were separated out into new && and || operators. But by
then there were perhaps a few dozens of kilobytes :) of source
code, so the operator parsing was left unchanged.
2. In C99, the expression 'a%b' where a<0 and b>0 will return a negative
result (K&R lets this be machine dependent). While it is technically
correct (-1 is congruent to 6 mod 7), it isn't exactly what most people
expect.

I think you have overly high expectations of "most people" here. :)
Mathematically, the congruence classes modulo n are usually
expressed as [0], [1], ... [n-1].

True; but the "%" operator is really the "remainder after division"
operator. The goal is to have (a/b + a%b) == a, whenever b != 0.
To have ((-3) % 7) == 4, we would have to have ((-3) / 7) == -1),
but 0 is the most common result today for this division. Thus,
machines with "remainder after divide" instructions mostly produce
-3 here, and machines without such an instruction require computing
(a % b) via (a - (a / b) * b) in the first place.
 
M

Malcolm

Rennie deGraaf said:
In the last few days, I have discovered two "interesting" behaviours of
the C language, both of which are apparently correct. Could someone
please explain the reasoning behind them?

1. The operators '^', '&', and '|' have lower precedance than '==',
'!=', '>=", etc.
You'd have to ask K and R about this. The problem is that the language was
designed by two people in a back room somewhere, and once set, the
precedence rules are very difficult to change.
2. In C99, the expression 'a%b' where a<0 and b>0 will return a negative
result (K&R lets this be machine dependent).
That would probably be driven by architecture. The idea is that a % b would
compile to a single machine instruction (on K and R's original platform).
Just a guess, but I suspect this is the motivation.
 
R

Rennie deGraaf

Malcolm said:
That would probably be driven by architecture. The idea is that a % b would
compile to a single machine instruction (on K and R's original platform).
Just a guess, but I suspect this is the motivation.

It was architecture dependent in K&R, but C99 standardized it (see
http://home.tiscalinet.ch/t_wolf/tw/c/c9x_changes.html#Semantics,
section 25). I know that the x86 idiv instruction spits out negative
residues, but is the fact that a common architecture does something
unusual a reason to standardize a programming language on an unusual
behaviour? It would make more sense to me to work around the
architecture when it does something weird.

GCC, for instance, frequently doesn't even compile a%b to a single idiv
instruction - it compiles it to a big mess of imul, shift, and leal
instructions.

Rennie
 
E

Erik Trulsson

Rennie deGraaf said:
In the last few days, I have discovered two "interesting" behaviours of
the C language, both of which are apparently correct. Could someone
please explain the reasoning behind them?

1. The operators '^', '&', and '|' have lower precedance than '==',
'!=', '>=", etc. I discovered this when the statement "if (array1 ^
array2 == 0xff)" failed to do what I expected. To me, it doesn't
make any sense to give the bitwise operators lower precedance than
comparators. I can't see any situation where someone would want to
perform a bitwise operation on a truth value, but that is what the
language specifies for the above expression.


I believe it is mostly historical baggage from the earliest C
compilers.

2. In C99, the expression 'a%b' where a<0 and b>0 will return a negative
result (K&R lets this be machine dependent). While it is technically
correct (-1 is congruent to 6 mod 7), it isn't exactly what most people
expect. Mathematically, the congruence classes modulo n are usually
expressed as [0], [1], ... [n-1]. The value -1 would be a member of the
congruence class [n-1]. So, when I'm performing certain operations, I
have to check if a number is negative, and if so, add the modulus to the
residue to get a sensible result. (For example, when subtracting two
struct timevals, and passing the result to select(). select() will barf
if the tv_usec field is negative, at least on Linux, so I have to set it
to (a.tv_usec-b.tv_usec)%1000000, and then add 1000000 if the result is
negative.)

It should always be true that (a/b)*b+(a%b) == a, if b is non-zero.
When you have negative values involved the behaviour of the '%'
operator therefore depends on the behaviour of the '/' operator.
In C89 it was implementation-dependent which way '/' truncated when it
had negative operands. In C99 this was fixed as being towards zero.
I believe this change was made for compatibility with Fortran (which
has always done it the same way as C99 does.)
 
C

Christian Bau

Rennie deGraaf said:
GCC, for instance, frequently doesn't even compile a%b to a single idiv
instruction - it compiles it to a big mess of imul, shift, and leal
instructions.

Because it is faster.
 
G

glen herrmannsfeldt

Erik said:

It is what most hardware designers expect.

I do it by adding the modulus and an additional %,

a=(x%y+y)%y;

It is unknown if the branch penalty is more or less than the
cost of the extra divide. It is less typing, anyway.
It should always be true that (a/b)*b+(a%b) == a, if b is non-zero.
When you have negative values involved the behaviour of the '%'
operator therefore depends on the behaviour of the '/' operator.
In C89 it was implementation-dependent which way '/' truncated when it
had negative operands. In C99 this was fixed as being towards zero.
I believe this change was made for compatibility with Fortran (which
has always done it the same way as C99 does.)

Well, there is a chicken and egg problem. Pretty much all
twos complement machines do it that way. I am not sure at
all what ones complement machines do. It might be, though,
that machines do it that way because Fortran does it that
way. From the Fortran 66 standard:

"The function MOD or AMOD(a1,a2) is defined as a1-[a1/a2]*a2
where [x] is the integer whose magnitude does not exceed
the magnitude of x, and whose sign is the same as x."

Note that many early Fortran machines were sign magnitude
or ones complement machines, and that C89 supports both
sign magnitude and ones complement arithmetic.
(I am not sure about C99.)

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top