Good way to write integer overflow checks?

  • Thread starter Alf P. Steinbach
  • Start date
R

Rosario1903

The code Rosario posted used unsigned integers (I'm assuming that's what
he meant with "u32"), so if I've interpreted the C++ standards
correctly, "a * b + c" cannot overflow as the operations are defined as
modulo 2^32, and so no undefined behaviour.

overflow can happen, even if in one machine, in a set of machine of 2
complement, there would be no UB
The cpu's overflow flag may
still be set.

"Overflow" as defined by C++ and "overflow" in a typical cpu flag
register are therefore somewhat orthogonal concepts, I think.

with "result in overflow" i mean, that result type can not hold the
full result [number] of one math operation
 
A

Alf P. Steinbach

it is the compiler that have to do that...
i think the easy form for a C or C++ language would be the follow:

int function(void)
{u32 a, b, r, s, cf;
a=0xFF; b=789799; r=7877;
makeCarryFlagTheLastStatement(&cf);
/* signal to the compiler cf var is the carry flag for overflow
in the last statement and initalize it to 0
*/
s=a*b+c;
if(cf==0) printf("Not carry flag the last statement\n");
else printf("There is one statement with overflow\n");
return 0;
}

where cf would detect integer overflow, unsigned overflow and float
point overflow

In very-much-pre-standard Borland's Turbo C++ version 2.01 a
pseudo-variable called _FLAGS was added to the language[1]. Accessing
this variable gave you the i8086 flags register value, presumably like
the value that would be pushed on the stack. Sorry, I don't remember the
details, and documentation is no longer readily available.

A similar but not processor-specific feature /could/ be standardized for
C++, because it would only have possible overhead (such as added
checking or foiled optimizations) where it was used.

So, your suggestion, and the once existing practice of _FLAGS, are two
pretty similar ways to provide overflow checking in an extended
language. A third way, library based, and I think less similar in the
basic idea, is to provide special "checked" types corresponding to the
built-in ones, but with checking. The C# language's `checked` keyword
can be regarded as a way to transparently replace the ordinary built-in
types with such checked types in a designated region of code.

Counting, we now have two main ideas, with a total of four detailed
variations.

Still, I was mainly asking about better ways (for some reasonable
definition of "better") to express the code I quoted. I think Öö Tiib's
final suggestion in this thread, of ...

template<typename T>
inline
bool can_enwiden(T const low, T const high, T delta)
{
return delta < 0 ? (std::numeric_limits<T>::min() - delta <= high
&& std::numeric_limits<T>::max() + delta >= low)
: (std::numeric_limits<T>::max() - delta >= high
&& std::numeric_limits<T>::min() + delta <= low);
}

is one such better way, for "better" defined as clarity with not too
much of a performance impact. However, since I didn't understand it up
front, but had to peer at it for a while, I think it needs some
reformulation -- and for the moment I'm keeping the original code.


Cheers, & thanks,

- Alf

Notes:
[1] <url: http://edn.embarcadero.com/article/20841>
[2] <url: news:[email protected]>
 
R

Rosario1903

The standards are quite specific regarding unsigned arithmetic -
arithmetic operations in C and C++ are defined modulo 2^n, and therefore
/cannot/ overflow

there is one mathematical overflow, it is not matter how machine
handle that [or standard handle that], yes i would know it handle that
overflwo using result "modulo 2^n"

i'm not much sure of the word "handle" that i mean "tratta"
it seems i lost my dictionary
 
W

woodbrian77

For whom? It's been less than a year that I've been able to use
a limited set of C++11, and most people I know in industry
cannot use it yet.

For lots of people including Google and Facebook
employees. (I'm not really persuaded by Alf in
this case of using auto in function declarations,
but think the number using compilers with 2011
support is both substantial and growing.) The
author of the Boost multi_index library made a
number of C++ 2011 updates

http://lists.boost.org/boost-users/2013/07/79348.php

Those updates were made some time ago so that now
they are available in the latest release of Boost.
I've only heard of one reported problem with the
updates and the author has, I think been able to
address it. Other Boost authors have also done
similar work to this. I'm also among those using
newer compilers.


Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net
 
W

woodbrian77

I'm also among those using newer compilers.

I've argued similarly before, but C++ 2011 was badly
delayed and should have been 2008 or earlier. Think
of it as C++ 2008 and the adoption rates may make
more sense.
 
G

Gennaro Prota

On Sat, 16 Nov 2013 03:27:13 +0100, Alf P. Steinbach wrote:

[...]
Inflation subtracts dx from left and adds it to right, so the total
width increase is 2*dx. Hence the particular range values, ensuring that
2*x doesn't overflow. It's just a little sanity check though.

Ah, OK :) I hadn't examined the subsequent code, and had
assumed you would add to "one side only" (e.g. only to the right
in the horizontal direction, and only to the bottom in the
vertical direction)

I had a few routines for checking integral operations but not on
this machine, so I'll have to resort to the old school of
"programming in the news reader" :) The idea was along these
lines, IIRC:

template< typename T >
bool
can_add( T const & a, T const & b )
{
if ( b < 0 ) { // NOTE: T_MIN not handled
return can_subtract( a, -b ) ;
}

T const m( std::numeric_limits< T >::max() ) ;

// the first member can't overflow here, because
// 0 <= b <= m implies 0 <= (m - b) <= m
//
return m - b >= a ;
}

Of course, can_subtract() can be written similarly.

But, as hinted at in the comment, this doesn't handle an
addition with T == int and b == INT_MIN, for instance. I don't
remember how I handled such cases.

[...]
Hm, checking...

Content-Type: text/plain; charset=ISO-8859-1; format=flowed

That looks good (it's ISO Latin-1), OK.

But at the start in October I think it was I posted via Google Groups,
and GG does all sorts of unthinkable unmentionable things.

Hmm... I see (for your <[email protected]> message)
ISO-8859-15. But I think yesterday I saw a KOI8-R somewhere. I
hope it's not my shiny new newsreader messing up. (But could it?
Isn't the header taken verbatim from the server?)
 
G

Gennaro Prota

On Sat, 16 Nov 2013 22:44:08 +0100, Gennaro Prota wrote:

[...]
I had a few routines for checking integral operations but not on
this machine, so I'll have to resort to the old school of
"programming in the news reader" :) The idea was along these
lines, IIRC:

template< typename T >
bool
can_add( T const & a, T const & b )
{
if ( b < 0 ) { // NOTE: T_MIN not handled
return can_subtract( a, -b ) ;
}

T const m( std::numeric_limits< T >::max() ) ;

// the first member can't overflow here, because
// 0 <= b <= m implies 0 <= (m - b) <= m
//
return m - b >= a ;
}

Of course, can_subtract() can be written similarly.

But, as hinted at in the comment, this doesn't handle an
addition with T == int and b == INT_MIN, for instance. I don't
remember how I handled such cases.

This is better (NOTE: untested):

template< typename T >
bool
can_add( T const & a, T const & b )
{
if ( b < 0 ) {
T const mn( (std::numeric_limits< T >::min)() );

return ( mn - b ) <= a ;
} else {

T const mx( (std::numeric_limits< T >::max)() );

return a <= ( mx - b ) ;
}
}

I employed some min-max macros protection (the parentheses, and the weird names
"mn" and "mx"). Note that you might want to use the ?: operator.

PS: back to your library, what do you do if dx (or dy) is negative and not
smaller than the rectangle semi-width (or semi-height)? Think e.g. of left=0,
top=0, right=10, bottom=5 and dx=-6, dy=-3.
 
A

Alf P. Steinbach

On Sat, 16 Nov 2013 22:44:08 +0100, Gennaro Prota wrote:

[...]
I had a few routines for checking integral operations but not on
this machine, so I'll have to resort to the old school of
"programming in the news reader" :) The idea was along these
lines, IIRC:

template< typename T >
bool
can_add( T const & a, T const & b )
{
if ( b < 0 ) { // NOTE: T_MIN not handled
return can_subtract( a, -b ) ;
}

T const m( std::numeric_limits< T >::max() ) ;

// the first member can't overflow here, because
// 0 <= b <= m implies 0 <= (m - b) <= m
//
return m - b >= a ;
}

Of course, can_subtract() can be written similarly.

But, as hinted at in the comment, this doesn't handle an
addition with T == int and b == INT_MIN, for instance. I don't
remember how I handled such cases.

This is better (NOTE: untested):

template< typename T >
bool
can_add( T const & a, T const & b )
{
if ( b < 0 ) {
T const mn( (std::numeric_limits< T >::min)() );

return ( mn - b ) <= a ;
} else {

T const mx( (std::numeric_limits< T >::max)() );

return a <= ( mx - b ) ;
}
}

I employed some min-max macros protection (the parentheses, and the
weird names "mn" and "mx").

Well, I don't think any of that is necessary; I can't see any vexing
parse here. But if there was then I'd avoid it simply by using "=" copy
initialization syntax. I like that syntax better anyway. :)

Note that you might want to use the ?: operator.

PS: back to your library, what do you do if dx (or dy) is negative and not
smaller than the rectangle semi-width (or semi-height)? Think e.g. of
left=0, top=0, right=10, bottom=5 and dx=-6, dy=-3.

The library is just API support, a C++ binding to the API, so to speak.
The API produces an "inverted" rectangle in this case. I.e. it just does
a straightforward update without any checking.

I think it's not unreasonable since the rectangles are used with many
different coordinate systems, and depending on the coordinate system the
"inverted" result can make sense directly.

Someone else asked about the same earlier in the thread.


Cheers,

- Alf
 
A

Alf P. Steinbach

I employed some min-max macros protection (the parentheses, and the
weird names
"mn" and "mx"). Note that you might want to use the ?: operator.

Oh! You're thinking of MACRO interference. Well I say, those who let
macros called "min" and "max" exist, deserve whatever they get! :D

Cheers,

- Alf
 
A

Alf P. Steinbach

Gennaro Prota writes:
[snip]

It's quite depressing to learn they don't teach this stuff in school any
more, apparently. If you really want to be fancy, you could throw in a
compile-time check that T is really an integer type, but that's as far
as I'd go.

plink

- Alf
 
A

Alf P. Steinbach

Gennaro Prota writes:
[snip]

It's quite depressing to learn they don't teach this stuff in school any
more, apparently. If you really want to be fancy, you could throw in a
compile-time check that T is really an integer type, but that's as far
as I'd go.

plink

Maybe I should better explain that plink to causal readers.

If Gennaro was clearly a student lacking in basic knowledge, and I also
was such a one, then Sam's comment could be interpreted to actually be
about the school system, and then to some degree excusing the students'
poor understanding of the issue at hand. However, Gennaro is, among
other things, a long time Boost library contributor, I'm a long time
clc++m moderator, and far from indicating novice level participants the
thread so far has been about ordinary professional level stuff. Thus,
Sam's comment is a personal slight that's counter to what he knows,
which strongly indicates pure trolling, a flame bait.

The "plink"[1] then indicates that I've placed him in my news client's
kill file, so that I won't even SEE his further postings in clc++.


- Alf

Notes:
[1] "plink" is (so it was explained to me) the sound of a light-weight
troll hitting the inner surface of the toilet bowl, while "plonk" is the
sound of a more heavy-weight troll being flushed down.
 
D

Daniel

Maybe I should better explain that plink to causal readers.
[Uninteresting self-indulgent explanation snipped]

It seems to this casual reader that Alf is the one playing the troll.

Daniel
 
G

Gennaro Prota

Oh! You're thinking of MACRO interference. Well I say, those who let
macros called "min" and "max" exist, deserve whatever they get! :D

It isn't an easy problem. One of the main offenders in this case
is <windows.h>. I've never found a reassurance in Microsoft
documentation that their min/max macros are not used in their
own functions. This means that using NOMINMAX in your code and
linking to something else that doesn't use it could potentially
lead to ODR violations.

(I think they do not use them; but that's not a certainty.)

Up until a few weeks ago, I've been avoiding the names "min" and
"max" in my own code, exactly to avoid problems. Then, I wrote a
little class intended to be a model of standard uniform random
number generator; and that *requires* those names.


To protect those names (including if you use
numeric_limits<>::min/max), you have two main alternatives:
either you parenthesize or you use a dummy macro like this:

#define NO_MACRO_EXPANSION
int x = std::numeric_limits< int >::max NO_MACRO_EXPANSION () ;

It prevents expanding any function-like macro named "max" for
the same reason that a ')' does: the macro name isn't followed
by an open parenthesis.

But parentheses also inhibit ADL, whereas the NO_MACRO_EXPANSION
trick doesn't. And, all in all, the macro is probably clearer if
you give it a good name.
 
G

Gennaro Prota

]
Maybe I should better explain that plink to causal readers.

If Gennaro was clearly a student lacking in basic knowledge, and I also
was such a one, then Sam's comment could be interpreted to actually be
about the school system, and then to some degree excusing the students'
poor understanding of the issue at hand. However, Gennaro is, among
other things, a long time Boost library contributor

I have been a contributor. But I attach great importance to
saying that it was an error of youth.
 
A

alf.p.steinbach

Alf, doesn't your newsreader support wildcards in "plink"? This way you
could plink the whole group at once and feel much more relaxed!

plonk
i should have done that long before, but i've been reluctant to do so.
however, this latest sour and misleading comment of yours does not occur asnatural part of a discussion you're involved in, it's just you trying to kick someone, anyone, for your own satisfaction of kicking, out of the blue,when you saw a chance. that makes you a pretty bad person as i see it. andconsidering earlier behavior, in spite of apparent age, you're trolling.


- Alf
 
T

Tobias Müller

Gennaro Prota said:
It isn't an easy problem. One of the main offenders in this case
is <windows.h>. I've never found a reassurance in Microsoft
documentation that their min/max macros are not used in their
own functions. This means that using NOMINMAX in your code and
linking to something else that doesn't use it could potentially
lead to ODR violations.

(I think they do not use them; but that's not a certainty.)

I don't get that. Macros are always expanded, they _never_ generate linker
symbols.
Up until a few weeks ago, I've been avoiding the names "min" and
"max" in my own code, exactly to avoid problems. Then, I wrote a
little class intended to be a model of standard uniform random
number generator; and that *requires* those names.


To protect those names (including if you use
numeric_limits<>::min/max), you have two main alternatives:
either you parenthesize or you use a dummy macro like this:

#define NO_MACRO_EXPANSION
int x = std::numeric_limits< int >::max NO_MACRO_EXPANSION () ;

It prevents expanding any function-like macro named "max" for
the same reason that a ')' does: the macro name isn't followed
by an open parenthesis.

But parentheses also inhibit ADL, whereas the NO_MACRO_EXPANSION
trick doesn't. And, all in all, the macro is probably clearer if
you give it a good name.

Just
#include <windows.h>
#ifdef max
#undef max
#endif
#ifdef min
#undef min
#endif
and you're done. No reason to clutter your code with yet another macro...

Tobi
 
T

Tobias Müller

Paavo Helde said:
org:

This does not work in library headers, which might be included in some
client code which expects to see Windows min/max macros.

True. My public APIs are usually in C (implemented in C++) because of
dynamic loading and interfacing other languages, so I never had that
problem.

Still, you could reimplement min/max in the global namespace if you have to
include windows.h in your header.
Been there, burned
myself. Sometimes it is really cumbersome to move the min/max usage into
the .cpp file, and with templates it might be impossible. Fortunately there
are (ugly) workarounds.

Besides, these #ifdef-s above are unneeded, #undef of a name which is not a
macro is a null-op.

Thanks for the hint.

Tobi
 
J

James Kanze

On 13/11/13 16:02, Rosario1903 wrote:
[...]
The compiler has to generate code /as if/ it followed the ordering in
the source code and the precedence rules for operators. But it can
re-arrange the /actual/ generated code, as long as the effect is the
same. Since any effect on flags is not visible in C or C++ in a
calculation like "a * b + c", the compiler does not have to consider it
when arranging instructions. If the "readOverflowFlag()" is an inline
assembly function, the compiler will probably treat it like a volatile
access - and obey its ordering with respect to other volatile accesses,
to function calls that /may/ have unknown effects, code that /may/ cause
an exception, etc. But simple arithmetic using local data can usually
be moved around quite freely.
That's true as far as it goes. Nothing in the condition code is
"observable" in C++. On the other hand:
-- any calculations that would set the overflow flag would
result in undefined behavior in C++, so the compiler can
assume that they won't occur, and
-- the compiler also knows that the condition code is not
observable, and may rearrange code so that the overflow bit
gets set in an intermediate results, as long as it knows
that the final results will be correct. (For example, given
a + b + c, where a is a large positive number, and b and
c fairly large negative values. Executing a + b, and then
the results + c, will not overflow. If the compiler knows
that the hardware rounds results, it may execute
b + c first, then add a, even if b + c results in an
overflow.)
I don't think there is any sort of undefined behaviour here - it is just
that C and C++ does not consider effects on flags as "behaviour" of
arithmetic instructions.
Overflow is undefined behavior. Flags or whatever have nothing
to do with it.
The code Rosario posted used unsigned integers (I'm assuming that's what
he meant with "u32"), so if I've interpreted the C++ standards
correctly, "a * b + c" cannot overflow as the operations are defined as
modulo 2^32, and so no undefined behaviour. The cpu's overflow flag may
still be set.

Even if the unsigned operations don't overflow, the CPU's
overflow flag may be set. Typical CPU's don't distinguish
between signed and unsigned operations; they just add. The
carry flags signals overflow if the operation is treated as
unsigned, the overflow flag if it is treated as signed. It's up
to the assembler programmer (or the compiler) to decide what he
wants to do with these flags, if anything.
"Overflow" as defined by C++ and "overflow" in a typical cpu flag
register are therefore somewhat orthogonal concepts, I think.

Not really. They are both based on a more general concept of
overflow. The hardware could call the flags signed overflow and
unsigned overflow if it wanted. The reason it probably doesn't
is that for the most part, the only time a programmer will use
unsigned arithmetic is either for bit manipulations, when
overflow doesn't have any really well defined meaning, or for
multiple precision arithmetic, where the low order elements are
handles as unsigned, and their "overflow" is the carry for the
next higher elements.
If only that were true! Developer writing this type of code are mostly
doing so because they believe it is correct, having seen lots of
examples in books, websites, and example code from their toolchain
vendor or microcontroller manufacturer.

I'm not saying that there aren't any incompetent programmers
around, but the only people I've seen who write this sort of
thing know very well that it isn't portable. It's quite correct
for the platform they are developing for, and that's what
counts.
It is common that compilers /do/ generate code in the order the user
wants,

What does the user want? It's actually very, very rare for a
compiler to generate code in a purely left to right order, as a
naïve programmer might expect (and Java requires). Things like
using Sethi-Ullman numbers to choose which branch of the parse
tree to generate first were wide-spread practice back when I was
writing compilers, twenty-five years ago. Today, instruction
ordering plays an important role, and even the most trivial
peephole optimizer will reorder operations considerably.
but I have never seen it documented or guaranteed in any way.
(Of course, there are lots of embedded toolchains around - I have used
quite a few, but it's still just a tiny percentage - so I can't
generalise too much.)
For a lot of code, there is no particular reason for the compiler to
generate code in a different order - and compilers generally order the
output based on the source code if there is no good reason to do
something else.

There are a very large number of reasons why a compiler might
reorder code.

Don't forget, too, that modern processors don't always execute
the machine instructions in the order they occur, and
particularly, they often reorder memory accesses (or suppress
some completely) even when the machine instructions are in the
order you might expect.
But the fact that it mostly works makes it more surprising to people
when they hit the unusual cases.

Only if they are incompetent.` (Incompetent to program at this
level, of course. I currently work with some world class
programmers... in numerical analysis; they probably wouldn't be
competent to program at this level.)
I think the compiler guarantees that it will send the write instructions
(since "globalInterruptEnable" is volatile). But there is no guarantee
that the /processor/ and/or memory caches and other hardware will pass
the writes on. In more sophisticated cpu's with write buffers could
easily combine the two writes before it finally hits memory. And
certainly they don't guarantee any ordering between writes to
"globalInterruptEnable" and "partA" and "partB".

Exactly. For the code you post to work, the compiler would have
to generate barriers or fences for all of the volatile accesses.

That's an aspect that many programmers might not be aware of,
but all programmers should certainly know that volatile only
affects the variable it is attached to, and the the compiler is
free to move accesses to other variabes accross the volatile
access.
When using targets with that sort of caching and buffering, you need to
set things up to ensure that writes like "globalInterruptEnable" are
marked appropriately to avoid such problems - clearly such a flag has a
specific address in the memory map rather than being allocated in ram.
And usually you also need to add extra memory synchronisation
instructions - your interrupt control is by function, not just a single
write.
But for smaller and simpler cpus, people write code like that - the
hardware will apply the writes in the order given by the instruction scheme.

Yes. In the past, it was a lot simpler. The compiler finished
one instruction before starting the next, and all reads and
writes were directly to memory---no caches and no pipelines.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top