micro-optimization question

H

Hallvard B Furuseth

Malcolm said:
Signed ints can trap, whilst unsigned ones cannot. So it is easy to
think of an architecture on which signed int is slower, hard to think
of one on which it is faster.

I've seen loops that gcc warned it could not optimize well because the
loop variables were unsigned. Don't remember what kind of loop caused
that though. I'm guessing it's because unsigned arithmetic is fully
defined, while signed arithmetic has undefined behavior on overflow - so
the compiler sometimes has more options about what to do with signed
arithmetic. The compiler may do whatever it wants with potential
undefined behavior, such as to assume some particular trouble case will
not happen.

Of course, that can work the other way too if the cpu will do nasty
things like traps on overflow: the compiler must in that case make sure
not to introduce a signed overflow where the program had none.

Anyway, just take it as another case of how intuition can be unreliable
about optimization.


Before optimizing, I hope you've been through the normal checklist:
Don't do it.
Don't do it yet.
Profile to find the hot spots where optimization will make a difference,
and cold spots where it won't.
Optimize algorithms and data structures before doing micro-optimizations
and code tweaks.

Regarding the latter, there are some techniques e.g. Paul Hsieh's pages:
http://www.azillionmonkeys.com/qed/tech.shtml
Google(optimization techniques C) has some things to say too.
 
H

Hallvard B Furuseth

Tim said:
(...)
sometimes discussed as a C vulnerability than as a performance question
(is either of those topical here?):

https://www.securecoding.cert.org/c...p;jsessionid=C0133C003E1DC5DFFA83BF496534B139

One of the strangest rules that project came up with, in my opinion.
As if the equivalent signed overflows were any better. With unsigned
you can at least do the arithmetic first and test afterwards, when that
is simpler. Though some of the bad cases are indeed specific to
unsigned, like expecting a careless 'unsigned - unsigned' to be small.
 
N

Nick Keighley

I am not sure I understand what you mean by "trap". So, I request you to explain
 it a bit more.

Everyone else seems to have ignored this so I'll have a go.
On some architectures some types can have a special "illegal" value
referred to as a trap value. This can generates a trap or signal if
read. IEEE floating point allows for things called "signalling NaNs"
that act as trap values. Few platforms implement signalling NaNs
though.
Some arechitecures use 1s complement arithmatic (most modern
architectures
use 2s complement) which allows for noth +0 and -0. -0 can then be
used as a trap.
Any C type except unisgned int can (but doesn't have to) contain a
trap
representation.

The poster is a little confused between trap representations and
overflow.
ints when they go beyond the legal range have Undefined Behaviour.
unsigned
ints have well defined behaviour (wrap around).


commonly they are the same
 
P

Phil Carmody

Richard Heathfield said:
Hallvard B Furuseth said:


And it came up with some very strange rules indeed. Last year I did
a detailed review of the first few rules, but the author seemed
uninterested so I stopped bothering.

I was wondering what the respected reg's would say about them too.
It seems as if unsignedness was being blamed for things that can
happen equally easily, and with less defined results, with signed
types. The signedness was pretty irrelevant.
C's guarantee that unsigned arithmetic is reduced modulo (N + 1) -
where N is the largest value representable in the relevant unsigned
type - is a useful one, and I see no particular reason to avoid
exploiting it for programmatical gain.

And I bet you see no problem with having a pointer to the non-existent
post-final element of an array for comparison in a loop. That's what
it's there for. To avoid using it, and use more contrived code instead
seems silly, and more prone to error.

Phil
 
P

Phil Carmody

Nick Keighley said:
Everyone else seems to have ignored this so I'll have a go.
On some architectures some types can have a special "illegal" value
referred to as a trap value. This can generates a trap or signal if
read. IEEE floating point allows for things called "signalling NaNs"
that act as trap values. Few platforms implement signalling NaNs
though.
Some arechitecures use 1s complement arithmatic (most modern
architectures
use 2s complement) which allows for noth +0 and -0. -0 can then be
used as a trap.
Any C type except unisgned int can (but doesn't have to) contain a
trap
representation.

I thought unsigned chars were the canonical can-be-set-to-anything
type?

Phil
 
P

Phil Carmody

Richard Heathfield said:
Cross said:


Well, that's up to you. Basically, you're allowed to assume that
unsigned arithmetic wraps (see below). How you exploit this is up
to you. Hashes are one obvious beneficiary of the guarantee,
because you can do something like:

unsigned long hash(const char *s)
{
unsigned long h = 0;
while(*s)
{
h *= 33;
h += *s++;

Isn't that implementation defined? For *s = 0b1111111, the integer
promotion might be to 255 (char is unsigned), but might be -1 (char
is signed, promotion preserving sign).

The mere fact I'm asking is one reason I always like to just have
my arrays of bytes as unsigned char[] right from the outset. Most
head-scratching is a warning sign.

Phil
 
P

Phil Carmody

Richard Heathfield said:
Phil Carmody said:

For a hash, does it matter?

Very much so.

If you're thinking of only using a hash function as a way of
locally indexing a hash table, then no, but that's a pretty
restricted view of the possible uses of a hash function.

Have you not seen how proud Linus Torvalds is of being able
to reproduce his entire Linux repository from a single hash
value? (E.g. most famously his talk to google about GIT, c.
2007, as viewable on google video or youtube.)

Phil
 
B

Ben Pfaff

Phil Carmody said:
Richard Heathfield said:
Phil Carmody said:

For a hash, does it matter?

Very much so. [...]
Have you not seen how proud Linus Torvalds is of being able
to reproduce his entire Linux repository from a single hash
value? [...]

That's a cryptographically secure hash function, which is almost
a completely different category. The design criteria for a
cryptographically secure hash are so much different, and more
stringent, than a hash function for hash table lookup.
 
P

Phil Carmody

Ben Pfaff said:
Phil Carmody said:
Richard Heathfield said:
Phil Carmody said:
Isn't that implementation defined?

For a hash, does it matter?

Very much so. [...]
Have you not seen how proud Linus Torvalds is of being able
to reproduce his entire Linux repository from a single hash
value? [...]

That's a cryptographically secure hash function, which is almost
a completely different category. The design criteria for a
cryptographically secure hash are so much different, and more
stringent, than a hash function for hash table lookup.

However, it relies on being a repeatable calculation, I think
you'll agree - that was the property I was highlighting, you'll
notice. Anything distributed may want or need to be hashed, it
doesn't have to be in a security-sensitive context. Frame relay,
ATM, IP and TCP checks-sums are all hash functions that need to
be calculated identically everywhere.

Phil
 
R

Richard Bos

Phil Carmody said:
If you're thinking of only using a hash function as a way of
locally indexing a hash table, then no, but that's a pretty
restricted view of the possible uses of a hash function.

How apt your sig is for this thread...

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top