How to make it faster?

Peter Nilsson · Apr 10, 2008

Keith said:
Check your compiler's documentation. It's likely that "-O2"
is not the highest optimization level available. (gcc supports
at least "-O3", plus a number of other options for specific
optimizations that aren't necessarily included in "-ON" for
whatever value of N.)

The way I see it, implementations don't have optimisation
levels per se. They simply allow more techniques to be
applied in the hopes of achieving faster code. Optimisation
remains inherently trial and error. So called 'higher' settings
can actually produce _less_ efficient code than 'lower'
settings in some cases.

Paul Hsieh · Apr 10, 2008

I tried using "#define ITER 64". Overflow occurred in this setting.
I got "1.#J". I don't know what "1.#J" stands for?

Its a kind of NaN which is in fact an overflow condition of some
kind. In that case, you should try 4 or 8. But the key is to use a
power of 2. This will allow your compiler to use an "as-if" rule to
reduce the cost of executing the "%" operator. Otherwise you can use
a target counter:

if (i >= tgtI) {
tgtI += ITER;
prodn /= prodd;
prodd = 1.0;
}

That would probably work just as well.

[...] When I changed
the type of sump and sumd to long double, the problem disappeared.
The program saved 1 sec because of larger ITER.

I also tried another way:

#define ITER 64

register double sum = 0.0;

if (i % ITER == 0) {
sum += log(prodn/prodd); /* introducing log, but not so frequent */
prodn = 1.0;
prodd = 1.0;

}

In the way, the running time remained the same: 11 secs before setting
-O2, and 7 secs after setting -O2. However, the return value was a
little different.

Well I would suggest trying other values of ITER first.
Unfortunately, long double sounds like its use a larger likely more
expensive data type which may likely have impact on the performance of
rest of your code. So I would recommend against doing that.

christian.bau · Apr 11, 2008

And what was all that stuff about a = 0, 0.25, 0.5, 0.75 all about?

Click to expand...

In the following function foo, the values of arguments g0, g1 and g2
are limited to 0, 0.25, 0.5, or 0.75. But I guess this property has
no use because exp1 and exp2 change in every call. However, exp0 dose
remain the same for all calls.

How big is sz? There are only 64 possible combinations of the values
of g0 , g1 , g2 . If sz > 64, you can pre-calculate quite a
few things for each of the 64 possible combinations, and instead of
three floating-point values for each i you could pass in a single
unsigned char which specifies which combination is used.

christian.bau · Apr 11, 2008

Glad to hear it. Transcendental calls are the worst in terms of
performance. After that comes divides and modulos. After that is
branch mis-predictions. If you can optimize all that, you will at
least be within 2x-4x of optimal (barring algorithm and IO
improvements) pretty much all time.

Usually the total cost is computational cost plus cost of data access.
Since the computational cost has by now been reduced by a factor ten,
it would now be a good idea to look at things like cache usage, but
that would require a bigger picture.

It was mentioned that g is always one of 0.0, 0.25, 0.5 and 0.75.
A trivial change (changing these arrays from double to float) would
save 25% of memory bandwidth. A simple experiment would be to measure
the time of one program run, then do the same number of calls to foo,
but always with the same set of data. If that is a lot faster, then
cache improvements can help.

user923005 · Apr 12, 2008

Glad to hear it. Transcendental calls are the worst in terms of
performance. After that comes divides and modulos. After that is
branch mis-predictions. If you can optimize all that, you will at
least be within 2x-4x of optimal (barring algorithm and IO
improvements) pretty much all time.

Click to expand...

Usually the total cost is computational cost plus cost of data access.
Since the computational cost has by now been reduced by a factor ten,
it would now be a good idea to look at things like cache usage, but
that would require a bigger picture.

It was mentioned that g is always one of 0.0, 0.25, 0.5 and 0.75.
A trivial change (changing these arrays from double to float) would
save 25% of memory bandwidth. A simple experiment would be to measure
the time of one program run, then do the same number of calls to foo,
but always with the same set of data. If that is a lot faster, then
cache improvements can help.

If the problem were scaled so that g contained characters of
0,1,2,3 {which represent multiples of 0.25} it would reduce the
requirement for 4 byte float to 1 byte for char (assuming 32 bit float
and 8 bit char).

How can I calculate the last payment for Reprofiled Amount column with 2 decimal places to make the sum of all payments to be the same as RC amount?	2	Jul 13, 2023
I am trying to make an auto-play thing. How do I make it work?	5	Apr 5, 2022
Sticky Header - How Do I Make It Hide/Show on Scroll?	0	Sep 27, 2021
Adding adressing of IPv6 to program	1	Feb 16, 2023
How to make an input box enable for only past dates and for the startdate if it falls in the current month?	5	Jun 12, 2023
faster way to do this?	17	Dec 1, 2011
How to make this faster?	1	Nov 13, 2012
How to make a iOS browser with different cookies per tab?	0	Sep 7, 2021

How to make it faster?

Peter Nilsson

Paul Hsieh

christian.bau

christian.bau

user923005

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads