optimizing the integer rescaling

V

Vladimir Jovic

Hello,

I am trying to optimize integer rescaling (see example at the end).

There is a range or integer elements (in the example, elements are of
type unsigned short), current range maximum, and the desired maximum.
Straight forward way is to multiply by new maximum, and then to divide
by old maximum (taking care about the overflow).

Are there better ways of rescaling the integers?



///////////////////////////////////
#include <iostream>

unsigned short a = 0x1234; // value in the range to rescale
unsigned short b = 0x3fff; // new maximum range value
unsigned short c = 0x0800; // current maximum range value


int main()
{
const unsigned int d = a * b;
const unsigned short e = d / c;
std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
}
///////////////////////////////////
 
V

Victor Bazarov

I am trying to optimize integer rescaling (see example at the end).

Optimize? As in "speed it up"?
There is a range or integer elements (in the example, elements are of
type unsigned short), current range maximum, and the desired maximum.
Straight forward way is to multiply by new maximum, and then to divide
by old maximum (taking care about the overflow).

Are there better ways of rescaling the integers?

Better? In what way? To make it more precise perform the operation
using [long] doubles and then truncate or round the results.
///////////////////////////////////
#include <iostream>

unsigned short a = 0x1234; // value in the range to rescale

It's *not* in the range. The "current maximum" is smaller than this value.
unsigned short b = 0x3fff; // new maximum range value
unsigned short c = 0x0800; // current maximum range value


int main()
{
const unsigned int d = a * b;
const unsigned short e = d / c;

It is better to store the intermediate value in a type that is
*guaranteed* to be larger. 'int' is not necessarily larger than
'short'. The actual solution apparently depends on the implementation
which controls the sizes of integral values. I would probably seek a
platform-specific solution.
std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
}
///////////////////////////////////

V
 
V

Vladimir Jovic

Victor said:
Optimize? As in "speed it up"?

Yes, like use operations which are going to be faster then one
multiplication + one division (as in the example)
There is a range or integer elements (in the example, elements are of
type unsigned short), current range maximum, and the desired maximum.
Straight forward way is to multiply by new maximum, and then to divide
by old maximum (taking care about the overflow).

Are there better ways of rescaling the integers?

Better? In what way? To make it more precise perform the operation
using [long] doubles and then truncate or round the results.

To do that, you need to convert integer to double, multiply and divide,
and back to integer. That is not going to be faster. Might be more
precise, but it shouldn't be.
It's *not* in the range. The "current maximum" is smaller than this value.

Yes, it is not a range, but lets imagine it is a randomly taken element
from a range :) Then, having a range or randomly chosen element doesn't
make much difference *how* will you rescale it.

My mistake about the current maximum :
unsigned short c = 0x2800; // current maximum range value
It is better to store the intermediate value in a type that is
*guaranteed* to be larger. 'int' is not necessarily larger than
'short'. The actual solution apparently depends on the implementation
which controls the sizes of integral values. I would probably seek a
platform-specific solution.

Good point. Which NG would you suggest for x86?
 
A

Alf P. Steinbach /Usenet

* Vladimir Jovic, on 08.09.2010 12:43:
Hello,

I am trying to optimize integer rescaling (see example at the end).

There is a range or integer elements (in the example, elements are of type
unsigned short), current range maximum, and the desired maximum. Straight
forward way is to multiply by new maximum, and then to divide by old maximum
(taking care about the overflow).

Are there better ways of rescaling the integers?



///////////////////////////////////
#include <iostream>

unsigned short a = 0x1234; // value in the range to rescale
unsigned short b = 0x3fff; // new maximum range value
unsigned short c = 0x0800; // current maximum range value


int main()
{
const unsigned int d = a * b;
const unsigned short e = d / c;
std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
}
///////////////////////////////////

As Victor noted your 'b' is out of range. That means that you've not paid
attention to detail. The details may or may not be important depending on the
problem at hand, which you do not describe.

In the general case you need to first decide on the distribution you want. For
example, are your maximums inclusive or exclusive? The value 0x0800 seems like
an exclusive maximum, while 0x3fff seems like an inclusive one.

When you have decided /what/ your rescaling should do, then you should implement
it correctly. That means e.g. avoiding overflow. The above code can easily
overflow, so as portable code it's generally incorrect no matter the details of
the desired scaling (it's like a car with square wheels, doesn't matter whether
it's meant to be a sports car or a lorry, it's just wrong).

So, yes, there are better ways.

The first step on the road to betterness is to define the desired effect, and
the second step is to implement that correctly. Only then worry about micro
efficiency. If at all (and if you do, first measure, Measure, MEASURE).


Cheers & hth.,

- Alf
 
V

Vladimir Jovic

Alf said:
* Vladimir Jovic, on 08.09.2010 12:43:

[snip old code]
As Victor noted your 'b' is out of range. That means that you've not
paid attention to detail. The details may or may not be important
depending on the problem at hand, which you do not describe.

Actually, the first part of the algorithm is calculating histogram of an
image. Another part of the algorithm is using cumulative sum of the
histogram as a lookup table.

Since the maximum value in the histogram is number of points in the
image (one point is 16 bits), I need to rescale all values in the
histogram to the maximum value of 16 bits (which is 0xffff).

All parts of the algorithm are optimized (at least I think that is
true), except for this rescaling. I am doing it after I calculate the
cumulative sum, and I am doing it as shown in the code example (one
multiplication and one division). It works fine, but I was wondering if
there is a faster way.
In the general case you need to first decide on the distribution you
want. For example, are your maximums inclusive or exclusive? The value
0x0800 seems like an exclusive maximum, while 0x3fff seems like an
inclusive one.

Yes, as I said else thread, that is a small mistake. For completeness,
here is the correct version :

///////////////////////////////////
#include <iostream>

unsigned short a = 0x1234; // value in the range to rescale
unsigned short b = 0x3fff; // new maximum range value
unsigned short c = 0x2800; // current maximum range value
int main()
{
const unsigned int d = a * b;
const unsigned short e = d / c;
std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
}
///////////////////////////////////
When you have decided /what/ your rescaling should do, then you should
implement it correctly. That means e.g. avoiding overflow. The above
code can easily overflow, so as portable code it's generally incorrect
no matter the details of the desired scaling (it's like a car with
square wheels, doesn't matter whether it's meant to be a sports car or a
lorry, it's just wrong).

I didn't really think about code portability, because my target is set
(x86 - some pentium processor). I also wasn't thinking about the
overflow, because that is easily fixed in proper unit tests.

The problem I am facing is the algorithm can not be used, because it
uses too much CPU :(
So, yes, there are better ways.

The first step on the road to betterness is to define the desired
effect, and the second step is to implement that correctly. Only then
worry about micro efficiency. If at all (and if you do, first measure,
Measure, MEASURE).

I did measure, and the profiler told that 30% time is spent on rescaling.

ps Is this another problem for comp.programming ? Is there a NG where to
post optimization questions?
 
V

Victor Bazarov

Victor said:
I am trying to optimize integer rescaling (see example at the end).
[..]

It is better to store the intermediate value in a type that is
*guaranteed* to be larger. 'int' is not necessarily larger than
'short'. The actual solution apparently depends on the implementation
which controls the sizes of integral values. I would probably seek a
platform-specific solution.

Good point. Which NG would you suggest for x86?

comp.lang.asm.x86, maybe?

V
 
V

Victor Bazarov

The problem I am facing is the algorithm can not be used, because it
uses too much CPU :(

Consider utilizing SSE (that's what those are called, IIRC). There you
can vectorize your multiplications, divisions, additions... Ask in the
x86 newsgroup. Actually, a decent compiler should be able to vectorize
some of those for you. BTW, "too much CPU" is not a good description of
your code's deficiencies. Too much compared to WHAT?

V
 
V

Vladimir Jovic

Victor said:
Consider utilizing SSE (that's what those are called, IIRC). There you
can vectorize your multiplications, divisions, additions... Ask in the
x86 newsgroup. Actually, a decent compiler should be able to vectorize

The SSE doesn't have integer division.
Will try x86 group
some of those for you. BTW, "too much CPU" is not a good description of
your code's deficiencies. Too much compared to WHAT?

IMO the description is ok if you are trying to optimize the code, but
that is not relevant to the question. If your algorithm spends too much
time in one loop, you know you have to optimize everything in that loop.

For example, your response in else thread to convert to double, multiply
and divide and convert back to integer is going to slow down my algorithm.
 
T

tni

The SSE doesn't have integer division.
Will try x86 group


IMO the description is ok if you are trying to optimize the code, but
that is not relevant to the question. If your algorithm spends too much
time in one loop, you know you have to optimize everything in that loop.

For example, your response in else thread to convert to double, multiply
and divide and convert back to integer is going to slow down my algorithm.

You can multiply by the reciprocal.

Since your input is 16-bits only, float instead of double should be enough.

Floating point to int conversion is quite slow in C/C++ on x86, you
should look at SSE compiler intrinsics (that stuff is portable across
the major x86/x64 compilers). With SSE, you can also do 4 conversions in
parallel.

\\

You seem to have compile time constants for your scaling. If you make
those available to the compiler (declare them as integral const values),
the compiler will be able to optimize away the division (at least Visual
C++, GCC do).
 
D

Daniel Pitts

Alf said:
* Vladimir Jovic, on 08.09.2010 12:43:

[snip old code]
As Victor noted your 'b' is out of range. That means that you've not
paid attention to detail. The details may or may not be important
depending on the problem at hand, which you do not describe.

Actually, the first part of the algorithm is calculating histogram of an
image. Another part of the algorithm is using cumulative sum of the
histogram as a lookup table.

Since the maximum value in the histogram is number of points in the
image (one point is 16 bits), I need to rescale all values in the
histogram to the maximum value of 16 bits (which is 0xffff).

All parts of the algorithm are optimized (at least I think that is
true), except for this rescaling. I am doing it after I calculate the
cumulative sum, and I am doing it as shown in the code example (one
multiplication and one division). It works fine, but I was wondering if
there is a faster way.
In the general case you need to first decide on the distribution you
want. For example, are your maximums inclusive or exclusive? The value
0x0800 seems like an exclusive maximum, while 0x3fff seems like an
inclusive one.

Yes, as I said else thread, that is a small mistake. For completeness,
here is the correct version :

///////////////////////////////////
#include <iostream>

unsigned short a = 0x1234; // value in the range to rescale
unsigned short b = 0x3fff; // new maximum range value
unsigned short c = 0x2800; // current maximum range value
int main()
{
const unsigned int d = a * b;
const unsigned short e = d / c;
std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
}
///////////////////////////////////
When you have decided /what/ your rescaling should do, then you should
implement it correctly. That means e.g. avoiding overflow. The above
code can easily overflow, so as portable code it's generally incorrect
no matter the details of the desired scaling (it's like a car with
square wheels, doesn't matter whether it's meant to be a sports car or
a lorry, it's just wrong).

I didn't really think about code portability, because my target is set
(x86 - some pentium processor). I also wasn't thinking about the
overflow, because that is easily fixed in proper unit tests.

The problem I am facing is the algorithm can not be used, because it
uses too much CPU :(
So, yes, there are better ways.

The first step on the road to betterness is to define the desired
effect, and the second step is to implement that correctly. Only then
worry about micro efficiency. If at all (and if you do, first measure,
Measure, MEASURE).

I did measure, and the profiler told that 30% time is spent on rescaling.
Depending on the size of the image, you can try generating a lookup
table. You can use something like Bresenham's Line Algorithm to build
that table using only integers.
ps Is this another problem for comp.programming ? Is there a NG where to
post optimization questions?
comp.programming is a better place, or comp.graphics.algorithms.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top