optimizing the integer rescaling

Discussion in 'C++' started by Vladimir Jovic, Sep 8, 2010.

  1. Hello,

    I am trying to optimize integer rescaling (see example at the end).

    There is a range or integer elements (in the example, elements are of
    type unsigned short), current range maximum, and the desired maximum.
    Straight forward way is to multiply by new maximum, and then to divide
    by old maximum (taking care about the overflow).

    Are there better ways of rescaling the integers?



    ///////////////////////////////////
    #include <iostream>

    unsigned short a = 0x1234; // value in the range to rescale
    unsigned short b = 0x3fff; // new maximum range value
    unsigned short c = 0x0800; // current maximum range value


    int main()
    {
    const unsigned int d = a * b;
    const unsigned short e = d / c;
    std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
    }
    ///////////////////////////////////
     
    Vladimir Jovic, Sep 8, 2010
    #1
    1. Advertising

  2. On 9/8/2010 6:43 AM, Vladimir Jovic wrote:
    > I am trying to optimize integer rescaling (see example at the end).


    Optimize? As in "speed it up"?

    > There is a range or integer elements (in the example, elements are of
    > type unsigned short), current range maximum, and the desired maximum.
    > Straight forward way is to multiply by new maximum, and then to divide
    > by old maximum (taking care about the overflow).
    >
    > Are there better ways of rescaling the integers?


    Better? In what way? To make it more precise perform the operation
    using [long] doubles and then truncate or round the results.

    > ///////////////////////////////////
    > #include <iostream>
    >
    > unsigned short a = 0x1234; // value in the range to rescale


    It's *not* in the range. The "current maximum" is smaller than this value.

    > unsigned short b = 0x3fff; // new maximum range value
    > unsigned short c = 0x0800; // current maximum range value
    >
    >
    > int main()
    > {
    > const unsigned int d = a * b;
    > const unsigned short e = d / c;


    It is better to store the intermediate value in a type that is
    *guaranteed* to be larger. 'int' is not necessarily larger than
    'short'. The actual solution apparently depends on the implementation
    which controls the sizes of integral values. I would probably seek a
    platform-specific solution.

    > std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
    > }
    > ///////////////////////////////////


    V
    --
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Sep 8, 2010
    #2
    1. Advertising

  3. Victor Bazarov wrote:
    > On 9/8/2010 6:43 AM, Vladimir Jovic wrote:
    >> I am trying to optimize integer rescaling (see example at the end).

    >
    > Optimize? As in "speed it up"?


    Yes, like use operations which are going to be faster then one
    multiplication + one division (as in the example)

    >
    >> There is a range or integer elements (in the example, elements are of
    >> type unsigned short), current range maximum, and the desired maximum.
    >> Straight forward way is to multiply by new maximum, and then to divide
    >> by old maximum (taking care about the overflow).
    >>
    >> Are there better ways of rescaling the integers?

    >
    > Better? In what way? To make it more precise perform the operation
    > using [long] doubles and then truncate or round the results.


    To do that, you need to convert integer to double, multiply and divide,
    and back to integer. That is not going to be faster. Might be more
    precise, but it shouldn't be.

    >
    >> ///////////////////////////////////
    >> #include <iostream>
    >>
    >> unsigned short a = 0x1234; // value in the range to rescale

    >
    > It's *not* in the range. The "current maximum" is smaller than this value.


    Yes, it is not a range, but lets imagine it is a randomly taken element
    from a range :) Then, having a range or randomly chosen element doesn't
    make much difference *how* will you rescale it.

    >
    >> unsigned short b = 0x3fff; // new maximum range value
    >> unsigned short c = 0x0800; // current maximum range value


    My mistake about the current maximum :
    unsigned short c = 0x2800; // current maximum range value

    >>
    >>
    >> int main()
    >> {
    >> const unsigned int d = a * b;
    >> const unsigned short e = d / c;

    >
    > It is better to store the intermediate value in a type that is
    > *guaranteed* to be larger. 'int' is not necessarily larger than
    > 'short'. The actual solution apparently depends on the implementation
    > which controls the sizes of integral values. I would probably seek a
    > platform-specific solution.


    Good point. Which NG would you suggest for x86?
     
    Vladimir Jovic, Sep 8, 2010
    #3
  4. * Vladimir Jovic, on 08.09.2010 12:43:
    > Hello,
    >
    > I am trying to optimize integer rescaling (see example at the end).
    >
    > There is a range or integer elements (in the example, elements are of type
    > unsigned short), current range maximum, and the desired maximum. Straight
    > forward way is to multiply by new maximum, and then to divide by old maximum
    > (taking care about the overflow).
    >
    > Are there better ways of rescaling the integers?
    >
    >
    >
    > ///////////////////////////////////
    > #include <iostream>
    >
    > unsigned short a = 0x1234; // value in the range to rescale
    > unsigned short b = 0x3fff; // new maximum range value
    > unsigned short c = 0x0800; // current maximum range value
    >
    >
    > int main()
    > {
    > const unsigned int d = a * b;
    > const unsigned short e = d / c;
    > std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
    > }
    > ///////////////////////////////////


    As Victor noted your 'b' is out of range. That means that you've not paid
    attention to detail. The details may or may not be important depending on the
    problem at hand, which you do not describe.

    In the general case you need to first decide on the distribution you want. For
    example, are your maximums inclusive or exclusive? The value 0x0800 seems like
    an exclusive maximum, while 0x3fff seems like an inclusive one.

    When you have decided /what/ your rescaling should do, then you should implement
    it correctly. That means e.g. avoiding overflow. The above code can easily
    overflow, so as portable code it's generally incorrect no matter the details of
    the desired scaling (it's like a car with square wheels, doesn't matter whether
    it's meant to be a sports car or a lorry, it's just wrong).

    So, yes, there are better ways.

    The first step on the road to betterness is to define the desired effect, and
    the second step is to implement that correctly. Only then worry about micro
    efficiency. If at all (and if you do, first measure, Measure, MEASURE).


    Cheers & hth.,

    - Alf

    --
    blog at <url: http://alfps.wordpress.com>
     
    Alf P. Steinbach /Usenet, Sep 8, 2010
    #4
  5. Alf P. Steinbach /Usenet wrote:
    > * Vladimir Jovic, on 08.09.2010 12:43:
    >> Hello,
    >>
    >> I am trying to optimize integer rescaling (see example at the end).
    >>
    >> There is a range or integer elements (in the example, elements are of
    >> type
    >> unsigned short), current range maximum, and the desired maximum. Straight
    >> forward way is to multiply by new maximum, and then to divide by old
    >> maximum
    >> (taking care about the overflow).
    >>
    >> Are there better ways of rescaling the integers?
    >>
    >>


    [snip old code]

    > As Victor noted your 'b' is out of range. That means that you've not
    > paid attention to detail. The details may or may not be important
    > depending on the problem at hand, which you do not describe.
    >


    Actually, the first part of the algorithm is calculating histogram of an
    image. Another part of the algorithm is using cumulative sum of the
    histogram as a lookup table.

    Since the maximum value in the histogram is number of points in the
    image (one point is 16 bits), I need to rescale all values in the
    histogram to the maximum value of 16 bits (which is 0xffff).

    All parts of the algorithm are optimized (at least I think that is
    true), except for this rescaling. I am doing it after I calculate the
    cumulative sum, and I am doing it as shown in the code example (one
    multiplication and one division). It works fine, but I was wondering if
    there is a faster way.

    > In the general case you need to first decide on the distribution you
    > want. For example, are your maximums inclusive or exclusive? The value
    > 0x0800 seems like an exclusive maximum, while 0x3fff seems like an
    > inclusive one.
    >


    Yes, as I said else thread, that is a small mistake. For completeness,
    here is the correct version :

    ///////////////////////////////////
    #include <iostream>

    unsigned short a = 0x1234; // value in the range to rescale
    unsigned short b = 0x3fff; // new maximum range value
    unsigned short c = 0x2800; // current maximum range value
    int main()
    {
    const unsigned int d = a * b;
    const unsigned short e = d / c;
    std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
    }
    ///////////////////////////////////

    > When you have decided /what/ your rescaling should do, then you should
    > implement it correctly. That means e.g. avoiding overflow. The above
    > code can easily overflow, so as portable code it's generally incorrect
    > no matter the details of the desired scaling (it's like a car with
    > square wheels, doesn't matter whether it's meant to be a sports car or a
    > lorry, it's just wrong).
    >


    I didn't really think about code portability, because my target is set
    (x86 - some pentium processor). I also wasn't thinking about the
    overflow, because that is easily fixed in proper unit tests.

    The problem I am facing is the algorithm can not be used, because it
    uses too much CPU :(

    > So, yes, there are better ways.
    >
    > The first step on the road to betterness is to define the desired
    > effect, and the second step is to implement that correctly. Only then
    > worry about micro efficiency. If at all (and if you do, first measure,
    > Measure, MEASURE).
    >


    I did measure, and the profiler told that 30% time is spent on rescaling.

    ps Is this another problem for comp.programming ? Is there a NG where to
    post optimization questions?
     
    Vladimir Jovic, Sep 8, 2010
    #5
  6. On 9/8/2010 9:36 AM, Vladimir Jovic wrote:
    > Victor Bazarov wrote:
    >> On 9/8/2010 6:43 AM, Vladimir Jovic wrote:
    >>> I am trying to optimize integer rescaling (see example at the end).

    [..]
    >>
    >> It is better to store the intermediate value in a type that is
    >> *guaranteed* to be larger. 'int' is not necessarily larger than
    >> 'short'. The actual solution apparently depends on the implementation
    >> which controls the sizes of integral values. I would probably seek a
    >> platform-specific solution.

    >
    > Good point. Which NG would you suggest for x86?


    comp.lang.asm.x86, maybe?

    V
    --
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Sep 8, 2010
    #6
  7. On 9/8/2010 10:57 AM, Vladimir Jovic wrote:
    > The problem I am facing is the algorithm can not be used, because it
    > uses too much CPU :(


    Consider utilizing SSE (that's what those are called, IIRC). There you
    can vectorize your multiplications, divisions, additions... Ask in the
    x86 newsgroup. Actually, a decent compiler should be able to vectorize
    some of those for you. BTW, "too much CPU" is not a good description of
    your code's deficiencies. Too much compared to WHAT?

    V
    --
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Sep 8, 2010
    #7
  8. Victor Bazarov wrote:
    > On 9/8/2010 10:57 AM, Vladimir Jovic wrote:
    >> The problem I am facing is the algorithm can not be used, because it
    >> uses too much CPU :(

    >
    > Consider utilizing SSE (that's what those are called, IIRC). There you
    > can vectorize your multiplications, divisions, additions... Ask in the
    > x86 newsgroup. Actually, a decent compiler should be able to vectorize


    The SSE doesn't have integer division.
    Will try x86 group

    > some of those for you. BTW, "too much CPU" is not a good description of
    > your code's deficiencies. Too much compared to WHAT?


    IMO the description is ok if you are trying to optimize the code, but
    that is not relevant to the question. If your algorithm spends too much
    time in one loop, you know you have to optimize everything in that loop.

    For example, your response in else thread to convert to double, multiply
    and divide and convert back to integer is going to slow down my algorithm.
     
    Vladimir Jovic, Sep 8, 2010
    #8
  9. Vladimir Jovic

    tni Guest

    On 2010-09-08 18:18, Vladimir Jovic wrote:
    > Victor Bazarov wrote:
    >> On 9/8/2010 10:57 AM, Vladimir Jovic wrote:
    >>> The problem I am facing is the algorithm can not be used, because it
    >>> uses too much CPU :(

    >>
    >> Consider utilizing SSE (that's what those are called, IIRC). There you
    >> can vectorize your multiplications, divisions, additions... Ask in the
    >> x86 newsgroup. Actually, a decent compiler should be able to vectorize

    >
    > The SSE doesn't have integer division.
    > Will try x86 group
    >
    >> some of those for you. BTW, "too much CPU" is not a good description
    >> of your code's deficiencies. Too much compared to WHAT?

    >
    > IMO the description is ok if you are trying to optimize the code, but
    > that is not relevant to the question. If your algorithm spends too much
    > time in one loop, you know you have to optimize everything in that loop.
    >
    > For example, your response in else thread to convert to double, multiply
    > and divide and convert back to integer is going to slow down my algorithm.


    You can multiply by the reciprocal.

    Since your input is 16-bits only, float instead of double should be enough.

    Floating point to int conversion is quite slow in C/C++ on x86, you
    should look at SSE compiler intrinsics (that stuff is portable across
    the major x86/x64 compilers). With SSE, you can also do 4 conversions in
    parallel.

    \\

    You seem to have compile time constants for your scaling. If you make
    those available to the compiler (declare them as integral const values),
    the compiler will be able to optimize away the division (at least Visual
    C++, GCC do).
     
    tni, Sep 8, 2010
    #9
  10. Vladimir Jovic

    Daniel Pitts Guest

    On 9/8/2010 7:57 AM, Vladimir Jovic wrote:
    > Alf P. Steinbach /Usenet wrote:
    >> * Vladimir Jovic, on 08.09.2010 12:43:
    >>> Hello,
    >>>
    >>> I am trying to optimize integer rescaling (see example at the end).
    >>>
    >>> There is a range or integer elements (in the example, elements are of
    >>> type
    >>> unsigned short), current range maximum, and the desired maximum.
    >>> Straight
    >>> forward way is to multiply by new maximum, and then to divide by old
    >>> maximum
    >>> (taking care about the overflow).
    >>>
    >>> Are there better ways of rescaling the integers?
    >>>
    >>>

    >
    > [snip old code]
    >
    >> As Victor noted your 'b' is out of range. That means that you've not
    >> paid attention to detail. The details may or may not be important
    >> depending on the problem at hand, which you do not describe.
    >>

    >
    > Actually, the first part of the algorithm is calculating histogram of an
    > image. Another part of the algorithm is using cumulative sum of the
    > histogram as a lookup table.
    >
    > Since the maximum value in the histogram is number of points in the
    > image (one point is 16 bits), I need to rescale all values in the
    > histogram to the maximum value of 16 bits (which is 0xffff).
    >
    > All parts of the algorithm are optimized (at least I think that is
    > true), except for this rescaling. I am doing it after I calculate the
    > cumulative sum, and I am doing it as shown in the code example (one
    > multiplication and one division). It works fine, but I was wondering if
    > there is a faster way.
    >
    >> In the general case you need to first decide on the distribution you
    >> want. For example, are your maximums inclusive or exclusive? The value
    >> 0x0800 seems like an exclusive maximum, while 0x3fff seems like an
    >> inclusive one.
    >>

    >
    > Yes, as I said else thread, that is a small mistake. For completeness,
    > here is the correct version :
    >
    > ///////////////////////////////////
    > #include <iostream>
    >
    > unsigned short a = 0x1234; // value in the range to rescale
    > unsigned short b = 0x3fff; // new maximum range value
    > unsigned short c = 0x2800; // current maximum range value
    > int main()
    > {
    > const unsigned int d = a * b;
    > const unsigned short e = d / c;
    > std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
    > }
    > ///////////////////////////////////
    >
    >> When you have decided /what/ your rescaling should do, then you should
    >> implement it correctly. That means e.g. avoiding overflow. The above
    >> code can easily overflow, so as portable code it's generally incorrect
    >> no matter the details of the desired scaling (it's like a car with
    >> square wheels, doesn't matter whether it's meant to be a sports car or
    >> a lorry, it's just wrong).
    >>

    >
    > I didn't really think about code portability, because my target is set
    > (x86 - some pentium processor). I also wasn't thinking about the
    > overflow, because that is easily fixed in proper unit tests.
    >
    > The problem I am facing is the algorithm can not be used, because it
    > uses too much CPU :(
    >
    >> So, yes, there are better ways.
    >>
    >> The first step on the road to betterness is to define the desired
    >> effect, and the second step is to implement that correctly. Only then
    >> worry about micro efficiency. If at all (and if you do, first measure,
    >> Measure, MEASURE).
    >>

    >
    > I did measure, and the profiler told that 30% time is spent on rescaling.

    Depending on the size of the image, you can try generating a lookup
    table. You can use something like Bresenham's Line Algorithm to build
    that table using only integers.
    >
    > ps Is this another problem for comp.programming ? Is there a NG where to
    > post optimization questions?

    comp.programming is a better place, or comp.graphics.algorithms.

    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
     
    Daniel Pitts, Sep 8, 2010
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?Sm9l?=

    CType(x,Integer) vs. Integer.Parse(x)

    =?Utf-8?B?Sm9l?=, Feb 6, 2006, in forum: ASP .Net
    Replies:
    7
    Views:
    5,966
    =?Utf-8?B?RGF2aWQgQW50b24=?=
    Feb 7, 2006
  2. =?ISO-8859-1?Q?Thomas_Gagn=E9?=

    No Math.min(Integer, Integer)?

    =?ISO-8859-1?Q?Thomas_Gagn=E9?=, Jul 29, 2003, in forum: Java
    Replies:
    0
    Views:
    519
    =?ISO-8859-1?Q?Thomas_Gagn=E9?=
    Jul 29, 2003
  3. Sebastian Stelzer

    How do I add an Integer to another Integer?

    Sebastian Stelzer, Oct 14, 2004, in forum: Java
    Replies:
    2
    Views:
    512
    Yu SONG
    Oct 15, 2004
  4. Elliot
    Replies:
    3
    Views:
    289
    Knute Johnson
    Nov 14, 2007
  5. Ben Phillips

    Rescaling image turns it blue?

    Ben Phillips, Oct 7, 2008, in forum: Java
    Replies:
    13
    Views:
    775
    Ben Phillips
    Oct 9, 2008
Loading...

Share This Page