Row-wise vs. column-wise image processing

Discussion in 'C++' started by Enrique Cruiz, Jan 25, 2007.

  1. Hello all,

    I am currently implementing a fairly simple algorithm. It scans a
    grayscale image, and computes a pixel's new value as a function of its
    original value. Two passes are made, first horizontally and second
    vertically. The problem I have is that the vertical pass is 3 to 4
    times slower than the horizontal, although the code is _exactly_ the
    same in both cases?!

    The code itself is very simple. There are 2 loops to scan through rows
    and colums (horizontal pass), or columns and rows (vertical pass)
    depending on the pass. The processing part is simple and is contained
    in a single line. 'pixel' is a pointer to the current pixel. The new
    value of the current pixel is first updated, and the pointer is then
    incremented to the next pixel, which is either in the next column or in
    the next row depending on the pass. I should add that the image is
    stored as a large 1-D vector, i.e. the values of each rows are
    contiguous in memory. Here is a simplified version of the code:

    ####################
    // HORIZONTAL
    // loop for every row
    ....
    // then for every column
    for(col=firstCol+1 ; col<=lastCol-1 ; ++col)
    {
    *pixel = (*pixel) * 2 / 3;
    pixel++;
    }

    // VERTICAL
    // loop for every column
    ....
    // then for every row
    for(row=firstRow+1 ; row<=lastRow-1 ; ++row)
    {
    *pixel = (*pixel) * 2 / 3;
    pixel+=imgWidth;
    }
    ##################

    For this small amount of code, timings are as follow:
    - horizontal = 0.035 sec.
    - vertical =   0.135 sec.

    Now if we simply remove in each case the line updating the pixel
    pointer (i.e. "pixel++;" and "pixel+=imgWidth;"), the timings then
    becomes equal at 0.081. This simple instruction is responsible for the
    massive loss of time. And I have no idea why.

    My only guess relates to memory management issues. Since the image is
    stored row-wise, the current and next values are physically next in
    memory during the horizontal pass. On the other hand, for the vertical
    pass, the next value is stored in the next row, and the distance
    between them becomes 'image_width'. My guess is that the next pixel
    value in such a case is not close enough to be stored in the processor
    cache or register. The processor has to fetch it from memory, hence the
    massive loss in speed. This is however just a guess.

    I would really appreciate if anybody could enlighten me on this topic.

    Thanks in advance,


    Enrique
     
    Enrique Cruiz, Jan 25, 2007
    #1
    1. Advertising

  2. Enrique Cruiz

    peter koch Guest

    On Jan 25, 11:59 am, Enrique Cruiz <>
    wrote:
    > Hello all,
    >
    > I am currently implementing a fairly simple algorithm. It scans a
    > grayscale image, and computes a pixel's new value as a function of its
    > original value. Two passes are made, first horizontally and second
    > vertically. The problem I have is that the vertical pass is 3 to 4
    > times slower than the horizontal, although the code is _exactly_ the
    > same in both cases?!
    >

    [snip]
    Modern CPU's are complex beasts doing lots of stuff to improve
    performance. One such thing is caching of memory, and that is what
    happens here: the horisontal pass can exploit the cache much better
    than the vertical pass.

    /Peter
     
    peter koch, Jan 25, 2007
    #2
    1. Advertising

  3. Enrique Cruiz wrote:

    This is not a C++ question as such, so it's kind of off topic, but
    anyway...

    > // then for every column
    > for(col=firstCol+1 ; col<=lastCol-1 ; ++col)
    > {
    > *pixel = (*pixel) * 2 / 3;
    > pixel++;
    >
    > }


    Normally one would use the loop-variable inside the loop, I suspect
    that this is not the real code you are showing. In the real code there
    might be something else that makes the difference, though I doubt it.

    > My only guess relates to memory management issues. Since the image is
    > stored row-wise, the current and next values are physically next in
    > memory during the horizontal pass. On the other hand, for the vertical
    > pass, the next value is stored in the next row, and the distance
    > between them becomes 'image_width'. My guess is that the next pixel
    > value in such a case is not close enough to be stored in the processor
    > cache or register. The processor has to fetch it from memory, hence the
    > massive loss in speed. This is however just a guess.


    Probably yes, many factors come into play, the size of the image, the
    size of the caches or the CPU, how much speculation the CPU does and so
    on. A modern CPU often speculates that you don't only want to use the
    piece of memory you try to access but also those pieces close to it.
    This helps in the row-wise run since it allows the CPU to read in the
    memory before it is actually used, in the column-wise run it works
    against you by pulling in more than you need. If you have a smaller
    image you might be able to squeeze it all into the cache which should
    make both runs equally fast.

    --
    Erik Wikström
     
    =?iso-8859-1?q?Erik_Wikstr=F6m?=, Jan 25, 2007
    #3
  4. On 2007-01-25 12:02:42 +0000, "peter koch" <> said:
    >
    > Modern CPU's are complex beasts doing lots of stuff to improve
    > performance. One such thing is caching of memory, and that is what
    > happens here: the horisontal pass can exploit the cache much better
    > than the vertical pass.


    Thanks. Do you know any resources that discuss similar issues related
    to (agressive) optimization?

    Alexis
     
    Enrique Cruiz, Jan 25, 2007
    #4
  5. On 2007-01-25 12:05:36 +0000, "Erik Wikström"
    <> said:
    > Probably yes, many factors come into play, the size of the image, the
    > size of the caches or the CPU, how much speculation the CPU does and so
    > on. A modern CPU often speculates that you don't only want to use the
    > piece of memory you try to access but also those pieces close to it.
    > This helps in the row-wise run since it allows the CPU to read in the
    > memory before it is actually used, in the column-wise run it works
    > against you by pulling in more than you need. If you have a smaller
    > image you might be able to squeeze it all into the cache which should
    > make both runs equally fast.


    Thanks a lot for the informative answer. Do you know good resources
    that discuss similar issues related to optimisation?

    Enrique
     
    Enrique Cruiz, Jan 25, 2007
    #5
  6. Enrique Cruiz

    Jim Langston Guest

    "Enrique Cruiz" <> wrote in message
    news:2007012514482643658-jni6l03mdo6nvuu@jetableorg...
    > On 2007-01-25 12:05:36 +0000, "Erik Wikström" <>
    > said:
    >> Probably yes, many factors come into play, the size of the image, the
    >> size of the caches or the CPU, how much speculation the CPU does and so
    >> on. A modern CPU often speculates that you don't only want to use the
    >> piece of memory you try to access but also those pieces close to it.
    >> This helps in the row-wise run since it allows the CPU to read in the
    >> memory before it is actually used, in the column-wise run it works
    >> against you by pulling in more than you need. If you have a smaller
    >> image you might be able to squeeze it all into the cache which should
    >> make both runs equally fast.

    >
    > Thanks a lot for the informative answer. Do you know good resources that
    > discuss similar issues related to optimisation?
    >
    > Enrique


    Also, consider that
    pixel++;
    is less assembly than:
    pixel+=imgWidth;

    I mean, pixel++; may be something as simple as:
    INC [pixel]
    (although it may be more like)
    MOV AX,[pixel]
    INC AX;
    MOV [pixel],AX

    Where pixel+=imgWidth; would involve some memory swapping and an add.
    Some I've seen are like:
    MOV AX,[pixel];
    MOV BX,[imgWidth];
    ADD AX,BX
    MOV [pixel],DX

    or similar. without looking at the assembly I couldn't say. I learned
    assembly back in the 80x86 days and I"m sure it's changed a bit since then.
     
    Jim Langston, Jan 25, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Soumyadip Rakshit

    copy array - row-wise

    Soumyadip Rakshit, Feb 15, 2006, in forum: C Programming
    Replies:
    5
    Views:
    326
    Barry Schwarz
    Feb 17, 2006
  2. Enrique Cruiz

    Row-wise vs. column-wise image processing

    Enrique Cruiz, Jan 25, 2007, in forum: C Programming
    Replies:
    10
    Views:
    725
    christian.bau
    Jan 26, 2007
  3. Soumyadip Rakshit

    row-wise copying of 2D arrays

    Soumyadip Rakshit, Feb 15, 2006, in forum: C++
    Replies:
    2
    Views:
    283
    default
    Feb 15, 2006
  4. D
    Replies:
    0
    Views:
    220
  5. Chris Lowis

    Parsing a CSV file column-wise

    Chris Lowis, Sep 1, 2008, in forum: Ruby
    Replies:
    10
    Views:
    227
    James Gray
    Sep 8, 2008
Loading...

Share This Page