Scripting and block data manipulation- how to preserve performance

Discussion in 'C++' started by Code4u, Jul 17, 2005.

  1. Code4u

    Code4u Guest

    I have an application that manipulates large arrays of image data of
    various types, all the usual arithmetic operations on the data objects
    are supported. With careful design and tricks similar to those in
    UBLAS (http://www.boost.org/libs/numeric/ublas/doc/) I have been able
    to avoid temporaries where appropriate and carefully optimize code
    (C++) that evaluates expressions, for example

    Image A, B, C, D, E
    A=((B+C)/D)+E

    In this expression A is evaluated directly by looping through the
    elements of B,C,D,E.

    This all works fine and dandy if the image expressions are known at
    compile time, but when it comes to scripting at run-time the
    opimizations are lost. The only technique that preserves efficiency is
    to generate object code for the image expression at run-time. In other
    words build a simple expression compiler into the application and have
    it generate native assembler as needed.

    Is there precedent for this type of approach? Comments, ideas and
    pointers to existing implementations would be most welcome.
    Code4u, Jul 17, 2005
    #1
    1. Advertising

  2. Code4u wrote:

    > UBLAS (http://www.boost.org/libs/numeric/ublas/doc/) I have been able
    > to avoid temporaries where appropriate and carefully optimize code
    > (C++) that evaluates expressions, for example
    >
    > Image A, B, C, D, E
    > A=((B+C)/D)+E
    >
    > In this expression A is evaluated directly by looping through the
    > elements of B,C,D,E.
    >
    > This all works fine and dandy if the image expressions are known at
    > compile time, but when it comes to scripting at run-time the
    > opimizations are lost. The only technique that preserves efficiency
    > is to generate object code for the image expression at run-time.


    Not the only. Your scripting can do the same that the C++ compiler does,
    interpret adequately the expression and generate calls to the compose
    operator functions. The difference in speed from this approach to generated
    machine code will be very small, and the complexity and portability much
    better.

    --
    Salu2
    =?ISO-8859-15?Q?Juli=E1n?= Albo, Jul 17, 2005
    #2
    1. Advertising

  3. Code4u

    Code4u Guest

    On Sun, 17 Jul 2005 23:57:11 +0200, Julián Albo <>
    wrote:

    >Code4u wrote:
    >
    >> UBLAS (http://www.boost.org/libs/numeric/ublas/doc/) I have been able
    >> to avoid temporaries where appropriate and carefully optimize code
    >> (C++) that evaluates expressions, for example
    >>
    >> Image A, B, C, D, E
    >> A=((B+C)/D)+E
    >>
    >> In this expression A is evaluated directly by looping through the
    >> elements of B,C,D,E.
    >>
    >> This all works fine and dandy if the image expressions are known at
    >> compile time, but when it comes to scripting at run-time the
    >> opimizations are lost. The only technique that preserves efficiency
    >> is to generate object code for the image expression at run-time.

    >
    >Not the only. Your scripting can do the same that the C++ compiler does,
    >interpret adequately the expression and generate calls to the compose
    >operator functions. The difference in speed from this approach to generated
    >machine code will be very small, and the complexity and portability much
    >better.


    If I understand you correctly (perhaps not), I don't see how this
    would generate efficient code. It's pretty easy to generate a series
    of operator calls at run-time but this will not result in the loop
    unrolling that maximises efficiency. In the above example the
    following operators would be called in sequence:

    operator+
    operator/
    operator+

    In each operator call a loop iterates over the scalar values. The
    problem is it is much less efficient than a combined operation:

    for (size_t i=0; i<scalarCount; ++i)
    {
    a=((b+c)/d)+e;
    }

    In case the reason is not apparent- the above code will make much
    better use of CPU cache because of the higher locality of reference
    and uses a trivial amount of temporary storage. When tested with the
    Visual Studio 2003 compiler I measure an average slowdown of x2 for
    the version that uses composition of operators.

    But how do we generate the equivalent object code at run-time? One
    approach is to build an expression compiler into the application,
    generate the required assembler, and call it.
    Code4u, Jul 18, 2005
    #3
  4. Code4u wrote:

    >>> This all works fine and dandy if the image expressions are known at
    >>> compile time, but when it comes to scripting at run-time the
    >>> opimizations are lost. The only technique that preserves efficiency
    >>> is to generate object code for the image expression at run-time.

    >>
    >>Not the only. Your scripting can do the same that the C++ compiler does,
    >>interpret adequately the expression and generate calls to the compose
    >>operator functions. The difference in speed from this approach to
    >>generated machine code will be very small, and the complexity and
    >>portability much better.

    >
    > If I understand you correctly (perhaps not), I don't see how this
    > would generate efficient code. It's pretty easy to generate a series
    > of operator calls at run-time but this will not result in the loop
    > unrolling that maximises efficiency. In the above example the
    > following operators would be called in sequence:
    >
    > operator+
    > operator/
    > operator+
    >
    > In each operator call a loop iterates over the scalar values. The
    > problem is it is much less efficient than a combined operation:
    >
    > for (size_t i=0; i<scalarCount; ++i)
    > {
    > a=((b+c)/d)+e;
    > }


    And that is that I say. Your scripting engine can analyze the sequence and
    call the combined operation instead of each individual operation in
    sequence.

    The loop unrolling probably can't be done in the scripting without great
    effort. But if the operators are costly (and if you are working with image
    data I suppose they are) the benefits of the loop unrolling will be
    minimal.

    And if you want the maximum efficience at all cost... well, you can write
    the entire application in hand-optimized assembler.

    > But how do we generate the equivalent object code at run-time? One
    > approach is to build an expression compiler into the application,
    > generate the required assembler, and call it.


    Don't know how your existing scripting engine actually works. If it
    generates a stack machine, for example, a simple check os sequences of
    operations in the machine can probably do the work. If it is a hand-coded
    parser can be harder.

    --
    Salu2
    =?ISO-8859-15?Q?Juli=E1n?= Albo, Jul 18, 2005
    #4
  5. Code4u

    Me Guest

    Code4u wrote:
    > I have an application that manipulates large arrays of image data of
    > various types, all the usual arithmetic operations on the data objects
    > are supported. With careful design and tricks similar to those in
    > UBLAS (http://www.boost.org/libs/numeric/ublas/doc/) I have been able
    > to avoid temporaries where appropriate and carefully optimize code
    > (C++) that evaluates expressions, for example
    >
    > Image A, B, C, D, E
    > A=((B+C)/D)+E
    >
    > In this expression A is evaluated directly by looping through the
    > elements of B,C,D,E.
    >
    > This all works fine and dandy if the image expressions are known at
    > compile time, but when it comes to scripting at run-time the
    > opimizations are lost. The only technique that preserves efficiency is
    > to generate object code for the image expression at run-time. In other
    > words build a simple expression compiler into the application and have
    > it generate native assembler as needed.
    >
    > Is there precedent for this type of approach? Comments, ideas and
    > pointers to existing implementations would be most welcome.


    I'd profile this to see if it is really necessary (I'm assuming you're
    doing this to eliminate temporaries and not to increase precision). For
    example, instead of trying to convert that expression to:

    for (size_t i = 0; i < len; ++i)
    A=((B+C)/D)+E;

    somehow. Try converting it to:

    for (size_t i = 0; i < len; ++i)
    A = B + C;

    for (size_t i = 0; i < len; ++i)
    A = A / D;

    for (size_t i = 0; i < len; ++i)
    A = A + E;

    which doesn't involve any runtime code generation at all.
    Me, Jul 18, 2005
    #5
  6. Code4u

    Code4u Guest

    On Mon, 18 Jul 2005 08:34:53 +0200, Julián Albo <>
    wrote:

    >And that is that I say.


    Huh?

    >Your scripting engine can analyze the sequence and
    >call the combined operation instead of each individual operation in
    >sequence.
    >
    >The loop unrolling probably can't be done in the scripting without great
    >effort. But if the operators are costly (and if you are working with image
    >data I suppose they are) the benefits of the loop unrolling will be
    >minimal.
    >


    Actually it's not, the average is a 2x speedup. That's huge.

    >And if you want the maximum efficience at all cost... well, you can write
    >the entire application in hand-optimized assembler.


    I think there's a disconnect here. You don't seem to understand the
    issue at hand.
    Code4u, Jul 18, 2005
    #6
  7. Code4u wrote:

    > I think there's a disconnect here. You don't seem to understand the
    > issue at hand.


    Maybe, but it's also possible thay you don't undesrtand what I propose.

    --
    Salu2
    =?ISO-8859-15?Q?Juli=E1n?= Albo, Jul 18, 2005
    #7
  8. Code4u

    Code4u Guest

    On Mon, 18 Jul 2005 16:15:16 +0200, Julián Albo <>
    wrote:

    >Code4u wrote:
    >
    >> I think there's a disconnect here. You don't seem to understand the
    >> issue at hand.

    >
    >Maybe, but it's also possible thay you don't undesrtand what I propose.


    You proposed composing the expression at run-time, yes? Such
    composition is what I'm trying to avoid because it does not make
    efficient use of cache. If I'm wrong, please restate and help me
    understand what you propose, simple source code would help.

    Thanks.
    Code4u, Jul 18, 2005
    #8
  9. Code4u wrote:

    > You proposed composing the expression at run-time, yes? Such
    > composition is what I'm trying to avoid because it does not make
    > efficient use of cache.


    Then I misunderstand you, I thinked you were worried about the use of
    individual operations instead of the composites.

    --
    Salu2
    =?ISO-8859-15?Q?Juli=E1n?= Albo, Jul 18, 2005
    #9
  10. Code4u

    Calum Grant Guest

    Code4u wrote:
    > I have an application that manipulates large arrays of image data of
    > various types, all the usual arithmetic operations on the data objects
    > are supported. With careful design and tricks similar to those in
    > UBLAS (http://www.boost.org/libs/numeric/ublas/doc/) I have been able
    > to avoid temporaries where appropriate and carefully optimize code
    > (C++) that evaluates expressions, for example
    >
    > Image A, B, C, D, E
    > A=((B+C)/D)+E
    >
    > In this expression A is evaluated directly by looping through the
    > elements of B,C,D,E.
    >
    > This all works fine and dandy if the image expressions are known at
    > compile time, but when it comes to scripting at run-time the
    > opimizations are lost. The only technique that preserves efficiency is
    > to generate object code for the image expression at run-time. In other
    > words build a simple expression compiler into the application and have
    > it generate native assembler as needed.
    >
    > Is there precedent for this type of approach? Comments, ideas and
    > pointers to existing implementations would be most welcome.


    I imagine that Matlab would be quite good at this sort of thing. So try
    to find out how it does it.
    Calum Grant, Jul 19, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Showjumper
    Replies:
    1
    Views:
    690
    Showjumper
    Mar 19, 2005
  2. Ron Stephens
    Replies:
    23
    Views:
    2,788
    Ron Stephens
    Apr 12, 2004
  3. DaveInSidney
    Replies:
    0
    Views:
    399
    DaveInSidney
    May 9, 2005
  4. morrell
    Replies:
    1
    Views:
    933
    roy axenov
    Oct 10, 2006
  5. Grzegorz Chrupala
    Replies:
    2
    Views:
    192
    Grzegorz Chrupala
    Jun 30, 2003
Loading...

Share This Page