Multibyte add & subtract

Discussion in 'C Programming' started by valinor@linuxmail.org, Apr 28, 2006.

  1. Guest

    Hi guys,

    (rather lengthy...)

    I'm trying to speed up the time spent on a postfilter for video.
    YUV 4:2:0 data, each pixel is 1 byte (0-255)

    The basic idea is to filter one pixel on each side of a 8-pixel border.
    The filter used is a variant of (1,1,-4,1,1).

    In the example below I do a vertical filtering of line n and the
    diff for pixel c1 is calculated as
    diff(c1) = a1+b1+(c1<<2)+d1+e1 (1)
    c2 as
    diff(c2) = a2+b2+(c2<<2)+d2+e2
    etc.

    Pixel 1.2.3.4.
    --------------
    n-2 a1a2a3a4
    n-1 b1b2b3b4
    n c1c2c3c4
    ----- pixel border----
    n+1 d12d3d4d
    n+2 e1e2e3e4

    The current implementation reads the values of a1,b1,c1,d1,e1 one byte
    at a time, do the calculation and write back the filtered value for c1.
    I.e something close to the code below:
    imdifftmp = *(ImageSrc_p-w2);
    imdiff2 = *(ImageSrc_p-w2+1);
    ...
    imdiff8 = *(ImageSrc_p-w2+7);
    imdifftmp += *(ImageSrc_p-width);
    imdiff2 += *(ImageSrc_p-width+1);
    ...
    imdiff8 += *(ImageSrc_p-width+7);
    imdifftmp -= (*(ImageSrc_p)) << 2;
    imdiff2 -= (*(ImageSrc_p+1)) << 2;
    ...
    imdiff8 -= (*(ImageSrc_p+7)) << 2;
    imdifftmp += *(ImageSrc_p+width);
    imdiff2 += *(ImageSrc_p+width+1);
    ...
    imdiff8 += *(ImageSrc_p+width+7);
    imdifftmp += *(ImageSrc_p+w2);
    imdiff2 += *(ImageSrc_p+w2+1);
    ...
    imdiff8 += *(ImageSrc_p+w2+7);

    Not very efficient on a 32-bit machine! What I'm trying to achive is
    to read a 32-bit word containing 4 pixel values, do the calculation
    an a whole word and write back a word. After some googeling I found
    the book "Hackers Delight" by Henry S. Warren, Jr. He presents such
    a method implemented by the two macros below:

    //Multibyte Add of 4 1-byte integers packed into a word
    #define MBA(x, y, s)\
    do{\
    s = ((x)&0x7f7f7f7f)+((y)&0x7f7f7f7f); \
    s = (((x)^(y))&0x80808080)^s; \
    // printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\
    }while(0)

    //Multibyte Subtract of 4 1-byte integers packed into a word
    #define MBS(x, y, d)\
    do{\
    d = ((x)|0x80808080)-((y)&0x7f7f7f7f); \
    d = ~((((x)^(y))|0x7f7f7f7f)^d); \
    // printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\
    }while(0)

    He also states that the operation below gives the carry into each
    position
    (where ¤ in this case denotes bitwise exclusive or (^):
    (x¤y)¤x¤y

    These macros works great for small values! The problem is how to handle
    the carry so that the correct values after the calculations in (1)
    can be extracted. My question (finally!) is:
    How can I (if it is possible) handle the carry to recreate the correct
    signed integer value after the calculations above?

    Some sample code below:

    void main(void){
    long a1 = 0xc7c8c9ca;
    long b1 = 0xc8c9cacb;
    long c1 = 0xddc8c9ca;
    long d1 = 0xcacbcccd;
    long e1 = 0xcbcccdce;

    MBA(a1,b1,s1); //a+b
    MBA(s1,d1,s2); //+d
    MBA(s2,e1,s1); //+e
    MBS(s1,c1,s2); //-c (is it possible to do the -(c<<2) part smarter?
    MBS(s2,c1,s1); //-c
    MBS(s1,c1,s2); //-c
    MBS(s2,c1,s1); //-c

    //Extract MSB Byte (B0) and add carry stuff...

    printf("\nvalue after macros %08lX, value after calc %08lX\n", s1,
    0xc7+0xc8-(0xdd<<2)+0xca+0xcb);
    }

    Gives:
    carry 9F939794
    carry 1F073F3A
    carry B7B9BF9C
    carry F8101000
    carry BF81879C
    carry FF313730
    carry 3B818384
    value after macros B0080808, value after calc FFFFFFB0
    -- --------
    ^ ^
    |----------------------------|
    |
    Same value for different methods

    Cheers
    //Fredrik
    , Apr 28, 2006
    #1
    1. Advertising

  2. Thad Smith Guest

    wrote:

    > I'm trying to speed up the time spent on a postfilter for video.
    > YUV 4:2:0 data, each pixel is 1 byte (0-255)
    >
    > In the example below I do a vertical filtering of line n and the
    > diff for pixel c1 is calculated as
    > diff(c1) = a1+b1+(c1<<2)+d1+e1 (1)
    > ...
    > Not very efficient on a 32-bit machine! What I'm trying to achive is
    > to read a 32-bit word containing 4 pixel values, do the calculation
    > an a whole word and write back a word. After some googeling I found
    > the book "Hackers Delight" by Henry S. Warren, Jr. He presents such
    > a method implemented by the two macros below:
    >
    > //Multibyte Add of 4 1-byte integers packed into a word
    > #define MBA(x, y, s)\
    > do{\
    > s = ((x)&0x7f7f7f7f)+((y)&0x7f7f7f7f); \
    > s = (((x)^(y))&0x80808080)^s; \
    > // printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\
    > }while(0)
    >
    > //Multibyte Subtract of 4 1-byte integers packed into a word
    > #define MBS(x, y, d)\
    > do{\
    > d = ((x)|0x80808080)-((y)&0x7f7f7f7f); \
    > d = ~((((x)^(y))|0x7f7f7f7f)^d); \
    > // printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\
    > }while(0)


    Each of the 8-bit fields is added mod 2^8.

    > These macros works great for small values! The problem is how to handle
    > the carry so that the correct values after the calculations in (1)
    > can be extracted. My question (finally!) is:
    > How can I (if it is possible) handle the carry to recreate the correct
    > signed integer value after the calculations above?
    >
    > Some sample code below:
    >
    > void main(void){
    > long a1 = 0xc7c8c9ca;
    > long b1 = 0xc8c9cacb;
    > long c1 = 0xddc8c9ca;
    > long d1 = 0xcacbcccd;
    > long e1 = 0xcbcccdce;
    >
    > MBA(a1,b1,s1); //a+b
    > MBA(s1,d1,s2); //+d
    > MBA(s2,e1,s1); //+e
    > MBS(s1,c1,s2); //-c (is it possible to do the -(c<<2) part smarter?
    > MBS(s2,c1,s1); //-c
    > MBS(s1,c1,s2); //-c
    > MBS(s2,c1,s1); //-c
    > ...
    > value after macros B0080808, value after calc FFFFFFB0
    > -- --------
    > ^ ^
    > |----------------------------|
    > |
    > Same value for different methods


    As you note, the 8 lsbs are correct. If you can guarantee that the
    difference in pixel value over points a - e is less than 64, you can
    simply use the msb as the sign bit. In your example the msb of B0 = 1,
    so sign extend the bit.

    If you make no assumptions about value range in the group, then the
    range of computed value is -4*255 to 4*255. That requires 11 bits to
    uniquely represent each value. You could represent each pixel as 11
    bits, with the initial 3 msbs = 0. You could thus pack 2 pixels in a
    32-bit word or 5 pixels in a 64-bit word. If you can guarantee a pixel
    value difference of 128 or less in each 5 point group, you could get by
    with 10 bits/pixel, packing 3 pixels per 32-bit word.

    If you choose to use two 11 bit pixels in a 32-bit word, you might as
    well pack 2 16-bit values per 32-bit word, which gives easier packing
    and unpacking.

    --
    Thad
    Thad Smith, Apr 29, 2006
    #2
    1. Advertising

  3. <> wrote in message
    news:...

    Valinor,

    I've made some corrections. Don't let those get to you. There are some
    useful non-correction related comments below.

    > I'm trying to speed up the time spent on a postfilter for video.
    > YUV 4:2:0 data, each pixel is 1 byte (0-255)
    >
    > The basic idea is to filter one pixel on each side of a 8-pixel border.
    > The filter used is a variant of (1,1,-4,1,1).
    >
    > In the example below I do a vertical filtering of line n and the
    > diff for pixel c1 is calculated as
    > diff(c1) = a1+b1+(c1<<2)+d1+e1 (1)
    > c2 as
    > diff(c2) = a2+b2+(c2<<2)+d2+e2
    > etc.
    >
    > Pixel 1.2.3.4.
    > --------------
    > n-2 a1a2a3a4
    > n-1 b1b2b3b4
    > n c1c2c3c4
    > ----- pixel border----
    > n+1 d12d3d4d
    > n+2 e1e2e3e4
    >

    <snip>
    >
    > //Multibyte Add of 4 1-byte integers packed into a word
    > #define MBA(x, y, s)\
    > do{\
    > s = ((x)&0x7f7f7f7f)+((y)&0x7f7f7f7f); \
    > s = (((x)^(y))&0x80808080)^s; \
    > // printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\


    The C++ comments create a multi-line comment according to GCC. Rewrite like
    so:

    /* printf("\ncarry %08lX", ((x)+(y))^(x)^(y)); */ \

    > }while(0)
    >
    > //Multibyte Subtract of 4 1-byte integers packed into a word
    > #define MBS(x, y, d)\
    > do{\
    > d = ((x)|0x80808080)-((y)&0x7f7f7f7f); \
    > d = ~((((x)^(y))|0x7f7f7f7f)^d); \
    > // printf("\ncarry %08lX", ((x)+(y))^(x)^(y));\


    The C++ comments create a multi-line comment according to GCC. Rewrite like
    so:

    /* printf("\ncarry %08lX", ((x)+(y))^(x)^(y)); */ \

    > }while(0)
    >
    > He also states that the operation below gives the carry into each
    > position
    > (where ¤ in this case denotes bitwise exclusive or (^):
    > (x¤y)¤x¤y
    >
    > These macros works great for small values! The problem is how to handle
    > the carry so that the correct values after the calculations in (1)
    > can be extracted. My question (finally!) is:
    > How can I (if it is possible) handle the carry to recreate the correct
    > signed integer value after the calculations above?
    >


    The MBS macro _appears_ (you'll need to confirm) to be calculating two's
    complement correctly. This means that the values _should_ be correctly
    signed when you extract each byte and cast them from an unsigned variable to
    a signed one. This is because most compilers use two's complement for
    negative integers.

    > Some sample code below:
    >
    > void main(void){


    #include <stdio.h>
    #include <stdlib.h>
    int main(void) { /* corrected */

    > long a1 = 0xc7c8c9ca;
    > long b1 = 0xc8c9cacb;
    > long c1 = 0xddc8c9ca;
    > long d1 = 0xcacbcccd;
    > long e1 = 0xcbcccdce;
    >


    long s1,s2; /* missing */

    > MBA(a1,b1,s1); //a+b
    > MBA(s1,d1,s2); //+d
    > MBA(s2,e1,s1); //+e
    > MBS(s1,c1,s2); //-c (is it possible to do the -(c<<2) part smarter?
    > MBS(s2,c1,s1); //-c
    > MBS(s1,c1,s2); //-c
    > MBS(s2,c1,s1); //-c
    >
    > //Extract MSB Byte (B0) and add carry stuff...
    >
    > printf("\nvalue after macros %08lX, value after calc %08lX\n", s1,
    > 0xc7+0xc8-(0xdd<<2)+0xca+0xcb);


    return(EXIT_SUCCESS); /* corrected */

    > }
    >


    In (1) above, you _add_ (c1<<2), but here you _subtract_ (c1<<2). Did you
    want MBS() or MBA()?

    > MBS(s1,c1,s2); //-c (is it possible to do the -(c<<2) part smarter?


    Yes, replace the four lines that compute (c1<<2), with (if you wanted MBS,
    otherwise change to MBA):

    MBS(s1,((c1&0x3f3f3f3f)<<2),s2); //-c (is it possible to do the -(c<<2) part
    smarter?

    > Gives:
    > carry 9F939794
    > carry 1F073F3A
    > carry B7B9BF9C
    > carry F8101000
    > carry BF81879C
    > carry FF313730
    > carry 3B818384
    > value after macros B0080808, value after calc FFFFFFB0


    Sorry, I didn't check these.


    Rod Pemberton
    Rod Pemberton, Apr 29, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Manoj Nair
    Replies:
    4
    Views:
    21,329
    Manoj Nair
    Sep 4, 2003
  2. Lutek

    How to subtract dates ?

    Lutek, Jan 4, 2005, in forum: Java
    Replies:
    6
    Views:
    52,991
    pac0rro
    Apr 3, 2009
  3. Replies:
    8
    Views:
    447
    osmium
    Oct 14, 2006
  4. clarken

    VHDL for add/subtract

    clarken, Dec 20, 2007, in forum: VHDL
    Replies:
    0
    Views:
    1,503
    clarken
    Dec 20, 2007
  5. Replies:
    8
    Views:
    436
    Lubna tt
    Jun 26, 2005
Loading...

Share This Page