The container abstraction and parallel programming

Discussion in 'C Programming' started by jacob navia, Jan 6, 2012.

  1. jacob navia

    jacob navia Guest

    Today, more than 10 years tafter Intel introduced the first SIMD
    instructions it is still not possible to use those instructions to do
    SIMD programming when you use C and two mainstream compilers like gcc or
    MSVC.

    From the C side there is absolutely no support for this type of
    programming, even though plain arrays could do the trick obviously.


    The container abstraction, specifically the valarray container, could be
    used to *mean* SIMD using operator overloading.

    The use case could be

    ValArray operator+(ValArray left,ValArray right);

    This would allow an operation like adding two arrays element-wise
    to be performed using SIMD instructions.


    Obviously, the usual "optimizations" can be added to those
    basic operations so that

    ValArray a,b,c,d;

    a = b+c+d;

    doesn't have to produce an intermediate array but can be "compiled"
    by an overloaded assignment operator to generate

    a = b+c+d;

    as is done in C++. Each overloaded operator returns a token operator
    containing information about its arguments, and the overloaded
    assignment operator compiles correct element-wise code by reparsing
    the generated tree.

    To prepare for this, the C Containers library proposes the ValArray
    container, for each of the basic types (char, short, int, long long and
    size_t).

    In the sample implementation there is a generic module that
    allows to generate code for ANY type, also those being left out like
    long (since it is always either equal to int or equal to long long in
    all implementations I know of). The type size_t was added to be able
    to correctly index any sequantial container with a ValArray of size_t's.

    The code proposed NOW is plain serial code, but can (and should) be
    replaced in more sophisticated implementations than the sample
    implementation to do SIMD programming, one of the most basic forms
    of parallelism, already available in plain hardware since already 10
    years...

    Obviously the four basic operations should be there, but other kinds of
    array operations could be interesting like

    o REDUCE Apply binary operation to each element and the running total:
    REDUCE + 1 2 3 4 5 --> 1+2+3+4+5 = 15

    You apply operator + to the identity element for "+" and to the
    first element: 0+1 --> 1
    You apply the operator + to the second element and the result
    of the last operation: 1+2-->3
    ETC until the last element is reached.

    o EXPAND Apply binary operation producing a ValArray of same length
    containing the intermediate results:
    EXPAND + 1 2 3 4 5 --> 1 3 6 10 15


    o SELECT Apply boolean vector to ValArray selecting all elements that
    correspond to 1 and eliminate those with zeros.

    SELECT 1 1 0 1 0 0 / 10 20 30 40 50 60 --> 10 20 40

    o EXTEND Apply boolean vector to ValArray leaving all elements with 1
    and introducing the zero or blank (for character arrays) at the
    zero positions:

    EXTEND 1 1 0 1 0 0 / 10 20 40 --> 10 20 0 40 0 0

    The idea is to select a set of operations that makes it easy to work
    with arrays as objects of operations instead of working with scalar data.

    What new operations should be in the set?
    How could those operations be incoporated into mainstream C?


    jacob
     
    jacob navia, Jan 6, 2012
    #1
    1. Advertising

  2. jacob navia

    Ben Pfaff Guest

    jacob navia <> writes:

    > Today, more than 10 years tafter Intel introduced the first SIMD
    > instructions it is still not possible to use those instructions to do
    > SIMD programming when you use C and two mainstream compilers like gcc
    > or MSVC.
    >
    > From the C side there is absolutely no support for this type of
    > programming, even though plain arrays could do the trick obviously.


    "absolutely no support"? You exaggerate:
    http://gcc.gnu.org/onlinedocs/gcc-4...din-Functions.html#X86-Built_002din-Functions
    --
    "For those who want to translate C to Pascal, it may be that a lobotomy
    serves your needs better." --M. Ambuhl

    "Here are the steps to create a C-to-Turbo-Pascal translator..." --H. Schildt
     
    Ben Pfaff, Jan 6, 2012
    #2
    1. Advertising

  3. jacob navia

    jacob navia Guest

    Le 07/01/12 00:08, Ben Pfaff a écrit :
    > jacob navia<> writes:
    >
    >> Today, more than 10 years tafter Intel introduced the first SIMD
    >> instructions it is still not possible to use those instructions to do
    >> SIMD programming when you use C and two mainstream compilers like gcc
    >> or MSVC.
    >>
    >> From the C side there is absolutely no support for this type of
    >> programming, even though plain arrays could do the trick obviously.

    >
    > "absolutely no support"? You exaggerate:
    > http://gcc.gnu.org/onlinedocs/gcc-4...din-Functions.html#X86-Built_002din-Functions


    Hi Ben

    Look, those functions exist of curse but that is not standard C, those
    functions have no links to the rest of the language.

    Besides, they are tied to the Intel architecture (what
    is not wrong of course) but the idea behind my post is to try to
    abstract a bit from a single architecture to a more *general* SIMD
    programming. Note that the notions of container and operations
    are NOT tied to Intel, IBM or Motorola vector proposals but could
    be used with all three.

    Even me with my limited ressources have some intrinsics for some
    operations but that is really not *language* support Ben.
     
    jacob navia, Jan 6, 2012
    #3
  4. jacob navia

    Ben Pfaff Guest

    jacob navia <> writes:

    > Le 07/01/12 00:08, Ben Pfaff a écrit :
    >> jacob navia<> writes:
    >>
    >>> Today, more than 10 years tafter Intel introduced the first SIMD
    >>> instructions it is still not possible to use those instructions to do
    >>> SIMD programming when you use C and two mainstream compilers like gcc
    >>> or MSVC.
    >>>
    >>> From the C side there is absolutely no support for this type of
    >>> programming, even though plain arrays could do the trick obviously.

    >>
    >> "absolutely no support"? You exaggerate:
    >> http://gcc.gnu.org/onlinedocs/gcc-4...din-Functions.html#X86-Built_002din-Functions

    >
    > Look, those functions exist of curse but that is not standard C, those
    > functions have no links to the rest of the language.


    If that's your purpose then I don't know why you started out by
    saying that you can't do it with "mainstream compilers like GCC".
    --
    int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
    \n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
    );while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p\
    );}return 0;}
     
    Ben Pfaff, Jan 6, 2012
    #4
  5. jacob navia

    jacob navia Guest

    Le 07/01/12 00:30, Ben Pfaff a écrit :
    > jacob navia<> writes:
    >
    >> Le 07/01/12 00:08, Ben Pfaff a écrit :
    >>> jacob navia<> writes:
    >>>
    >>>> Today, more than 10 years tafter Intel introduced the first SIMD
    >>>> instructions it is still not possible to use those instructions to do
    >>>> SIMD programming when you use C and two mainstream compilers like gcc
    >>>> or MSVC.
    >>>>
    >>>> From the C side there is absolutely no support for this type of
    >>>> programming, even though plain arrays could do the trick obviously.
    >>>
    >>> "absolutely no support"? You exaggerate:
    >>> http://gcc.gnu.org/onlinedocs/gcc-4...din-Functions.html#X86-Built_002din-Functions

    >>
    >> Look, those functions exist of curse but that is not standard C, those
    >> functions have no links to the rest of the language.

    >
    > If that's your purpose then I don't know why you started out by
    > saying that you can't do it with "mainstream compilers like GCC".


    .... and MSVC, and ANY compiler now. It wasn't directed against gcc
    specifically.


    Because all those compilers propose
    the same thing as gcc/Intel compilers do: a series of machine dependent
    intrinsics that make your code not portable at all, and maybe because of
    that are not accepted by a large user base.

    What I wanted to propose is a more abstract and portable way of doing
    SIMD than just using pseudo functions that map to one instructions of
    the Intel instruction set.

    Obviously the problem is there since all that those intrinsics propose
    is very low level operations (what amounts to more or less returning to
    àssembly language in C since all the intrinsics map to specific
    Intel/AMD instructions).

    What my aim is is to start a discussion about what kind of constructs
    could be used to abstract from those operations into more general concepts.
     
    jacob navia, Jan 6, 2012
    #5
  6. jacob navia

    Kaz Kylheku Guest

    On 2012-01-06, jacob navia <> wrote:
    > Today, more than 10 years tafter Intel introduced the first SIMD
    > instructions it is still not possible to use those instructions to do
    > SIMD programming when you use C and two mainstream compilers like gcc or
    > MSVC.


    If anything, that may be an indicator that maybe nobody cares that much about
    Intel's SIMD instructions. It's not hard to see why.

    SIMD instructions do not translate to a tangible benefit for the average
    consumer of Intel chips. (Marketing 101: features need to connect with
    benefits!)

    There was a time when games and video playback stretched the ability of the CPU
    to the core, and hardware for accelerating them was next to nonexistent. The
    processing-intensive tasks in those kinds of applications are handled by
    dedicated hardware these days.

    SIMD will not fix the major, user-visible performance issues in a Wintel PC,
    like long boot times, or browsers becoming unresponsive for seconds at a time.
     
    Kaz Kylheku, Jan 7, 2012
    #6
  7. jacob navia

    Ian Collins Guest

    On 01/ 7/12 12:02 PM, jacob navia wrote:
    > Today, more than 10 years tafter Intel introduced the first SIMD
    > instructions it is still not possible to use those instructions to do
    > SIMD programming when you use C and two mainstream compilers like gcc or
    > MSVC.
    >
    > From the C side there is absolutely no support for this type of
    > programming, even though plain arrays could do the trick obviously.


    That may be the case for those compilers, but not for others. For
    example the Sun Studio compilers (I'd expect Intel to offer something as
    well) have a number of vector processing optimisations which include simd.

    I think this class of optimisation is best left as a compiler
    implementation detail. The range of vector processing support
    (including the likes of OpenMP) is simply too large (from nothing to a
    GPU) to be considered for standardisation. If the demand is there, the
    vendor will deliver.

    > The container abstraction, specifically the valarray container, could be
    > used to *mean* SIMD using operator overloading.


    A similar approach has been used by some compilers for many years,
    typically implemented as some form of "performance library" for vector
    type operations.

    --
    Ian Collins
     
    Ian Collins, Jan 7, 2012
    #7
  8. jacob navia

    jacob navia Guest

    Le 07/01/12 01:04, Kaz Kylheku a écrit :
    > On 2012-01-06, jacob navia<> wrote:
    >> Today, more than 10 years tafter Intel introduced the first SIMD
    >> instructions it is still not possible to use those instructions to do
    >> SIMD programming when you use C and two mainstream compilers like gcc or
    >> MSVC.

    >
    > If anything, that may be an indicator that maybe nobody cares that much about
    > Intel's SIMD instructions. It's not hard to see why.
    >
    > SIMD instructions do not translate to a tangible benefit for the average
    > consumer of Intel chips. (Marketing 101: features need to connect with
    > benefits!)
    >


    But... how could those benefits be *USED* unless you program in assembly?

    That is the whole point of my post. You can't say that users do not
    want those features if there are no high level constructs that they
    can use to make that happen!


    > There was a time when games and video playback stretched the ability of the CPU
    > to the core, and hardware for accelerating them was next to nonexistent. The
    > processing-intensive tasks in those kinds of applications are handled by
    > dedicated hardware these days.
    >


    But the problem *remains* since you can't address that dedicated
    hardware from a high level language anyway!


    > SIMD will not fix the major, user-visible performance issues in a Wintel PC,
    > like long boot times, or browsers becoming unresponsive for seconds at a time.


    SIMD needs a special kind of programming that is badly supported in
    current serial languages like C or C++. Many parallel languages have
    been proposed but they are of too limited scope to catch. What is needed
    is a way to address SIMD programming from within a known general purpose
    programming *context*.
     
    jacob navia, Jan 7, 2012
    #8
  9. jacob navia

    jacob navia Guest

    Le 07/01/12 01:50, Ian Collins a écrit :
    > On 01/ 7/12 12:02 PM, jacob navia wrote:
    >> Today, more than 10 years tafter Intel introduced the first SIMD
    >> instructions it is still not possible to use those instructions to do
    >> SIMD programming when you use C and two mainstream compilers like gcc or
    >> MSVC.
    >>
    >> From the C side there is absolutely no support for this type of
    >> programming, even though plain arrays could do the trick obviously.

    >
    > That may be the case for those compilers, but not for others. For
    > example the Sun Studio compilers (I'd expect Intel to offer something as
    > well) have a number of vector processing optimisations which include simd.
    >


    Yes, Intel and Sun, as hardware vendors, have a real interest in making
    people use the features of their hardware and will try to propose
    either automatically SIMD deduction or inline assembly that supports
    their features directly.

    Problem is, even after dozens of years and millions and millions of
    dollars in research, automatic SIMD parallelization remains VERY
    difficult for the genral case. The problem, basically, is as follows:

    Automatic parallelization must PROVE that no dependencies exist
    between the array members, no unforeseen side effects can possibly
    occur and ONLY THEN the constructs in question can be parallelized.

    All this needs extensive program analysis, what is made even more
    difficult with the model of separate module compilation...

    High level constructs designed to allow the programmer assert that
    he/she wants parallel programming FOR THIS OBJECT simplify the
    task of the compiler, allowing for simpler compilers and more
    widespread use.

    > I think this class of optimisation is best left as a compiler
    > implementation detail. The range of vector processing support (including
    > the likes of OpenMP) is simply too large (from nothing to a GPU) to be
    > considered for standardisation. If the demand is there, the vendor will
    > deliver.
    >


    Well, SIMD processing is basically the same, only the SCALE changes from
    an Intel SSE4 and a GPU with some thousand nodes. Using a higher level
    construct you LEAVE the details to the compiler obviously, assuming a
    compiler for Intel will make use of SSE4 and a compiler for a GPU will
    use the GPU. What you do NOT leave to the compiler is the ANALYSIS as to
    which constructs can be used with SIMD, allowing the compiler to
    specialize in code generation without burdening it with a complete
    analysis of all the possible data paths that could interfere with the
    SIMD optimization.


    >> The container abstraction, specifically the valarray container, could be
    >> used to *mean* SIMD using operator overloading.

    >
    > A similar approach has been used by some compilers for many years,
    > typically implemented as some form of "performance library" for vector
    > type operations.
    >


    Yes, I just wanted to propose that idea not for a single compiler but
    for the language in general.
     
    jacob navia, Jan 7, 2012
    #9
  10. jacob navia

    Jens Gustedt Guest

    Am 01/07/2012 12:02 AM, schrieb jacob navia:
    > How could those operations be incoporated into mainstream C?


    macros and _Generic

    Jens
     
    Jens Gustedt, Jan 7, 2012
    #10
  11. jacob navia

    jacob navia Guest

    Le 07/01/12 09:39, Jens Gustedt a écrit :
    > Am 01/07/2012 12:02 AM, schrieb jacob navia:
    >> How could those operations be incoporated into mainstream C?

    >
    > macros and _Generic
    >
    > Jens


    Hi Jens

    "macros and _Generic"... can you maybe elaborate a bit?
    That was far to cryptic for my small brain.

    :)
     
    jacob navia, Jan 7, 2012
    #11
  12. jacob navia

    Ian Collins Guest

    On 01/ 7/12 09:31 PM, jacob navia wrote:
    > Le 07/01/12 01:50, Ian Collins a écrit :
    >> On 01/ 7/12 12:02 PM, jacob navia wrote:
    >>> Today, more than 10 years tafter Intel introduced the first SIMD
    >>> instructions it is still not possible to use those instructions to do
    >>> SIMD programming when you use C and two mainstream compilers like gcc or
    >>> MSVC.
    >>>
    >>> From the C side there is absolutely no support for this type of
    >>> programming, even though plain arrays could do the trick obviously.

    >>
    >> That may be the case for those compilers, but not for others. For
    >> example the Sun Studio compilers (I'd expect Intel to offer something as
    >> well) have a number of vector processing optimisations which include simd.
    >>

    >
    > Yes, Intel and Sun, as hardware vendors, have a real interest in making
    > people use the features of their hardware and will try to propose
    > either automatically SIMD deduction or inline assembly that supports
    > their features directly.
    >
    > Problem is, even after dozens of years and millions and millions of
    > dollars in research, automatic SIMD parallelization remains VERY
    > difficult for the genral case. The problem, basically, is as follows:
    >
    > Automatic parallelization must PROVE that no dependencies exist
    > between the array members, no unforeseen side effects can possibly
    > occur and ONLY THEN the constructs in question can be parallelized.
    >
    > All this needs extensive program analysis, what is made even more
    > difficult with the model of separate module compilation...


    > High level constructs designed to allow the programmer assert that
    > he/she wants parallel programming FOR THIS OBJECT simplify the
    > task of the compiler, allowing for simpler compilers and more
    > widespread use.


    I guess pragmas are another option, as used for OpenMP. We already have
    the restrict keyword to simplify analysis with pointer parameters, maybe
    something similar for parallelisation?

    >>> The container abstraction, specifically the valarray container, could be
    >>> used to *mean* SIMD using operator overloading.

    >>
    >> A similar approach has been used by some compilers for many years,
    >> typically implemented as some form of "performance library" for vector
    >> type operations.
    >>

    >
    > Yes, I just wanted to propose that idea not for a single compiler but
    > for the language in general.


    I do see your point, but there are many CPU/OS specific tricks used to
    get the optimum performance on a particular platform, which is way
    vendor's compilers generally squeeze the best performance from a system.
    Would this on its own do enough to justify its inclusion in the language?

    --
    Ian Collins
     
    Ian Collins, Jan 7, 2012
    #12
  13. jacob navia

    Jens Gustedt Guest

    Salut Jacob,

    Am 01/07/2012 09:54 AM, schrieb jacob navia:
    > Le 07/01/12 09:39, Jens Gustedt a écrit :
    >> macros and _Generic


    > "macros and _Generic"... can you maybe elaborate a bit?
    > That was far to cryptic for my small brain.


    probably a bit too terse, yes :)

    what I meant is that you need a type generic interface for these things,
    not necessarily a language extension

    the idea would be to write a whole bunch of type specific functions that
    cover the different cases (different primitive types, different
    operations). Then you can put a type generic glue arround that by using
    macros and the brand new _Generic from C11. This only would give you
    function like syntax but would make it easily usable.

    Then, after some time there could perhaps be some convergence on the
    functionalities and one could think of adding keywords and/or operator
    support.

    I don't know how far you are for your compiler for implementing _Generic
    (if you attacked it at all). Gcc already has equivalent features that
    are usable and that I am currently using to implement the set of type
    generic atomic_... interfaces of C11. I think the interface complexity
    of that is about the same as what would be necessary for vector
    operations. (BTW thinking about it, _Pragma can also be quite handy when
    programming such things.)

    Cheers
    Jens
     
    Jens Gustedt, Jan 7, 2012
    #13
  14. On 07-Jan-12 02:31, jacob navia wrote:
    > Le 07/01/12 01:50, Ian Collins a écrit :
    >> On 01/ 7/12 12:02 PM, jacob navia wrote:
    >>> Today, more than 10 years tafter Intel introduced the first SIMD
    >>> instructions it is still not possible to use those instructions to do
    >>> SIMD programming when you use C and two mainstream compilers like gcc or
    >>> MSVC.
    >>>
    >>> From the C side there is absolutely no support for this type of
    >>> programming, even though plain arrays could do the trick obviously.

    >>
    >> That may be the case for those compilers, but not for others. For
    >> example the Sun Studio compilers (I'd expect Intel to offer something as
    >> well) have a number of vector processing optimisations which include
    >> simd.

    >
    > Yes, Intel and Sun, as hardware vendors, have a real interest in making
    > people use the features of their hardware and will try to propose
    > either automatically SIMD deduction or inline assembly that supports
    > their features directly.
    >
    > Problem is, even after dozens of years and millions and millions of
    > dollars in research, automatic SIMD parallelization remains VERY
    > difficult for the genral case. The problem, basically, is as follows:
    >
    > Automatic parallelization must PROVE that no dependencies exist
    > between the array members, no unforeseen side effects can possibly
    > occur and ONLY THEN the constructs in question can be parallelized.
    >
    > All this needs extensive program analysis, what is made even more
    > difficult with the model of separate module compilation...


    There are no "dependencies" between array members. There is a possible
    aliasing problem when dealing with pointers rather than arrays, but that
    is easily enough solved with "restrict" qualifiers.

    I don't understand what you think the problem is.

    S

    --
    Stephen Sprunk "God does not play dice." --Albert Einstein
    CCIE #3723 "God is an inveterate gambler, and He throws the
    K5SSS dice at every possible opportunity." --Stephen Hawking
     
    Stephen Sprunk, Jan 7, 2012
    #14
  15. jacob navia

    Gene Guest

    On Jan 7, 2:39 pm, Stephen Sprunk <> wrote:
    > On 07-Jan-12 02:31, jacob navia wrote:
    >
    >
    >
    >
    >
    > > Le 07/01/12 01:50, Ian Collins a écrit :
    > >> On 01/ 7/12 12:02 PM, jacob navia wrote:
    > >>> Today, more than 10 years tafter Intel introduced the first SIMD
    > >>> instructions it is still not possible to use those instructions to do
    > >>> SIMD programming when you use C and two mainstream compilers like gccor
    > >>> MSVC.

    >
    > >>> From the C side there is absolutely no support for this type of
    > >>> programming, even though plain arrays could do the trick obviously.

    >
    > >> That may be the case for those compilers, but not for others. For
    > >> example the Sun Studio compilers (I'd expect Intel to offer something as
    > >> well) have a number of vector processing optimisations which include
    > >> simd.

    >
    > > Yes, Intel and Sun, as hardware vendors, have a real interest in making
    > > people use the features of their hardware and will try to propose
    > > either automatically SIMD deduction or inline assembly that supports
    > > their features directly.

    >
    > > Problem is, even after dozens of years and millions and millions of
    > > dollars in research, automatic SIMD parallelization remains VERY
    > > difficult for the genral case. The problem, basically, is as follows:

    >
    > > Automatic parallelization must PROVE that no dependencies exist
    > > between the array members, no unforeseen side effects can possibly
    > > occur and ONLY THEN the constructs in question can be parallelized.

    >
    > > All this needs extensive program analysis, what is made even more
    > > difficult with the model of separate module compilation...

    >
    > There are no "dependencies" between array members.  There is a possible
    > aliasing problem when dealing with pointers rather than arrays, but that
    > is easily enough solved with "restrict" qualifiers.
    >
    > I don't understand what you think the problem is.
    >
    > S
    >
    > --
    > Stephen Sprunk         "God does not play dice."  --Albert Einstein
    > CCIE #3723         "God is an inveterate gambler, and He throws the
    > K5SSS        dice at every possible opportunity." --Stephen Hawking- Hide quoted text -


    He has to be referring to loop dependencies as in

    for (i = 0; i < n; i++) a = a[i-3] + 2;

    The main reason I'm writing is to point out that late version C
    compilers including gcc, Intel, and I believe though am not sure,
    Microsoft all will emit vector instructions for loops coded with
    commonsense idioms. I have a signal processing code that speeds up by
    a factor of 3 with gcc when this opimization is turned on.

    Cheers.
     
    Gene, Jan 7, 2012
    #15
  16. jacob navia

    jacob navia Guest

    Le 08/01/12 09:08, Gareth Owen a écrit :
    > jacob navia<> writes:
    >
    >> Today, more than 10 years tafter Intel introduced the first SIMD
    >> instructions it is still not possible to use those instructions to do
    >> SIMD programming when you use C and two mainstream compilers like gcc
    >> or MSVC.

    >
    > Nonsense. As well as the builtins already mentioned, GCC has
    > -ftree-vectorize which uses these SIMD instructions automatically when
    > the compiler can deduce that the aliasing rules allow it. Dot product
    > and matrix multiplication get a sizeable speed up when you use it.
    > The 'restrict' keyword helps a lot, also


    Nonsense, gcc doesn't have a real vectorizer. For instance:

    void doADD(double *L, double *R, double *result,size_t n)
    {
    for(size_t i=0; i<n;i++) result[n-i]=L[n-i]+R[n-i];
    }

    This is exactly the same program as

    void doADD(double *L, double *R, double *result,size_t n)
    {
    for(size_t i=0; i<n;i++) result=L+R;
    }

    But NO option (including -ftree-vectorize, as you suggest) will
    convince gcc to use the parallel instructions in the first case.

    Only Intel has a more stable version of a vectorizer, since
    only Intel can afford to pay dozens of computer scientists
    dozens of years full time to build it.

    Gcc has invested a considerable effort and the results are
    not very good, as you can see.

    But of course, why make simple when you can do complicated?

    Why is not possible that the PROGRAMMER declares a special
    data type that should be used in SIMD programming?

    That was the main argument of my post and nobody has given a sensible
    answer to it. Most of the answer are like yours:

    "Nonsense. gcc can do it automatically"
     
    jacob navia, Jan 8, 2012
    #16
  17. jacob navia

    jacob navia Guest

    Le 07/01/12 14:39, Stephen Sprunk a écrit :
    >
    > There are no "dependencies" between array members. There is a possible
    > aliasing problem when dealing with pointers rather than arrays, but that
    > is easily enough solved with "restrict" qualifiers.
    >
    > I don't understand what you think the problem is.
    >
    > S
    >


    The problem, Stephen, is as follows:

    gcc will vectorize this:

    void doADD(double *L, double *R, double *result,size_t n)
    {
    for(size_t i=0; i<n;i++) result=L+R;
    }

    but will NOT vectorize this:

    void doADD(double *L, double *R, double *result,size_t n)
    {
    for(size_t i=0; i<n;i++) result[n-i]=L[n-i]+R[n-i];
    }

    Why?

    As you (correctly) say:

    > There are no "dependencies" between array members.


    All this huge machinery for automatic vectorizing has never worked
    correctly for dozens of years since requires an incredible
    amount of program analysis. My proposition is:

    Why make complicated something that you can do in a simple way?

    Let the programmer declare the data that should be handled in an SIMD
    way.

    If I write:

    c = (a+b)/2;
    if (c&1) fn(1);

    You could deduce that c must be of integer type isn't it?
    LET THE COMPILER DO THAT. Let's eliminate declarations for
    integer and let the compiler deduce from the program text
    the type of each variable.

    Looks ridiculous isn't it?

    Well, for SIMD the problem is the same.
     
    jacob navia, Jan 8, 2012
    #17
  18. jacob navia

    jacob navia Guest

    Le 08/01/12 11:41, jacob navia a écrit :
    > void doADD(double *L, double *R, double *result,size_t n)
    > {
    > for(size_t i=0; i<n;i++) result[n-i]=L[n-i]+R[n-i];
    > }
    >


    that should have been:

    for(size_t i=0; i<n;i++) result[n-i-1]=L[n-i-1]+R[n-i-1];

    The results are the same for gcc with following options:

    gcc -O3 -c -S -std=c99 -ftree-vectorize maybe.c

    inner loop:
    movsd -8(%rdi), %xmm0
    addsd -8(%rsi), %xmm0
    movsd %xmm0, -8(%rax)

    movsd = MOVe Scalar Double precision
     
    jacob navia, Jan 8, 2012
    #18
  19. jacob navia

    jacob navia Guest

    Le 08/01/12 11:41, jacob navia a écrit :
    >

    If you change the starting index to something Other than zero
    the vectorization fails

    > void doADD(double *L, double *R, double *result,size_t n)
    > {
    > for(size_t i=1; i<n;i++) result=L+R;
    > }
     
    jacob navia, Jan 8, 2012
    #19
  20. jacob navia

    ec429 Guest

    On 08/01/12 10:41, jacob navia wrote:
    > Nonsense, gcc doesn't have a real vectorizer. For instance:
    >
    > void doADD(double *L, double *R, double *result,size_t n)
    > {
    > for(size_t i=0; i<n;i++) result[n-i]=L[n-i]+R[n-i];
    > }
    >
    > This is exactly the same program as
    >
    > void doADD(double *L, double *R, double *result,size_t n)
    > {
    > for(size_t i=0; i<n;i++) result=L+R;
    > }

    Actually, those programs aren't "exactly the same", due to the lack of
    the aforementioned "restrict" keyword. For instance:
    double array[3]={0,1,2};
    doADD(array+1, array+1, array, 2);
    In the first case you will get array={8,4,2}; in the second, array={2,4,2}.
    In general, the correct approach to parallel programming is giving the
    compiler as few constraints as possible. This can sometimes be achieved
    with 'restrict', but not much can be done in procedural languages unless
    they have some explicit means of telling the compiler that normal order
    constraints can be relaxed (which would be a /much/ more useful
    extension to C than processor-specific vectorisation primitives, since
    the latter would reduce the portable glory of C to the provincial
    insularity of assembler (not that assembler isn't valuable in
    appropriate circumstances, of course!)). In C as it stands, the
    canonical way to do this is by means of threads (at least in C11; '99
    and its predecessors don't mention threaded execution, but many
    implementations support it). SIMD has its uses, of course; but unless
    your code is perversely written (and so long as you have used 'restrict'
    where appropriate) the compiler knows better than you do. Besides, your
    time as a programmer is worth enough that you generally shouldn't be
    expending it on non-portable performance hacks unless you /know/ your
    code will be run a very large number of times. YAGNI.
    If you really want to write strongly parallelisable programs (rather
    than merely hand-optimising with a few vector ops that probably don't
    gain you much anyway) try a functional language, such as LISP - with a
    properly designed data-flow model parallel processing becomes something
    that the implementation can do transparently.

    I think you just don't like gcc, because you know (and don't want to
    admit to yourself) that it's better than yours.
    -e
    --
    'sane', adj.: see 'unimaginative'
    on the web - http://jttlov.no-ip.org
     
    ec429, Jan 8, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Martin Burger
    Replies:
    0
    Views:
    836
    Martin Burger
    Jul 18, 2005
  2. Jeff Rodriguez

    Array and string abstraction

    Jeff Rodriguez, May 11, 2004, in forum: C Programming
    Replies:
    3
    Views:
    345
    Gregory Pietsch
    May 15, 2004
  3. Soren
    Replies:
    4
    Views:
    1,302
    c d saunter
    Feb 14, 2008
  4. Vivek Menon
    Replies:
    5
    Views:
    3,412
    Paul Uiterlinden
    Jun 8, 2011
  5. Vivek Menon
    Replies:
    0
    Views:
    1,786
    Vivek Menon
    Jun 10, 2011
Loading...

Share This Page