J
jacob navia
Today, more than 10 years tafter Intel introduced the first SIMD
instructions it is still not possible to use those instructions to do
SIMD programming when you use C and two mainstream compilers like gcc or
MSVC.
From the C side there is absolutely no support for this type of
programming, even though plain arrays could do the trick obviously.
The container abstraction, specifically the valarray container, could be
used to *mean* SIMD using operator overloading.
The use case could be
ValArray operator+(ValArray left,ValArray right);
This would allow an operation like adding two arrays element-wise
to be performed using SIMD instructions.
Obviously, the usual "optimizations" can be added to those
basic operations so that
ValArray a,b,c,d;
a = b+c+d;
doesn't have to produce an intermediate array but can be "compiled"
by an overloaded assignment operator to generate
a = b+c+d;
as is done in C++. Each overloaded operator returns a token operator
containing information about its arguments, and the overloaded
assignment operator compiles correct element-wise code by reparsing
the generated tree.
To prepare for this, the C Containers library proposes the ValArray
container, for each of the basic types (char, short, int, long long and
size_t).
In the sample implementation there is a generic module that
allows to generate code for ANY type, also those being left out like
long (since it is always either equal to int or equal to long long in
all implementations I know of). The type size_t was added to be able
to correctly index any sequantial container with a ValArray of size_t's.
The code proposed NOW is plain serial code, but can (and should) be
replaced in more sophisticated implementations than the sample
implementation to do SIMD programming, one of the most basic forms
of parallelism, already available in plain hardware since already 10
years...
Obviously the four basic operations should be there, but other kinds of
array operations could be interesting like
o REDUCE Apply binary operation to each element and the running total:
REDUCE + 1 2 3 4 5 --> 1+2+3+4+5 = 15
You apply operator + to the identity element for "+" and to the
first element: 0+1 --> 1
You apply the operator + to the second element and the result
of the last operation: 1+2-->3
ETC until the last element is reached.
o EXPAND Apply binary operation producing a ValArray of same length
containing the intermediate results:
EXPAND + 1 2 3 4 5 --> 1 3 6 10 15
o SELECT Apply boolean vector to ValArray selecting all elements that
correspond to 1 and eliminate those with zeros.
SELECT 1 1 0 1 0 0 / 10 20 30 40 50 60 --> 10 20 40
o EXTEND Apply boolean vector to ValArray leaving all elements with 1
and introducing the zero or blank (for character arrays) at the
zero positions:
EXTEND 1 1 0 1 0 0 / 10 20 40 --> 10 20 0 40 0 0
The idea is to select a set of operations that makes it easy to work
with arrays as objects of operations instead of working with scalar data.
What new operations should be in the set?
How could those operations be incoporated into mainstream C?
jacob
instructions it is still not possible to use those instructions to do
SIMD programming when you use C and two mainstream compilers like gcc or
MSVC.
From the C side there is absolutely no support for this type of
programming, even though plain arrays could do the trick obviously.
The container abstraction, specifically the valarray container, could be
used to *mean* SIMD using operator overloading.
The use case could be
ValArray operator+(ValArray left,ValArray right);
This would allow an operation like adding two arrays element-wise
to be performed using SIMD instructions.
Obviously, the usual "optimizations" can be added to those
basic operations so that
ValArray a,b,c,d;
a = b+c+d;
doesn't have to produce an intermediate array but can be "compiled"
by an overloaded assignment operator to generate
a = b+c+d;
as is done in C++. Each overloaded operator returns a token operator
containing information about its arguments, and the overloaded
assignment operator compiles correct element-wise code by reparsing
the generated tree.
To prepare for this, the C Containers library proposes the ValArray
container, for each of the basic types (char, short, int, long long and
size_t).
In the sample implementation there is a generic module that
allows to generate code for ANY type, also those being left out like
long (since it is always either equal to int or equal to long long in
all implementations I know of). The type size_t was added to be able
to correctly index any sequantial container with a ValArray of size_t's.
The code proposed NOW is plain serial code, but can (and should) be
replaced in more sophisticated implementations than the sample
implementation to do SIMD programming, one of the most basic forms
of parallelism, already available in plain hardware since already 10
years...
Obviously the four basic operations should be there, but other kinds of
array operations could be interesting like
o REDUCE Apply binary operation to each element and the running total:
REDUCE + 1 2 3 4 5 --> 1+2+3+4+5 = 15
You apply operator + to the identity element for "+" and to the
first element: 0+1 --> 1
You apply the operator + to the second element and the result
of the last operation: 1+2-->3
ETC until the last element is reached.
o EXPAND Apply binary operation producing a ValArray of same length
containing the intermediate results:
EXPAND + 1 2 3 4 5 --> 1 3 6 10 15
o SELECT Apply boolean vector to ValArray selecting all elements that
correspond to 1 and eliminate those with zeros.
SELECT 1 1 0 1 0 0 / 10 20 30 40 50 60 --> 10 20 40
o EXTEND Apply boolean vector to ValArray leaving all elements with 1
and introducing the zero or blank (for character arrays) at the
zero positions:
EXTEND 1 1 0 1 0 0 / 10 20 40 --> 10 20 0 40 0 0
The idea is to select a set of operations that makes it easy to work
with arrays as objects of operations instead of working with scalar data.
What new operations should be in the set?
How could those operations be incoporated into mainstream C?
jacob