Anything in the language to support better recognition of vector operations?

David Mathog · Nov 8, 2010

Do any of the current or upcoming C language standards provide support
to help a compiler recognize vectors and then generate SIMD
operations? For instance, consider this code snippet:

#define ALEN 128
unsigned char a[ALEN];
unsigned char b[ALEN];
int i;
/* arrays are initialized (not shown) */
for (i=0;i<ALEN;i++){
b += a;
}

One would hope that a compiler could recognize that as a vector
operation and generate code to take advantage of it on a given target
that supports SIMD operations. This gets trickier when converted to a
function:

addABvector(unsigned char *a, unsigned char *b, int len){
int i;
for (i=0;i<len;i++){
b += a;
}
}

Here while a and b may be "vectors" there is nothing to show that they
are aligned optimally, or that len will be large, so the compiler
cannot easily know which way it should optimize this (for long aligned
vectors, or short unaligned data). If we try to tell the compiler to
make a distinction by length, for instance, I am guessing that most
compilers would just optimize the test out of existence, for instance,
in this case:

addABvector(unsigned char *a, unsigned char *b, int len){
int i;
if(len<ISMINLENGTHVECTOR){
for (i=0;i<len;i++){ b += a; }
}
else {
for (i=0;i<len;i++){ b += a; }
}
}

Finally, what if saturating math is required? Then the loop becomes
something like this:

for (i=0;i<len;i++){
b += a;
if(b < a)b=UCHAR_MAX; /* or any other test for
saturation, all of which have a conditional */
}

I believe some processors have saturating math operations in their
"normal" instruction set, and there are definitely saturating add
operations in the SIMD instruction sets. But in order to use these
the compiler has to deduce from the logic of the loop that a
saturating add is the desired result.

There are certainly other facets, but clues the compiler would need to
tell it when it should use vector instructions would seem to be
minimally:

1. a method to indicate when math operations are saturating.
2. a method to indicate when memory structures must be aligned on
some 2^N byte boundary.
3. a method to indicate the allowed size of an array.

Is any of this present in a standard C language variant, or is the
only way to achieve this to use compiler specific pragmas and
intrinsics?

Thanks,

David Mathog

Ian Collins · Nov 8, 2010

Do any of the current or upcoming C language standards provide support
to help a compiler recognize vectors and then generate SIMD
operations? For instance, consider this code snippet:

#define ALEN 128
unsigned char a[ALEN];
unsigned char b[ALEN];
int i;
/* arrays are initialized (not shown) */
for (i=0;i<ALEN;i++){
b += a;
}

One would hope that a compiler could recognize that as a vector
operation and generate code to take advantage of it on a given target
that supports SIMD operations. This gets trickier when converted to a
function:

While it isn't a direct solution, look up OpenMP and see what support
your compiler of choice offers. This will give you some idea what is
required to identify and specify vector operations.

Something so processor specific can't really be standardised, so this
type of optimisation is more of a compiler QoI issue.

David Mathog · Nov 8, 2010

While it isn't a direct solution, look up OpenMP and see what support
your compiler of choice offers. This will give you some idea what is
required to identify and specify vector operations.

Had a look but that seems to be all about programming for multiple
cores, not so much (at all?) for SIMD optimization. So it looks like
OpenMP, on an N core machine with code that can be split perfectly
will "automatically" run N times faster, but not MN times faster,
where M is the extra speed factor that would result from using the
SIMD operations on each core. I never thought about it before, but
for just the right sort of code there could be a synergistic speedup
from using both methods - if M is 2 and N is 8, the code could run 16X
faster instead of 8x (multicore only) or 2x (SIMD only, single core)
vs 1x (no SIMD, single core).

Anyway, sounds like using compiler specific methods is currently the
only sure way to use SIMD.

Thanks,

David Mathog

Rui Maciel · Nov 8, 2010

David said:
Had a look but that seems to be all about programming for multiple
cores, not so much (at all?) for SIMD optimization. So it looks like
OpenMP, on an N core machine with code that can be split perfectly
will "automatically" run N times faster, but not MN times faster,
where M is the extra speed factor that would result from using the
SIMD operations on each core. I never thought about it before, but
for just the right sort of code there could be a synergistic speedup
from using both methods - if M is 2 and N is 8, the code could run 16X
faster instead of 8x (multicore only) or 2x (SIMD only, single core)
vs 1x (no SIMD, single core).

Anyway, sounds like using compiler specific methods is currently the
only sure way to use SIMD.

It would be a nice feature, particularly for us folk who happen to deal with numerical analysis
stuff. Nonetheless, asking for an implementation of this feature as a compiler-level optimization
may be an unobtainable goal, mainly due to it's complexity and lack of demand for this sort of
stuff. It is far better (and also desireable) to expect this sort of stuff from other projects
such as OpenCL.

Rui Maciel

Michael Angelo Ravera · Nov 8, 2010

Do any of the current or upcoming C language standards provide support
to help a compiler recognize vectors and then generate SIMD
operations? For instance, consider this code snippet:

#define ALEN 128
unsigned char a[ALEN];
unsigned char b[ALEN];
int i;
/* arrays are initialized (not shown) */
for (i=0;i<ALEN;i++){
b += a;

}

One would hope that a compiler could recognize that as a vector
operation and generate code to take advantage of it on a given target
that supports SIMD operations. This gets trickier when converted to a
function:

addABvector(unsigned char *a, unsigned char *b, int len){
int i;
for (i=0;i<len;i++){
b += a;
}

}

Here while a and b may be "vectors" there is nothing to show that they
are aligned optimally, or that len will be large, so the compiler
cannot easily know which way it should optimize this (for long aligned
vectors, or short unaligned data). If we try to tell the compiler to
make a distinction by length, for instance, I am guessing that most
compilers would just optimize the test out of existence, for instance,
in this case:

addABvector(unsigned char *a, unsigned char *b, int len){
int i;
if(len<ISMINLENGTHVECTOR){
for (i=0;i<len;i++){ b += a; }
}
else {
for (i=0;i<len;i++){ b += a; }
}

}

Finally, what if saturating math is required? Then the loop becomes
something like this:

for (i=0;i<len;i++){
b += a;
if(b < a)b=UCHAR_MAX; /* or any other test for
saturation, all of which have a conditional */

}

I believe some processors have saturating math operations in their
"normal" instruction set, and there are definitely saturating add
operations in the SIMD instruction sets. But in order to use these
the compiler has to deduce from the logic of the loop that a
saturating add is the desired result.

There are certainly other facets, but clues the compiler would need to
tell it when it should use vector instructions would seem to be
minimally:

1. a method to indicate when math operations are saturating.
2. a method to indicate when memory structures must be aligned on
some 2^N byte boundary.
3. a method to indicate the allowed size of an array.

Is any of this present in a standard C language variant, or is the
only way to achieve this to use compiler specific pragmas and
intrinsics?

Thanks,

David Mathog

I've seen the adjective "plural" used in some C compliers when an
array was design to be split across multiple processors, but no
standard exists for this of which I am aware.

David Mathog · Nov 11, 2010

You completely missed the case of overlapping arrays - if I call

addABVector (&array [0], &array [1], 100);

then vector operations would give a completely wrong result.

Well sure, but there are lots of functions where one cannot pass
pointers to overlapping memory areas and get the right results.

For saturating maths; this is easily recognised as long as you write
the code in a form that is actually equivalent to the saturated ops of
the processor.

I'll believe it when you post the source file and resulting assembler
where C code was compiled to use the native saturating operator. Even
if this works sometimes, I was trying to show that it is hard to code
so that it would work every time.

Regards,

David Mathog

[C language] Issue in the Lotka-Volterra model.	0	Jun 28, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Anyone wants to make this programming language? (in C)	0	Jun 1, 2022
If(strcmp(str, "") == 0) - What does this line of code mean?	0	Aug 8, 2022
How to multiply two matrices of size in using inline assembly in C++	2	Mar 3, 2024
How can I structure the final array to meet the requirements of Bootstrap Tree View for building a tree in JavaScript?	1	Mar 29, 2024
bit operations	6	Nov 16, 2009

Anything in the language to support better recognition of vector operations?

David Mathog

Ian Collins

David Mathog

Rui Maciel

Michael Angelo Ravera

David Mathog

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads