Anything in the language to support better recognition of vector operations?

D

David Mathog

Do any of the current or upcoming C language standards provide support
to help a compiler recognize vectors and then generate SIMD
operations? For instance, consider this code snippet:

#define ALEN 128
unsigned char a[ALEN];
unsigned char b[ALEN];
int i;
/* arrays are initialized (not shown) */
for (i=0;i<ALEN;i++){
b += a;
}

One would hope that a compiler could recognize that as a vector
operation and generate code to take advantage of it on a given target
that supports SIMD operations. This gets trickier when converted to a
function:

addABvector(unsigned char *a, unsigned char *b, int len){
int i;
for (i=0;i<len;i++){
b += a;
}
}

Here while a and b may be "vectors" there is nothing to show that they
are aligned optimally, or that len will be large, so the compiler
cannot easily know which way it should optimize this (for long aligned
vectors, or short unaligned data). If we try to tell the compiler to
make a distinction by length, for instance, I am guessing that most
compilers would just optimize the test out of existence, for instance,
in this case:

addABvector(unsigned char *a, unsigned char *b, int len){
int i;
if(len<ISMINLENGTHVECTOR){
for (i=0;i<len;i++){ b += a; }
}
else {
for (i=0;i<len;i++){ b += a; }
}
}

Finally, what if saturating math is required? Then the loop becomes
something like this:

for (i=0;i<len;i++){
b += a;
if(b < a)b=UCHAR_MAX; /* or any other test for
saturation, all of which have a conditional */
}

I believe some processors have saturating math operations in their
"normal" instruction set, and there are definitely saturating add
operations in the SIMD instruction sets. But in order to use these
the compiler has to deduce from the logic of the loop that a
saturating add is the desired result.

There are certainly other facets, but clues the compiler would need to
tell it when it should use vector instructions would seem to be
minimally:

1. a method to indicate when math operations are saturating.
2. a method to indicate when memory structures must be aligned on
some 2^N byte boundary.
3. a method to indicate the allowed size of an array.

Is any of this present in a standard C language variant, or is the
only way to achieve this to use compiler specific pragmas and
intrinsics?

Thanks,

David Mathog
 
I

Ian Collins

Do any of the current or upcoming C language standards provide support
to help a compiler recognize vectors and then generate SIMD
operations? For instance, consider this code snippet:

#define ALEN 128
unsigned char a[ALEN];
unsigned char b[ALEN];
int i;
/* arrays are initialized (not shown) */
for (i=0;i<ALEN;i++){
b += a;
}

One would hope that a compiler could recognize that as a vector
operation and generate code to take advantage of it on a given target
that supports SIMD operations. This gets trickier when converted to a
function:


While it isn't a direct solution, look up OpenMP and see what support
your compiler of choice offers. This will give you some idea what is
required to identify and specify vector operations.

Something so processor specific can't really be standardised, so this
type of optimisation is more of a compiler QoI issue.
 
D

David Mathog

While it isn't a direct solution, look up OpenMP and see what support
your compiler of choice offers.  This will give you some idea what is
required to identify and specify vector operations.

Had a look but that seems to be all about programming for multiple
cores, not so much (at all?) for SIMD optimization. So it looks like
OpenMP, on an N core machine with code that can be split perfectly
will "automatically" run N times faster, but not MN times faster,
where M is the extra speed factor that would result from using the
SIMD operations on each core. I never thought about it before, but
for just the right sort of code there could be a synergistic speedup
from using both methods - if M is 2 and N is 8, the code could run 16X
faster instead of 8x (multicore only) or 2x (SIMD only, single core)
vs 1x (no SIMD, single core).

Anyway, sounds like using compiler specific methods is currently the
only sure way to use SIMD.

Thanks,

David Mathog
 
R

Rui Maciel

David said:
Had a look but that seems to be all about programming for multiple
cores, not so much (at all?) for SIMD optimization. So it looks like
OpenMP, on an N core machine with code that can be split perfectly
will "automatically" run N times faster, but not MN times faster,
where M is the extra speed factor that would result from using the
SIMD operations on each core. I never thought about it before, but
for just the right sort of code there could be a synergistic speedup
from using both methods - if M is 2 and N is 8, the code could run 16X
faster instead of 8x (multicore only) or 2x (SIMD only, single core)
vs 1x (no SIMD, single core).

Anyway, sounds like using compiler specific methods is currently the
only sure way to use SIMD.

It would be a nice feature, particularly for us folk who happen to deal with numerical analysis
stuff. Nonetheless, asking for an implementation of this feature as a compiler-level optimization
may be an unobtainable goal, mainly due to it's complexity and lack of demand for this sort of
stuff. It is far better (and also desireable) to expect this sort of stuff from other projects
such as OpenCL.


Rui Maciel
 
M

Michael Angelo Ravera

Do any of the current or upcoming C language standards provide support
to help a compiler recognize vectors and then generate SIMD
operations?  For instance, consider this code snippet:

#define ALEN 128
unsigned char a[ALEN];
unsigned char b[ALEN];
int i;
/* arrays are initialized (not shown) */
for (i=0;i<ALEN;i++){
  b += a;

}

One would hope that a compiler could recognize that as a vector
operation and generate code to take advantage of it on a given target
that supports SIMD operations.  This gets trickier when converted to a
function:

addABvector(unsigned char *a, unsigned char *b, int len){
int i;
  for (i=0;i<len;i++){
    b += a;
  }

}

Here while a and b may be "vectors" there is nothing to show that they
are aligned optimally, or that len will be large, so the compiler
cannot easily know which way it should optimize this (for long aligned
vectors, or short unaligned data).   If we try to tell the compiler to
make a distinction by length, for instance, I am guessing that most
compilers would just optimize the test out of existence, for instance,
in this case:

addABvector(unsigned char *a, unsigned char *b, int len){
int i;
  if(len<ISMINLENGTHVECTOR){
    for (i=0;i<len;i++){ b += a;  }
  }
  else {
    for (i=0;i<len;i++){ b += a;  }
  }

}

Finally,  what if saturating math is required?  Then the loop becomes
something like this:

for (i=0;i<len;i++){
  b += a;
  if(b < a)b=UCHAR_MAX;  /*  or any other test for
saturation, all of which have a conditional */

}

I believe some processors have saturating math operations in their
"normal" instruction set, and there are definitely saturating add
operations in the SIMD instruction sets.  But in order to use these
the compiler has to deduce from the logic of the loop that a
saturating add is the desired result.

There are certainly other facets, but clues the compiler would need to
tell it when it should use vector instructions would seem to be
minimally:

1.  a method to indicate when math operations are saturating.
2.  a method to indicate when memory structures must be aligned on
some 2^N byte boundary.
3.  a method to indicate the allowed size of an array.

Is any of this present in a standard C language variant, or is the
only way to achieve this to use compiler specific pragmas and
intrinsics?

Thanks,

David Mathog


I've seen the adjective "plural" used in some C compliers when an
array was design to be split across multiple processors, but no
standard exists for this of which I am aware.
 
D

David Mathog

You completely missed the case of overlapping arrays - if I call

addABVector (&array [0], &array [1], 100);

then vector operations would give a completely wrong result.

Well sure, but there are lots of functions where one cannot pass
pointers to overlapping memory areas and get the right results.
For saturating maths; this is easily recognised as long as you write
the code in a form that is actually equivalent to the saturated ops of
the processor.

I'll believe it when you post the source file and resulting assembler
where C code was compiled to use the native saturating operator. Even
if this works sometimes, I was trying to show that it is hard to code
so that it would work every time.

Regards,

David Mathog
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top