[geeks] Best Vista story I've seen

Wed Feb 21 12:06:11 CST 2007

On Wed, Feb 21, 2007 at 12:43:22PM -0500, Charles Shannon Hendrix wrote:

> I just hate x86 assembly, and wonder if there are techniques you can use in
> vanilla C that help SIMD compilers.

The following is GCC specific (well, IBM may copy these extensions, I
haven't looked into what Sun does).  It will generate code for SSE, 
3D Now, Altivec, and VMX.  It is possible to do VIS, but not for
floating point.

typedef int v4sf __attribute__ ((vector_size (16)));

/* a=b+c, all three arrays must be the same length */
void add_array(int len, float * a, float * b, float * c)
{
  int i;
  for (i=0; i<len - (len%4); i+=4)
    {
      v4sf *av, *bv, *cv;
      av=&(a[i]);
      bv=&(b[i]);
      cv=&(c[i]);
      *av = *bv + *cv;
    }

  for (i=len - (len%4); i<len; i++)
    {
       a[i] = b[i] + c[i];
    }
}

That said, I haven't benchmarked the above code.  I can verify that it
is generating SSE code on this laptop.  However, without turning on a
lot of extra compiler options, I don't know how much of a boost you will
get.

-O3 -msse2 -funroll-loops -fprefetch-loop-arrays -fexpensive-optimizations
 -ffast-math -mfpmath=sse

was what I used.  I didn't specifically test each option individually,
but turning on the latter options didn't do anything significant without
the -O3, and the -O3 doesn't cause it to unroll or prefetch
(prefetching makes a big difference in the other SSE assembly code I
have benchmarked) without the other flags.

I experimented a bit with manual loop unrulling, but it doesn't appear
to be a big difference from reading the disassembly of the compiled
file. 

-- 
Joshua D. Boyd
jdboyd at jdboyd.net
http://www.jdboyd.net/
http://www.joshuaboyd.org/