[geeks] Best Vista story I've seen
Joshua Boyd
jdboyd at jdboyd.net
Wed Feb 21 12:06:11 CST 2007
On Wed, Feb 21, 2007 at 12:43:22PM -0500, Charles Shannon Hendrix wrote:
> I just hate x86 assembly, and wonder if there are techniques you can use in
> vanilla C that help SIMD compilers.
The following is GCC specific (well, IBM may copy these extensions, I
haven't looked into what Sun does). It will generate code for SSE,
3D Now, Altivec, and VMX. It is possible to do VIS, but not for
floating point.
typedef int v4sf __attribute__ ((vector_size (16)));
/* a=b+c, all three arrays must be the same length */
void add_array(int len, float * a, float * b, float * c)
{
int i;
for (i=0; i<len - (len%4); i+=4)
{
v4sf *av, *bv, *cv;
av=&(a[i]);
bv=&(b[i]);
cv=&(c[i]);
*av = *bv + *cv;
}
for (i=len - (len%4); i<len; i++)
{
a[i] = b[i] + c[i];
}
}
That said, I haven't benchmarked the above code. I can verify that it
is generating SSE code on this laptop. However, without turning on a
lot of extra compiler options, I don't know how much of a boost you will
get.
-O3 -msse2 -funroll-loops -fprefetch-loop-arrays -fexpensive-optimizations
-ffast-math -mfpmath=sse
was what I used. I didn't specifically test each option individually,
but turning on the latter options didn't do anything significant without
the -O3, and the -O3 doesn't cause it to unroll or prefetch
(prefetching makes a big difference in the other SSE assembly code I
have benchmarked) without the other flags.
I experimented a bit with manual loop unrulling, but it doesn't appear
to be a big difference from reading the disassembly of the compiled
file.
--
Joshua D. Boyd
jdboyd at jdboyd.net
http://www.jdboyd.net/
http://www.joshuaboyd.org/
More information about the geeks
mailing list