I've had success using C vector extensions in GCC and Clang [0]. With a simple typedef, you get a portable SIMD vector type and basic arithmetic operators working. It's compatible with platform-specific intrinsics (SSE, NEON, etc), check out this small example with some basic arithmetic and a few uses of intrinsics with it and what kind of compiler output it produces [1] (warning: I'm pretty sure the rcp/rsqrt/sqrt functions are wrong, this was just an experiment).
Here's the gist of it:
typedef float vec4f __attribute__((vector_size(4 * sizeof(float))));
vec4f a = { 1.0, 2.0, 3.0, 4.0 }, b = { 5.0, 6.0, 7.0, 8.0 };
vec4f c = (2.0 * a) + (a + b * b); // with -ffast-math, this will emit a fused multiply-and-add (FMA)
Note: if you look inside the intrinsics headers (xmmintrin.h, arm_neon.h, etc) supplied by GCC, you'll find that it uses these internally. E.g. _mm_add_ps(a, b) is defined as a+b.
I work with basic 3d math and physics, so I don't need that much and just having 4-wide vectors is good enough for me.
I've also found out that you can use vector widths that are not available in the target machine. E.g. 4 x double vectors work fine even without 256 bit registers, the compiler will split the vector and use two 128 bit registers and emit two instructions. This might also work for using 16 x float vectors for 4x4 matrices.
Some C++ overloading magic would be useful for naming things (e.g. no need for dot4f vs dot4d).
I've been trying to get some time to write an article about the ins and outs of using vector extensions, but haven't got there yet. Some effort would also be required to put together a decent library of basic arithmetic (dot, cross, quaternion product, matrix product & inverse, etc) as well as basic libm functions (sin, cos, log, exp). I haven't had the time to put together a comprehensive (and well tested) collection of these nor have I found any open source library that would do.
Here's the gist of it:
Note: if you look inside the intrinsics headers (xmmintrin.h, arm_neon.h, etc) supplied by GCC, you'll find that it uses these internally. E.g. _mm_add_ps(a, b) is defined as a+b.I work with basic 3d math and physics, so I don't need that much and just having 4-wide vectors is good enough for me.
I've also found out that you can use vector widths that are not available in the target machine. E.g. 4 x double vectors work fine even without 256 bit registers, the compiler will split the vector and use two 128 bit registers and emit two instructions. This might also work for using 16 x float vectors for 4x4 matrices.
Some C++ overloading magic would be useful for naming things (e.g. no need for dot4f vs dot4d).
I've been trying to get some time to write an article about the ins and outs of using vector extensions, but haven't got there yet. Some effort would also be required to put together a decent library of basic arithmetic (dot, cross, quaternion product, matrix product & inverse, etc) as well as basic libm functions (sin, cos, log, exp). I haven't had the time to put together a comprehensive (and well tested) collection of these nor have I found any open source library that would do.
[0] https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html [1] https://godbolt.org/g/N9VvXZ