> You've almost described ISPC verbatim.
Thanks, I've read about it before, but haven't spent too much time looking at it.
However, this "single program, multiple data" isn't exactly what I'm looking for (it would solve the sin4f vs. sin8f issue mentioned above, though). I need explicit, low level access to SIMD, coupled with genericity over vector widths. This means doing almost assembly-style SIMD code with explicit shuffles, blending, etc as well as access to intrinsics where needed.
I also need portability (ispc is from Intel, it probably doesn't support ARM NEON) and targetting GPUs.
I'm very well aware that my needs are very specific. I need to do math stuff for 3d graphics and physics applications.
All I need is for a lot of free time to appear from out of nowhere and I can write a prototype compiler for this myself :)