I'm also a huge fan of Godbolt/Compiler Explorer. Highway is integrated there, so you can just copy in your functions. Here's the last throwaway test that I was using: https://gcc.godbolt.org/z/5azbK95j9
> things might get better in the future but for now we have to implement it another way
There's several possible answers. 1) For the issue you pointed to, that we cannot have arrays of N vectors, it's a reasonable workaround to instead allocate an array of N vectors' worth of elements, and trust the compiler to elide unnecessary Load/Store. This often works using clang, and would avoid having to manually unroll here. I do prefer to minimize the amount of compiler magic required for good performance, though, so typically we're unrolling manually as shown in that code.
2) If there are compiler bugs, typically the workarounds have to stay in because in the open-source world people are still using that compiler years later.
Automatically detecting when things get better is an interesting idea but I am not yet aware of such infrastructure.