Yea, benchmark based compilation, that's already happening in the tinygrad compiler we use for openpilot to determine the local group size.
https://github.com/geohot/tinygrad/blob/caea34c52996cde2ed46...Working on parameterizing a search space that includes more than the local group size. The end dream is some ML guided search to optimize the kernels :)