This is awesome, and it seems extremely valuable to me for usage in performance sensitive code. With that said...
> Our system requires a lot of trial and error, and remains too costly to operate at scale.
It makes me think that we still need to find a method which is far more sample efficient. Generating (up to) 1 million samples for each problem introduces a lot of inefficiency here, and as the highest quality models start to require more and more resources, my guess is that we'll want to push in the direction of fewer samples with a smarter model (rather than the other way around).