Scaling up compute, creating and curating data (be it human or synthetically sourced) and more resilient benchmarking for example. But on the algorithmic side we already have a true general purpose, arbitrarily scalable, differentiable algorithm. So training it to do the right stuff is essentially the only missing ingredient. And models are catching up fast.