The long answer probably includes GATs (generic associated types), and the even longer answer starts with the amazing work that Nico et al. does to reinvent the internals of the Rust type checker. (Basically - if I remember correctly - the compiler team is currently refactoring big parts of rustc, to librarify the type checker, and replace it with a PROLOG-ish library called "chalk", which is able to prove more things, so it allows better handling of GATs, so it allows better handling of async, where you get "impl Future<Output = T>"-s everywhere.
But also somewhere there is that the ergonomics of async/.await are still not very much in progress too. With better error messages people will be able to forego Box<>-ing and figure out what to use instead, and only heap allocate where they must.
It's possible to write low-level async code with hand rolled-polled Futures, and push/manage as much stuff on the stack as possible. But ... that takes time, and performance is already "good enough". (And/or probably there are bigger gains in performance in other areas, such as scheduling.)
And, finally, boxing helps with compile times. (Because it's basically trivial for the compile to prove that a heap allocated simple trait object behaves well compared to a stack allocated concrete, but usually highly complex (plus unnameable) type.)