While you're correct we still need a lot more, the advances in the past 5 years represent more than I've seen in most of my life.
Just look at the speed in which we can train a humanoid robot things now. We can send out a mo-cap human, get some data, and in few hours run a few hundred trillion simulations, and publish a kernel that can do that task relatively well.
LLMs allow us any perception at all. They feed vision to scene comprehension an then let the robot control part calculate a plan to achieve a goal. It's not very fast, and fine motor controls have a long way to go, but it is possible.