1. Get it working.
2. Get it working well.
3. Get it working fast.
This puts the "just get it working" as the first priority. Don't care about quality, just make it. Then, and only once you have something working, do you care about quality first. This is about getting the code into something reasonable that would pass a review (e.g., architectually sound). Finally, do an optimization pass.
This is the process I follow for PRs and projects alike. Sometimes you can mix all the steps into a single commit, if you understand the problem&solution domain well. But if you don't, you'll likely have to split it up.
Depending on how low-level your code is, this... may not work out in those terms.
In other words, I’d say that if you actually want good software—and that includes making sure its speed falls within a reasonable factor of the napkin-math theoretical maximum achievable on the platform—your three steps can easily constitute three entire rewrites or at least substantial refactors. You might well need to rearchitect if the “working well” version has multiple small loops split by domain-level concern when the hardware really wants a single large one, or if you’re doing a lot of pointer-chasing and need to flatten the whole thing into a single buffer in preorder, or if your interface assumes per-byte ops where SIMD can be applied.
This is not a condemnation of the strategy, mind you. Crap code is valuable and I wish I were better at it. I just disagree that the transition from step 2 to step 3 can be described as an optimization pass. If that’s what you limit yourself to, you’ll quite likely be forced to leave at least an order of magnitude’s worth of performance on the table.
And yes, most consumer software is very much not good by that definition.
(For instance, I’m expecting that the Ladybird devs will be able to get their browser to work well for daily tasks—which I would count a tremendous achievement—but I’m not optimistic about it then becoming any faster than the state of the art even ten or fifteen years ago.)
Sometimes, it might even be completely separate people working on each step... separated by time and space.
In any case, most software generally stops at (2) simply due to the fact that any effort towards (3) isn't worth the effort -- for example, there's very little point in spending two weeks optimizing a report generation that runs in the middle of the night, once a month. At some point, there may be, but usually not anytime soon.
Again, this is something I set bite enterprise style applications quite often as they can be pushed out piecemeal where you can get things like the datastore/input APIs/UI to the customer quickly, then over the next months things like reporting, auditing, and fine grained access controls get put in, and suddenly you find yourself stuffed working around major issues where a little bit up up front thinking about the later steps would have saved you a lot of heartache.
I once joined a team where they knew they were going to do translations at some point ... and the way they decided to "prepare" for it was absolutely nonsensical. It was clear none of them had ever done translations before, so when it came time to actually do the work, it was a disaster. They had told product/sales "it was ready" but it didn't actually work -- and couldn't ever actually work. It required redesigning half the architecture and months of effort across a whole team to get it working. Even then, some aspects were completely untranslatable that took an additional 6-8 months of refactoring.
So, another lesson is to not try to engineer something unless your goal is to "get it working". If you don't need it, it is probably still better to actually wait until you need it.
I much prefer a code base that is readable and straightforward (maybe at the expense of some missed perf gains) over code that is highly performant but hard to follow/too clever.
I've used a similar mantra of "make it work, make it pretty, make it fast" for two decades.
I think I've had to get to step 3 once and that was because the specs went from "one device" to "20 devices and two factories" after step 1 :D