You don't even have to continue there. People, who should know better, assume that 'modern cloud stuff' will make this trivial. You just add some auto-scaling and it can handle anything. Until it grinds to a halt because it cannot scale beyond bottlenecks (relational database most likely) or the credit card is empty trying to pull in more resources beyond the ridiculous amount that were already being used for the (relatively) tiny amount of users.
This will only get worse as people generally use the 'premature optimization' (delivering software for launch is not premature!) and 'people are more expensive than more servers' (no they are not with some actual traffic and O(n^2) performing crap) as excuse to not even try to understand this anymore. Same with storage space; with NoSQL, there are terabytes of data growing out of nowhere because 'we don't care as it works and it's 'fast' to market, again 'programmers are more expensive than more hardware!'). Just run a script to fire up 500 aws instances backed by Dynamo and fall asleep.
I am not so worried about premature optimization ; I am more worried about never optimization. And at that; i'm really worried about my (mostly younger) colleagues simply not caring because they believe it's a waste of time.
I'm simultaneously worried about both, because I've had to deal with poor architecture and unnecessarily convoluted & difficult-to-work-with code that was only justified by completely misguided optimization attempts (with no experimentation or profiling to back any of it up; and indeed, performance in practice was terrible!). At the same time, there's a constant stream of "oh no this convoluted mess has a bug in it" and "oh no we need a new feature, it can't take long?" tickets but never a ticket that says "profile and optimize the program because it's ridiculously slow, oh and refactor and undo the convoluted mess."
You may think that's a cop-out, but consider something like coz[1]. Sqlite is managed and maintained by experts. There's significant capital behind investing engineering effort. However, better tooling still managed to locate 25% of performance improvement[2] & even 9% in memcached. Even experts have their limits & of course these tools require expertise so a tool like coz is still an expert-only tool. The successful evolution of the underlying concept for mass adoption will happen when it's possible to convert "expert speak" into something that can be easily and simply communicated outside CPU or compiler experts to meet the user on their knowledge level so they can dig in as deep as they need to/want to.
[1] https://github.com/plasma-umass/coz [2] https://arxiv.org/abs/1608.03676
But if every young or beginner programmer who asked a performance question on reddit, or stack overflow, could get good answers instead of lectures on how what they are doing is "premature optimization" every single time, the world would collect quite a bit more expertise on making things perform well.
You have to plan and architecture for it, and you can't just tack it on after the fact by profiling a few hot codepaths (though you should do that too).
Performance can be different from "scalability" though. Sometimes, there is tension between the two.
I don't think number of users is what matters except for web services. If you make for instance a photo or audio editing software, there will never be enough performance.
It isn't acceptable to say to your users that your software works fine until, say, 8192*8192 images : if you want to compete against other software, you have to consistently be the fastest at every task that artists may throw at you (else you will get bad reviews on specialist & prosumer press / forums / blogs which can kill your business pretty efficiently... as it takes hundreds of people saying "it's fast" to offset the effect of a single press article saying "it's slow as shit" in art communities).
And extensibility. It's not necessarily fun trying to add a new feature to someone else's "highly optimized" code.
And even that is not enough. You also have to know how to plan and architecture for it, have a well developed mental model of what can get you there, which means you have to practice doing high performance things, follow research and high performance ideas and generally have a habit of building things that are fast. Few people actually do that.
But you design for performance. The proper time to address it is at design time. That's not premature, that's the right moment.
I wish we could reserve the word "optimization" for the kinds of things you can do after implementation to improve the performance without significantly changing the design.
That is, let's continue to optimize last, but not try to make the word optimize mean address performance in general. That's not what the word means, after all.
but does it ever happen? =)
If you want to call it a feature, its closer to N features: 1 for each feature you have. If you have 10 features, and add performance, the effort involved isn't like having 11 features. It's like having 20 features. The effect is multiplicative.
This is because performance is a cross cutting concern. Many times cross cutting concerns are easy to inject/share effort with. But not with performance. You can't just add an @OptimizeThis annotation to speed up your code. Performance tuning tends to be very specific to each chunk of code.
That mention of regressions seems, IMO, a slightly out of left field attempt at dismissing how the SQLite example shows that you can, in fact, "make it fast" later. Maybe he should've a picked a different example entirely because it undermined his point a little bit.[1]
All in all, his entire thesis comes from talking about a typechecker, which is indeed a piece of software whose each component in general contributes to the performance of the whole. It isn't a set of disparage moving parts (at least, from what I remember of my time studying parsers in college), so it's very hard to optimize by sections because all components mostly feed off each other. Most software is not a typechecking tool, plenty (dare I say, most) of software does have specific bottlenecks.
Though I do agree that, even if we aren't focusing on it right away, we should keep performance in mind from the beginning. If nothing else, making the application/system as modular as possible, so as to make it easier to replace the slowest moving parts.
[1] Which is a good thing IMO, as it highlights how this is all about trade-offs. "Premature optimization is the root of all evil", "CPU time is always cheaper than an engineer’s time", etc., are, in fact, mostly true, at least when talking about consumer software/saas: it really doesn't matter how fast your application is because crafting fast software is slower than crafting slow software, and your very performant tool is not used by anyone because everyone is already using that other tool that is slower but came out first.
Important here is that for a user, "faster" means with respect to achieving the goal.
At work we've created a module where, instead of punching line items by hand and augmenting the data by memory or web searches, the user can paste data from Excel (or import from OCR) and the system remembers mappings for the data augmentation.
After a couple of initial runs for the mapping table to build our users can process thousands of lines in 10 minutes or less, a task that could take the better part of a day.
It's not uncommon with some follow-up support after new customers start with this module, so I often get to follow the transformation from before to after.
They also quickly get accustomed. We'll hear it quick if those 10 minutes grows to 20 from one build to another, not much thought is given to how 20 minutes is still a lot faster than they'd be able to punch those 8000 lines :)
Nit: the link says it’s 10% faster than the previous release. It’s 50% faster than some arbitrary point in the past, perhaps the time when they began their CPU-based profile optimization.
I’ve come to believe really strongly that…