Up until recently universities pretty much had to overspend on hardware, though. (Virtualization/expandable-on-demand cloud computing may eventually get around to changing it, but the customers aren't ready for it and the programming costs to take advantage of it dwarf the benefits for our clients.)
Most of the important systems that need to scale have very, very bad usage patterns from a hardware buyer perspective: for example, take course registration. (Edit to clarify: They are probably not talking about course registration, because there is no way in heck that peak is only 50% more than the steady state.)
At a university with 10,000 students, about 360 days out of the year you can run the course registration on a laptop while it is being used to play World of Warcraft. Then there is course registration season, at which point your peak concurrency goes from 1 user per hour to generally a multiple of your student population all signed in at once. (Because, no matter what you do, they will open multiple windows/connections/etc because "the site is so slow, come on duuuuude, why the heck is this POS always so slow?")
All the accesses are dynamic. Most of them have writes attached. You have to get caching right because if you overbook a class and tell 15 students that they're confirmed a seat in a room which sits 12 because your cache got stale for three minutes, your customer gets yelled at, and they will turn around and yell at you. The end users are also typically incompetent at using the system (typically 1/4 of them have never used it before) and they will perform an impromptu fuzz test on it.
(Oh the stories I can't tell, sadly.)
Those were good times. :)
Registration is as large of a system, but with higher peak loads at semester start, as you've indicated.
I think often times in a startup, especially one thats bootstrapped, its easy to get caught up in trying to optimize everything too early on to scale for a million users before you have your first thousand. Sometimes you have to optimize for your development time first and product development goals, and leave the optimization till you get a bit of runway.
> development goals, and leave the optimization till you get a bit of runway.
That's a typical way to postpone and fall into a trap. Not all optimizations take huge amounts of time. And certainly there are some architectural decisions you can take early on that can save you a lot of wasted time later. The most common is to prevent bottlenecks.
For example, it's not necessary to make a clustered DB from launch day but coding and managing the UI and middle-ware to be ready for a switch to clustered DB usually takes very little impact. If you don't do this and the site becomes successful, re-engineering your whole site to support a new DB is usually close to impossible or extremely expensive. The site ends up in the DB hardware feedback trap just like in the article.
Edit: format fix.
So first off the software costs where far higher than the HW costs. Second, possible savings of 750k would have come at the cost of features or development time both of which have costs. If anything this just points out how expensive using Microsoft solutions can be when you need to scale. But, even still I expect over 3 years the 750k "lost" is an insignificant cost vs the cost of development and roll out.
PS: Reading between the lines the HW costs where probably well under 150k. "up to 30% of 32 database cores all by themselves."
The customer purchased IBM x460's, 16 CPU's, 64GB RAM = $225k each in 2005, right after they came out. Three required (active, passive in a cluster at the production site and one for the DR hot site). Upgrade them to dual core x3950's six months later = another 200K (or so, the upgrade was to 16 dual cores, 128BG RAM for each of the three database servers ). Plus all the odd's and ends, like 60 amp 3 phase power, 70,000 btu's of cooling, etc.
The software/licensing/support costs were because with > 8 sockets, you needed Microsoft Data Center Edition, and at that time, it came with an expensive support contract ($70k/yr).
Had the vendor been about 6 or 12 months ahead on their optimization, the 16 CPU x460's could have been 8 dual cores instead, and the 32 core upgrade probably wouldn't have been needed - or if it was, it could have been delayed until quad cores were available, the expensive OS support would not have been needed, half as many Microsoft SQL Enterprise CPU licenses would have been needed, and the 24x7 hardware maint contracts would have been half as much, etc.
I suspect that the author is looking at a problem that would require tons of hardware regardless of how well it was optimized. Evidence of this can be found in the fact that even after tons of optimization, his $1.5M setup is still running at 20% load steady state.
Our single <$5,000 box handles about 4M pageviews per day without moving the cpu above 5% steady state. That's the sort of baseline I'm used to from the Microsoft stack, so it causes me to question whether the author is really looking at a mainstream case.
You realize how meaningless a statement like this is, right? You just can't go around talking about "pageviews" as if they were some uniform measure of workload.
The problem is obviously when you're not serving static pages.
My day job sells integrated solutions to Japanese universities. We would care very intensely how much the hardware costs in a circumstance like this. (Which we wouldn't be in, because we try not to deliver software which is an abomination against all that is good and holy in terms of database use, but still.)
The math is simple: the university typically has a budget of X million yen to Get This Done. It doesn't matter what the line items are on our invoice -- we can't charge them more than X million yen.
Given that constraint, what do you think we want to charge them? Software license fees for our solutions, where our margin is anywhere from... crikey, I can't tell you the numbers, but "high to higher". Software license fees for third party providers like a certain Enterprise Database, where the margin to the reseller (us) is from low to medium? Or the margin resellers get on hardware, which compared to our software is small enough we could use it to remove food from between our teeth?
In your situation, actually in most situations when the client can compare the costs for both software and hardware before, it's logical to optimize the software.
So is the risk.
Nothing drives customers crazier than a slow system because someone else is having a busy day.
I spent my first 10 years decoupling applications from each other because independence was more important than economies of scale. Now we're swinging back the other way. Hopefully, we will have learned something this time.
The customer hardware is IBM xSeries (3950's), with modular, stackable components, much like a stack of switches. It could be broken up into smaller servers and re-used on other projects.