The reason games have these launch problems is that publishers got away with it for too long. Having users who already paid for the game do the beta testing is much, much cheaper and more efficient than in-house testing.
I sincerely hope that steam's move to grant refunds (http://store.steampowered.com/steam_refunds) will sort this out. "Oops, 30% of our pre-alpha-published-as-final purchasers want their money back". Have one or two publishers file for bankruptcy over that and it might be a very valuable, albeit painful, lesson for the industry.
Another issue the article failed to address is that many games fail catastrophically when servers cannot be contacted even for online features that are not necessary for core gameplay; or for DRM purposes. This is a flaw in the front end that contributes to the same problem.
All those "causes" are primarily fueled by the following:
"It's a major threat to their business. So why does it keep happening?"
This here should have clued him into the larger context. It isn't a major threat to their business. Getting a refund on a videogame is becoming easier, but has historically been difficult. Try returning an opened copy of a game to any retailer. Even explaining that it is the game itself that is broken does not always result in an accepted return.
The consumer, rather than the business, is bearing the cost of a failed launch.
So Augustine is correct in a technical sense regarding how backend issues contribute to failed launches. However, it is the market as a whole that is the primary driver behind failed launches. There is little incentive to fix the failed launches regardless of the technical means by which they occur.
"So why does it keep happening? It comes down to the fact that most games are now online services, with updated content, special events, virtual economies, player interaction, etc"
As someone who plays both online and offline AAA games on console (PS4, Xbox One) and PC, I haven't seen any evidence that online-heavy games are more likely to ship with major problems than offline games.
For both types of games it is routine these days to get games on day one where you have the base game, you have the near obligatory multiple-gigabyte 0 day patch applied to deal with problems found between gold master and actual release, and then you still have obvious crash bugs, serious problems with animation systems, physics systems, etc. Connection issues with backend services are just one of a very long list of game breaking problems, many of which are seen in totally offline single-player games.
The whole "preorder" culture for gaming is a large factor in this problem, IMO. There's not that much incentive to ship a product that is solid when you've already gotten a large amount of the money you will get on release before the product even ships. This is even more evident in all the failed kickstarters where people have paid for alpha-level games that were never released or evolved much past where they were at the time of fund raising.
To a large degree the gamers are thus to blame for perpetuating this problem due to buying into the preorder culture (presumably in order to avoid loss aversion due to missing out on preorder bonuses).
I'd say the main reason is that video games companies are usually run by a bunch of amateurs who are good at a few particular things (coding, design, creativity, and maybe even "hyping" things) but have shortcomings when it comes to engineering and delivering consistent, solid quality. Many companies dont have regular processes, and rely on employees forsaking weekends in order to deliver things on time.
I know it's usually not popular here to talk about the necessity of Project Managers, but the Video Games Industry is a pretty good example of an situation where more (i.e. better) management could help make things better compared to the current mess.
For instance, we've had articles here about how beautiful the Quake 2 and Doom 3 codebases were. It's not that John Carmack isn't an amazing programmer (I am sure he is far better than me), but the techniques for readability and maintainability that he used in those games are just what anyone writing in a large codebase should be doing. The style he used is exactly what we demand when interviewing senior devs. But then game programmers look at it as if he had discovered cold fusion. But when all you see is games, that are written expecting that once the game is shipped, nobody is going to touch your code again, standards go down. Very different from what we do in business programming, where we know some systems might end up living 15+ years, so we have to write things planning for the system to keep evolving for decades.
If anything, online games and MMOs should be written better, precisely because they are far long lived, and will be edited more, than a throwaway that will just see a few patches if users complain.
The other side of things is where you actually have a team capable of delivering a solid, scaling backend service, who simply do not get the chance to do so because of poor production planning and overzealous development schedules handed down from on high. This is not an industry where the engineers get to define the development schedule or feature set. They can influence both, but on many teams the managers expect that crunch will solve all of their scheduling mistakes.
I think you guys at PlayFab are in a great space as this is work that is very difficult to pull off right and many, many developers will (smartly) choose to pay for a solution instead of rolling their own.
One issue parts of the games industry struggle with is simply coding competency. There's a tendency to hire newer grads and folks who will work for lower salaries, because people always want to make games. (And why not? Making games is awesome.) This leads to turnover, poorly thought out (or over-engineered) designs, lack of "common sense" things like load testing before launch (remember the industry standards you learned at college? if your college was anything like mine, you didn't), language soup, etc. And in my experience, the best backend systems are built by those with at least a bit of that experience already - rather than gameplay programmers trying to teach themselves what the CAP theorem is. But what senior engineer wants to work with legacy spaghetti code, or unlaunched promises (and the threat of future layoffs), when they could work at Facebook or Google or whoever?
At a previous large tech company I worked at, we built a games team internally to work on large scale platform stuff (pretty similar to what PlayFab is doing but aligned with said company's products). It was really cool - we got folks who were solid engineers but also ex-games industry, or avid gamers, themselves, advertising team openings via the videogames@ internal mailing list. We tried to combine the culture/fun of the games industry without the baggage and conditions.
But platform isn't content, and I recently joined a similar kind of team in an actual games company (which operates much more like a tech company than most, since our game is operated as a live service rather than a one-off downloadable release). Being able to work alongside artists, designers, narrative writers, sound engineers, event producers etc creates a really creative environment, and though I'm working on MySQL performance tuning and internal monitoring data pipelines, I get to hang out and talk about the new champion releases with folks at lunch. A nice balance, but for those who can't afford their own platform team, or don't have the carrots to lure us in, PlayFab seem like a neat alternative. :)
Facebook, Twitter etc begin by a stable (code) base and then step by step scale up their systems. When something does not scale they can roll back or quickly fix it. They don't have to ramp up from a few hundred players to hundreds of thousands in a few hours like in the gaming industry.
There's this thing, they're called dedicated servers. The reality is that game companies CEO's/execs don't want to pay for quality when they can push it out now get the money and patch later, aka. They put in the least effort. The whole thing is intentional at big companies, a combination of intentionality and incompetence at small ones.
I do not disagree on your point that it might be a combination of intention and incompetence, although I think most of them really do not want to see downtime and negative press at release date, when they sell the most. So next to intention and incompetence there is also a strong incentive for them to prevent this from happening.
But the orignal point I was making is that scaling up all of a sudden for an enormous amount of people/traffic is intrinsically very hard, so its easy for them to underestimate the effort needed, and pay deeply for that on launch date (and we too as gamers).
Also: a game launch is like an explosion going off, lots of pressure on the first day. Marketing usually wants to concentrate everything on that one single launch event. Demand on launch day and the 2..3 days after can easily be 100x to 1000x higher than in the second week. You now have the decision to somehow spin up many times the backend infrastructure that you normally need, with the associated cost (theoretically achievable with a cloud service, but see below), even if it is technically possible to smoothly scale up against millions of players wanting to login simultaneously somebody needs to take the cost hit, this is usually when the finger-pointing between management and tech begins. Sometimes it's easier to 'wait it out', since the storm will be over after a few hours.
As for cloud services: Most game-servers have soft-realtime requirements which many cloud providers can't provide (at least with the cheap cost plans). For a web server running at a cloud provider it isn't much of a problem if you have 250ms load spikes or delays from time to time, for an action game it is catastrophic.
And finally: players that can't login will spend the time raging on the internet. If you can't get your iOS update on day 1: well, shrug, I'll try again tomorrow :)
Except that now the player can spend the money twice in the race window...
That kind of sloppiness by game developers might be related to the launch problems they have.
From what I've seen Agile does not work very well here, at least on the engineering side. Design might be doing something similar to Agile on a small scale.
The reason we don't see it everywhere though is because in AAA games, deadlines are seen as something so important that large parts of the game are built simultaneously. In your typical modern story-based action game, you have different teams working on different levels, and some people might spend all their time in just one or two levels: To get that level of parallelism, and have over 100 people working on a game, chances are that the development process will not have much to do with agile.
But many of the games that most people would consider great come from much love, refinement and iteration, along with relatively long development periods. This is why Blizzard always takes forever, and we have the concept of Valve Time.
That and the obsessive desire for control of everything.
Scalability problem are not because the load of gaming, but as you say because they are obsessive about stealing every piece of info they can to better funnel users into premium.
Remember when people could host dedicated server? You would never hear quake having scalability issues or need to be saved from ec2
Last years when silicon reached it's limits, all kind of developers start to worry about the performance. The question "how to make a more advanced game?"'s answer is not git gud on hardware anymore. It's up to the software today (again).
Granted, I'm not sure why they included some games with client side problems in their list of examples.