Postmortem of my 9 year journey at Google (opens in new tab)

(tinystruggles.com)

198 pointsdelive1y ago231 comments

231 comments

There are a bunch of comments saying that Google is just like any other big tech company and that the exciting engineering bit has gone. My experience is only from the last 2.5 years, but I've got a slightly different take.

Engineering from >10 years ago seems like it was a wild west. Some truly stunning pieces of technology, strung together with duct tape. Everything had its own configuration language, workflow engine, monitoring solution, etc. Deployments were infrequent, unreliable, and sometimes even done from a dev's machine. I don't want to disparage engineers who worked there during that time, the systems were amazing, but everything around the edge seemed pretty disparate, and I suspect gave rise to the "promo project" style development meme.

Nowadays we've got Boq/Pod, the P2020 suite, Rollouts, the automated job sizing technologies, even BCID. None of these are perfect by any means, but the convergence is a really good thing. I switched team, PA, and discipline 6 months ago, and it was dead easy to get up and running because I already knew the majority of the tech, and most of that tech is pretty mature and capable.

Maybe Google has become more like other tech companies (although I doubt they have this level of convergence), but I think people glorify the old days at Google and miss that a lot of bad engineering was done. Just one example, but I suspect Google has some of the best internal security of any software company on the planet, and that's a very good thing, but it most certainly didn't have that back in the day.

mike_hearn1y ago

I left over ten years ago and it's hard to understand that perspective. Back when I was an SRE (~2006 to 2009) there were only one or two monitoring systems (which didn't overlap, so you could argue there was one) and a handful of config languages. Compared to anywhere else Google had military levels of discipline and order.

> Deployments were infrequent, unreliable, and sometimes even done from a dev's machine.

Deployments were weekly and done from a dev machine because that way someone was watching it and could intervene in case of unexpected problems. Some teams didn't do that and tried to automate rollouts completely. I could always tell which products weren't doing enough manual work because I'd encounter obviously broken features live in production, do a bit of digging and discover end user complaints had been piling up in the support forums for months. But nobody was reading them, and the metrics didn't show any problem, and changes flowed into prod so the team just ... didn't realize their product wasn't working. There's no substitute for trying stuff out for yourself. I encounter clearly broken software that never seems to get fixed way too often these days and I'm sure it's partly because the teams in question don't use their own product much and don't even realize anything is wrong.

danpalmer1y ago

I think the state of the art has moved on quite a way from this. I understand the point of view that someone should be watching a release, but the alternative is not "no one watching a release", but more that binary releases should be no-ops. With feature flagging the binary release should do nothing different so that no one watching it is not a problem.

Additionally, rolling out from a dev machine brings so many risks – security, reproducibility, human error, and so on.

I'm glad this is not the way things work anymore, and for the most part things are more reliable as a result.