Why should I do production support? (opens in new tab)

(devenbhooshan.wordpress.com)

78 pointscdev5y ago130 comments

130 comments

You don't get anything out of burning out. You just burn out. And that time is not coming back. This post seems like an apologist talking.

Production support alone is not that much of a problem. What the author skipped (conveniently? or forgot to mention?) is - it's really the "on call" phenomenon that's the problem.

The "typical" on-call - where when you are on-call you are magically on-call 24x7. Yes, during your sleeping hours as well; as if that's less important and the company can avoid spending money to hire dedicated support for those hours and instead make you suffer (yes, it's just that - there's no other name for it like "satisfaction", "learning", "growing" or any of those buzzwords).

You want engineers to do production support? Well, let them do it during normal office hours and only few times a month. Or heck, let them do it for weeks but let them punch in and punch out normal office hours. Let them choose to do only one half of the day and have someone else willing to do the another half.

There's no excuse for burning out engineers (esp. unsuspecting youngsters) by pushing them into ungodly hours of work ruining their health among other things while trying to constantly tell them - "do you even realise what a service to humanity you are doing!".

It's just exploitation.

dopylitty5y ago

The point about on-call is really critical and really on-point.

If a company thinks an application is important enough to run 24x7 then it should staff for 24x7 support. Stealing wages from workers by expecting them to be available 24x7 (on-call) is an absolute abuse.

It also leads to burn out, poor performance during the day (how is a dev's development ability when they were up at 2:30am on an incident call the night before?), and clouded thinking causing mistakes or impacting recovery time during incidents.

nineteen9995y ago

> If a company thinks an application is important enough to run 24x7 then it should staff for 24x7 support

And where it really matters, they do. My team and I build and manage a large Emergency Services telecommunication network. We have Tier 1/2 operators on shift work 24/7. Tier 3 staff (programmers, system integrators and administrators) are their escalation point for critical issues outside of business hours.

> Stealing wages from workers by expecting them to be available 24x7 (on-call)

The Tier 3's that are on-call in our environment are on a rotating roster are compensated nicely for being prepared to answer the phone outside business hours. Frequently they don't get called during their week at all and it's free money.

> how is a dev's development ability when they were up at 2:30am on an incident call the night before?

Easy, as well as the financial compensation, we give them time in lieu. Two hours callout in the middle of the night, two (paid) hours given back on their next working day, or whenever they prefer, subject to availability of other staff.

There are simple solutions to these problems, and where they matter, they are applied. Granted things are very black and white for us as lives are potentially at stake, but any company that wants to have 24/7 engineers available needs to pay for that kind of support.

4 more replies

HenryBemis5y ago

The companies do this for the money. And the people what work in those companies have no real sense of the risks, or they just care more for the numbers and want to roll the dice.

I would not trust someone who I just woke up at 2am to do something. He/she is mid-sleep. They will be prone to errors, they will be super tired, and I just ruined their next 1.5 days that it will take them to recover from that.

This is not a job where you live boxes where intellect is not needed as much, (strength and stamina will also be affected by a mid-night alarm). You want your folks to be 100% on par, otherwise they may make things worse.

1 more reply

MattGaiser5y ago

They should also allow time and budget for building an application that can run 24/7 without too many errors.

"We need [insert thing manager asks for here] immediately" has consequences.

1 more reply

burnt_husk5y ago

Having worked in the tech sector for a while now, having burned out once, and having been on-call 24x7 far more than is healthy, I would say exploitation is the name of the game to a lot of managers.

You are less of a person and more of a means to an end. A tool to achieve something, and some tools are disposable. It can be of career advantage to a manager to burn out engineers. Maybe instead of spreading 24x7 on-call across 3 teams in three timezones, you put it on 1 team in 1 timezone. By doing so a manager can achieve a lot with less resources, and hopefully secure their own elevation up the corporate ladder before the cost of their strategy becomes evident.

The cost of burn out I think remains hidden, in technology there's a constant flux of staff anyway, teams being being created and dissolved, in all the noise a few people being exhausted and bailing from the company is hardly noticed. Perhaps they said something before they left, but it's best for everyone in middle management if the burnt out individual is labeled the problem, they were a bad culture fit you see, a grumbler who didn't have what it took.

danielheath5y ago

Burning out from on-call comes from not being able to fix the underlying causes.

I'm happy to hold the pager if I've also got the right to block/rollback deploys until the system is stable - my current job has had two out-of-hours pages in the past year, and we're in the alexa top 10k so it's not like there's no traffic.

taylodl5y ago

Bingo! There's nothing more motivating to "prioritize" fixing a problem than being woken up at 2 a.m. being affected by it. I've worked in places where one team writes the code and another team supports it and it's always the same: the code is absolute garbage. It ends up being too complex and having way too many moving parts, and is impossible to diagnose. I've worked in places where the team who writes it, supports it - 24x7. Their code is always super simple, easy to diagnose, and easy to maintain.

I've been at this for 35 years at many companies and working with many teams and it's always the same: if you want good software then make the team creating it also support it. In every case I've experienced it leads to software requiring little to no support, easy to maintain, and easy to extend. Why? Because nobody wants to get up in the middle of the night or work weekends and moreover, they'd rather be adding features than limping along with existing features.

mjayhn5y ago

I'm in ops and this is a huge, huge headache for those of us on our end at companies where we're not given Google SRE powers to control release trajectory based on failure budgets of some sort.

I've worked at too many places that had no SWEs on-call for the on-call alerts that I get, which the vast majority of the time involves throwing a bandaid (as in redirecting traffic, etc) in front of an internal bug that I hope eventually gets fixed once the RFO/etc has been submitted before it hits my NEXT on-call rotation or my poor coworkers.

Without SWEs on my rotation they don't understand the immediacy. They aren't the ones getting their Christmas week interrupted every 4 hours while ops keeps the house running. In Ops having your entire day ruined by various on-call alerts usually feels like you're working without any breaks and nobody even cares.

Anyone want a bad golang developer, wannabe ex-ops person who knows a lot about platform reliability and o11y and wants to focus on the golang end finally? I'll make your teams automation and o11y purr no matter where it is (bm, cloud, global pops, serverless..)..

afberg5y ago

I think there's only one way to solve this — which I've been unsuccessfully advocating for at my current company — and that is paid voluntary on call schedules.

It creates an actual market for on call work where engineers can simply say no to the extra cash if they don't like work taking up their nights and weekends. If the company is having trouble with no engineers wanting to be on call the pay is simply too low and needs to be increased. It's a job like any other and should be compensated as such.

In the end I honestly believe it will be beneficial for the company not having engineers burn out so quickly. Compensation also clearly sets the expectations — if you're being paid to do it you'll take it more seriously.

Just my 2 cents

nimrody5y ago

But does it give engineers a good incentive to improve product quality and reduce the number of production incidents?

Where I work we are not on-call. Nevertheless, I try to help the ops team when they encounter issues. This does make you improve logging and error handling since you know it takes a lot more time when it's difficult to filter logs for the interesting events.

Engineers not exposed to production issues and customers will never understand why you need these extra measures.

1 more reply

tybit5y ago

I work somewhere that does this and it works surprisingly well. It’s rare to find an employer that’s willing to look after their staff though.

hinkley5y ago

> "do you even realise what a service to humanity you are doing!".

I don’t know who needs to hear this besides me twenty years ago, but if you want to do charity then go home at 6pm and volunteer at a real charity. Don’t do it for a wannabe robber baron who will not share with you. Don’t do it for someone where even an emotional payoff is years away or may never come.

Find something else you care about and help some people just because. Not because you’re getting under-paid and over-guilted to do it.

tasogare5y ago

Well even "real charity", including very famous ones, are quite shady too: volunteers do work for free while donations goes towards C-exec 5-6 digits salaries.

m4635y ago

Various places I've worked put certain people on-call. They were salaried but somehow paid extra for the time they were on-call. It seemed to be voluntary and the people who did it liked the extra cash.

On the other hand I had a friend at a very large and well known company. He got a job offer and was hired into one department, but he wanted to take a little time off between jobs before he started. They somehow convinced him to start saying there was a holiday coming up and he could take the time off then.

and as soon as he came in he started getting calls 1am 2am 3am etc...

so he left.

And they cajoled him back saying things were different and he finally bought it and went back.

same thing happened again, and he quit for a final time.

One part of the problem was that he was a US citizen working with a bunch of H1B visa folks and the company could get away with that sort of stuff. H1B folks will say yes sir, no sir because their dreams of living in the US are tied to keeping their job at all costs. and then the bad work culture festers.

throwaway0a5e5y ago

I'm not gonna advocate for having engineers directly field customer calls at 3am (because that's a recipe for unhappy customers) but as someone who fields customer support calls/tickets as a large part of my job I feel very comfortable saying that a "sizeable enough to be a big problem" will not build systems that are supportable without engineering escalations unless they get support trying to escalate an issue or conference them in at 3am from time to time.

fatnoah5y ago

>The "typical" on-call - where when you are on-call you are magically on-call 24x7.

I ran the Engineering org for a startup and we had a small, 3 person Ops team that handled initial triage of events. About 75% of these issues were Engineering-related. My solution was to a) create an on-call rotation for Engineering and b) allow the Engineers to prioritize reliability work.

It sounds like a no-brainer, but I had to fight with the rest of the exec team to allow b) to happen, since it came at the expense of the product roadmap. I eventually won the fight and our nightly on-call volume went from 1-2 incidents per day to 1-2 every few months.

Vindication came about a year later when we were acquired by a large company. As part of the due diligence process (including 18 hours with me going over technical details in front of 30 senior folks from the acquiring company) we got major kudos for having a level of reliability that far exceeded what they typically saw for a company our size.

hinkley5y ago

It’s usually a case if taking responsibility for things you have no power over.

When they ask it’s usually after a giant hole has been dug, and patterns have been set. If you knew from the outset that you would be the support team then you’d have prioritized some other tickets. You’d have increased the estimates on others. You would have refused to work on these three, you would have argued vigorously about these four decisions, and you would insisted your boss fire “That Guy” months ago because his code is garbage and his only real skill is articulate deflection.

This group of folks wants several somethings for nothing. One of them is labor, another is somewhere to assign blame. They are grooming you for failure and we all deserve better.

mtberatwork5y ago

> There's no excuse for burning out engineers (esp. unsuspecting youngsters) by pushing them into ungodly hours of work ruining their health among other things while trying to constantly tell them - "do you even realise what a service to humanity you are doing!".

In the US, this is happening across the board and not just in tech. The expectation to always be available [often without monetary compensation] is sadly the new normal. Without strong labor laws in place, this implicit form of exploitation will never cease.

rdtwo5y ago

Everyone is doing it is the excuse

lifeisstillgood5y ago

The cure - Unionisation

seriously

throwaway0a5e5y ago

Police officers have pretty much the strongest unions out there and the junior folks on the force (i.e. exactly the kind of people who get put on call in tech) generally wind up stacking absurd combinations of shifts in order to be paid competitively. Rail workers, another strongly unionized profession, have it no better.

I get that you like unions but just because you have a hammer doesn't make every problem a nail.

Galanwe5y ago

The cure - proper labor law.

Seriously

2 more replies

bowmessage5y ago

heh, I started searching on Spotify...

1 more reply

hanniabu5y ago

You're talking as if they care about the exploitation. It's done purposely. They simply don't care when it means greater profits. They also believe there's an endless line of devs they can burn out and throw away.

zeckalpha5y ago

If the engineers aren’t oncall, who is? Is it okay to exploit non-engineers? If anything, it is less exploitative to have those who are empowered to improve their situation oncall.

detaro5y ago

Given that the parent comment is clear about the problem being the 24x7 expectation: Someone paid to work or be available that shift, engineer or not?

2 more replies

magicalhippo5y ago

Our support folks handle the on-call support. They do one week each on rotation, they get some extra compensation and the following friday off.

If it's a serious issue they can't handle they might wake up one of us programmers, but usually they can find some temporary fix or workaround until the next morning.

1 more reply

MattGaiser5y ago

People seem to get stuck in support work. That would be my aversion to doing too much of it. At some of my prior workplaces, there have been people who have been so good at support that they never got assigned to do new development work. Engineers far more experienced and senior than I was (or currently am) were dealing with trivial issues as they were good at it while I got the nice greenfield project.

They had to quit to get out of support.

x87678r5y ago

This has been a huge problem for me. I've always loved working with customers and dealing with real problems under pressure. So have enjoyed my time on support rota - but quickly L1 guys learn I'm better at support that other team members so everyone contacted me directly. The grumpy devs who pretend they dont know anything about the live system get interesting projects, no interruptions and a much better resume. I've learned to say no. Its also a good reason to move teams as you're not the super experienced guy who has to fix the urgent gnarly support issues.

jasonlotito5y ago

Of course, if they were the ones who originally built those products, they should support them. Why should people that create software that requires so much support they quit over it be entrusted with yet another greenfield project without fixing the stuff they built in the first place?

runawaybottle5y ago

Some people have a switch then turns them into bug fixing demons. I know I get a rush out of fixing prod issues. People notice that, especially when you just jump into foreign code that you never touched and come out with a fix within an hour or two.

woutr_be5y ago

I work in finance, where we have clearly defined processes for production support, simply because developers don’t have access to production environments.

However, production support teams don’t have a real understanding of our application and how it’s build. So most of the times you have engineers on call with production support, telling them how to debug the problem and come up with relevant logs.

It’s incredibly infuriating and time consuming, and I absolutely hate doing it this way.

90% of the time you also get incredibly vague bug reports with irrelevant logs, and a description of what they think the problem is. Most of the time you need to spend another day finding correct logs and somehow debugging it. Most teams log every single request with all parameters and payloads because they can just replicate the problem locally instead of relying on production support.

We’ve long advocated for either having dedicated support or have engineers on some sort of schedule that can do support.

pawelmi5y ago

I'm curious, why can't people that created the system take part in production support. You've mentioned finance, that I presume require high level of security, but at the same time there are also people on the other side, just not knowing the system first hand and perhaps having skills different from knowing how to debug software. They can see all the data and in theory modify system behavior, eg modify/install any binary. Why are thrall developers less trusted, is it some kind of logic or regulation or just "the way it has always been done" in finance?

x87678r5y ago

> just "the way it has always been done" in finance

Twenty years ago for many systems devs could do whatever they wanted in production. There were insider trading scandals and combined with SOX, regulators cracked down on it so now devs have lost at least write access. If you have an old system that relied on knowledgeable devs to fix stuff its a terrible situation where people just quit and no one can support it.

woutr_be5y ago

Part of it probably stems from that it's always been done this way, and the processes haven't evolved over time. Another part is also to protect the bank against rogue employees, it wouldn't be the first time a developer made changes against a production database.

You're right that production support has access to those systems, and could potentially make changes and install different binaries, but the amount of people that can do that is extremely limited. Every change also requires a change request that needs several approvals, to request data you need another data request.

1 more reply

goatinaboat5y ago

They can, they just can’t have direct access to live systems due to separation of duties. But there are methods for dealing with this, like centralised logging so a developer never needs to see the original log file on the problematic box.

1 more reply

HenryBemis5y ago

Seems like we were writing in parallel.

"We don't trust" a dev. The change management processes demand the existence of 1) Dev, 2) Librarian (we used to call them that)(that would review and transfer the code, or review and compile the code), 3) the prod sys admin.

Some orgs may have a slightly different setup, but in some form or another, but (these general) rules apply.

Today with tools like CyberArk it is easier to grant temporarily privileged access to a dev for production support, we also got the tools to trace/monitor/record access, so it makes the process auditor-friendly.

user59944615y ago

Also in finance with dedicated support teams. Our support was great, some of our guys in Asia were outright fantastic at debugging.

To be fair, being great wasn't enough, their job was only possible because the company had unified tooling. A single deployment solution that was deploying near 1M tasks a day in the company, allowing all employees to lookup what is running where and see logs.

This made me appreciate just how useful it is to have both dedicated support AND unified tooling. The average company couldn't benefit from having folks on rota because it's impossible to figure out where anything is running.

woutr_be5y ago

We still don't have a single deployment solution, we don't even have single hosting solution. We can choose between 3 different cloud providers, or dedicated servers. (They're pushing for cloud now)

The thing is, this all is pushed down from management. In my previous project, we tried to automate as much as possible, but at the end of the day, our production support still wanted to deploy manually. Our business still wanted to see manual end-to-end tests with screenshots.

Then there's also different regulations in certain countries where you need to host your application and database in the country itself, so that's another solution.

Working in finance can be a real eye opener sometimes.

arminiusreturns5y ago

>However, production support teams don’t have a real understanding of our application and how it’s build.

This reeks of bad documentation to me (which finance is notorious for). If a dev has to be on to support normal prod ops thats largely due to errors in both documentation and often in poor tooling. Sometimes those errors aren't as much the devs fault because of management decisions, usually related to understaffing, but I hate how prod support gets shit on so often for failing to fix an issue when it's not really their fault.

woutr_be5y ago

You're not wrong, the entire thing is because of poor management decisions and poor processes. I don't really shit on specific people, more on the entire process.

> This reeks of bad documentation to me

Not necessarily, you can document your entire application, but production support only looks at the logs, and does a data extract based on what they see. It would be far more beneficial if you had someone who has a clear understanding of the application so that they can help with debugging and actually solving the problem.

At the end of the day, production support are teams who help with 10-20 applications, it's impossible for them to truly understand specific applications. They receive a bug report from the business, investigate and extract logs, then pass it to the relevant development teams. If you need extra info, well though luck, you can reply to the ticket and wait for it to be picked up again. It's no surprise companies like this move so slow.

HenryBemis5y ago

Adding to the above 100/% spot on comment; "we" in finance/banking do this because segregation of duties is mandatory, none of that DevOps nonsense ;)

In the off chance that a dev has the unique knowledge to solve a problem, they may get the firefighter/temporary elevated access needed, but will have to document the reason and the dev's actions very very well, because both internal and external auditors will zero in on that.

sdevonoes5y ago

I like to write software but I don't want to be on-call if the software I wrote breaks at 3am in the morning. I do take my job with professionalism, I do write tests for most of it (not 100% coverage, but 100% coverage of the critical parts), I do monitoring (and answer and fix alerts if they happen during working hours) and I don't deploy on Fridays (and don't allow people to deploy on Fridays).

My code will crash sooner or later. I already know that. I don't write 100% bug-free code. But I cannot accept to give 100% of my time one week per month or so to a company in exchange for money. I just don't understand why people can't understand that I can be a professional only during 8 hours per day, but not more.

0xbadcafebee5y ago

This attitude almost always turns into the following:

On-call: "Hey devs, I'm being woken up at 3AM because your app sucks. Please fix it." Devs: "Sure, no problem."

4 months go by

On-call: "These alerts are still coming in at 3AM. Did you fix the issue?" Dev: "We have a lot of work, we can't dedicate all our time to some minor problems, we have a deadline."

Next week, Devs are put on-call.

The alerts are fixed in two weeks. Site reliability goes up. Apps suddenly become more resilient to failure.

Honestly, the whole attitude of not wanting to work more than 8 hours is privilege. Most of the rest of the world works long hours. As a dev, you get a good salary and a job you don't have to break your body to do. The least you can do is be completely responsible for your own code.

And it helps you as an engineer. Like the article points out, it creates empathy for the users and product support engineers, it helps you improve architecture and app design, and it helps you understand different failure domains. You won't learn all that on your own time, especially without the scale of production.

cabraca5y ago

Putting the blame on the devs is to easy. There are a bunch of management layers between the on-call team and the devs in your example. If management does not prioritize those alerts, its not the fault of the devs and its the wrong to put punish the devs for it.

> Honestly, the whole attitude of not wanting to work more than 8 hours is privilege. Most of the rest of the world works long hours. As a dev, you get a good salary and a job you don't have to break your body to do. The least you can do is be completely responsible for your own code.

unless i signed a contract that states i will do on-call, i'm not gonna do on-call. I doesn't matter how long the rest of the world works.

1 more reply

sdevonoes5y ago

> This attitude almost always turns into the following: [...]

Well, that's another problem (the dev not being able to solve a bug that reappears at 3AM).

> The alerts are fixed in two weeks. Site reliability goes up. Apps suddenly become more resilient to failure.

I always wondered why DevOps has the "Dev" in its title. At least, in most of the companies I have worked on, it was DevOps the ones that were on call (payed), but they were very picky regarding what they can touch/work on (they almost never touched application code... we should call them "Ops" then, no?).

> Honestly, the whole attitude of not wanting to work more than 8 hours is privilege.

And it's a privlege I'm thankful for. What's wrong with that?

> As a dev, you get a good salary and a job you don't have to break your body to do.

We do break our body to do software engineering (our brains, to be more specifically). If you think physical work >>> brain work, well, that's relative. Every person is different, and for me, brain work is equally taxative as physical work.

> And it helps you as an engineer. Like the article points out, it creates empathy for the users and product support engineers, it helps you improve architecture and app design, and it helps you understand different failure domains

I know I can become better by working harder and smarter (it's obvious), but I just want to be the best version of myself by putting at most 40h/week. Isn't that something honourable in itself? Or does that make me a "bad engineer"?

1 more reply

condercet5y ago

This attitude is terrible and exploitative. Employee's have a duty of fidelity and good faith to their employer, sure - but highly skilled and in-demand engineers should demand better of their employees.

stronglikedan5y ago

> Most of the rest of the world works long hours.

Most of the world works labor jobs, and studies have shown that the body can work longer than the mind without burnout.

robmsmt5y ago

enter devops

ocdtrekkie5y ago

If you aren't doing production support, you don't actually know your product. You aren't connected to the pain points your users experience and you miss what is, to your support team and your customers, the glaringly obvious.

I would argue all developers should be required to do some support work.

ed25519FUUU5y ago

Having skin in the game will make you a better engineer. You’ll get better at the non-sexy things: monitoring, alerting, testing, etc. You better believe a person who is pages for software at 3:00 am has an incentive to make that software more reliable.

ocdtrekkie5y ago

I'm not even talking 24/7 pager work. But just tackling some support tickets as part of your job so you see where people are having issues with your product.

Too often I see BigCorp development teams seeming blatantly oblivious to where their pain points are, and it's because they aren't forcing their developers to do support. They're pushing code, but they aren't pushing code that solves real problems for people.

2 more replies

lostdog5y ago

But the directors and VP's aren't getting paged at 3am, so the incentives of the organization still go to lower quality software.

1 more reply

cgrealy5y ago

Sorry, this is a terrible argument.

No one expects customer support people to write code. Why? Because they don't have the skillset.

Yet people who make this argument seem to think any moron can do support.

The skillset for an engineer is not a superset of a customer support person.

Have your engineers sit in on support, by all means, but actually making them DO support will result in unhappy engineers and sub-par support.

Do not undervalue a good support person. They have a whole suite of skills engineers often don't have.

TuringNYC5y ago

Dont most developers do level III support? Ultimately if the first and second lines of defense cannot solve the problem, from everything I've seen, it goes to an Engineer. If it is your product, i'd assume it comes to you. This has been the case at almost every company I've been at.

ocdtrekkie5y ago

Don't dismiss the value of tier 1 and 2 support requests. Maybe 30% of your tier 1 requests are some confusion that support knows how to alleviate, but an engineer could make the product more straightforward to eliminate those support issues entirely.

You aren't looking for the hardest problems, you're looking for the problems your users hit the most that an engineer could reduce in the product.

1 more reply

jjirsa5y ago

Why should a company hire two lines of support to shield an engineer from bugs they introduce?

2 more replies

axaxs5y ago

I largely agree. My company has dedicated CS, a second tier 'triage', and well...me. I always prefer if customers just email me directly. CS is a frustrating, cost saving measure. And often I'll get tickets, 3 weeks later, like 'customer said they have an issue.' What's the point again?

seanwilson5y ago

Another pro for consultancy work: you can chose contracts that that don't require you to be on call for production problems so this isn't forced upon you.

I'm not saying you wouldn't learn from working on production, but whether it's worth the stress is another question. In terms of software development, it's hard to think of a worse feeling than when you do a production deploy, you hit refresh on the website or whatever it is, and it shows a fatal error, then there's a mad scramble to roll back the change and figure out quickly what went wrong before the consequences grow too great. Most of the time bosses + coworkers aren't that understanding about it either and get into finger-pointing.

user59944615y ago

Consultants charge 200% for out of hours work, if not more. I don't think they mind working extra hours.

They're never offered extra work though. Companies are always willing to wait for Monday when they are asked to put money on the table.

efitz5y ago

I really like the author's first point, that one of his learnings was empathy for customers. Being on support for a product you didn't develop really calibrates you to understand what a product needs in order to be supportable, and where customers have problems with products. My time years ago in support was invaluable in shaping me as an engineer; I regularly push back against features that I know will be difficult to support or difficult for customers to understand.

AkshatM5y ago

It sounds very much like the poster is describing an on-call rotation rather than "production support", which is a very different thing altogether.

Production support is customer support: responding to chat messages or communications from users.

An on-call rotation, on the other hand, involves responding to production incidents and mounting a proper incident response.

The Google SRE workbook has a great chapter on the subject: https://landing.google.com/sre/workbook/chapters/on-call/

johnbellone5y ago

I don’t know why you’re being downvoted; these are two entirely different roles.

brailsafe5y ago

I feel like the author is referring more specifically to troubleshooting issues when production goes down, which I'm fine with if I have no other people asking me for updates. But, I burnt out at my lost job trying to do support, because there was individual support baked into every contract for our SDK and not remotely enough people to handle it all. I was hired as a software developer, not a customer support person, and they are not the same thing. It's unfortunate, because it was my first and highest paying gig after realizing that I also have ADHD, and it was a good company. Thing is, if I have a problem to solve or task to complete, I'm not going to think about how long it's been since I replied to whoever about their pet issue. I'm just going to zone in on my thing, and if the guy next to me doesn't break me out of it by cracking eggs on the desk and burping, then I'll stay on that thread till it's done. That's how my brain works, and expecting otherwise is naive. Anyway, this constant context switching and battling my apparent insufficiency killed my spirit for the work and I turned into a blob of productivity. It's as stupid as expecting me to cook while programming, because either the food, myself, or the code will get burnt.

greesil5y ago

At some large tech companies, production support as a software engineer does not seem to be a path to promotion, unless you are a junior level engineer. And yet, solving some of the bugs encountered in production environments, especially with a heterogeneous set of users, requires expert level knowledge of a particular software library. It is probably a great way to coast, so I hear.

goatinaboat5y ago

It is probably a great way to coast, so I hear.

Or to stagnate, depending on how you look at it

hinkley5y ago

Coasting is just stagnation in a different reference frame.

greesil5y ago

One part of your life stagnates while the other parts move forward.

aprinsen5y ago

Teams I've worked on have not had dedicated support staff, but rather engineers rotate 24 hour support duties on a weekly basis.

I have always had mixed feelings about "on call". I dread my turn on the rotation because the imminent threat of a prod issue has a psychological impact on my entire week, even off hours, and usually for a day or two after.

If everybody on the team feels that way, maybe it can act as a forcing function for product quality. I've seen this work on teams that already cultivate a strong sense of ownership.

On the flip side, it really stresses me out, and I sometimes resent that I'm not getting paid overtime for 24hr on call days. Maybe that's just baked into an engineer's salary these days, though...

MattGaiser5y ago

The engineers at the organization I just departed (at least the ones in the support rotation which did not include me) got paid in both money and extra time off for their time spent on support tasks.

hinkley5y ago

I find myself both applauding that and shrinking away thinking “perverse incentives”.

What I want is to run an engineering organization as if you should never have to call us. And if you do you either get chewed out for making a frivolous call, or we’re falling all over ourselves because that thing that is happening should definitely not be happening and we’ll be looking at how to keep that from ever happening again, again.

kevinmchugh5y ago

Yeah, I had a job where only certain teams had on call rotations. Anyone in those rotations got an extra half day off per week on call.

I've also seen folks spotted extra time off for really gnarly oncall shifts. Folks should push to have such accommodations standardized.

1 more reply

jonpurdy5y ago

Anecdotal of course, but a couple of jobs ago I did on-call rotations for a week on every few weeks. I got paid some additional money and time off, but the psychological impact was too great for me and it just wasn't worth it.

wisecoder5y ago

99% of the companies don't pay for On call rotation / Production Support. Exploiting H1Bs for On call support is very common practice in IT industry.

jake_morrison5y ago

A lot of the pain around production support is easily solved by having staff in multiple time zones.

A good structure is to have first line support be relatively generic ops people. They can handle problems related to infrastructure, e.g. hardware failures, network problems, or issues that can be handled by adding resources. The deployment process should be consistent enough across applications that they can e.g. roll back to a previous release.

This covers the majority of production problems. After that, it's time to bring in someone who understands the details of how the application works. If the dev team is geographically distributed, then someone is available during working hours. Otherwise, we have to get someone out of bed.

If the dev team has done their job right, this should be a rare occasion. Making the dev team fully responsible for the reliability of the application means that they are motivated to make it reliable. Otherwise there is a tendency to have an underclass of ops people who get abused.

A fundamental mindset here is taking responsibility for the user experience, including reliability. If this is not owned by the product development team, then who?

mtberatwork5y ago

Convincing folks at the top of food chain that more staff is needed is one of the most difficult things to do.

g0510515y ago

Our company is trying to move to "devops" and having dev teams on pager duty, doing manual reporting, etc. They seem surprised at the amount of pushback from the devs.

comeonseriously5y ago

On the one hand, a lot of devs see production support as beneath them. On the other I think managers seem to thing they'll get a net productivity increase doing this, but there's not. Development gets much harder when you have to context shift several times per day. That being said, I do feel devs should do prod support. It gives them a better feel for their apps they build and how they're used and where the customer pain points are. But, I also feel there should be layers of support below the devs; devs should only get the ticket when they're the only ones left who can figure it out.

g0510515y ago

It's not "beneath" them, but it's an entirely different skill set. If you want me involved, approach me like a partner and we can work together, don't try to suddenly give me a job I can't do.

hermitcrab5y ago

I am an independent developer living off 3 software products I created and sell. I have done all my own support over the last 15 years. While it can be frustrating (especially for B2C) it is also my superpower, as it gives me some much more insight into how I can improve my products. I only do support via email (not phone or chat).

jp0d5y ago

It depends on what and how much one is learning from it. I once interviewed a guy for an ETL developer role. He was supporting an ETL application. In four years, all he had done was restart the application in case of any issues and it somehow worked for him. I've seen my fair share of such cases.

bcbrown5y ago

This seems very telling:

> I no longer work at Gojek

MattGaiser5y ago

Why? Engineers switch jobs all the time.

momokoko5y ago

> Engineers switch jobs all the time.

This also seems very telling

1 more reply

hyko5y ago

...because your company wants you to work two jobs for the price of one?

comeonseriously5y ago

Engineers absolutely should do prod support. But... There should be layers below. It should only come to engineering when nobody else can figure it out.

j / k navigate · click thread line to collapse

130 comments

crossroadsguy5y ago

You don't get anything out of burning out. You just burn out. And that time is not coming back. This post seems like an apologist talking.

Production support alone is not that much of a problem. What the author skipped (conveniently? or forgot to mention?) is - it's really the "on call" phenomenon that's the problem.

It's just exploitation.

dopylitty5y ago

The point about on-call is really critical and really on-point.

nineteen9995y ago

> If a company thinks an application is important enough to run 24x7 then it should staff for 24x7 support

> Stealing wages from workers by expecting them to be available 24x7 (on-call)

> how is a dev's development ability when they were up at 2:30am on an incident call the night before?

4 more replies

HenryBemis5y ago

The companies do this for the money. And the people what work in those companies have no real sense of the risks, or they just care more for the numbers and want to roll the dice.

1 more reply

MattGaiser5y ago

They should also allow time and budget for building an application that can run 24/7 without too many errors.

"We need [insert thing manager asks for here] immediately" has consequences.

1 more reply

burnt_husk5y ago

Having worked in the tech sector for a while now, having burned out once, and having been on-call 24x7 far more than is healthy, I would say exploitation is the name of the game to a lot of managers.

danielheath5y ago

Burning out from on-call comes from not being able to fix the underlying causes.

taylodl5y ago

mjayhn5y ago

I'm in ops and this is a huge, huge headache for those of us on our end at companies where we're not given Google SRE powers to control release trajectory based on failure budgets of some sort.

afberg5y ago

I think there's only one way to solve this — which I've been unsuccessfully advocating for at my current company — and that is paid voluntary on call schedules.

Just my 2 cents

nimrody5y ago

But does it give engineers a good incentive to improve product quality and reduce the number of production incidents?

Engineers not exposed to production issues and customers will never understand why you need these extra measures.

1 more reply

tybit5y ago

I work somewhere that does this and it works surprisingly well. It’s rare to find an employer that’s willing to look after their staff though.

hinkley5y ago

> "do you even realise what a service to humanity you are doing!".

Find something else you care about and help some people just because. Not because you’re getting under-paid and over-guilted to do it.

tasogare5y ago

Well even "real charity", including very famous ones, are quite shady too: volunteers do work for free while donations goes towards C-exec 5-6 digits salaries.

m4635y ago

and as soon as he came in he started getting calls 1am 2am 3am etc...

so he left.

And they cajoled him back saying things were different and he finally bought it and went back.

same thing happened again, and he quit for a final time.

throwaway0a5e5y ago

fatnoah5y ago

>The "typical" on-call - where when you are on-call you are magically on-call 24x7.

hinkley5y ago

It’s usually a case if taking responsibility for things you have no power over.

This group of folks wants several somethings for nothing. One of them is labor, another is somewhere to assign blame. They are grooming you for failure and we all deserve better.

mtberatwork5y ago

rdtwo5y ago

Everyone is doing it is the excuse

lifeisstillgood5y ago

The cure - Unionisation

seriously

throwaway0a5e5y ago

I get that you like unions but just because you have a hammer doesn't make every problem a nail.

Galanwe5y ago

The cure - proper labor law.

Seriously

2 more replies

bowmessage5y ago

heh, I started searching on Spotify...

1 more reply

hanniabu5y ago

zeckalpha5y ago

If the engineers aren’t oncall, who is? Is it okay to exploit non-engineers? If anything, it is less exploitative to have those who are empowered to improve their situation oncall.

detaro5y ago

Given that the parent comment is clear about the problem being the 24x7 expectation: Someone paid to work or be available that shift, engineer or not?

2 more replies

magicalhippo5y ago

Our support folks handle the on-call support. They do one week each on rotation, they get some extra compensation and the following friday off.

If it's a serious issue they can't handle they might wake up one of us programmers, but usually they can find some temporary fix or workaround until the next morning.

1 more reply

MattGaiser5y ago

They had to quit to get out of support.

x87678r5y ago

jasonlotito5y ago

runawaybottle5y ago

woutr_be5y ago

I work in finance, where we have clearly defined processes for production support, simply because developers don’t have access to production environments.

It’s incredibly infuriating and time consuming, and I absolutely hate doing it this way.

We’ve long advocated for either having dedicated support or have engineers on some sort of schedule that can do support.

pawelmi5y ago

x87678r5y ago

> just "the way it has always been done" in finance

woutr_be5y ago

1 more reply

goatinaboat5y ago

1 more reply

HenryBemis5y ago

Seems like we were writing in parallel.

Some orgs may have a slightly different setup, but in some form or another, but (these general) rules apply.

user59944615y ago

Also in finance with dedicated support teams. Our support was great, some of our guys in Asia were outright fantastic at debugging.

woutr_be5y ago

We still don't have a single deployment solution, we don't even have single hosting solution. We can choose between 3 different cloud providers, or dedicated servers. (They're pushing for cloud now)

Then there's also different regulations in certain countries where you need to host your application and database in the country itself, so that's another solution.

Working in finance can be a real eye opener sometimes.

arminiusreturns5y ago

>However, production support teams don’t have a real understanding of our application and how it’s build.

woutr_be5y ago

You're not wrong, the entire thing is because of poor management decisions and poor processes. I don't really shit on specific people, more on the entire process.

> This reeks of bad documentation to me

HenryBemis5y ago

Adding to the above 100/% spot on comment; "we" in finance/banking do this because segregation of duties is mandatory, none of that DevOps nonsense ;)

sdevonoes5y ago

0xbadcafebee5y ago

This attitude almost always turns into the following:

On-call: "Hey devs, I'm being woken up at 3AM because your app sucks. Please fix it." Devs: "Sure, no problem."

4 months go by

On-call: "These alerts are still coming in at 3AM. Did you fix the issue?" Dev: "We have a lot of work, we can't dedicate all our time to some minor problems, we have a deadline."

Next week, Devs are put on-call.

The alerts are fixed in two weeks. Site reliability goes up. Apps suddenly become more resilient to failure.

cabraca5y ago

unless i signed a contract that states i will do on-call, i'm not gonna do on-call. I doesn't matter how long the rest of the world works.

1 more reply

sdevonoes5y ago

> This attitude almost always turns into the following: [...]

Well, that's another problem (the dev not being able to solve a bug that reappears at 3AM).

> The alerts are fixed in two weeks. Site reliability goes up. Apps suddenly become more resilient to failure.

> Honestly, the whole attitude of not wanting to work more than 8 hours is privilege.

And it's a privlege I'm thankful for. What's wrong with that?

> As a dev, you get a good salary and a job you don't have to break your body to do.

1 more reply

condercet5y ago

stronglikedan5y ago

> Most of the rest of the world works long hours.

Most of the world works labor jobs, and studies have shown that the body can work longer than the mind without burnout.

robmsmt5y ago

enter devops

ocdtrekkie5y ago

I would argue all developers should be required to do some support work.

ed25519FUUU5y ago

ocdtrekkie5y ago

I'm not even talking 24/7 pager work. But just tackling some support tickets as part of your job so you see where people are having issues with your product.

2 more replies

lostdog5y ago

But the directors and VP's aren't getting paged at 3am, so the incentives of the organization still go to lower quality software.

1 more reply

cgrealy5y ago

Sorry, this is a terrible argument.

No one expects customer support people to write code. Why? Because they don't have the skillset.

Yet people who make this argument seem to think any moron can do support.

The skillset for an engineer is not a superset of a customer support person.

Have your engineers sit in on support, by all means, but actually making them DO support will result in unhappy engineers and sub-par support.

Do not undervalue a good support person. They have a whole suite of skills engineers often don't have.

TuringNYC5y ago

ocdtrekkie5y ago

You aren't looking for the hardest problems, you're looking for the problems your users hit the most that an engineer could reduce in the product.

1 more reply

jjirsa5y ago

Why should a company hire two lines of support to shield an engineer from bugs they introduce?

2 more replies

axaxs5y ago

seanwilson5y ago

Another pro for consultancy work: you can chose contracts that that don't require you to be on call for production problems so this isn't forced upon you.

user59944615y ago

Consultants charge 200% for out of hours work, if not more. I don't think they mind working extra hours.

They're never offered extra work though. Companies are always willing to wait for Monday when they are asked to put money on the table.

efitz5y ago

AkshatM5y ago

It sounds very much like the poster is describing an on-call rotation rather than "production support", which is a very different thing altogether.

Production support is customer support: responding to chat messages or communications from users.

An on-call rotation, on the other hand, involves responding to production incidents and mounting a proper incident response.

The Google SRE workbook has a great chapter on the subject: https://landing.google.com/sre/workbook/chapters/on-call/

johnbellone5y ago

I don’t know why you’re being downvoted; these are two entirely different roles.

brailsafe5y ago

greesil5y ago

goatinaboat5y ago

It is probably a great way to coast, so I hear.

Or to stagnate, depending on how you look at it

hinkley5y ago

Coasting is just stagnation in a different reference frame.

greesil5y ago

One part of your life stagnates while the other parts move forward.

aprinsen5y ago

Teams I've worked on have not had dedicated support staff, but rather engineers rotate 24 hour support duties on a weekly basis.

If everybody on the team feels that way, maybe it can act as a forcing function for product quality. I've seen this work on teams that already cultivate a strong sense of ownership.

On the flip side, it really stresses me out, and I sometimes resent that I'm not getting paid overtime for 24hr on call days. Maybe that's just baked into an engineer's salary these days, though...

MattGaiser5y ago

The engineers at the organization I just departed (at least the ones in the support rotation which did not include me) got paid in both money and extra time off for their time spent on support tasks.

hinkley5y ago

I find myself both applauding that and shrinking away thinking “perverse incentives”.

kevinmchugh5y ago

Yeah, I had a job where only certain teams had on call rotations. Anyone in those rotations got an extra half day off per week on call.

I've also seen folks spotted extra time off for really gnarly oncall shifts. Folks should push to have such accommodations standardized.

1 more reply

jonpurdy5y ago

wisecoder5y ago

99% of the companies don't pay for On call rotation / Production Support. Exploiting H1Bs for On call support is very common practice in IT industry.

jake_morrison5y ago

A lot of the pain around production support is easily solved by having staff in multiple time zones.

A fundamental mindset here is taking responsibility for the user experience, including reliability. If this is not owned by the product development team, then who?

mtberatwork5y ago

Convincing folks at the top of food chain that more staff is needed is one of the most difficult things to do.

g0510515y ago

Our company is trying to move to "devops" and having dev teams on pager duty, doing manual reporting, etc. They seem surprised at the amount of pushback from the devs.

comeonseriously5y ago

g0510515y ago

It's not "beneath" them, but it's an entirely different skill set. If you want me involved, approach me like a partner and we can work together, don't try to suddenly give me a job I can't do.

hermitcrab5y ago

jp0d5y ago

bcbrown5y ago

This seems very telling:

> I no longer work at Gojek

MattGaiser5y ago

Why? Engineers switch jobs all the time.

momokoko5y ago

> Engineers switch jobs all the time.

This also seems very telling

1 more reply

hyko5y ago

...because your company wants you to work two jobs for the price of one?

comeonseriously5y ago

Engineers absolutely should do prod support. But... There should be layers below. It should only come to engineering when nobody else can figure it out.

j / k navigate · click thread line to collapse