Testing on production (opens in new tab)

(marcochiappetta.medium.com)

178 pointschpmrc2y ago95 comments

95 comments

An interesting perspective I once heard from an information security expert is that there's a difference between risks and 'things that can go wrong'. Something is only an actual risk if it hurts the bottom-line. In particular quite a few things that can go wrong don't carry that much risk, and conversely something that is hard but not impossible to go wrong may carry huge amounts of risk.

The trick with this perspective is that after identifying the real risks you can then link the risks and possible mitigations by looking at all 'things' and identifying the ways in which they might fail (and how this may be prevented from happening). This way you can easily identify which mitigations are helping prevent risks and which risks are not sufficiently mitigated. It's a fair bit of work, but it's not complicated and often gives useful insights.

What this article basically does is note that you should first asses what risks a failed deployment has, and correctly states that in quite a few cases this risk is low and therefore the mitigations (of which there can be many) may not be necessary and may in fact be doing harm without actually sufficiently preventing any risk.

chias2y ago

This was an interesting bit of math I did when I joined a startup. It's pretty counter-intuitive to think about how very large numbers can increase the importance of small ones.

Say the company you work for is worth $10,000,000, and that you're hosted on GCP. Now take your best guess: what do you think the likelihood is of e.g. a fire or earthquake or something occurring in all relevant Google infrastructure simultaneously*, basically ushering in the end of all of your infrastructure, data, and backups? Frame that in a number of years. Is this kind of event something that may happen once in a thousand years? Once in ten thousand years? Let's say this is the sort of thing that might happen once in ten thousand years -- that's a long time!

Then the cost of this particular risk to your company is $1000 / year.

This kind of math isn't just a toy. When you have questions like "would maintaining actual physical backups in a safe somewhere outside of GCP be worth it?", you now have a framework to answer them ("if it would cost less than $1000 per year, then yes")

* or substitute in your favorite company-ending event.

contravariant2y ago

This is similar, but one of the benefits of thinking about 'things that musn't happen' and relating them to 'things that can go wrong and how to prevent them' is that it avoids talking about expected damage.

This avoids two nasty problems with trying to express risk as an expected value.

The first is that it is hard to express all kinds of probabilities and damages numerically, not all kinds of damages convert easily to money, and some probabilities are hard to guess (you quickly get uncertain probabilities, but expected values just flatten those into an average again). Even without those issues pinning a number on it can lead to lots of discussion (good if you want discussion, not so good if you want to get shit done).

The second is that you easily fall into the trap of assuming everything has an average, and that the law of large number applies. While physics kind of helps you there by putting hard limits on the maximum amount of damage possible, you may end up in a situation where all nasty stuff is in the long improbable tail. Good example is earthquakes, magnitude increases tenfold for every point in the Richter scale but frequency also only decreases tenfold, what then is the average?

Well and something that's not really a big problem, but worth thinking about, some of these eventualities may very well cause you damage but are beyond your sphere of influence. Sure you should try to avoid going bankrupt if someone knocks over a server rack, but if all google data centres go down over an entire continent you've got bigger fish to fry. So focusing on the things you can do something about is a helpful way to keep focused.

tnel772y ago

While I agree with your math, I am curious how much your physical backups would be worth if something so catastrophic occurred that all of Google/AWS/Azure cloud services were destroyed. Whether that be an act of war, a massive solar flare, etc., I am curious if it would even matter anymore that you had those backups.

big_question2y ago

Similarly:

"We are spending 50$ per month just for one test in our code. We could cut it down to 10$ if we wanted."

"How many hours would it take to reduce the spend? If it's more than a couple of hours for a senior engineer, then it's not worth it."

We kept spending money on this inefficient test and it was the right choice.

Vt71fcAqt72y ago

Reciprocally, when low wages are available in manufacturing, comapnies are less likely to use automated proccesses because labor is so cheap it can cost less to eg. have two buckets/wheelbarrows between two parts of a factory line with one person to swap them rather than use a conveyor belt. Getting 100% automation would allow factories to come back to the US but getting that last 20% at a competitive cost is difficult. See America’s largest tool company couldn’t make a wrench in America (wsj.com).[0]

[0] https://news.ycombinator.com/item?id=36828861

Scubabear682y ago

Very well said.

I had a similar conversation as a new-ish fractional CTO last year. One team was working on a new CRM product that was effectively alpha-level software used only internally. The team had become terrified of shipping and breaking something and was horrifically risk averse. For a new release that the team was going to delay again at the last minute, I got the CEO on the release call and asked him what would happen if the release completely failed and it took us an entire day to get the product working again. He replied “Not a big deal. The users would just write stuff down like they do today and key it on tomorrow. It’s not like this has enough features to be critical or anything”.

The team was completely stunned. It goes without saying we did the release, found a small mistake, fixed it, and life went on.

Teams really do have understand who her users are and criticality of the software.

chpmrcOP2y ago

This story perfectly aligns with the arguments in this article. I'll add it as a note if you don't mind.

Scubabear682y ago

Please do :-)

pravus2y ago

When I was doing disaster recovery we measured three things: 1) the likelihood of a particular scenario, 2) the magnitude of the impact if the scenario unfolded, and 3) the level of effort expected to recover. It was the combination of these three things that prioritized decision making.

We were working towards a business continuity plan which can include incidents like your main office and operations being destroyed and having to quickly relocate all services to 3rd-parties using off-site backups with minimal staff. While that was a worst-case, a primary focus was just getting a notification site up and running in the event of a network outage because that was vastly more frequent and had high visibility.

It was a very interesting project and I learned quite a bit about how to think comprehensively about the solutions we provided.

angarg122y ago

> may in fact be doing harm without actually sufficiently preventing any risk.

The biggest practical impediment to increasing velocity of delivery that I encounter is trying to convey this. People can visualize and estimate the risk and impact of a deployment gone wrong, but have a hard time estimating the impact of processes that slow down delivery. Therefore they overindex in heavy and "safe" processes (which often don't increase safety) at the cost of speed of iteration.

I'm not sure how to define this asymmetry, maybe some variation of loss aversion.

sandworm1012y ago

>> Something is only an actual risk if it hurts the bottom-line.

Sometimes not even that. We have see many huge breakdowns in recent years that did hit bottom lines, but didn't impact stock prices. Perhaps, at least for a publicly-traded company, the only real risks are those that might impact stock prices. That might include things that hurt other companies if doing so might result in money leaving the entire sector.

kawemi2y ago

It's a really useful perspective in real-life scenarios when you're not developping critical software. Of course a baseline of risk-avoidance is always important, but businesses/custommers/users most of the time are ready to handle some risks, like downtime, bugs, delays, etc. SWE and developpers are the more risk-averse of the two parties, which leads to us over-valuing the importance of robustness and stability.

For example, it's way easier/faster to implement observability and some sort of rollback of bad versions than to try and prevent every possible way an app could crash and trigger a bunch of problems. What's going to happen if the app crash is pretty simple : customers will be mad (CS/Marketing/PR can handle them), you'll notice the downtime quickly and rollback (or maybe even rollback automatically!). Then you'll be in a perfect position to handle what went wrong : systems will be back on a known stable position and all the stress of trying to fix something in a live production system will be gone.

johnmaguire2y ago

Of course, there is no magic bullet. Some problems aren't solved by rolling back services. (e.g. A thundering herd of clients caused by re-deploying an old build overloading your database.)

kawemi2y ago

Yes of course, my fake situation was assuming a pretty boring case of failure with an easy out (rollback). The underlying principle is that most of the time trying to preempt every situation is way more work than being conscious of them and giving yourself and your team(s) reasonable tools to mitigate them :)

HL33tibCe72y ago

Risk = likelihood * severity

contravariant2y ago

That's also an approach, but it may lead to endless discussions about how likely something is. It's easier to tell what the worst possible consequence is (what this article calls criticality). After that it's fairly straightforward to figure out if you're doing enough to prevent this scenario from being realized (which is more like coverage = risk * mitigations).

hef198982y ago

Sometimes one has to include detectability as well.

datadrivenangel2y ago

Severity should include detectability. If you never detect an issue, it's not an issue because nobody sees it.

1 more reply

HL33tibCe72y ago

To me, that’s a subcomponent of severity

Narann2y ago

> something that is hard but not impossible to go wrong may carry huge amounts of risk.

I think it's the definition of the black swan theory[1].

[1]: https://en.wikipedia.org/wiki/Black_swan_theory

chpmrcOP2y ago

Fantastic take. Thank you.

hef198982y ago

Or, one cpuld alao just follow some standard processes all the time, instead of developing an individual approach every single time. Those standards should contain mitigation for most of the common risks, and rules to apply for the rest.

And one of those standards, an no I don't give a shit about developer experience, software or otherwise, should be you never ever test on production. As soon as you work on real products for real customers you better start behaving like a professional. Childs play is over as soon as some one is paying you to do stuff.

kasey_junk2y ago

Every enterprise I’ve ever worked for, including corporate giants running the backbones of modern finance ran tests on production. Post deploy smoke tests, fail over tests, small group alphas, test migration/roll backs etc.

Not being confident in your test plan is a sign of immaturity not maturity because at some point you are going to need to validate how something behaves in production.

There are a wide range of processes, procedures and software architectures to get you to being confident that your production testing is doing more good than harm for your customers but in an environment where you can deploy new software you are going to do some testing in production.

Eddygandr2y ago

There’s many reasons you test in production! The most recent TiP project I worked on was running a regression suite against production to notify outages before before users did (in an irregularly used but critical system, such that you can’t just go by user logs alone).

civilized2y ago

I have a dumb question as a non-SWE who is curious about software engineering.

I've heard "feature flags" are popular these days, and I understand that that's where you commit code for a new way of doing things but hide it behind a flag so you don't have to turn it on right away.

Now, if I want to test in prod, couldn't I just make the flag for my new feature turn on if I log in on a special developer test account? And if everything goes well, I change the condition to apply to everyone?

rahoulb2y ago

Yes.

As long as your code makes sure it takes account of that flag everywhere that it is used. Otherwise your new feature could "leak" into the system for everyone else.

Plus, as systems grow in complexity, there's always a danger that features step on each other. We'd like to think that everything we write is nicely isolated and separated from the rest of the system, but it never works that way - plus we're just a group of squishy humans who make mistakes. There will be times when having Features A and C switched on, with B switched off, produces some weird interactions that don't happen if A, B and C are switched on together.

mwint2y ago

Feature flags sound great, but a company I’ve been consulting for has been using them to their own detriment. Seems like many bugs are due to a (production!) user not having the right combinations of flags enabled.

There ends up being code to deal with what happens when various combinations of flags are on/off, and that code doesn’t get tested much.

And teams spend a lot of time just removing flags.

This isn’t a safety-critical app - I really think they’d do better dropping the flags, and just deploying what they want when it’s ready.

xnorswap2y ago

I'm going to go further and say that Feature Flags are a nightmare and should be avoided. Because instead of just being used to stage roll-out, they get used to configure different environments for different customers.

You not only waste time with "Remove feature flag X" stories if all customers end up with the feature, you also slow down the response time of some categories of bugs, because you end up having to stop and check the combination of feature flags to reproduce a bug.

And if you end up with a feature that isn't popular except by one customer, not only are you now stuck supporting "Legacy feature Y", you're actually stuck supporting, "Optional legacy feature Y" which is worse.

Maybe I'm ranting about "misuse of feature flags", but I don't like to pontificate about how things ought to be, but how in my experience they actually are.

steveBK1232y ago

Yes, it really depends on the type of environment / app you are running too. If your app is stateful, uses lots of data, etc.. then feature flags can cause a lot of issues with inadvertent upgrades that have to be rolled back manually in things like user data.

Or you can have infinite permutations of feature flags if you don't flip them to be on for everyone quickly enough, and it becomes hard to test if[a&(not b)&c&(not d)] behavior vs if[a&b&(not c)&d] and.... you end up with too many to cover with testing well.

JohnBooty2y ago

They can be very very very nice if you have a lengthy (or perhaps just unpredictable) build/deploy process. And/or if you have lots of teams working independently on the same monolith.

Suppose you have daily production builds. You are rolling out Feature XYZ. You would like to enable it in prod, but you would like to monitor it closely and may need to turn it off again. Feature flags allow that.

Ultimately what's being achieved is a decoupling of configuration and deployment.

    Maybe I'm ranting about "misuse of feature flags", but 
    I don't like to pontificate about how things ought to be, 
    but how in my experience they actually are.

Similarly, I might just be making excuses for bad build/deploy processes. =)

At my last job we relied heavily on feature flags via Launch Darkly. I will admit: it was somewhat of a band-aid for the fact that our build process was way too slow and flaky, and that we had too many teams working on an overstuffed monolith.

1 more reply

gwright2y ago

> they get used to configure different environments for different customers

That is not a feature flag, that is a customer configuration option. They are different things and should not be treated in the same way.

Sure, it is possible for a feature flag to behave like a configuration option but they have different lifecycles and different audiences and so should not be confused. Of course it is easy to say that but harder in practice to maintain those differences.

smallerfish2y ago

> Seems like many bugs are due to a (production!) user not having the right combinations of flags enabled.

In my experience, feature flags work best if you aim to remove them as quickly as possible. They can be useful to allow continual deployment, and even for limited beta programs, but if you're using them to enable mature features for the whole customer base, they're no longer feature flags.

mastersummoner2y ago

We've been using feature flags extensively lately. A step that helps for this issue is having all merged code deploy automatically to our QA environment first. We have automated tests which run there regularly, as well as it being the environment most people use for testing, which increases the likelihood that issues like this will become evident quickly.

Definitely doesn't do anything like completely obviate the issue though.

perrygeo2y ago

You've described the ideal use case - a single feature flag, short lived, to let select users test one isolated piece of functionality until it's made generally available. Feature flags used in this way are wonderful.

But there are numerous ways to use feature flags incorrectly - typically once you have multiple long-lived flags that interact with each other, you've lost the thread. You no longer have one single application, you have n_flags ^ 2 applications that all behave in subtlety different ways depending on the interaction of the flags.

There's no way around it - you have to test all branches of your code somehow. "Just let the users find the bugs" doesn't work in this case since each user can only test their unique combination of flags. I've regularly seen default and QA tester flag configurations work great, only to have a particular combination fail for customers.

The only solution is setting up a full integration test for every combination of flags. If that sounds tedious (and it is), the solution is to avoid feature flags, not to avoid testing them!

codethief2y ago

> The only solution is setting up a full integration test for every combination of flags.

I've long been wondering whether there are tools that help with that. Like they measuring a test suite's code coverage but for feature toggle permutations. Either you test those permutations explicitly or you rule them out explicitly.

recroad2y ago

Long lived feature flags are totally fine, they're more like operational flags than anything. The Fowler article is pretty good at classifying them. Depending on the type of flag (longevity/dynamism) the design will vary. https://martinfowler.com/articles/feature-toggles.html

kiitos2y ago

An essential property of a feature flag is that it is short-lived, existing only for the duration of the roll-out of the feature. In the language of your linked article, feature flags are 1-to-1 with "release toggles" and not really any other kind of toggle.

andreareina2y ago

2^nflags actually. Which is a much bigger number.

perrygeo2y ago

Yes, thank you for the correction! Though the point still stands - keep your nflags <= 2 and you can reasonably test it.

marcosdumay2y ago

The solution is to remove your feature flags after you are done with them.

Eddygandr2y ago

The problem is when you use feature flags for customer-bespoke reasons or to enable paid features. Then they’re always there and have to be tested in combinations which sucks.

3 more replies

rco87862y ago

Yes, that's the general idea - and it works pretty well.

It can also be a huge PITA. The fallacy is that a "feature" is an isolated chunk of code. You just wrap that in a thing that says "if feature is on, do the code!". But in reality, a single feature often touches numerous different code points, potentially across multiple codebases and services/APIs. So you have to intertwine that feature flag all over the place. Then write tests that test for each scenario (do the right thing when the feature is off, do the right thing then the feature is on). Then you have to remember to go back and clean up all that code when the feature is on for everyone and stabilized.

It's a good tool, but it's not an easy tool like a lot of folks think it is.

zimzam2y ago

In web development there is often a single place you can put a feature flag though.

For example maybe the feature flag just shows/hides a new button on the UI. The rest of the code like the new backend endpoint and the new database column are "live" (not behind any flags) and just invisible to a regular user since they will never hit that code without the button.

As far as "remembering" to clean up the feature flag, teams I've been on have added a ticket for cleaning up the feature flag(s) as part of the project, so this work doesn't get lost in the shuffle. (And also to make visible to Product and other teams that there is some work there to clean up)

jiggawatts2y ago

This is pretty common at larger scales, and is also often done on a per-tenant or per-account basis.

For example, the Microsoft Azure public cloud has a hierarchy of tenant -> subscription -> resource group -> resource.

It's possible to have feature flags at all four levels, but the most common one I see is rolling deployments where they pick customer subscriptions at random, and deploy to those in batches.

This means you can have a scenario where your tenant (company) is only partially enabled for a feature, with some departments having subscriptions with the feature on, but others don't have it yet.

This can be both good and bad. The blast radius of a bad update is minimised, but the users affected don't care how many other users aren't affected! Similarly, inconsistencies like the one above are frustrating. Even simple things like demonstrating a feature for someone else can result in accidental gaslighting where you swear up and down that they just need to "click here" and they can't find the button...

datadrivenangel2y ago

The training aspect of feature flags is a huge pain point.

Not to mention it looks really awkward when an account manager has forgotten to enable some great new feature for you.

devjab2y ago

Yes. That’s how we typically do it in our shop. Though we do test it during development. Then when we think it’s ready, we have the product owner (or whoever ordered it) “play around with it” on a test setup. Before we let select users “test it in production”.

I’m not a fan of this article in general, however, a lot of what it talks about is anti-pattern in my book. Take the bit about Micro-services as an example. They are excellent in small teams, even when you only have 2-5 developers. The author isn’t wrong as such, it’s just that the author seems to misunderstand why Conway’s law points toward service architectures. Because even when you have 2-5 developers, the teams that actually ”own” the various things you build in your organisation might make up hundreds of people. In which case you’re still going to avoid a lot of complexity by using service architecture even if your developers sort of work on the same things.

eloisius2y ago

You’re describing QC. The reason that’s not sufficient is because your test user might not meet the conditions that trigger a bug. Trivial example: a bug that only shows up for users using RTL languages. A test suite allows you to test edge cases like that. Another shortfall of QC is that it doesn’t provide future assurance. A test suite makes sure the feature keeps working in the future when changes that interact with it are introduced.

speed_spread2y ago

Yes. Also, feature flags don't have to be on/off, they can be set to a % of requests or users, enabling a progressive rollout period.

MH152y ago

Yes, this is a relatively common practice. There’s of course still the chance you make a mistake setting up the feature flag and bring down production/expose the feature to users who shouldn’t have access.

roenxi2y ago

The risk is context dependent. It could be a great idea or it could be the end of the company.

Classic story: https://dougseven.com/2014/04/17/knightmare-a-devops-caution...

sidlls2y ago

Feature flags are just code, like the rest of the software. You can program any feature with it, including auto-enabling it given appropriate circumstances (e.g., the user is logged in to a developer account). Of course, that doesn't work for features available without requiring an account.

literallyroy2y ago

Yes, feature flags are often able to be applied globally, or per customer. However, feature flags add complexity (littering your business logic with feature flag checks), so many small non-feature changes wouldn’t use them.

miguelxt2y ago

I guess you can implement them however works for you and your team. I have personally implemented them in various way: depending on the date, on the client IP address, on an env var, on a logged user id, etc etc.

bradleyjg2y ago

A note of caution re: flags from an oracle dev: https://news.ycombinator.com/item?id=18442941

Shrezzing2y ago

I enjoyed the entire article except this part:

> Unfortunately there is no easy way to distinguish between people who are good and need a paycheck from people who just need a paycheck. But you sure as hell don’t want the latter in your team.

If you can't tell them apart, then the distinction is unimportant. So if among the group of people who need paychecks, good is indistinguishable from non-good, the comment serves no purpose other than needless elitism.

nonameiguess2y ago

There is something I've started to notice as I've been working as a platform-layer consultant for the past few years. Many of the companies I've worked with don't have anyone in their company with any meaningful level of experience or expertise in environment administration, security, really anything ops-related. I see this especially when I start trying to hand off work I've done into maintenance phase operations and they don't have any kind of operations team to take over, but my contract sure as shit doesn't say I'm going to come in at 3 AM on a Sunday morning and I never will. So they may try to identify someone in the company to train up or they may try to hire, but the core problem they face isn't that it's impossible in principle to tell good apart from bad. The problem is it's impossible for them to tell the difference because they have no one in their company even qualified to conduct such an interview.

solatic2y ago

It's implied that it isn't easy to distinguish them during interviews. After they join your team, it's very easy to distinguish them.

Shrezzing2y ago

I've re-read the developer experience section, and I can't see where that implication is established. In that context, the paragraph stands out as an abrupt diversion from the main theme of the section, and undermines the argument of the entire piece. The section defines developer dissonance, and asserts that it's possible to overcome it with reasoned and sensible questioning. If it's possible to overcome dissonance with reasoned questioning, a hiring interview should a prime opportunity to roll out some reasoned questions and head-off dissonance before it enters the organisation in the first place.

chpmrcOP2y ago

I can't see how making that statement undermines the rest of the argument. It would help if you could clarify that relationship.

And I'm not sure I understand why, if you can't distinguish them, the distinction is unimportant. It's hard to distinguish an edible mushroom from a poisonous one and yet making that distinction makes a huge difference.

Interviews are definitely a limited tool to do so btw, this is only something that you realize over time. It's also very easy to play an interviewer if the interviewee's soft skills are better than the interviewer's (which happens often in this industry).

dijksterhuis2y ago

this made me chuckle

> If GitHub makes a mistake it can affect thousands of businesses but they’ll likely shrug and their DevOps team will just post “GitHub is down, nothing we can do” on some Slack channel.

Gonna try and read the rest of this on the lunch break as was surprisingly meaty for a clickbait title ;)

kubanczyk2y ago

I love the style:

> That’s a terrible mistake and in the long run will be the cause of cost overruns, unmet deadlines, increased churn and overall bad vibes. And nobody wants bad vibes.

NewEntryHN2y ago

Good article, but it's a bit binary on the notion of incident. For the same company, it can be very serious to have a global 1h outage, but not so serious to have the internal admin interface down for 1h. This allows for more fine-grained assessment of the validation required to push to prod: the "checks" only have to test the critical part of the application. Dev exp start deteriorating when the non-critical parts are over-tested.

chpmrcOP2y ago

Yes criticality is multidimensional, this was a simplification for the sake of brevity. Will add a note. Thank you!

bhaney2y ago

Love this article. So many great points that I deeply agree with but have never really put into words, and all written in such an engaging style.

chpmrcOP2y ago

Thank you so much!

wheelerof4te2y ago

Just keep the enironments separate, but similar. What works in the test environment, should work in production.

Of course, there are always exceptions to this rule. Adapt and modify the code as needed.

We keep three environments at work: Dev, Test and Prod. However, dev environments are sometimes neglected and some features land in Test only.

So, use Dev as a development playground. Use Test to test the changes made in Dev. If the change is approved in Test, it will go in Prod environment.

tempodox2y ago

Everybody has got a test environment. Some also have a production environment.

al_be_back2y ago

>> If Tesla makes a mistake in their autopilot software, people might die.

In this case, a good "Testing on Production" rule would be to not let customers test your software, period.

There's plenty of land and resources to construct towns and cities that simulate real-life commute very accurately.

In the case of self-driving (or even autopilot), you're not really testing a feature, you're researching a new product, they difference is vast.

hulitu2y ago

> Shipping confidence We can define “shipping confidence” as the feeling a mentally sane developer has when they know their code is about to be deployed to production (whether it can be updated over the air or not).

A bug which must be fixed in production is much more expensive than a bug fixed during development.

People here complain when you bash Microsoft, but their phylosophy was (and still is) let the users test the product.

bornfreddy2y ago

Double negation is hard... :) (yes and no should be switched)

> Ask yourself a question: do you have any reason to think that your engineers will not do a good job? If the answer is no: why are they still there? If the answer is yes: let them do their damn job.

postalrat2y ago

We all test in production but some people are in denial and refuse to accept it.

1 more reply

therealchiko2y ago

> The TL;DR is that some (“best”) practices are contextual and understanding when to use them is ultimately what gives us the title of “engineers”.

So well put, just today I implemented a feature and kept asking myself if i should be extending the component (leaning more towards OOP) or just add an additional argument to said component. The latter would have stuck more with the current style but I also realized there's no obvious better way, extending made sense and I realized the importance of understanding the nuance and standing up for those design decisions is what I am here to do :)

thank for putting that in less words

trollied2y ago

"Everybody has a testing environment. Some people are lucky enough enough to have a totally separate environment to run production in"

ethbr12y ago

I code at the interface between ops teams (on the business side) of companies and dev teams (on the IT side).

One of the things I've realized is that in most unregulated companies (read: non-healthcare/financial) the business side of the house is used to having little or no lower lifecycle.

If they want to make a process change, they make it on production work.

Granted, they have change control approvals, etc. etc., but the whole dev-test-prod cycle looks extremely different for them, because you can't do certain things without lower environments.

JohnBooty2y ago

This hasn't been my experience. I think it depends on how business-critical the application is.

I worked at a home remodeling company. Revenue was several million dollars a day. App handled sales, scheduling, logistics, everything. Breaking production was a big deal, it cost us millions per day and created logjams.

I would think that most online applications are the same. Even if a simple online web shop goes down you are costing money.

What kinds of experiences have you had where testing in production was the norm?

    because you can't do certain things without lower environments.

I agree that this is something many shops REALLY struggle with.

One of the most challenging things is exporting or creating some kind of realistic data set for local development use. I think 99% of companies struggle with this.

dfox2y ago

Well, the lucky part is the important one. What it boils down is that such system is either self-contained or has rigidly defined outside interfaces. For anything that deals with a physical reality outside of the pure computational realm this tends to be impossible. You are not going to build an entire warehouse to serve as the physical part of testing environment and even if you did so it will not be really useful, because the thing will be different than the production one due to who knows what tolerances involved in building a physical things. In same vein if you interact with external services you can either mock them or use whatever testing environment the communication partner provides, in both cases it is bound to not behave the same way as the actual production environment.

chpmrcOP2y ago

Ha! Love it.

wheelerof4te2y ago

That should honestly be the norm at any larger company.

Even at startups, the added initial costs yield more long term benefits with higher-quality products.

anoy88882y ago

I don’t understand why people redefine words just make their point. It can be confusing at best and at worst change the me meaning of words when it becomes viral. “Smart” people means smart people. It shouldn’t be used to mean junior dev who are trying to hard to prove themselves and over engineer or choose the wrong approach. So many words have changed their original meaning because someone decides to write a viral post and redefine words to make a point

chpmrcOP2y ago

Oh what an accomplishment it would be, to be able to change the meaning of the word "smart" with a single article!

(Don't take it too seriously, like I said this is mostly a brain dump, I'm sure there's a lot of stuff that can be improved)

JohnBooty2y ago

I like your usage of "smart" in the article.

I see this challenge a lot in the industry. The young engineers truly are smart, even brilliant, but lack wisdom and experience.

chpmrcOP2y ago

I completely agree. "Smart" isn't used sarcastically here. It's an adjective that most young devs would (rightfully) like to be referred to as. But I see experienced devs as less interested in looking/being "smart" (or clever or whatever word you want to use) and just getting things done in a way that allows the org to make money and get rid of BS (unrelated to the above) as much as possible.

Maybe there's a better way to outline this difference.

routerl2y ago

Oh right, the "original meaning" of "smart"... so you must mean "pain or ache"? I really don't see how that's relevant to the article.

Words change, they always have, they always will. Get over it.

And anyway, the article's usage is consistent with the well-established phrase "smart guy", within which the word "smart" carries a sarcastic and derisive tone.

gwright2y ago

> Words change, they always have, they always will. Get over it.

While this is true, I think it is helpful to communication to resist changes to language. This isn't the same thing as opposing change entirely, but language needs to have a certain stability and common understanding to maximize its usefulness.

routerl2y ago

> I think it is helpful to communication to resist changes to language

Your opinion is wrong. The most widely spoken languages, in every historical period, are the most adaptable. Adaptability is the single most important factor in a language's ability to survive, in a useful/usable/used state, and always has been.

chrisfowles2y ago

One of the reasons that English has been so successful as a global business language is its ability to be flexible and splodgable and still make sense.

chpmrcOP2y ago

Welp! For some reason someone at HN decided to change the title and bump this down to the 11th position (atm). Not sure what I did wrong here but it feels pretty crappy...

@dang any chance you could help here? :(

chpmrcOP2y ago

For the sake of transparency it was explained to me that the title was too "link baity" and that the comment section was a bit too heated. I appreciate the explanation and I agree this kind of moderation is, unfortunately, required to keep things civil and constructive.

j / k navigate · click thread line to collapse

95 comments

contravariant2y ago

chias2y ago

This was an interesting bit of math I did when I joined a startup. It's pretty counter-intuitive to think about how very large numbers can increase the importance of small ones.

Then the cost of this particular risk to your company is $1000 / year.

* or substitute in your favorite company-ending event.

contravariant2y ago

This avoids two nasty problems with trying to express risk as an expected value.

tnel772y ago

big_question2y ago

Similarly:

"We are spending 50$ per month just for one test in our code. We could cut it down to 10$ if we wanted."

"How many hours would it take to reduce the spend? If it's more than a couple of hours for a senior engineer, then it's not worth it."

We kept spending money on this inefficient test and it was the right choice.

Vt71fcAqt72y ago

[0] https://news.ycombinator.com/item?id=36828861

Scubabear682y ago

Very well said.

The team was completely stunned. It goes without saying we did the release, found a small mistake, fixed it, and life went on.

Teams really do have understand who her users are and criticality of the software.

chpmrcOP2y ago

This story perfectly aligns with the arguments in this article. I'll add it as a note if you don't mind.

Scubabear682y ago

Please do :-)

pravus2y ago

It was a very interesting project and I learned quite a bit about how to think comprehensively about the solutions we provided.

angarg122y ago

> may in fact be doing harm without actually sufficiently preventing any risk.

I'm not sure how to define this asymmetry, maybe some variation of loss aversion.

sandworm1012y ago

>> Something is only an actual risk if it hurts the bottom-line.

kawemi2y ago

johnmaguire2y ago

Of course, there is no magic bullet. Some problems aren't solved by rolling back services. (e.g. A thundering herd of clients caused by re-deploying an old build overloading your database.)

kawemi2y ago

HL33tibCe72y ago

Risk = likelihood * severity

contravariant2y ago

hef198982y ago

Sometimes one has to include detectability as well.

datadrivenangel2y ago

Severity should include detectability. If you never detect an issue, it's not an issue because nobody sees it.

1 more reply

HL33tibCe72y ago

To me, that’s a subcomponent of severity

Narann2y ago

> something that is hard but not impossible to go wrong may carry huge amounts of risk.

I think it's the definition of the black swan theory[1].

[1]: https://en.wikipedia.org/wiki/Black_swan_theory

chpmrcOP2y ago

Fantastic take. Thank you.

hef198982y ago

kasey_junk2y ago

Not being confident in your test plan is a sign of immaturity not maturity because at some point you are going to need to validate how something behaves in production.

Eddygandr2y ago

civilized2y ago

I have a dumb question as a non-SWE who is curious about software engineering.

rahoulb2y ago

Yes.

As long as your code makes sure it takes account of that flag everywhere that it is used. Otherwise your new feature could "leak" into the system for everyone else.

mwint2y ago

There ends up being code to deal with what happens when various combinations of flags are on/off, and that code doesn’t get tested much.

And teams spend a lot of time just removing flags.

This isn’t a safety-critical app - I really think they’d do better dropping the flags, and just deploying what they want when it’s ready.

xnorswap2y ago

Maybe I'm ranting about "misuse of feature flags", but I don't like to pontificate about how things ought to be, but how in my experience they actually are.

steveBK1232y ago

JohnBooty2y ago

They can be very very very nice if you have a lengthy (or perhaps just unpredictable) build/deploy process. And/or if you have lots of teams working independently on the same monolith.

Ultimately what's being achieved is a decoupling of configuration and deployment.

    Maybe I'm ranting about "misuse of feature flags", but 
    I don't like to pontificate about how things ought to be, 
    but how in my experience they actually are.

Similarly, I might just be making excuses for bad build/deploy processes. =)

1 more reply

gwright2y ago

> they get used to configure different environments for different customers

That is not a feature flag, that is a customer configuration option. They are different things and should not be treated in the same way.

smallerfish2y ago

> Seems like many bugs are due to a (production!) user not having the right combinations of flags enabled.

mastersummoner2y ago

Definitely doesn't do anything like completely obviate the issue though.

perrygeo2y ago

The only solution is setting up a full integration test for every combination of flags. If that sounds tedious (and it is), the solution is to avoid feature flags, not to avoid testing them!

codethief2y ago

> The only solution is setting up a full integration test for every combination of flags.

recroad2y ago

kiitos2y ago

andreareina2y ago

2^nflags actually. Which is a much bigger number.

perrygeo2y ago

Yes, thank you for the correction! Though the point still stands - keep your nflags <= 2 and you can reasonably test it.

marcosdumay2y ago

The solution is to remove your feature flags after you are done with them.

Eddygandr2y ago

The problem is when you use feature flags for customer-bespoke reasons or to enable paid features. Then they’re always there and have to be tested in combinations which sucks.

3 more replies

rco87862y ago

Yes, that's the general idea - and it works pretty well.

It's a good tool, but it's not an easy tool like a lot of folks think it is.

zimzam2y ago

In web development there is often a single place you can put a feature flag though.

jiggawatts2y ago

This is pretty common at larger scales, and is also often done on a per-tenant or per-account basis.

For example, the Microsoft Azure public cloud has a hierarchy of tenant -> subscription -> resource group -> resource.

It's possible to have feature flags at all four levels, but the most common one I see is rolling deployments where they pick customer subscriptions at random, and deploy to those in batches.

This means you can have a scenario where your tenant (company) is only partially enabled for a feature, with some departments having subscriptions with the feature on, but others don't have it yet.

datadrivenangel2y ago

The training aspect of feature flags is a huge pain point.

Not to mention it looks really awkward when an account manager has forgotten to enable some great new feature for you.

devjab2y ago

eloisius2y ago

speed_spread2y ago

Yes. Also, feature flags don't have to be on/off, they can be set to a % of requests or users, enabling a progressive rollout period.

MH152y ago

roenxi2y ago

The risk is context dependent. It could be a great idea or it could be the end of the company.

Classic story: https://dougseven.com/2014/04/17/knightmare-a-devops-caution...

sidlls2y ago

literallyroy2y ago

miguelxt2y ago

bradleyjg2y ago

A note of caution re: flags from an oracle dev: https://news.ycombinator.com/item?id=18442941

Shrezzing2y ago

I enjoyed the entire article except this part:

> Unfortunately there is no easy way to distinguish between people who are good and need a paycheck from people who just need a paycheck. But you sure as hell don’t want the latter in your team.

nonameiguess2y ago

solatic2y ago

It's implied that it isn't easy to distinguish them during interviews. After they join your team, it's very easy to distinguish them.

Shrezzing2y ago

chpmrcOP2y ago

I can't see how making that statement undermines the rest of the argument. It would help if you could clarify that relationship.

dijksterhuis2y ago

this made me chuckle

> If GitHub makes a mistake it can affect thousands of businesses but they’ll likely shrug and their DevOps team will just post “GitHub is down, nothing we can do” on some Slack channel.

Gonna try and read the rest of this on the lunch break as was surprisingly meaty for a clickbait title ;)

kubanczyk2y ago

I love the style:

> That’s a terrible mistake and in the long run will be the cause of cost overruns, unmet deadlines, increased churn and overall bad vibes. And nobody wants bad vibes.

NewEntryHN2y ago

chpmrcOP2y ago

Yes criticality is multidimensional, this was a simplification for the sake of brevity. Will add a note. Thank you!

bhaney2y ago

Love this article. So many great points that I deeply agree with but have never really put into words, and all written in such an engaging style.

chpmrcOP2y ago

Thank you so much!

wheelerof4te2y ago

Just keep the enironments separate, but similar. What works in the test environment, should work in production.

Of course, there are always exceptions to this rule. Adapt and modify the code as needed.

We keep three environments at work: Dev, Test and Prod. However, dev environments are sometimes neglected and some features land in Test only.

So, use Dev as a development playground. Use Test to test the changes made in Dev. If the change is approved in Test, it will go in Prod environment.

tempodox2y ago

Everybody has got a test environment. Some also have a production environment.

al_be_back2y ago

>> If Tesla makes a mistake in their autopilot software, people might die.

In this case, a good "Testing on Production" rule would be to not let customers test your software, period.

There's plenty of land and resources to construct towns and cities that simulate real-life commute very accurately.

In the case of self-driving (or even autopilot), you're not really testing a feature, you're researching a new product, they difference is vast.

hulitu2y ago

A bug which must be fixed in production is much more expensive than a bug fixed during development.

People here complain when you bash Microsoft, but their phylosophy was (and still is) let the users test the product.

bornfreddy2y ago

Double negation is hard... :) (yes and no should be switched)

> Ask yourself a question: do you have any reason to think that your engineers will not do a good job? If the answer is no: why are they still there? If the answer is yes: let them do their damn job.

postalrat2y ago

We all test in production but some people are in denial and refuse to accept it.

1 more reply

therealchiko2y ago

> The TL;DR is that some (“best”) practices are contextual and understanding when to use them is ultimately what gives us the title of “engineers”.

thank for putting that in less words

trollied2y ago

"Everybody has a testing environment. Some people are lucky enough enough to have a totally separate environment to run production in"

ethbr12y ago

I code at the interface between ops teams (on the business side) of companies and dev teams (on the IT side).

One of the things I've realized is that in most unregulated companies (read: non-healthcare/financial) the business side of the house is used to having little or no lower lifecycle.

If they want to make a process change, they make it on production work.

Granted, they have change control approvals, etc. etc., but the whole dev-test-prod cycle looks extremely different for them, because you can't do certain things without lower environments.

JohnBooty2y ago

This hasn't been my experience. I think it depends on how business-critical the application is.

I would think that most online applications are the same. Even if a simple online web shop goes down you are costing money.

What kinds of experiences have you had where testing in production was the norm?

    because you can't do certain things without lower environments.

I agree that this is something many shops REALLY struggle with.

One of the most challenging things is exporting or creating some kind of realistic data set for local development use. I think 99% of companies struggle with this.

dfox2y ago

chpmrcOP2y ago

Ha! Love it.

wheelerof4te2y ago

That should honestly be the norm at any larger company.

Even at startups, the added initial costs yield more long term benefits with higher-quality products.

anoy88882y ago

chpmrcOP2y ago

Oh what an accomplishment it would be, to be able to change the meaning of the word "smart" with a single article!

(Don't take it too seriously, like I said this is mostly a brain dump, I'm sure there's a lot of stuff that can be improved)

JohnBooty2y ago

I like your usage of "smart" in the article.

I see this challenge a lot in the industry. The young engineers truly are smart, even brilliant, but lack wisdom and experience.

chpmrcOP2y ago

Maybe there's a better way to outline this difference.

routerl2y ago

Oh right, the "original meaning" of "smart"... so you must mean "pain or ache"? I really don't see how that's relevant to the article.

Words change, they always have, they always will. Get over it.

And anyway, the article's usage is consistent with the well-established phrase "smart guy", within which the word "smart" carries a sarcastic and derisive tone.

gwright2y ago

> Words change, they always have, they always will. Get over it.

routerl2y ago

> I think it is helpful to communication to resist changes to language

chrisfowles2y ago

One of the reasons that English has been so successful as a global business language is its ability to be flexible and splodgable and still make sense.

chpmrcOP2y ago

Welp! For some reason someone at HN decided to change the title and bump this down to the 11th position (atm). Not sure what I did wrong here but it feels pretty crappy...

@dang any chance you could help here? :(

chpmrcOP2y ago

j / k navigate · click thread line to collapse