Principles for building and scaling feature flag systems (opens in new tab)

(docs.getunleash.io)

212 pointsferrantim2y ago112 comments

112 comments

> Make feature flags short-lived. Do not confuse flags with application configuration.

This is my current battle.

I introduced feature flags to the team as a means to separate deployment from launch of new features. For the sake of getting it working and used, I made the mis-step of backing the flags with config files with the intent to get Launch Darkly or Unleash working ASAP instead to replace them.

Then another dev decided that these Feature Flags look like a great way to implement permanent application configs for different subsets of entities in our system. In fact, he evangelized it in his design for a major new project (I was not invited to the review).

Now I have to stand back and watch as the feature flags are being used for long-term configurations. I objected when I saw the misuse- in a code review I said "hey that's not what these are for"- and was overruled by management. This is the design, there's no time to update it, I'm sure we can fix it later, someday.

Lesson learned: make it very hard to misuse meta-features like feature flags, or someone will use them to get their stuff done faster.

zellyn2y ago

Sadly, this is a battle you are destined to lose. I have almost completely given up. The best you can aim for is to use feature flags better rather than worse.

    - Some flags are going to stay forever: kill switches, load shedding, etc. (vendors are starting to incorporate this in the UI)
    - Unless you have a very-easy-to-use way to add arbitrary boolean feature toggles to individual user accounts (which can become its own mess), people are going to find it vastly easier to create feature flags with per-use override lists (almost all of them let you override on primary token). They will use your feature flags for:
      - Preview features: "is this user in the preview group?"
      - rollouts that might not ever go 100%: "should this organization use the old login flow?"
      - business-critical attributes that it would be a major incident to revert to defaults: "does this user operate under the alternate tax regime?"

You can try to fight this (indeed, especially for that last one, you most definitely should!), but you will not ever completely win the feature flag ideological purity war!

hamandcheese2y ago

Thank you for this great list of the immense business value derived from "misusing" feature flags!

1 more reply

sporkland2y ago

We have an interesting hybrid between the two that I'd like your take on. When we release new versions of our web client static assets we have a version number that we bump that moves folks over to the new version.

1. We could stick it in a standard conf system and serve it up randomly based on what host a client hits. (Or come up with more sophisticated rollouts)

2. Or we can put it as "perm" conf in the feature flag system and roll it out based on different cohorts/segments.

I'm leaning towards #2 but I'd love to understand why you want to prohibit long lived keys so I can make a more informed choice. The original blog posts main reasons were that FF systems favor availability over consistency so make a pour tool if you need fast converging global config, which somewhat becomes challenging here during rollbacks but is likely not the end of the world.

1 more reply

baq2y ago

Or... see them for what they are: runtime configuration. The name implies a use case scenario, but in reality it's just a configuration knob. With a good UI, it's a pretty damn convenient way to do runtime configuration.

So of course they'll be used for long-term configuration purposes, especially under pressure and for gradual rollouts of whole systems, not just A/B testing features.

rubicon332y ago

This hits the nail on the head.

The term "feature flag" has come to inherently have a time component because features are supposed to eventually be fulled GA'd.

What I've seen in practice is feature flags are never removed so a better way to think about them is as a runtime configuration.

2 more replies

mabbo2y ago

There is a need for runtime configurations, yes, but it's important to put them behind an interface intended for that, and not one intended for something else.

2 more replies

hinkley2y ago

Our FF system uses our config system as its system of record. There's some potential for misuse, and it's difficult to apply deadlines. On the plus side all our settings are captured in version control. Before they were spread out over several systems, one of which had an audit system that was pure tribal knowledge for years.

gastonfournier2y ago

The main challenge is when things go wrong. Feature flags are designed for high-rate evaluation with low latency responses. Configuration usually doesn't care that much about latency as it's usually read once at startup. This context leads to some very specific tradeoffs such as erring to availability over consistency, which in the case of configuration management could be a bad choice

zellyn2y ago

Yeah, and assuming they are done well, they probably have better analytics and insights attached to them than anything else except perhaps your experiments!

hinkley2y ago

Long lived features flags is a development process bug, I'm not sure we can solve it with the feature toggle system.

I'm at the point of deciding that Scrum is fundamentally incompatible with feature flags. We demo the code long before the flag has been removed, which leads to perverse incentives. If you want flags to go away in a timely manner you need WIP limits, and columns for those elements of the lifecycle. In short: Kanban doesn't (have to) have this problem.

And even the fixes I can imagine like the above, I'm not entirely sure you can stop your bad actor, because it's going to be months before anyone notices that the flags have long overstayed their welcome.

I'm partial to flags being under version control, where we have an audit trail. However time and again what we really need is a summary of how long each flag has existed, so they can be gotten rid of. The Kanban solution I mention above is only a 90% solution - it's easy to forget you added a flag (or added 3 but deleted 2)

gastonfournier2y ago

I faced something similar, and I think it's unavoidable. Give people a screwdriver and they'll find a way of using it as a hammer.

The best you can do is expect the feature flagging solution to give some kind of warning for tech debt. Then equip them with alternative tools for configuration management. Rather than forbidding, give them options, but if it's not your scope, I'd let them be (I know as engineers this is hard to do :P).

llbeansandrice2y ago

> Give people a screwdriver and they'll find a way of using it as a hammer.

I feel like feature flags aren't that far off though. They're fantastic for many uses of runtime configuration as mentioned in another comment.

There's multiple people in this thread complaining about "abuse" of feature flags but no one has been able to voice why it's abuse instead of just use beyond esoteric dogma.

2 more replies

Dirak2y ago

I feel like this is a solvable problem: 1) make feature flags be configured to have an expiration date. If over the expiration date, auto-generate a task to clean up your FF 2) If you want to be extra fancy, set up a codemod to automatically clean up the FF once it's expired

I don't see the problem with developers using flags for configuration as a stopgap until there's a better solution available.

tantalor2y ago

> automatically clean up the FF once it's expired

Um what? How could that ever work. It's like you are trying to find new exciting ways to break prod.

1 more reply

tantalor2y ago

Sounds like "other dev" found some business case they could unblock with existing system, and you thought the business was better off not solving that, or finding a more expensive solution.

Curious how you plan to justify cost to "fix it" to management. If it ain't broke...

strken2y ago

I think it's better to admit they actually are config, just a different kind of config that comes with an expiration date.

Accepting reality in this way means you'll design a config management system that lets you add feature flags with a required expiration date, and then notifies you when they're still in the system after the deadline.

timothyfcook2y ago

I agreed. My perspective is that there are two kinds of feature flags: temporary and permanent.

Temporary ones can be used to power experiments or just help you get to GA and then can be removed.

Permanent ones can be configs that serve multiple variations (e.g. values for rate limits), but they can also be simple booleans that manage long term entitlements for customers (like pricing tiers, regional product settings, etc.)

accountantbob2y ago

We did the same. We were early adopters of unleash and wrangled it to also host long term application configuration and even rule based application config.

The architecture of unleash made it so simple to do in unleash vs having to evaluate, configure, and deploy a separate app config solution.

gastonfournier2y ago

Victim of your own success. As others were saying, when it works for short-lived its easy/no effort to use it for long-lived configurations.

ivarconr2y ago

Thanks for sharing. I have seen systems grow in to thousands of flags, where most developers does not know what a particular flags do anymore.

brightball2y ago

It’s one of the main reasons to start with something like unleash because they have stale flag warnings built in. Plus, since you already have a UI it’s harder for it to be hijacked.

getrealyall2y ago

The solution is not to use feature flags. Or maybe have them expire. Oh, also, discipline the developers who do this.

dabeeeenster2y ago

For those that dont know about the project, check out Open Feature https://openfeature.dev/ which is sort of like Open Telemetry but for feature flags. Helps avoid vendor lock in. We're a young project and looking for help and to build the community!

triyambakam2y ago

Wow, part of the CNCF. That's awesome

staplung2y ago

This feels a bit like the dicta on 12 Factor: rules handed down from a presumed authority without any discussion of the tradeoffs. Engineering is tradeoff evaluation. Give me some discussion about the alternatives, when and why they're inferior and don't pretend like the proposed solution doesn't have shortcomings or pitfalls.

imiric2y ago

I agree with you that tradeoff evaluation is crucial in engineering, but I don't see the 12 Factor methodology as a set of strict rules. They're more like guidelines that are generally a good idea to follow for building modern applications or services. Some of the suggestions apply for any type of software, like having a single version controlled codebase, separate build/release/run stages, and using stateless processes.

So it's good to be aware of _why_ those guidelines are considered a good thing, but as with any methodology, an engineer should be pragmatic in deciding when to follow it strictly, and when to adapt or ignore some of it.

That said, I wouldn't want to work on software that completely ignores 12 Factor.

tiberriver2562y ago

Item #1 depends on the reason you're using feature flags.

For a more nuanced and careful discussion of the topic I like to reference: https://martinfowler.com/articles/feature-toggles.html

gastonfournier2y ago

It's true that there are more long-lived use cases, but if you have the ability to choose, runtime controlled ones cover both cases, while compile time only cover some use cases. But fair point

oddx2y ago

I dedicated a day to evaluating feature flag software based on specific criteria:

    - Must support multiple SDKs, including Java and Ruby.
    - Should be self-hosted with PostgreSQL database support.
    - Needs to enable remote configuration for arbitrary values (not just feature flags). I don't run two separate services for this.
    - Should offer some UI functionality.
    - it should cache flag values locally and, ideally, provide live data updates (though pooling is acceptable).

Here are the four options that met these basic criteria and underwent detailed evaluation:

    - Unleash: Impressive and powerful, but its UI is more complex than needed, and it lacks remote configuration.
    - Flagsmith: Offers remote configuration but appears less polished with some features not working smoothly; Java SDK error reporting needs improvement.
    - Flipt: Simple and elegant, but lacks remote configuration and local caching for Java SDK.
    - FeatureHub: Offers fewer features than Unleash and Flagsmith; its Java API seems somewhat enterprisly but supports remote configuration and live data updates.

Currently, I'm leaning towards FeatureHub. If remote configuration isn't necessary, Unleash offers more features, and if simplicity is key and local caching isn't needed, Flipt is an attractive option.

bullcitydev2y ago

Hey thanks for giving Flipt a look! I'm the creator of Flipt so would love to chat more about your needs to see how we could make it work for your use case! We're actively looking into providing local caching for all our SDKs btw and would love to learn more about what your requirements are for remote configuration as it's also on our radar! Feel free to send me an email at: mark (at) flipt.io.

rubicon332y ago

As an engineer, I am generally against feature flags.

They fracture your code base, are sometimes never removed, and add complexity and logic that at best is a boolean check and at worse is something more involved.

I'd love a world where engineers are given time to complete their feature in its entirety, and the feature is released when it is ready.

Sadly, we do not live in that world and hence: feature flags.

hn_throwaway_992y ago

This misses the point. A big point of feature flags is that you don't yet know how features will be perceived until you get them in front of real users.

I get what you'd like "as an engineer", but it ignores the needs of the business.

rubicon332y ago

Isn't that the job a product manager? There are other means and methodologies for gathering user sentiment before you go and build something.

You should get as close as you can, release the product, and iterate.

Todays world is release the product in some ramshackle form or fashion, collect feedback, iterate. To do that introduces a new construct of Feature Flags that would otherwise not be necessary.

1 more reply

PH95VuimJjqBqy2y ago

That is not what feature flags are typically used for.

They're typically used as a way of enabling a change for a subset of your services to allow for monitoring of the update and easier "rollback" if it becomes necessary.

They can be used for A/B testing, but this is not what they're typically used for.

jt21902y ago

I just item 1 (“Enable run-time control. Control flags dynamically, not using config files”) and it’s almost exclusively focused on what to do but not on why to do it.

It seems to be skipping past the use-cases and assumptions, in particular, describing what a system with feature flags looks and acts like, what the benefits and drawbacks are.

tmpX7dMeXU2y ago

Yeah, because it’s describing how a product works, being passed off as knowledge.

ivarconr2y ago

> It seems to be skipping past the use-cases and assumptions

This is a great feedback. Our intention was to describe how such a system work at scale, but I see we could do better in this section, thanks!

Do you have some use-cases in mind?

zellyn2y ago

Background: I work at Block/Square, on the team that owns (but didn't build) our internal Feature Flag system, and also have a lot of experience with using LaunchDarkly.

I like the idea of caching locally, although k8s makes that a bit more difficult since containers are typically ephemeral. People will use feature flags for things that they shouldn't, so eventually "falling back go default values" will cause production problems. One thing you can do to help with this is run proxies closer to your services. For example, LaunchDarkly has an open source "Relay".

Local evaluation seems to be pretty standard at this point, although I'd argue that delivering flag definitions is (relatively) easy. One of the real value-add of a product like LaunchDarkly is all the things they can do when your applications send evaluation data upstream: unused flags, only-ever-evaluated-to-the-default flags, only-ever-evaluated-to-one-outcome flags, etc.

One best practice that I'd love to see spread (in our codebases too) is always naming the full feature flag directly in code, as a string (not a constant). I'd argue the same practice should be taken with metrics names.

One of the most useful things to know (but seldom communicated clearly near landing pages) is a basic sketch of the architecture. It's necessary to know how things will behave if there is trouble. For instance: our internal system uses ZK to store (protobuf) flag definitions, and applications set watches to be notified of changes. LaunchDarkly clients download all flags[1] in the project on connection, then stream changes.

If I were going to build a feature flag system, I would ensure that there is a global, incrementing counter that is updated every time any change is made, and make it a fundamental aspect of the design. That way, clients can cache what they've seen, and easily fetch only necessary updates. You could also imagine annotating that generation ID into W3C Baggage, and passing it through the microservices call graph to ensure evaluation at a consistent point in time (clients would need to cache history for a minute or two, of course).

One other dimension in which feature flag services vary is by the complexity of the rules they allow you to evaluate. Our internal system has a mini expression language (probably overkill). LaunchDarkly's arguably better system gives you an ordered set of rules within which conditions are ANDed together. Both allow you to pass in arbitrary contexts of key/value pairs. Many open source solutions (Unleash, last I checked, some time ago) are more limited: some of them don't let you vary on inputs, some only a small set of prescribed attributes.

I think the time is ripe for an open standard client API for feature flags. I think standardizing the communication mechanisms would be constricting, but there's no reason we couldn't create something analogous to (or even part of) the Open Telemetry client SDK for feature flags. If you are seriously interested in collaborating on that, please get in touch. (I'm "zellyn" just about everywhere)

[1] Yes, this causes problems if you have too many flags in one project. They have a pretty nice filtering solution that's almost fully ready.

[Update: edited to make 70% of it not italics ]

zellyn2y ago

One more update. I spent a little time the other day trying to find all the feature flag products I could. I'm sure I missed a ton. Let me know in the comments!

LaunchDarkly Split Apptimize CloudBees ConfigCat DevCycle FeatBit FeatureHub Flagsmith Flipper Flipt GrowthBook Harness Molasses OpenFeature Posthog Rollout Unleash

Here's my first draft of the questions you'd want to ask about any given solution:

    Questionnaire
    
    - Does it seem to be primarily proprietary, primarily open-source, or “open core” (parts open source, enterprise features proprietary)?
      - If it’s open core or open source with a service offering, can you run it completely on your own for free?
    - Does it look “serious/mature”?
      - Lots of language SDKs
      - High-profile, high-scale users
      - Can you do rules with arbitrary attributes or is it just on/off or on/off with overrides?
    - Can it do complex rules?
    - How many language SDKs (one, a few, lots)
    - Do feature flags appear to be the primary purpose of this company/project?
      - If not, does it look like feature flags are a first-class offering, or an afterthought / checkbox-filler? (eg. split.io started out in experimentation, and then later introduced free feature flag functionality. I think it’s a first-class feature now.)
    - Does it allow approval workflows?
    - What is the basic architecture?
      - Are flags evaluated in-memory, locally? (Hopefully!)
      - Is there a relay/proxy you can run in your own environment?
      - How are changes propagated?
        - Polling?
        - Streaming?
      - Does each app retrieve/stream all the flags in a project, or just the ones they use?
      - What happens if their website goes down?
    - Do they do experiments too?
      - As a first-class offering?
    - Are there ACLs and groups/roles?
      - Can they be synced from your own source of truth?
    - Do they have a solution for mobile and web apps?
      - If so, what is the pricing model?
      - Do they have a mobile relay type product you can run yourself?
    - What is the pricing model?
      - Per developer?
      - Per end-user? MAU?

konradlekko2y ago

A few more: https://featurevisor.com/ https://configcat.com/

I will toss our hat in the ring but we are early in this space! https://lekko.com

blawson2y ago

Togglz is another option: https://www.togglz.org/

tpetr2y ago

Also https://prefab.cloud

kiitos2y ago

> Are flags evaluated in-memory, locally? (Hopefully!)

This seems like a MUST rather than a SHOULD, right?

1 more reply

vlovich1232y ago

Do you have the answers to that questionnaire for the services you mention?

1 more reply

vijayer2y ago

Could you add Statsig to your research?

daigoba662y ago

> One best practice that I'd love to see spread (in our codebases too) is always naming the full feature flag directly in code, as a string (not a constant).

Can you elaborate on this? As a programmer, I would think that using something like a constant would help us find references and ensure all usage of the flag is removed when the constant is removed.

zellyn2y ago

One of the most common things you want to do for a feature flag or metric name is ask, "Where is this used in code?". (LaunchDarkly even has a product feature that does this, called "Code References".) I suppose one layer of indirection (into a constant) doesn't hurt too much, although it certainly makes things a little trickier.

The bigger problem is when the code constructs metric and flag names programmatically:

    prefix = "framework.client.requests.http.{status%100}s"
    recordHistogram(prefix + ".latency", latency)
    recordCount(prefix + ".count", 1)

    flagName = appName + "/loadshed-percent"

    # etc...

That kind of thing makes it very hard to find references to metrics or flags. Sometimes it's impossible, or close to impossible to remove, but it's worth trying hard.

Of course, this is just, like, my opinion, man!

1 more reply

athenot2y ago

Not OP but multiple code bases may refer to the same flag by a different constant. Having a single string that can be searched accross all repos in an organization is quite handy to find all places where it's referenced.

1 more reply

grork2y ago

IME searching for the name of the flag name and getting 1 result is less helpful than 15 results that directly show point-of-use.

zellyn2y ago

After typing that, and realizing I have a lot more to say, I guess I should write a blog post on the subject

snorlaxmorlax2y ago

You definitely should! These questions are great, and could use some appropriate context for evaluation.

ferrantimOP2y ago

Yes please. Blog would be awesome.

gastonfournier2y ago

Yes, please!

zellyn2y ago

Oh, and one last(?) update.

If you create your own service to evaluate a bunch of feature flags for a given user/client/device/location/whatever and return the results, for use in mobile clients (everyone does this), PLEASE *make sure the client enumerates the list of flags it wants*. It's very tempting to just keep that list server-side, and send all the flags (much simpler requests, right?), but you will have to keep serving all those flags for all eternity because you'll never know which deployed versions of your app require which flags, and which can be removed.

[Edit: speling]

baq2y ago

You should be collecting metrics on used flags and their values if you’re rolling your own. A saas offering will do that for you.

1 more reply

zellyn2y ago

> I'd argue that delivering flag definitions is (relatively) easy.

I'd argue that coming up with good UI that nudges developers towards safe behavior, as well as useful and appropriate guard rails -- in other words, using the feature flag UI to reduce likelihood of breakage -- is difficult, and one of the major value propositions of feature flag services.

eximius2y ago

The system we're building now meets most of these but not necessarily in the way described.

First, we're building a runtime configuration system on top of AWS AppConfig. YAML/proto validation that pushes to AppConfig via gitops and bazel. Configurations are namespaced so the unique names is solved. It's all open in git.

Feature flags are special cases of runtime configuration.

We are distinguishing backend feature flags from experimentation/variants for users. We don't have (or want) cohorting by user IDs or roles. We have a separate system for that and it does it well.

The last two points - distinguishing between experimentation/feature variants and feature flags as runtime configuration are somewhat axiomatic differences. Folks might disagree but ultimately we have that separate system that solves that case. They're complimentary and share a lot of properties but ultimately it solves a lot of angst if you don't force both to be the same tool.

tvink2y ago

>Organizations who adopt feature flags see improvements in all key operational metrics for DevOps: Lead time to changes, mean-time-to-recovery, deployment frequency, and change failure rate.

Is this true? unfortunately there's no sources indicated, and a quick check on scholar doesn't show me anything of the sort.

gastonfournier2y ago

There are a few case studies listed in most of the feature flag solutions, of course, each organization is completely different and the maturity of each organization varies. But feature flags are a 2-way-door decision, meaning that you can adopt them at smaller scale, try it out and see if it works for you before making a decision.

Here's a list of case studies from some of the solutions referred in the comments, some focus on operational metrics, others in lead time to changes: https://www.getunleash.io/case-studies https://launchdarkly.com/case-studies/ https://www.flagsmith.com/case-studies

timothyfcook2y ago

The LaunchDarkly 2022 State of Feature Management Report has results from surveying 1000 software people and looks at impact on those DevOps metrics: https://launchdarkly.com/state-of-feature-management/

getrealyall2y ago

This was written by a company selling feature flag software. Use that information as thou wilt.

Lutger2y ago

We use Unleash. There are many things you can do with feature flags and Unleash helps with a lot of them. However, my feeling is 80% of the value comes from 20% of the features. Even a much simpler system provides a ton of benefit. For me, it is top of the list after having automated tests and automated deployments.

aranchelk2y ago

With regard to web-based services, once you’ve got the ability to do canary testing, IMO flags/toggles are less compelling — busier code and logic you’ll have to pull out later.

erik_seaberg2y ago

Canarying gets you a 1/n treatment group, but it might be skew geographically (all affected users are near the canary’s datacenter). You need a percentage in a feature flag if 1/n is too big and you want, e.g., 0.1% of traffic.

gastonfournier2y ago

I agree that if you have only a few changes going to prod, fast and doing canary testing, you should be covered. In my experience that's rarely the case because of multiple teams deploying changes at the same time, and even deployments in external services causing side effects in other services.

aranchelk2y ago

Emergent inter-service issues are challenging to deal with regardless.

I’ve absolutely seen canary testing work in large environments with a lot of teams doing frequent deploys. The teams need to have the tooling to conduct their own canary testing and monitoring.

As soon as you’re involving external services or anything persistent you may not be able to undo the damage of misbehaving software by simply disabling the offending code with a flag.

In practice the cost/benefit of feature flags has never proven out for me, better to just speed up your deploys/rollbacks, the caveat is I’ve only ever worked in web environments, I can imagine with software running on an end user device it could solve some difficult problems provided you have a way to toggle the flag.

baq2y ago

OTOH a flag gives you an ability to deploy and revert independent of the product’s release cycle.

sinuhe692y ago

I know features flags like the flags of FF, but they are in fact config files. How does one realize dynamic control of features flags?

Are they using a kind of logic to determine to turn on/off a feature or do they query a central database to know that?

Can someone explain its basic mechanism? Thanks

adasdasdas2y ago

More principles

- Require in code defaults for fault tolerance

- Start annoying the flag author to delete if the flag is over a month old

- Partial rollout should be by hash on user id

- Contextual flag features should always be supplied by client (e.g. only show in LA, the location should be provided by client)

hamandcheese2y ago

> Partial rollout should be by hash on user id

With a per-flag salt as well, otherwise the same user will always have bad luck and be subject to experiments first.

zellyn2y ago

Our in-house solution hashes flagname+key, and LD does the same but adds salt

stravant2y ago

Better yet require always the same default for boolean flags, so that it's easier to reason about lifecycle for them.

tantalor2y ago

> Start annoying the flag author to delete if the flag is over a month old

No problem, filter that email directly to spam folder.

flitzofolov2y ago

I couldn't find an easy link from these docs to the product page on mobile. Seems like a wasted opportunity. I had to edit the URL to get to the company website.

angarg122y ago

Offtopic but relevant:

TL;DR if you break long posts into pages, at least have an option to see the whole thing in a single page.

I use a browser extension to send websites to my Kindle. It's great for long-ish format blog posts that I want to read, but I don't have the time at the moment. However, whenever I see long blog posts that are broken into sections, each one in it's own page, it becomes a mess. It forces me to navigate each individual page and send it to my Kindle. Then in the Kindle I have a long list of unsorted files that I need to jump around to read in order.

I understand breaking long pieces of text into pages makes it neater and more organized, but at least have an option to see the whole thing in a single page, as a way to export it somewhere else for easy reading.

dgorton2y ago

Open Source knowledge sharing. I like it:

"Unleash is open-source, and so are these principles. Have something to contribute? Open a PR or discussion on our Github."

jacomoRodriguez2y ago

Is it just me or does this articles text structure and wording strongly indicate that is was written by gpt?

pests2y ago

The only thing worse than AI generated content is now everyone thinks everything is AI generated.

gingerrr2y ago

Definitely getting strong uncanny valley prose vibes.

Hard to tell if it's generated or written in an attempt to be as plain English as possible, but either way feels strangely vacuous for a technical opinion piece. There's no writer's voice.

flitzofolov2y ago

It's not really an opinion piece is it? It's docs. The language seems appropriate for articulating principles.

1 more reply

j / k navigate · click thread line to collapse

112 comments

mabbo2y ago

> Make feature flags short-lived. Do not confuse flags with application configuration.

This is my current battle.

Lesson learned: make it very hard to misuse meta-features like feature flags, or someone will use them to get their stuff done faster.

zellyn2y ago

Sadly, this is a battle you are destined to lose. I have almost completely given up. The best you can aim for is to use feature flags better rather than worse.

    - Some flags are going to stay forever: kill switches, load shedding, etc. (vendors are starting to incorporate this in the UI)
    - Unless you have a very-easy-to-use way to add arbitrary boolean feature toggles to individual user accounts (which can become its own mess), people are going to find it vastly easier to create feature flags with per-use override lists (almost all of them let you override on primary token). They will use your feature flags for:
      - Preview features: "is this user in the preview group?"
      - rollouts that might not ever go 100%: "should this organization use the old login flow?"
      - business-critical attributes that it would be a major incident to revert to defaults: "does this user operate under the alternate tax regime?"

You can try to fight this (indeed, especially for that last one, you most definitely should!), but you will not ever completely win the feature flag ideological purity war!

hamandcheese2y ago

Thank you for this great list of the immense business value derived from "misusing" feature flags!

1 more reply

sporkland2y ago

1. We could stick it in a standard conf system and serve it up randomly based on what host a client hits. (Or come up with more sophisticated rollouts)

2. Or we can put it as "perm" conf in the feature flag system and roll it out based on different cohorts/segments.

1 more reply

baq2y ago

So of course they'll be used for long-term configuration purposes, especially under pressure and for gradual rollouts of whole systems, not just A/B testing features.

rubicon332y ago

This hits the nail on the head.

The term "feature flag" has come to inherently have a time component because features are supposed to eventually be fulled GA'd.

What I've seen in practice is feature flags are never removed so a better way to think about them is as a runtime configuration.

2 more replies

mabbo2y ago

There is a need for runtime configurations, yes, but it's important to put them behind an interface intended for that, and not one intended for something else.

2 more replies

hinkley2y ago

gastonfournier2y ago

zellyn2y ago

Yeah, and assuming they are done well, they probably have better analytics and insights attached to them than anything else except perhaps your experiments!

hinkley2y ago

Long lived features flags is a development process bug, I'm not sure we can solve it with the feature toggle system.

gastonfournier2y ago

I faced something similar, and I think it's unavoidable. Give people a screwdriver and they'll find a way of using it as a hammer.

llbeansandrice2y ago

> Give people a screwdriver and they'll find a way of using it as a hammer.

I feel like feature flags aren't that far off though. They're fantastic for many uses of runtime configuration as mentioned in another comment.

There's multiple people in this thread complaining about "abuse" of feature flags but no one has been able to voice why it's abuse instead of just use beyond esoteric dogma.

2 more replies

Dirak2y ago

I don't see the problem with developers using flags for configuration as a stopgap until there's a better solution available.

tantalor2y ago

> automatically clean up the FF once it's expired

Um what? How could that ever work. It's like you are trying to find new exciting ways to break prod.

1 more reply

tantalor2y ago

Sounds like "other dev" found some business case they could unblock with existing system, and you thought the business was better off not solving that, or finding a more expensive solution.

Curious how you plan to justify cost to "fix it" to management. If it ain't broke...

strken2y ago

I think it's better to admit they actually are config, just a different kind of config that comes with an expiration date.

timothyfcook2y ago

I agreed. My perspective is that there are two kinds of feature flags: temporary and permanent.

Temporary ones can be used to power experiments or just help you get to GA and then can be removed.

accountantbob2y ago

We did the same. We were early adopters of unleash and wrangled it to also host long term application configuration and even rule based application config.

The architecture of unleash made it so simple to do in unleash vs having to evaluate, configure, and deploy a separate app config solution.

gastonfournier2y ago

Victim of your own success. As others were saying, when it works for short-lived its easy/no effort to use it for long-lived configurations.

ivarconr2y ago

Thanks for sharing. I have seen systems grow in to thousands of flags, where most developers does not know what a particular flags do anymore.

brightball2y ago

It’s one of the main reasons to start with something like unleash because they have stale flag warnings built in. Plus, since you already have a UI it’s harder for it to be hijacked.

getrealyall2y ago

The solution is not to use feature flags. Or maybe have them expire. Oh, also, discipline the developers who do this.

dabeeeenster2y ago

triyambakam2y ago

Wow, part of the CNCF. That's awesome

staplung2y ago

imiric2y ago

That said, I wouldn't want to work on software that completely ignores 12 Factor.

tiberriver2562y ago

Item #1 depends on the reason you're using feature flags.

For a more nuanced and careful discussion of the topic I like to reference: https://martinfowler.com/articles/feature-toggles.html

gastonfournier2y ago

It's true that there are more long-lived use cases, but if you have the ability to choose, runtime controlled ones cover both cases, while compile time only cover some use cases. But fair point

oddx2y ago

I dedicated a day to evaluating feature flag software based on specific criteria:

    - Must support multiple SDKs, including Java and Ruby.
    - Should be self-hosted with PostgreSQL database support.
    - Needs to enable remote configuration for arbitrary values (not just feature flags). I don't run two separate services for this.
    - Should offer some UI functionality.
    - it should cache flag values locally and, ideally, provide live data updates (though pooling is acceptable).

Here are the four options that met these basic criteria and underwent detailed evaluation:

    - Unleash: Impressive and powerful, but its UI is more complex than needed, and it lacks remote configuration.
    - Flagsmith: Offers remote configuration but appears less polished with some features not working smoothly; Java SDK error reporting needs improvement.
    - Flipt: Simple and elegant, but lacks remote configuration and local caching for Java SDK.
    - FeatureHub: Offers fewer features than Unleash and Flagsmith; its Java API seems somewhat enterprisly but supports remote configuration and live data updates.

bullcitydev2y ago

rubicon332y ago

As an engineer, I am generally against feature flags.

They fracture your code base, are sometimes never removed, and add complexity and logic that at best is a boolean check and at worse is something more involved.

I'd love a world where engineers are given time to complete their feature in its entirety, and the feature is released when it is ready.

Sadly, we do not live in that world and hence: feature flags.

hn_throwaway_992y ago

This misses the point. A big point of feature flags is that you don't yet know how features will be perceived until you get them in front of real users.

I get what you'd like "as an engineer", but it ignores the needs of the business.

rubicon332y ago

Isn't that the job a product manager? There are other means and methodologies for gathering user sentiment before you go and build something.

You should get as close as you can, release the product, and iterate.

Todays world is release the product in some ramshackle form or fashion, collect feedback, iterate. To do that introduces a new construct of Feature Flags that would otherwise not be necessary.

1 more reply

PH95VuimJjqBqy2y ago

That is not what feature flags are typically used for.

They're typically used as a way of enabling a change for a subset of your services to allow for monitoring of the update and easier "rollback" if it becomes necessary.

They can be used for A/B testing, but this is not what they're typically used for.

jt21902y ago

I just item 1 (“Enable run-time control. Control flags dynamically, not using config files”) and it’s almost exclusively focused on what to do but not on why to do it.

It seems to be skipping past the use-cases and assumptions, in particular, describing what a system with feature flags looks and acts like, what the benefits and drawbacks are.

tmpX7dMeXU2y ago

Yeah, because it’s describing how a product works, being passed off as knowledge.

ivarconr2y ago

> It seems to be skipping past the use-cases and assumptions

This is a great feedback. Our intention was to describe how such a system work at scale, but I see we could do better in this section, thanks!

Do you have some use-cases in mind?

zellyn2y ago

Background: I work at Block/Square, on the team that owns (but didn't build) our internal Feature Flag system, and also have a lot of experience with using LaunchDarkly.

[1] Yes, this causes problems if you have too many flags in one project. They have a pretty nice filtering solution that's almost fully ready.

[Update: edited to make 70% of it not italics ]

zellyn2y ago

One more update. I spent a little time the other day trying to find all the feature flag products I could. I'm sure I missed a ton. Let me know in the comments!

LaunchDarkly Split Apptimize CloudBees ConfigCat DevCycle FeatBit FeatureHub Flagsmith Flipper Flipt GrowthBook Harness Molasses OpenFeature Posthog Rollout Unleash

Here's my first draft of the questions you'd want to ask about any given solution:

    Questionnaire
    
    - Does it seem to be primarily proprietary, primarily open-source, or “open core” (parts open source, enterprise features proprietary)?
      - If it’s open core or open source with a service offering, can you run it completely on your own for free?
    - Does it look “serious/mature”?
      - Lots of language SDKs
      - High-profile, high-scale users
      - Can you do rules with arbitrary attributes or is it just on/off or on/off with overrides?
    - Can it do complex rules?
    - How many language SDKs (one, a few, lots)
    - Do feature flags appear to be the primary purpose of this company/project?
      - If not, does it look like feature flags are a first-class offering, or an afterthought / checkbox-filler? (eg. split.io started out in experimentation, and then later introduced free feature flag functionality. I think it’s a first-class feature now.)
    - Does it allow approval workflows?
    - What is the basic architecture?
      - Are flags evaluated in-memory, locally? (Hopefully!)
      - Is there a relay/proxy you can run in your own environment?
      - How are changes propagated?
        - Polling?
        - Streaming?
      - Does each app retrieve/stream all the flags in a project, or just the ones they use?
      - What happens if their website goes down?
    - Do they do experiments too?
      - As a first-class offering?
    - Are there ACLs and groups/roles?
      - Can they be synced from your own source of truth?
    - Do they have a solution for mobile and web apps?
      - If so, what is the pricing model?
      - Do they have a mobile relay type product you can run yourself?
    - What is the pricing model?
      - Per developer?
      - Per end-user? MAU?

konradlekko2y ago

A few more: https://featurevisor.com/ https://configcat.com/

I will toss our hat in the ring but we are early in this space! https://lekko.com

blawson2y ago

Togglz is another option: https://www.togglz.org/

tpetr2y ago

Also https://prefab.cloud

kiitos2y ago

> Are flags evaluated in-memory, locally? (Hopefully!)

This seems like a MUST rather than a SHOULD, right?

1 more reply

vlovich1232y ago

Do you have the answers to that questionnaire for the services you mention?

1 more reply

vijayer2y ago

Could you add Statsig to your research?

daigoba662y ago

> One best practice that I'd love to see spread (in our codebases too) is always naming the full feature flag directly in code, as a string (not a constant).

Can you elaborate on this? As a programmer, I would think that using something like a constant would help us find references and ensure all usage of the flag is removed when the constant is removed.

zellyn2y ago

The bigger problem is when the code constructs metric and flag names programmatically:

    prefix = "framework.client.requests.http.{status%100}s"
    recordHistogram(prefix + ".latency", latency)
    recordCount(prefix + ".count", 1)

    flagName = appName + "/loadshed-percent"

    # etc...

That kind of thing makes it very hard to find references to metrics or flags. Sometimes it's impossible, or close to impossible to remove, but it's worth trying hard.

Of course, this is just, like, my opinion, man!

1 more reply

athenot2y ago

1 more reply

grork2y ago

IME searching for the name of the flag name and getting 1 result is less helpful than 15 results that directly show point-of-use.

zellyn2y ago

After typing that, and realizing I have a lot more to say, I guess I should write a blog post on the subject

snorlaxmorlax2y ago

You definitely should! These questions are great, and could use some appropriate context for evaluation.

ferrantimOP2y ago

Yes please. Blog would be awesome.

gastonfournier2y ago

Yes, please!

zellyn2y ago

Oh, and one last(?) update.

[Edit: speling]

baq2y ago

You should be collecting metrics on used flags and their values if you’re rolling your own. A saas offering will do that for you.

1 more reply

zellyn2y ago

> I'd argue that delivering flag definitions is (relatively) easy.

eximius2y ago

The system we're building now meets most of these but not necessarily in the way described.

Feature flags are special cases of runtime configuration.

We are distinguishing backend feature flags from experimentation/variants for users. We don't have (or want) cohorting by user IDs or roles. We have a separate system for that and it does it well.

tvink2y ago

>Organizations who adopt feature flags see improvements in all key operational metrics for DevOps: Lead time to changes, mean-time-to-recovery, deployment frequency, and change failure rate.

Is this true? unfortunately there's no sources indicated, and a quick check on scholar doesn't show me anything of the sort.

gastonfournier2y ago

timothyfcook2y ago

getrealyall2y ago

This was written by a company selling feature flag software. Use that information as thou wilt.

Lutger2y ago

aranchelk2y ago

With regard to web-based services, once you’ve got the ability to do canary testing, IMO flags/toggles are less compelling — busier code and logic you’ll have to pull out later.

erik_seaberg2y ago

gastonfournier2y ago

aranchelk2y ago

Emergent inter-service issues are challenging to deal with regardless.

I’ve absolutely seen canary testing work in large environments with a lot of teams doing frequent deploys. The teams need to have the tooling to conduct their own canary testing and monitoring.

As soon as you’re involving external services or anything persistent you may not be able to undo the damage of misbehaving software by simply disabling the offending code with a flag.

baq2y ago

OTOH a flag gives you an ability to deploy and revert independent of the product’s release cycle.

sinuhe692y ago

I know features flags like the flags of FF, but they are in fact config files. How does one realize dynamic control of features flags?

Are they using a kind of logic to determine to turn on/off a feature or do they query a central database to know that?

Can someone explain its basic mechanism? Thanks

adasdasdas2y ago

More principles

- Require in code defaults for fault tolerance

- Start annoying the flag author to delete if the flag is over a month old

- Partial rollout should be by hash on user id

- Contextual flag features should always be supplied by client (e.g. only show in LA, the location should be provided by client)

hamandcheese2y ago

> Partial rollout should be by hash on user id

With a per-flag salt as well, otherwise the same user will always have bad luck and be subject to experiments first.

zellyn2y ago

Our in-house solution hashes flagname+key, and LD does the same but adds salt

stravant2y ago

Better yet require always the same default for boolean flags, so that it's easier to reason about lifecycle for them.

tantalor2y ago

> Start annoying the flag author to delete if the flag is over a month old

No problem, filter that email directly to spam folder.

flitzofolov2y ago

I couldn't find an easy link from these docs to the product page on mobile. Seems like a wasted opportunity. I had to edit the URL to get to the company website.

angarg122y ago

Offtopic but relevant:

TL;DR if you break long posts into pages, at least have an option to see the whole thing in a single page.

dgorton2y ago

Open Source knowledge sharing. I like it:

"Unleash is open-source, and so are these principles. Have something to contribute? Open a PR or discussion on our Github."

jacomoRodriguez2y ago

Is it just me or does this articles text structure and wording strongly indicate that is was written by gpt?

pests2y ago

The only thing worse than AI generated content is now everyone thinks everything is AI generated.

gingerrr2y ago

Definitely getting strong uncanny valley prose vibes.

Hard to tell if it's generated or written in an attempt to be as plain English as possible, but either way feels strangely vacuous for a technical opinion piece. There's no writer's voice.

flitzofolov2y ago

It's not really an opinion piece is it? It's docs. The language seems appropriate for articulating principles.

1 more reply

j / k navigate · click thread line to collapse