Rare to see though. I don't think being able to write code automatically means you can write decent tests. Skill needs to be developed.
As you change your codebase you will experience lots of “failures” that are not failures. You still have to burn your time investigating them.
Many checks will require elaborate mocking or other kinds of setup, that give lie to the claim that designing them is simple and straightforward.
That says more about you and the care you put into quality assurance than anything else, really.
If you can ship your hypothesis along with an effectively unaltered version of prod, the ability to test things without breaking other things becomes much more feasible. I've never been in a real business scenario where I wasn't able to negotiate a brief experimental window during live business hours for at least one client.
A good decom/cleanup strategy definitely helps
Personally I've also had a lot of success requiring "expiration" dates for all flags, and when passed they emit a highly visible warning metric. You can always just bump it another month to defer it, but people eventually get sick of doing that and clean it up so it'll go away for good. Make it mildly annoying, so the cleanup is an improvement, and it happens pretty automatically.
Another issue that I've ran into a few times, is if a feature flag starts as a simple thing, but as new features get added, it evolves into a complex bifurcation of logic and many code paths become dependent on it, which can add crippling complexity to what you're developing
https://en.wikipedia.org/wiki/Knight_Capital_Group#2012_stoc...
If you work on fifty feature toggles a year, one of them is going to go wrong. If your team is doing a few hundred, you’re gonna have oopsies.
Most of the problematic cases are where the code is set up so that the old path and the new one can’t bypass each other cleanly. They get tangled up and maybe the toggle gets implemented inverted where it’s difficult to remove the old path without breaking the new.
I also like recording and replaying production traffic, as well, so that you can do your tee-testing in an environment that doesn't affect latency for production, but that's not quite the same thing.
It is not very useful in giving you confidence your changes would not cause unexpected side effects, which is usually the main problem working with legacy code.
If you want confidence when working with legacy code, your best bet is to do a strangler fig pattern - find a boundaries for the module you want to work on, rewrite the module (or clone and make your changes), run both at the same time in shadow mode, monitor and verify your new module is working the same as the old one, then switch and eventually delete the old module.
Refactoring is generally useful for annealing code enough that you can reshape it into separate concerns. But when the work hardening has been going on far too long there usually seems like there’s no way to get from A->D without just picking a day when you feel invincible, getting high on caffeine, putting on your uptempo playlist and telling people not to even look at you until you file your +1012 -872 commit.
I used to be able to do those before lunch. I also found myself to be the new maintainer of that code afterward. That doesn’t work when you’re the lead and people need to use you to brainstorm getting unblocked or figuring out weird bugs (especially when calling your code). All the plates fall at that point.
It was less than six months after I figured out the workaround that I learned the term Mikado, possibly when trying to google if anyone else had figured out what I had figured out. I still like my elevator pitch better than theirs:
Work on your “top down” refactor until you realize you’ve found yet another whole call tree you need to fix, and feel overwhelmed/want to smash your keyboard. This is the Last Straw. Go away from your keyboard until you calm down. Then come back, stash all your existing changes, and just fix the Last Straw.
For me I find that I’m always that meme of the guy giving up just before he finds diamonds in the mine. The Last Straw is always 1-4 changes from the bottom of the pile of suck, and then when you start to try to propagate that change back up the call stack, you find 75% of that other code you wrote is not needed, and you just need to add an argument or a little conditional block here and there. So you can use your IDE’s local history to cherry pick a couple of the bits you already wrote on the way down that are relevant, and dump the rest.
But you have to put that code aside to fight the Sunk Cost Fallacy that’s going to make you want to submit that +1012 instead of the +274 that is all you really needed. And by the way is easier to add more features to in the next sprint.
Something to realize is that every codebase is legacy. My best new feature implementations are always several commits that do no-op refactorings, with no changes to tests even with good coverage (or adding tests before the refactoring for better coverage), then one short and sweet commit with just the behavior change.
Mikado is more of a get out of jail card for getting trapped in a “top down refactor” which is an oxymoron.
In other words, I'd reword this to using the Mikado method to understand large codebases, or get a first glimpse of how things are connected and wired up. But to say it allows for _safe_ changes is stretching it a bit much.
Working with old code is tough, no real magic to work around that.
Then by definition you have the smallest safest step you can take. It would be the leaf nodes on your graph?
Of course, working in a legacy codebase is also torture.
Software development is a hyper-rational endeavor, so we don't often talk about feelings. This article also does not talk much about feelings.
Reading between the lines, it looks like reverting the code is supposed to affect how you feel about the work. Knowing that failure is an explicit option can help to set an expectation; however, without a mature understanding of failure, that expectation may just be misery.
With a mature understanding of failure, the possibility of a forced rollback should help you "let go" of those changes. It's like starting a day of painting or drawing with one that you force yourself to throw away; or a writing session with a silly page.
----
If someone thinks that they are giving you good advice, but it sounds terrible, then maybe they are expecting you to do some more work to realize the value of that advice.
If you are giving someone advice and they push back, maybe you are implying some extra work or expectations that you have not actually said out loud.
Advice is plagued by the tacit knowledge problem.
It makes intuitive sense to me that this would be true in complex domains (e.g. legacy code) where you really need to find the right solution, even if it takes a bit longer. Our first ideas are rarely our best ideas, and it's easy to get too attached to your first solution and try to tweak it into shape when it would be better just to start fresh.
[1]: https://www.hytradboi.com/2025/03580e19-4646-4fba-91c3-17eab...
Maybe the software crashes when you write 42 in some field and you're able to tell it's due to a missing division-by-zero check deep down in the code base. Your gut tells you you should add the check but who knows if something relies on this bug somehow, plus you've never heard of anyone having issues with values other than 42.
At this point you decide to hard code the behavior you want for the value 42 specifically. It's nasty and it only makes the code base more complex, but at least you're not breaking anything.
Anyone has experience of this mindset of embracing the mess?
Do you really know all of the expected behavior you're hardcoding in? What happens if your hardcoded behavior is just incorrect enough that it breaks something somewhere else? How can you be sure that your test for that specific value is even correct?
I think the better approach is to let things break naturally and open a bug with your findings. You'd be surprised how often someone else knows exactly what's going on and can fix it correctly. Your hacks are not just pouring gasoline onto the fire, but opening a well directly underneath that will keep it burning for a long time.
(seriously though, this book has answers for you: Working Effectively with Legacy Code, by Michael Feathers)
It gives a great way to visualise the work needed to achieve a goal, without ever mentioning time.
It is the best step by step guide I have ever seen to successfully work with legacy code.
Written by somebody who has been there done that many many times.
For this particular example, the first question I have is why are we upgrading the ORM? As a codebase grows and matures, the cost of ORM change increases, and so too must the justification for upgrading it increase. Any engineer worth their salt needs to know this justification and have it mind at all times so they can apply appropriate judgment as the discovered scope increases. Let's assume now the change is justified.
The next question critical question is how do you know if you've broken anything? Right in the intro the author talks about an "untested and poorly documented codebase", but then in the example uses basic compilation as a proxy for success. I'm sorry to be harsh, but this completely hand-waves away the hard part of the problem. To have any confidence in a change like this you need to have a sense of what could go wrong and guard against those things, some of which could be subtle data corruption that could be extremely costly and hard to unwind later. This may involve logging, side-by-side testing, canary deployments, additional monitoring, static analysis and/or any other number of techniques applied based on an understanding of what the upgrade actually means under-the-hood for the ORM in question combined with an analysis of what risks that entails to the system and business/process in question. Drawing a mindmap of your refactoring plan is barely more than IntelliJ (let alone Claude Code) can already do at the click of a button.
Then you’ll get several paths of action.
Chose one and tell the model to write into a file you’ll keep around while the implantation is on going so you won’t pollute the context and can start over each chunk of work in a clean prompt. Name the file refactor-<name >-plan.md tell it to write the plan step by step and dump a todo list having into account dependencies for tracking progress.
Review the plans, make fixes if needed. You need to have some sort of table reassembling a todo so it can track and make progress along.
Open a new prompt tell it analyze the plan file, to go to the todo list section and proceed with the next task. Verify it done, and update the plan.
Repeat until done.
I've spent blood sweat, tears and restless evenings scrolling and ctrl-f-ing huge build and test logs to finally accomplish the task.
But let's take a step back.
So they assign you to get that done. You're supposed to be careful, courageous and precise while making those changes without regression. There's very little up-to-date documentation on the design, architecture, let alone any rationale on design choices. You're supposed to come up with methods like Mikado, tdd, shadowing or anything that gets the job done.
Is this even fair to ask? Suppose you ask a contractor to re-factor a house with old style plumbing and electricity. Will they do it Mikado style, or, would they say - look - we're going to tear things down and rebuild it from the ground. You need to be willing to pay for a designer, an architect, new materials and a set of specialized contractors.
So why do we as sw engineers put up with the assignment? Are we rewarded so much more than the project manager of that house who subcontracts the work to many people to tear down and rebuild?
Does the project manager get paid more by the hour to refactor a house than to build one?
You don't "do" changes. This sounds lazy at best, unintelligent at worst, and fails to communicate what's happening. "Make changes" would be better diction and more appropriate vernacular. You could even "enact changes upon" or "affect changes to."
Alternatively, drop the extra verb altogether: "Use the Mikado Method to change complex codebases safely"
> After a couple of minutes, just stop and think. What’s missing? What would make it easier to do that change, like the previous one?
"Perform." Not "do." You would be _performing_ a change.
But hey at least I'm pretty sure an LLM didn't write this article =)
I have this configured to feed in to an agent for large changes. It’s been working pretty well, still not perfect though… the tricky part is that it is very tempting (and maybe even sometimes correct) to not fully reset between mikado “iterations”, but then you wind up with a messy state transfer. The advantage so far has been that it’s easy to make progress while ditching a session context “poisoned” by some failure.
Is that the Mikado method?
I think there are similar methods, such as nested todo-lists. But DAGs are exceptionally good for this use case of visualising work (Mikado graphs are DAGs).
I assume something about this appeals to a certain psychology. Here is the essence of the method for people who dislike rituals: pick one little thing you can succeed at. Do that thing. Repeat as necessary.
Edit: thought I read it was of Scandinavian origin, hence my comment. But Wikipedia said european origin. Well well.
Poor code requires not coding but analysis and decisions, partitioning code and clients. So:
1. Stop writing code
2. Buy or write tools to analyze the code (modularity) and use-case (clients)
3. Make 3+ rough plans:
(a) leave it alone and manage quality;
(b) identify severable parts to fix and how (same clients);
(3) incrementally migrate (important) clients to something new
The key lesson is that incremental improvements are sinking money (and worse, time) into something that might need to go, without any real context for whether it's worth it.
Using a programming language that has a compiler, lucky.