If anything, what should be warned against is big bangs. Netscape's problem isn't that they did a rewrite (which eventually became Firefox, mind you) it's that they essentially abandoned their old code too early, and similarly, they also announced the rewrite too soon.
If you're going to do a rewrite, do it quietly, and don't announce it until it's close to ready, and even then you can roll out slowly. For example, the infamous Digg v4 debacle is another example, but the problem isn't that they did a rewrite, it's that they did a rewrite to produce a product nobody wanted, and they burned any possibility of going back after they released it.
"But we are smarter than those other guys who wrote this mess" - if that is the case, you should be smart enough to fix the mess.
Classic Mac OS was written at a time when computers had kilobytes of memory and black and white graphics. It was not designed for multi-tasking, because at the time it was written, the world had insufficient hardware to make multi-tasking practical. It was not designed for networking, because the internet mostly did not exist at the time. It was not designed for security, because computers were a less important target for criminals, and no internet meant far fewer exploit vectors.
All of these features were post-facto bolted onto Mac OS, and the result was an unstable mess. It was ultimately a full OS rewrite that fixed the platform.
---
I'll make a separate case too, for a rewrite I wish would happen: it's all well and good for Slack to build their client in Electron as a small startup that needs to experiment and iterate quickly. However, Slack is now a reasonably-sized public company, and they (should) have a stable product that will not undergo rapid changes.
Now would be an excellent time for Slack to rewrite their client to be a native app on each of the major platforms. They have the resources to create a snappy, performant app that more customers will enjoy using.
Problems can include:
- lack of parallelism, designed with single core CPUs in mind
- lack of security, software was designed for single user, offline operation turned multi-user and online
- "wrong" optimizations, for example relying on a lot of precomputation when people now want to change everything on the fly and modern computers allow it if the software is properly designed
- relying on outdated tech, like Flash
None of the previous points involve bad design, but things change, including user expectations. For now, security is a big one. It wasn't a big deal back then, for example, in a game console, a buffer overflow in a game would just cause a crash and piss off the player in some extremely rare case. Now because your console is online and so is your bank, the same relatively harmless bug on the same game can be used to siphon your bank account.
Bad reasons: "The code is messy", "It's written in language foo while all the cool kids are using language bar these days", "It's slow".
Good reasons: "The architecture is too tightly coupled now and it won't scale without untangling major pieces anyway", "The lack of our forward velocity is directly related to problem XYZ in the code, and here is how a new architecture would fix that."
In other words, I feel like I can sniff out bad rewrites now, which are generally lack a sense of focus and a true, enumerated list of problems with the current code base. Good rewrites have clearly delineated benefits that the rewrite will bring, stuff that brings measurable value, and show that the rewrite will bring things better and more cheaply than what's possible with the current codebase.
I think there are definitely counter-examples. Can you imagine using an OS in 2020 based on incremental improvements in Mac OS System 9, or Windows ME. Or browsing using a browser based on incremental improvements in Netscape 4.7?
The GEOM subsystem is FreeBSD is an example though that sometimes the world changes in a way that you can't accommodate and you have to go do a big bang reset.
Because of that, FreeBSD has a much nicer set of methods for dealing with disks and hotplugging and resizing ... while Linux is still all very ad hoc.
Some of the problems came only to light, because the original software was written and showed weaknesses. Of course afterwards one always knows more.
Then there is the reason of software often not perfectly matching your use-case and you could have better, specialized for your use-case software. You might also have other ideas about how extensible and modifiable your software should be. There really is a lot of software out there, that barely works for its use-case. If you are asked to extend that, good luck with that, without introducing new bugs, due to inflexible design.
So there are very valid reason for rewriting and often you do know better, how to write the software for your own use-case.
There's an inherent complexity in the problem being solved.
And there is an "accidental complexity" in the implementation of the solution.
Throwing away everything, people typically believe that they can avoid handling a lot of the "inherent complexity." But typically there is a good reason why the inherent complexity was addressed in the previous version of the program, and there's a big chance that the new "from the scratch" designers will have to relearn and rediscover all that, instead of transforming the already existing knowledge that is encoded in the previous version.
For anybody interested in the topic, I recommend the number of case studies presented in:
https://www.amazon.com/Search-Stupidity-Twenty-Marketing-Dis...
"In Search of Stupidity: Over Twenty Years of High Tech Marketing Disasters"
See about new rewrite of Wordstar simply not having the printer drivers that the previous had, and also other features people already expected, leading to Wordstar's demise.
Or what Zawinski's names "Cascade of Attention-Deficit Teenagers" (search the internet for that, the link here wouldn't work!)
"I'm so totally impressed at this Way New Development Paradigm. Let's call it the "Cascade of Attention-Deficit Teenagers" model, or "CADT" for short."
"It hardly seems worth even having a bug system if the frequency of from-scratch rewrites always outstrips the pace of bug fixing. Why not be honest and resign yourself to the fact that version 0.8 is followed by version 0.8, which is then followed by version 0.8?"
Or an interview with Jamie Zawinski from Siebel's "Coders at Work."
https://www.amazon.com/Coders-Work-Reflections-Craft-Program...
... "even phrasing it that way makes it sounds like there’s someone who’s actually in charge making that decision, which isn’t true at all. All of this stuff just sort of happens. And one of the things that happens is everything get rewritten all the time and nothing’s ever finished. If you’re one of those developers, that’s fine because there’s always something to play around with if your hobby is messing around with your computer rather than it being a means to an end — being a tool you use to get whatever you’re actually interested in done."
If one is able to cover all the complexity, and it is not destructive to the goal, the rewrite is OK. Otherwise, one should be critical to the ideas of rewrites as they could be potentially secretly motivated by simple (jwz again): "rewriting everything from scratch is fun (because "this time it will be done right", ha ha)"
But if the architecture is wrong, a piecewise rewrite to a new architecture is very tough.
I think rewriting to a new architecture piecemeal is probably easier than in a big bang - assuming the thing is at all complex and you can't actually understand all of it at once. The difference is more one of perception. It's easier to see the costs of the Frankenstein architecture than the rewrite, so we overestimate the costs of the former and underestimate the latter.
The real case for a rewrite is if you honestly think all that old stuff really has mostly just sentimental value; i.e. that it's OK to break all kinds of workflows because there are enough alternatives. If you can sell actual users on relearning all their habits, you can get away with a lot.
I keep seeing the same optimistic failure mode over and over again at each employer. It's just too tempting to rewrite, say, the SDK package deployment system rather than trying to understand and fix up the messy old Python that some past engineers left behind. And it'll only take three or four months! A year later the messy old Python with unfixed buts is still the only option because the rewrite isn't ready, and the rewrite consumed all the resources that would've been used to fix bugs.
Not to brag, but I recently spent a week re-writing some software I developed over 5 years (thanks covid), and it runs 100x faster. No joke. Better code, better SQL, better indexing, less list manipulation. 100x faster.
Of course, T.A.N.S.T.A.A.F.L, so caveat emptor. Needless to say, picking the right dependency is a big deal.
All I did was, I dumped Microsoft's COM/DCOM and replaced it with REST; also replaced C++ and VB with C#. Totally made sense to rewrite from scratch and it was a big win with absolutely no downsides. Like, no downsides whatsoever because otheriwse the same month I would have spent fixing maybe 4 or 5 multithreading bugs in the old codebase.
Anyway, never say "never do X" because you are most likely wrong.
For projects where there are several devs going to be working on the re-write for a long time, it almost certainly is a sign that the 'legacy' code has way more knowledge embedded in it than it might seem. Having been on the death march of a re-write before I think the signs are usually obvious when you're in one scenario or the other.
Cannot agree more, but when knowledge embedded only in the code and not in comments/commits and documentation it is a problem even if you don't plan for re-write.
It is very disappointing to hear from otherwise good programmers statements like: "good code doesn't need comments (and my code is good)". As a result we have code where some line can captures hours of research and/or discussions, but there are no comments and commit logs are too brief. After a year even an author usually cannot say why it was done this way.
The implication is that is can be non-obvious if something is potentially that sort of project, as it originally was a large undertaking. I mean, you can assume it's a fake story, but they said it wasn't originally a one person, one month thing.
I don't know that it implies the person who rewrites something is a super-genius, it's just that sometimes a new perspective or tools allow a really amazing improvement.
Of course, this doesn't apply in all cases, but "never rewrite" is a limiting strategy.
Sometimes the language is so old it's cripplingly expensive to hire programmers in. Sometimes the code is deeply tied to a hardware architecture that is extinct. Sometimes the legacy code actually is that bad.
For reference, I've done total rewrites on two different small-ish software projects - in both cases because the original author had made design choices that made the whole thing unsustainable in the long run (no blame implied, it's more about shifting goals).
For reference, Twitter has done a total rewrite of their large codebase, and lived to tell the tale.
But I think that dogmatically saying "You should never rewrite code from scratch" is just useful enough to catch the vast majority of instances when you should not rewrite code from scratch.
I'm willing to be called out for being totally wrong for my dogma in the rare instance that the application should be rewritten from scratch.
Every project I have ever joined on, there has been someone who wants to start fresh. There have been a lot of projects. There was one such instance where it was a good idea.
With the benefit of twenty years of hindsight, I think the programmers who performed the rewrite in question did a better job than the original Netscape implementation.
Come to think of it, I'm typing this message on the direct descendent of that rewrite, while the competitor to whom they gave « a gift of two or three years » has been forced to abandon their codebase in favour of a third-party one.
That said, I'm in favor of a rewrite when the use cases for the platform have diverged so significantly that you essentially have to bend it in half to make it do the thing that users want it to today. That is, as some point the core use cases have changed, perhaps via a pivot, and you need to build a product that is essentially new as compared to the old.
Regardless, the cases where I've seen a rewrite take place are when the engineers are young, talented, and enthusiastic about new tech and the engineering manager doesn't want to piss them off so they let the devs play (most of them will be off to other jobs in 2-3 years leaving behind a system that is debatable as ugly and busted as the old).
That's why Eric Evans came up with the Autonomous Bubble pattern. You rebuild features one at a time and connect them to the old code through anti-corruption layers and translation layers. Over time, the older code will be set aside. In a year or two, the plug will have naturally been pulled on all of the older code.
So there's some truth to Spolsky's 2000 blog post. But it's not smart to maintain older code forever either. You have to be careful, thorough, and use the right tools.
Deau ex Agile.. knock out one "thing" at a time and let's keep loading the backlog for the next features/etc to be upgraded/rewritten. Tbh in banking I see very few rewrites, most banks (that make their own applications) don't fix it unless it breaks down. They only bother in massive upgrades where modules are rewritten, or an application is replaced one module at a time.
In his book "Code Simplicity", Max has a checklist (summarized in point 4 of https://techbeacon.com/app-dev-testing/rewrites-vs-refactori...) -- and that's the checklist referred to in the InfoQ interview.
To me, this was nowhere more obvious than from my old co-worker who was constantly trying to rewrite the internal site we worked on. First it was angular and coffeescript, but that was apparently unmaintainable after 3 months (!!!). So they rewrote it and then I joined, and it apparently made it to 6 months. Then we tried to rewrite it in TypeScript and React which blew 10 man weeks away before we abandoned it. And then he tried to do TypeScript and Angular again but it never launched.
Meanwhile, after the React debacle, I learned my lesson and focused on rewriting and rearchitecting the old code to follow more modern Angular practices (1.5 components and such), and suddenly the code didn't seem so unworkable anymore! Quality improved and features were added that made it a very popular internal site, still in use 3 years after I left that team.
So I agree, I think you should always try to improve what you have instead of starting over. (If only our UX designers would go for incremental improvements too...)
Having said that, angularjs and coffescript is a terrible software stack in a dead ecosystem, the best course of action is probably incremental rewrite - React is really good at that because it starts out as a small rendering library and you can incrementally replace stuff like routing etc.
Considering they're two entirely different frameworks, Google did a good job creating a usable compatibility layer. This allows a relatively painless "ship of Theseus"-style transition where you have a combination of Angular and AngularJS components interacting with each other, until eventually it's just Angular components and you can drop AngularJS from the build.
Several comments have already expressed that they've rewritten projects successfully. I've also evolved several projects successfully too. One of the benefits of the evolution approach is that when old bugs resurface, they're less likely to show up all at once since you only changed part of the application. Also it should be easy to compare the code with the bug to the previous code without the bug because it mostly similar.
There are times when an application is beyond repair and a rewrite is necessary, but I see those times as the exception rather than the norm.
2013 https://news.ycombinator.com/item?id=6327021
2012 https://news.ycombinator.com/item?id=3624830
2012 (a bit) https://news.ycombinator.com/item?id=3449953
While I'm a "never say never" kind of guy, I do think many well-intentioned engineers make a leap to the 100% solution too easily. There are lots of other tools in the toolbox that can de-risk the process, like becoming more service oriented, implementing facade patterns, etc., which all in some way or another work towards a "rewrite" usually without ever rewriting everything.
Sometimes you get the benefits of the 100% solution with 25% of the work, and without most of the risk. "Yeah, that part of the codebase is old and crufty, but we never have to touch it because it Just Works and there's no point in rewriting it."
It a cognitive bias and we can try to compensate it. Though like with any biases - usually you can see it in others, but cannot correct own behavior.
I'm not implying that's always the case. Besides, in some cases, past experience is deliberately not taken into account -- I'm aware of a handful of systems written in procedural languages which performed not that great and were replaced by naive object-oriented implementations. Went really bad.
Having a smaller and under pressure team of newcomers w/o any knowledge of the underlying concepts tackling a rewrite while trying to understand and fix a clunky existing code is an ordeal.
Nobody has any appetite to fix the semi-abandoned framework, so if we want to process additional data from bigger workloads, we really have no choice but to rewrite our system.
Joel's right - there's a lot of pitfalls. We tried this once and it was a failure. Our 2nd attempt is going much better, though.
The problems with all these rewrites went well beyond just changing the codebases.
https://medium.com/@herbcaudill/lessons-from-6-software-rewr...
And this DHH talk (transcript) https://businessofsoftware.org/2015/10/david-heinemeier-hans...
Don’t re-write from scratch, refactor instead.