I'm not sure, but reading other articles[0] on the blog seems like they've been jumping on bandwagons before, so it's probably good to come back on those decisions every now and again.
Edit: Not trying to come off as too snarky, though I've found that this type of thing is pretty common in startups where everyone from the CTO on down has some experience but not a lot of experience. I've fallen into that trap too, at some point saying "Sure, Scala will work great! It's future-proof and everyone will love it!" cue crickets
[0] https://tech.channable.com/posts/2017-02-24-how-we-secretly-...
Yes definitely. This thing was a big exercise in "You aren't going to need it".
> so in the end it's mostly...just an app talking to Postgres?
Yes. We solve problems for our customers. It's nice if we can do that without also creating problems for ourselves :)
> seems like they've been jumping on bandwagons before
Just wanted to point out that this system is also written in Haskell. We didn't really switch bandwagons.
Fair enough! In the end it's great that you've been able to take a step back and look at your ultimate goal (i.e. the customer problem solving). That doesn't always happen enough, and it's a good sign for your IT teams :) Also the fact that you were able to get the time to actually do this.
So there's all of that stuff, and then you think, "Oh, and it gives us an easy scaling path if we ever find our data volumes growing at an unexpectedly rapid clip." And at first you think it's just a cherry on top.
It may seem too good to be true at first, but with everyone using it, and with most the alternatives looking like they'll involve at least as much dev effort during the initial analysis, it's pretty easy to miss that.
And it's only a while later, when you're already pretty heavily invested in the ecosystem, that you start to really understand some things that the bloggers and book authors don't talk about in public. Like how Spark makes Big Data and scale-out a self-fulfilling prophecy. You will need to scale out, even if your data should fit in memory, because Spark wastes memory like a 22-year-old football player wastes money.
And maybe you're an appropriately cynical jerk, and can therefore spot that it couldn't possibly live up to the hype from a mile away. Congratulations. Hopefully you're the technical lead. Even if you are, though, too bad, because, nobody else in the meeting room is as jaded about tech as you are. Certainly not the management folks who don't program or have been out of the game for years. And, since you are cynical and jaded, that means there's still a good chance you'll go with Spark anyway. Because that's a hard argument to win - unless you've already been burned by Spark in the past, you aren't going to have enough intimate knowledge to make an argument that's concrete enough to sound convincing. And because this isn't the hill you want to die on. And because, being appropriately cynical, you realize that, at the end of the day, it's not really your problem. Spark will get the job done. It'll be inefficient, sure, but the extra server and development costs aren't coming out of your pocket, they're coming out of the pocket of the person who wants to be using Spark.
The company I work is currently undergoing a similar rewrite from firebase to Postgres (although we're doing a gradual migration), and it's amazing how much code we have been able to throw away, and how much quicker we can move in the parts of the code that have been migrated.
I had a similar thought. Postgres is a great choice, but then they also went with Haskell ... I look forward to another blog post in 2-3 years detailed all the ways that Haskell failed them and that at the day they should have just gone with an industry standard language.
FWIW they have this post from almost 3 years ago about adding Haskell to their stack. I'm guessing the time for your prediction has come and gone.
On the other hand, I find it quite encouraging that Haskell was barely mentioned. It seems they viewed the risk as "changing the projects language" rather than "using a non industry standard language".
Scala is still the least-bad option for a JVM language.
Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.
It’s funny that you gloss over one the big productivity issues with Scala. Somehow everyone who joins your team should be acutely aware of your flavor of Scala.
If the community decided if wanted to be a better java or a worse Haskell then I bet more people would be productive in Scala
I'm sure if you are very disciplined when writing Scala then this will not happen, etc etc. But the fact that people need to decide upfront which parts of Scala to use and which ones to avoid seem like red flags to me (and in fact were red flags, in my experience).
Source, please?
> Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.
Why is that?
1. "We are going to start over from scratch and rewrite the whole thing!"
Joel Spolksy famously said to "Never do this!"
2. "We are going to slowly refactor the whole codebase."
This can, eventually, lead you to a place where none of the original code is there so it's like a rewrite but much simpler.
3. "We are going to slowly add new places to replace the old system till there is no new system."
This is called the "Strangler App" model as described by Martin Fowler (https://martinfowler.com/bliki/StranglerFigApplication.html)
Granted, for some reason, it seems that "I retired a legacy system and rolled out a brand new one" seems to look better on a resume than "I refactored a legacy system into a better system." so your YMMV.
Maybe that's because most jobs I apply for tend to have at least one 'legacy' system laying around that needs some help. But this is somewhat common outside of the startup scene.
Legacy in this case means an app that's maybe 7-10 years old, often written like everything happened on the back of a napkin. Sprawling, inconsistent, monolithic stuff. But not true legacy where it's written in Fortran or something in the 1500s.
At any rate, I think it's worth having refactors on your resume. I know I'd be interested in people who have done it - it's always such an educational experience.
I worked for a big bank for 5 years supporting large trading systems. After the 3rd attempt to retire a legacy trading system by rolling out a new system, someone pointed out that if you are a mid level manager you don't get any resume points for refactoring old systems.
Hence, you keep trying to roll out new systems even if those systems have bugs like "switch all buy orders to sells" (true story).
This just sounds like the sort of incremental Ship of Theseus [0] development that many of us are doing. The product I'm working on has had enough key internal portions rewritten over a long enough time (including interestingly the job management system) that you could say it's a rewrite compared to the product from 2 years ago.
Likely buggy, but buggy in different ways. The real test will be in measuring the number and quality and severity of bugs, as well as the ability to identify and repair them in both systems. A testable system with good test code written may still be 'buggy', but will help foster a sense of confidence when making the fixes required.
Yeah, at my last place before I left, they were looking to do a rewrite of their main project that had been the result of years of hundreds of developers working on it. The time-scale was wooly, but was expected to be between 5 and 10 years and probably require ~200 developers and testers.
This was rewriting millions of lines of code, very little would be reusable.
I don't know if they went ahead with it or not, but it was looking like it would descend into chaos.
> Examples abound: (...) using a distributed database when Postgres would do
This is the only part of the article that bugged me a little, because in my experience the choice between single-machine and distributed databases is not so much about scale as it is about availability and avoiding a single point of failure.
Even if your database server is fairly stable (a VM in a robust cloud for instance), if you use Postgres or MySQL and you need to upgrade to a newer version of the database (let say for an urgent security update), you have no choice but to completely stop the service for a few seconds / minutes (assuming the service cannot work without its database).
Depending on the service and its users, this mandatory down-time might or might not be acceptable.
Anecdotally I suspect services requiring high SLAs are more common than ones requiring petabyte scale storage.
In this particular case, I'd take the single point of failure over the previous situation.
That being said: we have successfully used PostgreSQL's fail-overs multiple times. In my experience, they work quite alright.
[1]: https://tech.channable.com/posts/2018-04-10-debugging-a-long...
At $previous_job we had a "one service" = "one MySQL instance" policy. Every time a MySQL server would go down all clients would all lose access to that service at the same time. It was stressful and much less robust than your setup.
However this is just a personal opinion that I might revisit at some point.
[1] https://github.blog/2018-10-30-oct21-post-incident-analysis/
Warnings about the risks of premature optimization should not stop people thinking about issues that are likely to arise in future, and what they might do about them. On the other hand, this does not mean that you should necessarily implement that solution now.
Also, Spark has some single point of failures that are not obvious at first.
https://gist.github.com/aseigneurin/3af6b228490a8deab519c6ae...
There's not a single approach that works for every case, but they all involve a replica and upgrading one at a time.
True. But if it is acceptable, then it may well be a good trade-off to make.
I've had situations where a client wanted a 99.95% availability SLA for our SaaS service instead of 99.5% as we were providing at the time.
He thought it was a minor change on our side, but it required completely rethinking our architecture from the ground up.
Years ago I was building SEO software. One of the products was originally written as an internal tool, and handled our work load without skipping a beat. Then we decided to release it to the public, so I did a small refactor to implement accounts. We launched with it on hosted on a small Dell PC under my desk (where it had been running as our internal tool). Within 2 hours of launch, it was completely overwhelmed and shutting down due to overheating.
It was "rewrite" time.
While doing that, I had to come up with SOME work around. So I opened the case and stuck a box fan on it to try and exhaust some of the heat. That lasted about 8 more hours. Before the server shut down, and I got a call from the boss.
I went in to the office in the middle of the night, and started profiling the application. I found a VPS host, quickly spun up their largest Windows VM they offered, and that helped for a few days while I rewrote large swaths of the application. Even after a ~80% rewrite and splitting the application in two, we had more users that we'd ever anticipated and I was out of my depths with scaling. So we got a few (much larger) physical servers at Softlayer.
This was the setup that this website ran under for the next couple of years with minor tweaks, more space with an iSCSI array, more RAM, migrating to a more CPU, etc, but all staying at Softlayer. Eventually when the hosting bill was getting into the high four figures a month, we reevaluated and decided a rewrite was in order to switch it to Microsoft Azure utilizing Azure SQL, Azure Table Storage, Azure Queue Service, and offloading all of the complicated tasks from the web server onto the Azure infrastructure. For all I know it is still on Azure.
I can only think of a few places where a full rewrite is justified:
* You lost the source code
* The application is almost purely integration with some 3'rd party platform or component, and you need to replace that platform. (E.g. you are developing a registry cleaner and need to port it to Mac)
* You don't have any customers or users yet and time-to-market is not a concern.
* You are not a business and are writing the code purely for your own enjoyment
But these are business level considerations. For individual developers there may be compelling reasons to push for a rewrite:
* You find it more fun to work on green-field projects than to perform maintenance development.
* The new platform is more exciting or looks better on the CV than the old
Anyway, it makes sense that they backed off using Spark and HDFS - makes sense given the size of their datasets.
The original poster mentioned that their data analytics software is written in Haskell. I would like to see a write up on that.
EDIT: I see that they do have two articles on their blog on their use of Haskell.
Don't wait for a big rewrite. Constantly keep deleting and rewriting. Just make sure you are solving real problems while doing it.
So this seems to be the massive takeaway - if you need to operate on a whole dataset that is larger than one node's memory capacity then you have to go distributed. Else it still seems an overhead barely worth the effort.
So Google: dataset is all web pages on the internet - yes that's too large go distributed.
Tesco / Walmart : dataset might be all the sales for a year. Probably too large. But could you do with sales per week? per day?
having the raw data of all your transactions etc lying around waiting for your spiffo business query sounds good but ... is it?
I would be interested in hearing folks' cut-off points for going full Big Data vs "we don't really need this"
So we immediately set out redesigning our system to use other, fully open source technologies. Also gave us an opportunity to reconsider architecture decisions that had not scaled well. In our case, moving from a monolith to microservices has had major benefits. Maybe the biggest being able to quickly see which microservice is the bottleneck and needs to be scaled up to handle the load. With the monolith, if it got slow, it was very difficult to figure out which part of the workload was making it slow.
The biggest problems are trying to run/develop this code on machines that were made in the last ten years, the other is that it's a horrible, horrible codebase. Code practices from the early 90's.
To make things worse, objects are 'evil', all HTML/SQL/XML is built by appending strings, there's no data sanity checks.....
I started a proof of concept replacement system that was written in Python and ran on the web.
It was met with "We can go with a web based system, since if anything changes in the browser we'll be up shit creek."
:-/
it is also natively integrated with K8s - https://github.com/dask/dask-kubernetes - and Yarn https://yarn.dask.org/en/latest/
In theory, there is an easy answer to this question: If the cost of the rewrite, in terms of money, time, and opportunity cost, is less than the cost of fixing the issues with the old system, then one should go for the rewrite.
In our case, the answer to all of these questions was yes.
One of our original mistakes (back in 2014) had been that we had tried to “future-proof” our system by trying to predict our future requirements. One of our main reasons for choosing Apache Spark had been its ability to handle very large datasets (larger than what you can fit into memory on a single node) and its ability to distribute computations over a whole cluster of machines4. At the time, we did not have any datasets that were this large. In fact, 5 years later, we still do not. Our datasets have grown by a lot for sure, both in size and quantity, but we can still easily fit each individual dataset into memory on a single node, and this is unlikely to change any time soon5. We cannot fit all of our datasets in memory on one node, but that is also not necessary, since we can trivially shard datasets of different projects over different servers, because they are all independent of one another.
With hindsight, it seems obvious that divining future requirements is a fool’s errand. Prematurely designing systems “for scale” is just another instance of premature optimization, which many development teams seem to run into at one point or another