We decided to go for the big rewrite (opens in new tab)

(tech.channable.com)

167 pointsduijf6y ago70 comments

70 comments

Reading this article it seems like yet another example of "you don't have big data". Most of the features that are unique to Spark (or Spark-like setups) were not needed, so in the end it's mostly...just an app talking to Postgres?

I'm not sure, but reading other articles[0] on the blog seems like they've been jumping on bandwagons before, so it's probably good to come back on those decisions every now and again.

Edit: Not trying to come off as too snarky, though I've found that this type of thing is pretty common in startups where everyone from the CTO on down has some experience but not a lot of experience. I've fallen into that trap too, at some point saying "Sure, Scala will work great! It's future-proof and everyone will love it!" cue crickets

[0] https://tech.channable.com/posts/2017-02-24-how-we-secretly-...

duijfOP6y ago

> Reading this article it seems like yet another example of "you don't have big data".

Yes definitely. This thing was a big exercise in "You aren't going to need it".

> so in the end it's mostly...just an app talking to Postgres?

Yes. We solve problems for our customers. It's nice if we can do that without also creating problems for ourselves :)

> seems like they've been jumping on bandwagons before

Just wanted to point out that this system is also written in Haskell. We didn't really switch bandwagons.

royjacobs6y ago

>Just wanted to point out that this system is also written in Haskell. We didn't really switch bandwagons.

Fair enough! In the end it's great that you've been able to take a step back and look at your ultimate goal (i.e. the customer problem solving). That doesn't always happen enough, and it's a good sign for your IT teams :) Also the fact that you were able to get the time to actually do this.

mumblemumble6y ago

There's that, but Spark is also a little bit tricky because it has such a big feature set that it's not just attractive as a big data tool. On paper, it can be attractive as a tool for easy single-node data parallelism, for easy streaming data processing, for easy machine learning on the Java platform, stuff like that. I'm looking at migrating off of Spark, too, and finding that Spark is still the only way to get a decently ergonomic (for developers) data table library on a Java language.

So there's all of that stuff, and then you think, "Oh, and it gives us an easy scaling path if we ever find our data volumes growing at an unexpectedly rapid clip." And at first you think it's just a cherry on top.

It may seem too good to be true at first, but with everyone using it, and with most the alternatives looking like they'll involve at least as much dev effort during the initial analysis, it's pretty easy to miss that.

And it's only a while later, when you're already pretty heavily invested in the ecosystem, that you start to really understand some things that the bloggers and book authors don't talk about in public. Like how Spark makes Big Data and scale-out a self-fulfilling prophecy. You will need to scale out, even if your data should fit in memory, because Spark wastes memory like a 22-year-old football player wastes money.

And maybe you're an appropriately cynical jerk, and can therefore spot that it couldn't possibly live up to the hype from a mile away. Congratulations. Hopefully you're the technical lead. Even if you are, though, too bad, because, nobody else in the meeting room is as jaded about tech as you are. Certainly not the management folks who don't program or have been out of the game for years. And, since you are cynical and jaded, that means there's still a good chance you'll go with Spark anyway. Because that's a hard argument to win - unless you've already been burned by Spark in the past, you aren't going to have enough intimate knowledge to make an argument that's concrete enough to sound convincing. And because this isn't the hill you want to die on. And because, being appropriately cynical, you realize that, at the end of the day, it's not really your problem. Spark will get the job done. It'll be inefficient, sure, but the extra server and development costs aren't coming out of your pocket, they're coming out of the pocket of the person who wants to be using Spark.

pbourke6y ago

Also, if you collaborate with data scientists and don't have the resources to use fully distinct systems for the production and research/reporting functions then Spark is very nearly the only game in town.

mason556y ago

Curious what you recommend instead of Spark.

4 more replies

nicoburns6y ago

I think it is, but it's a worthwhile point to reiterate. It's also worth pointing out that it can be worth doing a rewrite to a less scalable architecture if it simplifies the architecture and provides more flexibility.

The company I work is currently undergoing a similar rewrite from firebase to Postgres (although we're doing a gradual migration), and it's amazing how much code we have been able to throw away, and how much quicker we can move in the parts of the code that have been migrated.

fulafel6y ago

Trying Haskell if you happen to land a Haskell expert on your team sounds entirely reasonable. It's an old language by now and even used in the conservative FAANG companies. And if you're stuck on the JVM, so does checking out the competition to Java. Lots of people are happy with Clojure or Scala and don't look back.

macspoofing6y ago

>Not trying to come off as too snarky

I had a similar thought. Postgres is a great choice, but then they also went with Haskell ... I look forward to another blog post in 2-3 years detailed all the ways that Haskell failed them and that at the day they should have just gone with an industry standard language.

samprotas6y ago

https://tech.channable.com/posts/2017-02-24-how-we-secretly-...

FWIW they have this post from almost 3 years ago about adding Haskell to their stack. I'm guessing the time for your prediction has come and gone.

On the other hand, I find it quite encouraging that Haskell was barely mentioned. It seems they viewed the risk as "changing the projects language" rather than "using a non industry standard language".

kod6y ago

> "Sure, Scala will work great! It's future-proof and everyone will love it!"

Scala is still the least-bad option for a JVM language.

Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.

nemothekid6y ago

>Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.

It’s funny that you gloss over one the big productivity issues with Scala. Somehow everyone who joins your team should be acutely aware of your flavor of Scala.

If the community decided if wanted to be a better java or a worse Haskell then I bet more people would be productive in Scala

2 more replies

royjacobs6y ago

I would argue that in a sufficiently large organization you're better off using plain Java or perhaps Kotlin. The latter hits the sweet spot between 'expressive' and 'unreadable' whereas Scala can miss the mark.

I'm sure if you are very disciplined when writing Scala then this will not happen, etc etc. But the fact that people need to decide upfront which parts of Scala to use and which ones to avoid seem like red flags to me (and in fact were red flags, in my experience).

apta6y ago

With Java getting value types, record types, and pattern matching it will end up giving all the other JVM languages a run for their money. Today, Kotlin is much more approachable than Scala, not to mention better tooling. Once Java catches up though, it will be a different story.

2 more replies

julienfr1126y ago

Yeah the team is productive. But only in the 20% left for real work between the interminable arguing over monads or circe or play or sbt or whatever.

1 more reply

s_Hogg6y ago

> Scala is still the least-bad option for a JVM language.

Source, please?

> Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.

Why is that?

alexpotato6y ago

Rewriting an application can mean different things:

1. "We are going to start over from scratch and rewrite the whole thing!"

Joel Spolksy famously said to "Never do this!"

2. "We are going to slowly refactor the whole codebase."

This can, eventually, lead you to a place where none of the original code is there so it's like a rewrite but much simpler.

3. "We are going to slowly add new places to replace the old system till there is no new system."

This is called the "Strangler App" model as described by Martin Fowler (https://martinfowler.com/bliki/StranglerFigApplication.html)

Granted, for some reason, it seems that "I retired a legacy system and rolled out a brand new one" seems to look better on a resume than "I refactored a legacy system into a better system." so your YMMV.

steve_adams_866y ago

I'm not sure about the resume part. My experience refactoring legacy stuff into more modern stuff is actually a big talking point for my resume in my experience. People are always interested in why I did it, how I did it, why didn't I just build anew, etc.

Maybe that's because most jobs I apply for tend to have at least one 'legacy' system laying around that needs some help. But this is somewhat common outside of the startup scene.

Legacy in this case means an app that's maybe 7-10 years old, often written like everything happened on the back of a napkin. Sprawling, inconsistent, monolithic stuff. But not true legacy where it's written in Fortran or something in the 1500s.

At any rate, I think it's worth having refactors on your resume. I know I'd be interested in people who have done it - it's always such an educational experience.

alexpotato6y ago

Should have been more specific as to which industry I was referring to regarding the resume comment.

I worked for a big bank for 5 years supporting large trading systems. After the 3rd attempt to retire a legacy trading system by rolling out a new system, someone pointed out that if you are a mid level manager you don't get any resume points for refactoring old systems.

Hence, you keep trying to roll out new systems even if those systems have bugs like "switch all buy orders to sells" (true story).

collyw6y ago

I guess if you are replacing a system part by part as opposed to big bang rewrite, you will be more likely do make sure interfaces are well defined and components are more modular.

gtsteve6y ago

This doesn't really sound like a big-bang rewrite as such but an incremental development process. The sort of rewrite that would be a major strategic mistake is where you start with a new repository and begin re-implementing the entire product, but this seems all perfectly sensible to me.

This just sounds like the sort of incremental Ship of Theseus [0] development that many of us are doing. The product I'm working on has had enough key internal portions rewritten over a long enough time (including interestingly the job management system) that you could say it's a rewrite compared to the product from 2 years ago.

[0] https://en.wikipedia.org/wiki/Ship_of_Theseus

barking6y ago

I'd like to see some idea of what they mean by big. An order of magnitude in terms of lines of code and the number of engineers they had for the job and how long it all took versus how long the original took to write etc. One of the things that scares me about something like this are all the seemingly illogical bits of code that were added over time and are actually that way for a reason because they fix some edge case or whatever. Hard to see those not getting swept away in a rewrite as well as a whole bunch of new rarely occurring issues added. A complete rewrite strikes me as something that's almost guaranteed to be buggier than what it replaces, at least for awhile.

mgkimsal6y ago

> A complete rewrite strikes me as something that's almost guaranteed to be buggy

Likely buggy, but buggy in different ways. The real test will be in measuring the number and quality and severity of bugs, as well as the ability to identify and repair them in both systems. A testable system with good test code written may still be 'buggy', but will help foster a sense of confidence when making the fixes required.

LandR6y ago

>I'd like to see some idea of what they mean by big. An order of magnitude in terms of lines of code and the number of engineers they had for the job and how long it all took versus how long the original took to write etc.

Yeah, at my last place before I left, they were looking to do a rewrite of their main project that had been the result of years of hundreds of developers working on it. The time-scale was wooly, but was expected to be between 5 and 10 years and probably require ~200 developers and testers.

This was rewriting millions of lines of code, very little would be reusable.

I don't know if they went ahead with it or not, but it was looking like it would descend into chaos.

1 more reply

baud1472586y ago

The company where I work is doing (and is close to finishing) a big rewrite of one the product. The base version is deployed on-prem at our clients, the new one will be cloud service, with the same features. But the on-prem version is still worked on and supported (I'm working on it) alongside the cloud one, the cloud one is to offer a less cumbersome alternative.

barking6y ago

Is the on-prem one Windows based? Also by less cumbersome do you mean the cloud one has fewer features or is just easier to get up and running with it because it's a browser based. Are the company expecting your on-prem one to fade away over time etc? Could you give any information of what each was written in and an order of magnitude, hundres of thousands of loc etc. Sorry for all the questions but I'm nosey! No problem if you can't.

1 more reply

Darkstryder6y ago

> Prematurely designing systems “for scale” is just another instance of premature optimization

> Examples abound: (...) using a distributed database when Postgres would do

This is the only part of the article that bugged me a little, because in my experience the choice between single-machine and distributed databases is not so much about scale as it is about availability and avoiding a single point of failure.

Even if your database server is fairly stable (a VM in a robust cloud for instance), if you use Postgres or MySQL and you need to upgrade to a newer version of the database (let say for an urgent security update), you have no choice but to completely stop the service for a few seconds / minutes (assuming the service cannot work without its database).

Depending on the service and its users, this mandatory down-time might or might not be acceptable.

Anecdotally I suspect services requiring high SLAs are more common than ones requiring petabyte scale storage.

duijfOP6y ago

Re availability: We had a hard time keeping the system based on Spark available. There were days when the cluster would freak out multiple times in a single day. The 'fix' would be: restart a bunch of spark workers. We spent a lot of time debugging/finding this out (some parts documented in [1]) but couldn't work out what the problem was. (EDIT: Assuming there even was a single problem.)

In this particular case, I'd take the single point of failure over the previous situation.

That being said: we have successfully used PostgreSQL's fail-overs multiple times. In my experience, they work quite alright.

[1]: https://tech.channable.com/posts/2018-04-10-debugging-a-long...

Darkstryder6y ago

Yeah, I agree. It was more of a general comment, because you seem to have one Postgres instance for every client, which is already a big step against SPOF.

At $previous_job we had a "one service" = "one MySQL instance" policy. Every time a MySQL server would go down all clients would all lose access to that service at the same time. It was stressful and much less robust than your setup.

devonkim6y ago

High availability Postgres setups are a minimum for a production system and a staging system to understand how your system behaves during a failure event. These failure scenarios should be tested not necessarily on every commit but often enough that there’s confidence during a failover you’re not going to drop queries on the floor and pretend it’s all good as well as your monitoring systems report on the event for the sake of event reporting.

Darkstryder6y ago

Yeah. I guess after having been bitten myself a few times with failed MySQL failovers and especially after having read the GitHub October 2018 incident postmortem [1], I stopped considering failover solutions as a reliable availability solution altogether.

However this is just a personal opinion that I might revisit at some point.

[1] https://github.blog/2018-10-30-oct21-post-incident-analysis/

1 more reply

mannykannot6y ago

I had the same reaction... this statement seems to be an over-generalization, but it can be resolved by being careful about what 'premature' means. In this case, "we can trivially shard datasets of different projects over different servers, because they are all independent of one another", so it seems the scaling issue had a solution from the outset.

Warnings about the risks of premature optimization should not stop people thinking about issues that are likely to arise in future, and what they might do about them. On the other hand, this does not mean that you should necessarily implement that solution now.

StreamBright6y ago

You can use a replication setup for Postgres which gives you a pretty good availability.

Also, Spark has some single point of failures that are not obvious at first.

https://gist.github.com/aseigneurin/3af6b228490a8deab519c6ae...

smilliken6y ago

Downtime is not mandatory while upgrading to a new PostgreSQL version. I've upgraded from 9.0 to 11 and most versions in between without downtime on large busy databases.

There's not a single approach that works for every case, but they all involve a replica and upgrading one at a time.

wvenable6y ago

You don't need a distributed database to have replication and a hot backup. Stackoverflow runs this kind of configuration -- if it works for them, it can work for you.

nicoburns6y ago

> Depending on the service and its users, this mandatory down-time might or might not be acceptable.

True. But if it is acceptable, then it may well be a good trade-off to make.

Darkstryder6y ago

I agree completely. But it is a trade-off you need to be aware of, and need to be confident it will still be true in the future.

I've had situations where a client wanted a 99.95% availability SLA for our SaaS service instead of 99.5% as we were providing at the time.

He thought it was a minor change on our side, but it required completely rethinking our architecture from the ground up.

1 more reply

jermaustin16y ago

This is a story of one of my first product launches. And inevitable rewrites that ensued. You can read more of it here: https://jeremyaboyd.micro.blog/2016/11/05/my-first-product.h...

Years ago I was building SEO software. One of the products was originally written as an internal tool, and handled our work load without skipping a beat. Then we decided to release it to the public, so I did a small refactor to implement accounts. We launched with it on hosted on a small Dell PC under my desk (where it had been running as our internal tool). Within 2 hours of launch, it was completely overwhelmed and shutting down due to overheating.

It was "rewrite" time.

While doing that, I had to come up with SOME work around. So I opened the case and stuck a box fan on it to try and exhaust some of the heat. That lasted about 8 more hours. Before the server shut down, and I got a call from the boss.

I went in to the office in the middle of the night, and started profiling the application. I found a VPS host, quickly spun up their largest Windows VM they offered, and that helped for a few days while I rewrote large swaths of the application. Even after a ~80% rewrite and splitting the application in two, we had more users that we'd ever anticipated and I was out of my depths with scaling. So we got a few (much larger) physical servers at Softlayer.

This was the setup that this website ran under for the next couple of years with minor tweaks, more space with an iSCSI array, more RAM, migrating to a more CPU, etc, but all staying at Softlayer. Eventually when the hosting bill was getting into the high four figures a month, we reevaluated and decided a rewrite was in order to switch it to Microsoft Azure utilizing Azure SQL, Azure Table Storage, Azure Queue Service, and offloading all of the complicated tasks from the web server onto the Azure infrastructure. For all I know it is still on Azure.

goto116y ago

Is the current system is so badly architected that it cannot be refactored gradually or rewritten piecemeal? Then the forces which caused these problems will also be in effect during the rewrite, so it will end in the same place when it reaches feature parity.

I can only think of a few places where a full rewrite is justified:

* You lost the source code

* The application is almost purely integration with some 3'rd party platform or component, and you need to replace that platform. (E.g. you are developing a registry cleaner and need to port it to Mac)

* You don't have any customers or users yet and time-to-market is not a concern.

* You are not a business and are writing the code purely for your own enjoyment

But these are business level considerations. For individual developers there may be compelling reasons to push for a rewrite:

* You find it more fun to work on green-field projects than to perform maintenance development.

* The new platform is more exciting or looks better on the CV than the old

ryanelfman6y ago

That's not necessarily true. Those forces will still be there but the learning of what not to do has also been had. People can improve and learn over time.

goto116y ago

Yes but if you have improved over time, then at least the most recently developed parts of the code would be high-quality and would not need rewriting. So you would only need to rewrite some encapsulated legacy parts of the codebase, which is completely different from a full rewrite.

mark_l_watson6y ago

I have also found Spark (and Hadoop before that) a little clunky to prototype and develop on, but when you need to handle very large data sets with good throughout performance then systems like Spark/Hadoop are great. One problem they had was maintaining infrastructure, and to be honest, when I used mapreduce as a contractor at Google or AWS Elastic MapReduce as a consultant I didn’t have to deal too much with infrastructure.

Anyway, it makes sense that they backed off using Spark and HDFS - makes sense given the size of their datasets.

The original poster mentioned that their data analytics software is written in Haskell. I would like to see a write up on that.

EDIT: I see that they do have two articles on their blog on their use of Haskell.

z3t46y ago

"A creature from another dimension would see us as noise as our atoms are constantly being replaced"

Don't wait for a big rewrite. Constantly keep deleting and rewriting. Just make sure you are solving real problems while doing it.

barking6y ago

This feels like a far safer path to take

lifeisstillgood6y ago

>>> One of our main reasons for choosing Apache Spark had been its ability to handle very large datasets (larger than what you can fit into memory on a single node) and its ability to distribute computations over a whole cluster of machines ... We cannot fit all of our datasets in memory on one node, but that is also not necessary, since we can trivially shard datasets of different projects over different servers, because they are all independent of one another.

So this seems to be the massive takeaway - if you need to operate on a whole dataset that is larger than one node's memory capacity then you have to go distributed. Else it still seems an overhead barely worth the effort.

So Google: dataset is all web pages on the internet - yes that's too large go distributed.

Tesco / Walmart : dataset might be all the sales for a year. Probably too large. But could you do with sales per week? per day?

having the raw data of all your transactions etc lying around waiting for your spiffo business query sounds good but ... is it?

I would be interested in hearing folks' cut-off points for going full Big Data vs "we don't really need this"

jimbokun6y ago

We decided to go for a Big Rewrite for a completely different reason. The initial license for the proprietary NoSQL database we had negotiated was about to expire, and the company was going to charge us an order of magnitude more to renew.

So we immediately set out redesigning our system to use other, fully open source technologies. Also gave us an opportunity to reconsider architecture decisions that had not scaled well. In our case, moving from a monolith to microservices has had major benefits. Maybe the biggest being able to quickly see which microservice is the bottleneck and needs to be scaled up to handle the load. With the monolith, if it got slow, it was very difficult to figure out which part of the workload was making it slow.

bluedino6y ago

Our company runs on a pile of VBA/Access. At least it talks to a MariadDB server on Linux.

The biggest problems are trying to run/develop this code on machines that were made in the last ten years, the other is that it's a horrible, horrible codebase. Code practices from the early 90's.

To make things worse, objects are 'evil', all HTML/SQL/XML is built by appending strings, there's no data sanity checks.....

I started a proof of concept replacement system that was written in Python and ran on the web.

It was met with "We can go with a web based system, since if anything changes in the browser we'll be up shit creek."

:-/

viraptor6y ago

Ah... The fine line between "better the devil you know" and "I don't understand the options so every change is too scary".

sandGorgon6y ago

in case you are using PySpark - a good framework to move to is Dask. https://docs.dask.org/en/latest/

it is also natively integrated with K8s - https://github.com/dask/dask-kubernetes - and Yarn https://yarn.dask.org/en/latest/

roland356y ago

I am currently evaluating if we should rewrite a large chunk of our embedded controller which handles motion control right now, so this is a timely write-up! I think the lessons here are the same in embedded code - we can keep the existing black box (consultant written!) code for now while the new motion control code is written in a parallel branch. The more modular the project the easier this is luckily!

sebastianconcpt6y ago

This begs the question: in which situations is it appropriate to decide on a full rewrite?
In theory, there is an easy answer to this question: If the cost of the rewrite, in terms of money, time, and opportunity cost, is less than the cost of fixing the issues with the old system, then one should go for the rewrite.

In our case, the answer to all of these questions was yes.

One of our original mistakes (back in 2014) had been that we had tried to “future-proof” our system by trying to predict our future requirements. One of our main reasons for choosing Apache Spark had been its ability to handle very large datasets (larger than what you can fit into memory on a single node) and its ability to distribute computations over a whole cluster of machines4. At the time, we did not have any datasets that were this large. In fact, 5 years later, we still do not. Our datasets have grown by a lot for sure, both in size and quantity, but we can still easily fit each individual dataset into memory on a single node, and this is unlikely to change any time soon5. We cannot fit all of our datasets in memory on one node, but that is also not necessary, since we can trivially shard datasets of different projects over different servers, because they are all independent of one another.

With hindsight, it seems obvious that divining future requirements is a fool’s errand. Prematurely designing systems “for scale” is just another instance of premature optimization, which many development teams seem to run into at one point or another

AzzieElbab6y ago

This article is insane. I wouldn't even know how to begin configuring HDFS/Spark cluster for 10Gb of data.

apta6y ago

Next up: why we decided to re-write our Haskell re-write in $HYPED_LANGUAGE.

duijfOP6y ago

Rust looks pretty cool.

j / k navigate · click thread line to collapse

70 comments

royjacobs6y ago

I'm not sure, but reading other articles[0] on the blog seems like they've been jumping on bandwagons before, so it's probably good to come back on those decisions every now and again.

[0] https://tech.channable.com/posts/2017-02-24-how-we-secretly-...

duijfOP6y ago

> Reading this article it seems like yet another example of "you don't have big data".

Yes definitely. This thing was a big exercise in "You aren't going to need it".

> so in the end it's mostly...just an app talking to Postgres?

Yes. We solve problems for our customers. It's nice if we can do that without also creating problems for ourselves :)

> seems like they've been jumping on bandwagons before

Just wanted to point out that this system is also written in Haskell. We didn't really switch bandwagons.

royjacobs6y ago

>Just wanted to point out that this system is also written in Haskell. We didn't really switch bandwagons.

mumblemumble6y ago

pbourke6y ago

mason556y ago

Curious what you recommend instead of Spark.

4 more replies

nicoburns6y ago

fulafel6y ago

macspoofing6y ago

>Not trying to come off as too snarky

samprotas6y ago

https://tech.channable.com/posts/2017-02-24-how-we-secretly-...

FWIW they have this post from almost 3 years ago about adding Haskell to their stack. I'm guessing the time for your prediction has come and gone.

kod6y ago

> "Sure, Scala will work great! It's future-proof and everyone will love it!"

Scala is still the least-bad option for a JVM language.

Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.

nemothekid6y ago

>Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.

It’s funny that you gloss over one the big productivity issues with Scala. Somehow everyone who joins your team should be acutely aware of your flavor of Scala.

If the community decided if wanted to be a better java or a worse Haskell then I bet more people would be productive in Scala

2 more replies

royjacobs6y ago

apta6y ago

2 more replies

julienfr1126y ago

Yeah the team is productive. But only in the 20% left for real work between the interminable arguing over monads or circe or play or sbt or whatever.

1 more reply

s_Hogg6y ago

> Scala is still the least-bad option for a JVM language.

Source, please?

> Anyone who can't be productive in Scala (not "better java", not "worse haskell", Scala) isn't someone you want on your team anyway.

Why is that?

alexpotato6y ago

Rewriting an application can mean different things:

1. "We are going to start over from scratch and rewrite the whole thing!"

Joel Spolksy famously said to "Never do this!"

2. "We are going to slowly refactor the whole codebase."

This can, eventually, lead you to a place where none of the original code is there so it's like a rewrite but much simpler.

3. "We are going to slowly add new places to replace the old system till there is no new system."

This is called the "Strangler App" model as described by Martin Fowler (https://martinfowler.com/bliki/StranglerFigApplication.html)

steve_adams_866y ago

Maybe that's because most jobs I apply for tend to have at least one 'legacy' system laying around that needs some help. But this is somewhat common outside of the startup scene.

At any rate, I think it's worth having refactors on your resume. I know I'd be interested in people who have done it - it's always such an educational experience.

alexpotato6y ago

Should have been more specific as to which industry I was referring to regarding the resume comment.

Hence, you keep trying to roll out new systems even if those systems have bugs like "switch all buy orders to sells" (true story).

collyw6y ago

I guess if you are replacing a system part by part as opposed to big bang rewrite, you will be more likely do make sure interfaces are well defined and components are more modular.

gtsteve6y ago

[0] https://en.wikipedia.org/wiki/Ship_of_Theseus

barking6y ago

mgkimsal6y ago

> A complete rewrite strikes me as something that's almost guaranteed to be buggy

LandR6y ago

This was rewriting millions of lines of code, very little would be reusable.

I don't know if they went ahead with it or not, but it was looking like it would descend into chaos.

1 more reply

baud1472586y ago

barking6y ago

1 more reply

Darkstryder6y ago

> Prematurely designing systems “for scale” is just another instance of premature optimization

> Examples abound: (...) using a distributed database when Postgres would do

Depending on the service and its users, this mandatory down-time might or might not be acceptable.

Anecdotally I suspect services requiring high SLAs are more common than ones requiring petabyte scale storage.

duijfOP6y ago

In this particular case, I'd take the single point of failure over the previous situation.

That being said: we have successfully used PostgreSQL's fail-overs multiple times. In my experience, they work quite alright.

[1]: https://tech.channable.com/posts/2018-04-10-debugging-a-long...

Darkstryder6y ago

Yeah, I agree. It was more of a general comment, because you seem to have one Postgres instance for every client, which is already a big step against SPOF.

devonkim6y ago

Darkstryder6y ago

However this is just a personal opinion that I might revisit at some point.

[1] https://github.blog/2018-10-30-oct21-post-incident-analysis/

1 more reply

mannykannot6y ago

StreamBright6y ago

You can use a replication setup for Postgres which gives you a pretty good availability.

Also, Spark has some single point of failures that are not obvious at first.

https://gist.github.com/aseigneurin/3af6b228490a8deab519c6ae...

smilliken6y ago

Downtime is not mandatory while upgrading to a new PostgreSQL version. I've upgraded from 9.0 to 11 and most versions in between without downtime on large busy databases.

There's not a single approach that works for every case, but they all involve a replica and upgrading one at a time.

wvenable6y ago

You don't need a distributed database to have replication and a hot backup. Stackoverflow runs this kind of configuration -- if it works for them, it can work for you.

nicoburns6y ago

> Depending on the service and its users, this mandatory down-time might or might not be acceptable.

True. But if it is acceptable, then it may well be a good trade-off to make.

Darkstryder6y ago

I agree completely. But it is a trade-off you need to be aware of, and need to be confident it will still be true in the future.

I've had situations where a client wanted a 99.95% availability SLA for our SaaS service instead of 99.5% as we were providing at the time.

He thought it was a minor change on our side, but it required completely rethinking our architecture from the ground up.

1 more reply

jermaustin16y ago

This is a story of one of my first product launches. And inevitable rewrites that ensued. You can read more of it here: https://jeremyaboyd.micro.blog/2016/11/05/my-first-product.h...

It was "rewrite" time.

goto116y ago

I can only think of a few places where a full rewrite is justified:

* You lost the source code

* You don't have any customers or users yet and time-to-market is not a concern.

* You are not a business and are writing the code purely for your own enjoyment

But these are business level considerations. For individual developers there may be compelling reasons to push for a rewrite:

* You find it more fun to work on green-field projects than to perform maintenance development.

* The new platform is more exciting or looks better on the CV than the old

ryanelfman6y ago

That's not necessarily true. Those forces will still be there but the learning of what not to do has also been had. People can improve and learn over time.

goto116y ago

mark_l_watson6y ago

Anyway, it makes sense that they backed off using Spark and HDFS - makes sense given the size of their datasets.

The original poster mentioned that their data analytics software is written in Haskell. I would like to see a write up on that.

EDIT: I see that they do have two articles on their blog on their use of Haskell.

z3t46y ago

"A creature from another dimension would see us as noise as our atoms are constantly being replaced"

Don't wait for a big rewrite. Constantly keep deleting and rewriting. Just make sure you are solving real problems while doing it.

barking6y ago

This feels like a far safer path to take

lifeisstillgood6y ago

So Google: dataset is all web pages on the internet - yes that's too large go distributed.

Tesco / Walmart : dataset might be all the sales for a year. Probably too large. But could you do with sales per week? per day?

having the raw data of all your transactions etc lying around waiting for your spiffo business query sounds good but ... is it?

I would be interested in hearing folks' cut-off points for going full Big Data vs "we don't really need this"

jimbokun6y ago

bluedino6y ago

Our company runs on a pile of VBA/Access. At least it talks to a MariadDB server on Linux.

The biggest problems are trying to run/develop this code on machines that were made in the last ten years, the other is that it's a horrible, horrible codebase. Code practices from the early 90's.

To make things worse, objects are 'evil', all HTML/SQL/XML is built by appending strings, there's no data sanity checks.....

I started a proof of concept replacement system that was written in Python and ran on the web.

It was met with "We can go with a web based system, since if anything changes in the browser we'll be up shit creek."

:-/

viraptor6y ago

Ah... The fine line between "better the devil you know" and "I don't understand the options so every change is too scary".

sandGorgon6y ago

in case you are using PySpark - a good framework to move to is Dask. https://docs.dask.org/en/latest/

it is also natively integrated with K8s - https://github.com/dask/dask-kubernetes - and Yarn https://yarn.dask.org/en/latest/

roland356y ago

sebastianconcpt6y ago

In our case, the answer to all of these questions was yes.

AzzieElbab6y ago

This article is insane. I wouldn't even know how to begin configuring HDFS/Spark cluster for 10Gb of data.

apta6y ago

Next up: why we decided to re-write our Haskell re-write in $HYPED_LANGUAGE.

duijfOP6y ago

Rust looks pretty cool.

j / k navigate · click thread line to collapse