undefined | Better HN

0 pointsstaticassertion7y ago0 comments

This is what I was saying though.

Yes, you have to handle the failure case of "network call". Microservices add this failure case. But you already had to handle the case of "code blew up because of a bug".

By forcing you to handle comm errors like network failure, you also force people, implicitly, to handle "code blew up because of a bug" errors. Even though it adds a second error case, you pretty much handle them the same way in the same place.

There was always an error case - the fact that code may blow up.

0 comments

fauigerzigerk7y ago

>But you already had to handle the case of "code blew up because of a bug".

I'm handling bugs very differently than network failures though, because network failures are usually temporary while bugs are usually (or even by definition) permanent.

Dealing with temporary outages in lots of places is extremely difficult. You may need retry logic. You may need temporary storage or queuing. You may need compensating transactions. You may need very carefully managed restarts to avoid DDOSing the service once it comes back online. There may be indirect, non-deterministic knock-on effects you can't even test for properly.

Microservices cause huge complexity that is very hard to justify in my opinion.

staticassertionOP7y ago

> I'm handling bugs very differently than network failures though, because network failures are usually temporary while bugs are usually (or even by definition) permanent.

Depends on the bug - there are transient bugs that are not networking related.

But let's assume it's a "hard error" ie: a consistently failing bug. I would say where that bug is makes a huge difference.

If it's a critical feature, that bug should probably get propagated. If it's a non-critical feature, maybe you can recover.

By isolating your state across a network boundary, recovery failure is made much simpler (because you do not need to unwind to a 'safe point' - the safe point is your network boundary).

But it often depends how you do it. I personally prefer to write microservices that use queues for the vast majority of interactions. This makes fault isolation particularly trivial (as you move retry logic to the queue) and it scales very well.

If you build lots of microservices with synchronous communications I think you'll run into a lot more complexity.

Still, I maintain that faults were already something to be handled, and that a network bound encourages better fault handling by effectively forcing it upon you.

j / k navigate · click thread line to collapse