undefined | Better HN

0 pointscourtf5y ago0 comments

I honestly couldn't tell you.

What I can tell you, is that the unbelievable bloat in the complexity of our systems is going to bite us in the ass. I'll never forget when I joined a hip fintech company, and the director of eng told us in orientation that we should think of their cloud of services as a thousand points of light, out in space. I knew my days were numbered at exactly that moment. This company had 200k unique users, and they were spending a million dollars a month on CRUD. Granted, banking is its own beast, but I had just come from a company of 10 people serving 3 million daily users 10k requests a second for images drawn on the fly by GPUs. Our hosting costs never exceeded 20k per month, and the vast majority of that was cloudflare.

Deploying meant compiling a static binary and copying it to the 4-6 hardware servers we ran in a couple racks, one rack on each side of the continent. We were drunk by 11am most of the time.

Today, it's apparently much more impressive if you need to have a team of earnest, bright-eyed Stanford grads constantly tweaking and fiddling with 100 knobs in order to keep systems running. Enter kubernetes.

0 comments

falcolas5y ago

> is that the unbelievable bloat in the complexity of our systems is going to bite us in the ass.

My favorite example of this right now is Vitess. Sure, it's a beautiful piece of technology. But, for a usecase my company is looking at, we'll be replacing one (exceptionally large) DB with in excess of 80 mysql pods, managed by another opaque-through-complexity system running on the top of kubernetes (which already bites us regularly even though it's "managed").

The complexity and failure scenarios makes my head ache, even though I should never have to interact with it myself.

Oh, and my current favorite PITA - having to change the API version of deployment objects from 'v1beta1' to 'v1' in over 160 microservice charts as part of a kubernetes version upgrade. Helm 2 doesn't recognize the deployments as being identical, so we're also have to do a helm3 upgrade as well, just to avoid taking down our entire ecosystem to do the API version upgrade. Wheeee!

harpratap5y ago

> Oh, and my current favorite PITA - having to change the API version of deployment objects from 'v1beta1' to 'v1' in over 160 microservice charts as part of a kubernetes version upgrade.

How is this a problem unique to Kubernetes? Don't you have to make similar changes when upgrading a library or dependency that was in beta?

roman_sf5y ago

good thing kubernetes can restart pods fast and have persistent volumes!

falcolas5y ago

Persistant volumes rely on NFS (or a flavor thereof), which is not great for database performance.

But that's a moot point anyways, since Vitess doesn't use persistent volumes - it reloads the individual DBs from backups and binlogs when a pod is moved or restarted.

3 more replies

look_lookatme5y ago

I work at a fintech that runs a rails monolith on k8s and you would probably puke at how cheap our hosting cost is given our AUM. Engineering is a funny thing.

quickthrower25y ago

I guess because raw VMs are cheap these days, they are a true commodity and k8s takes advantage of that. No cloud premiums for you!

melq5y ago

I'm curious how long ago you were at that company serving 3m customers a day. I have not been in the industry very long, so I don't really know what things were like >5 years ago, and don't want to sound as if I'm pretending to be an expert.

That said, a couple thoughts that came to mind:

1. having only 4 servers in 2 locations serving 3m customers a day seems crazy to me, atleast in the context of current practices regarding highly available systems.

2. not sure your cost comparisons are fair, in the first case you're talking about cloud costs (so including hardware, 3rd party services/api fees, etc), but in the second you're just talking hosting fees.

If your first company had a relatively static, hardware-heavy (gpus doing most of the work) workload, easily handled by a few servers -- then it would be crazy to pay for a cloud provider. And it wouldn't make much sense to bother with k8s or containers either (imo).

On the other hand, if the more recent company has a dynamic/spikey, software-heavy workload, with a ton of different services, orders of magnitude more infrastructure, and (being fintech) much more demanding SLAs... then it might make a lot of sense to use a cloud provider and take advantage of k8s. Especially if you're a start up that doesn't have the time/expertise to deal with datacenter design.

I agree that there's a lot of unnecessary fixation on the latest and greatest these days, but there are definitely situations where kubernetes can be very valuable.

courtfOP5y ago

Good questions, and you are onto something here for sure. This wasn't too long ago, around 2014-2015.

This was all for a weather radar app, and you are correct, there really weren't any SLA's, but we had to handle very high loads. We did make use of cloud services for some pieces of the system (there was a database and a small API for some minor bookkeeping, mostly around users). I included those costs in my estimate of monthly expenses. We had lots of caches, for all our JSON and for things like user authentication, which saved us from having to really figure out the database side. The caches were typically push-based, so we didn't let user requests get to the disc, if we could help it.

The vast majority of requests were for those images though, which required moving lots of clumsy geographic data into the GPUs to render map tiles (at high-def and high zooms as well), so the requests were still somewhat costly to serve, even if they didn't hit a database. We were able to get away with a small footprint in the datacenter by making heavy use of CDN caching. Cache lifetimes for the latest weather images were often measured in seconds, and getting those timings right was crucial. Screwing up cache lifetimes would rapidly swamp the system with requests, but the software was good at continuing to keep latency low under heavy load, and degrading gracefully. In fact, the vast majority of bandwidth usage in the datacenters was actually not requests, but streaming geographic data from various government sources. We regularly had 50-100MB/s coming in, and we stored all of it in memory. The GPU machines had 100-200GB of memory, and we used all of it. We had to cycle through that memory pretty rapidly as well, so making sure allocations were low and memory was freed up on time was important.

It may not sound like we had much redundancy, but with all the caches, and each machine being quite powerful, it was better than it sounds in that regard. We often took machines in an out of nginx. The way the graceful degradation worked, we would prioritize the imagery from higher zooms (more zoomed out) so the worst that would happen on a typically day is that some very zoomed in images, in places few people were looking, might be slow or time out.

So, in the end, you are correct, the situations are different. The bank had to store things for a lot longer, and had to uphold more stringent SLA's and the like. That said, I still think they were flushing a lot of cash down the toilet, and making things over complicated :).

melq5y ago

Thanks for the detailed reply, I found it very informative. I work on a small team developing 'private cloud' infrastructure for a large company, so I usually find myself on the opposite side of the arguement... trying to highlight the virtues of on-prem hardware and the downsides of 3rd party cloud providers.

We've had to work very hard to allow for developers/sre/ops folks to be able provision vms and bare-metal machines in our datacenters the same way they would in the cloud provider that we use. Obviously its not as fast, seamless or feature-rich as it is with aws/gcp/azure et al, but I'm proud of the progress we've made.

What really kills me though, is that a huge chunk of our engineers seem to think our work is a complete waste of time in the first place. We have several physical dcs, and tens of thousands of machines... but since most engineers don't have to think about costs, or about workloads other than their own, they think of us as out of touch and clinging to the past.

Nothing worse than getting snark about our platform from an SRE who spends their days in a web app glueing together the ready made services of google and amazon while acting as if they're building the world of tomorrow :)

1 more reply

quickthrower25y ago

Network effects. Kubernetes is getting into the "no one fired for choosing" category. Whatever it's flaws you can be sure it'll be a round in 10 years (or more sure than other tech). With cloud providers creating flux in their offerings, who knows if the locked-in simpler alternative will be desisted in 10 years, but at least with k8s you can move if they stop supporting it.

crimsonalucard15y ago

Does there exist a solution that does what kubernetes does but without the complexity?

If such a tool does not exist do any of you feel that the creation of such a tool is within the realm of possibility?

I would imagine all these knobs could have default configurations that 99% of all users would be okay with and that the knob should only be exposed in a small amount of cases.

dcwca5y ago

Managed K8s is simpler, managed container services like ECS are simpler still.

tly_alex5y ago

+1 ECS is very very simple to use. And it has good integration with other AWS services like load balancing and logging

winrid5y ago

It sounds like your first company did one thing well, and the second one was trying to do a lot of different things. So while volume was lower, complexity was higher.

Don't get me wrong. I'd still probably build that as a monolith in Java instead of a thousand NodeJS services, but I can see how you end up with Kubernetes.

Aeolun5y ago

Kubernetes makes it easier to run your application if it’s already way too complex for it’s own good.

falcolas5y ago

Nah, we already had orchestration systems (a'la chef, ansible, cfengine, and VM images to manage most of that complexity. Worst case scenario, there were always works-100%-of-the-time-80%-of-the-time shell scripts.

xenospn5y ago

I remember back in the day we were deploying a production video streaming server at a customer site, and a team from IBM deployed their analytics suite alongside us. We had over 8Gbps coming out of a single 1U server. The IBM installed a separate server for every Java process that their application needed, which was something like 8. It was ridiculous.

lonelappde5y ago

IBM charged the customer for each server.

mikereedell5y ago

or, as with my experience in licensing IBM stuff, the charged per CPU.

1 more reply

xenospn5y ago

Hell yeah they did. The analytics software cost more than the actual solution.

jmspring5y ago

Wouldn't be prudent...

ninkendo5y ago

Your argument is about complexity but it doesn’t really speak to kubernetes itself. You can have some dead simple architecture on top of k8s. Hosting your own k8s though, that’s pretty complex, but get that working (or have your cloud provider just host a cluster for you) and things can be pretty darned simple.

dilap5y ago

dang, why'd you leave the old company? ;-)

hyperbovine5y ago

> I'll never forget when I joined a hip fintech company, and the director of eng told us in orientation that we should think of their cloud of services as a thousand points of light

Let's be real, if you are old enough to get that reference without Googling, you probably would not have lasted that long at a hip fintech company anyways :-P

apetresc5y ago

Sorry, what's the reference?

Lightbody5y ago

https://en.wikipedia.org/wiki/Thousand_points_of_light

sk5t5y ago

Best memorialized in Dana Carvey's George H. W. Bush impression.

falcolas5y ago

George Sr's term wasn't that long ago.

... was it?

jsjohnst5y ago

32 years ago, so I dunno, is that “long ago” to you?

j / k navigate · click thread line to collapse

0 comments

falcolas5y ago

> is that the unbelievable bloat in the complexity of our systems is going to bite us in the ass.

The complexity and failure scenarios makes my head ache, even though I should never have to interact with it myself.

harpratap5y ago

> Oh, and my current favorite PITA - having to change the API version of deployment objects from 'v1beta1' to 'v1' in over 160 microservice charts as part of a kubernetes version upgrade.

How is this a problem unique to Kubernetes? Don't you have to make similar changes when upgrading a library or dependency that was in beta?

roman_sf5y ago

good thing kubernetes can restart pods fast and have persistent volumes!

falcolas5y ago

Persistant volumes rely on NFS (or a flavor thereof), which is not great for database performance.

But that's a moot point anyways, since Vitess doesn't use persistent volumes - it reloads the individual DBs from backups and binlogs when a pod is moved or restarted.

3 more replies

look_lookatme5y ago

I work at a fintech that runs a rails monolith on k8s and you would probably puke at how cheap our hosting cost is given our AUM. Engineering is a funny thing.

quickthrower25y ago

I guess because raw VMs are cheap these days, they are a true commodity and k8s takes advantage of that. No cloud premiums for you!

melq5y ago

That said, a couple thoughts that came to mind:

1. having only 4 servers in 2 locations serving 3m customers a day seems crazy to me, atleast in the context of current practices regarding highly available systems.

I agree that there's a lot of unnecessary fixation on the latest and greatest these days, but there are definitely situations where kubernetes can be very valuable.

courtfOP5y ago

Good questions, and you are onto something here for sure. This wasn't too long ago, around 2014-2015.

melq5y ago

1 more reply

quickthrower25y ago

crimsonalucard15y ago

Does there exist a solution that does what kubernetes does but without the complexity?

If such a tool does not exist do any of you feel that the creation of such a tool is within the realm of possibility?

I would imagine all these knobs could have default configurations that 99% of all users would be okay with and that the knob should only be exposed in a small amount of cases.

dcwca5y ago

Managed K8s is simpler, managed container services like ECS are simpler still.

tly_alex5y ago

+1 ECS is very very simple to use. And it has good integration with other AWS services like load balancing and logging

winrid5y ago

It sounds like your first company did one thing well, and the second one was trying to do a lot of different things. So while volume was lower, complexity was higher.

Don't get me wrong. I'd still probably build that as a monolith in Java instead of a thousand NodeJS services, but I can see how you end up with Kubernetes.

Aeolun5y ago

Kubernetes makes it easier to run your application if it’s already way too complex for it’s own good.

falcolas5y ago

xenospn5y ago

lonelappde5y ago

IBM charged the customer for each server.

mikereedell5y ago

or, as with my experience in licensing IBM stuff, the charged per CPU.

1 more reply

xenospn5y ago

Hell yeah they did. The analytics software cost more than the actual solution.

jmspring5y ago

Wouldn't be prudent...

ninkendo5y ago

dilap5y ago

dang, why'd you leave the old company? ;-)

hyperbovine5y ago

> I'll never forget when I joined a hip fintech company, and the director of eng told us in orientation that we should think of their cloud of services as a thousand points of light

Let's be real, if you are old enough to get that reference without Googling, you probably would not have lasted that long at a hip fintech company anyways :-P

apetresc5y ago

Sorry, what's the reference?

Lightbody5y ago

https://en.wikipedia.org/wiki/Thousand_points_of_light

sk5t5y ago

Best memorialized in Dana Carvey's George H. W. Bush impression.

falcolas5y ago

George Sr's term wasn't that long ago.

... was it?

jsjohnst5y ago

32 years ago, so I dunno, is that “long ago” to you?

j / k navigate · click thread line to collapse