One thing that wasn't clear to me, is that if running NPM to install dependencies on pod startup is slow, why not pre build an image with dependencies already installed, and deploy that instead?
Loading the AWS SDK via `require` was slow, not installing. As sibling comment says - collapsing different SDKs into one helped reduce loading times of the many SDKs.
Pondering this question across every organization in the world and the countless opportunities for caching leads to dark places. Would be interesting to see CDN usage for Linux distributions pre and post docker builds becoming popular.
We moved shitloads to self hosted Thanos. While this comes with its own drawbacks obv, I think it was worth it.
Is it possible the prior measurement happened during a high traffic period and the post measurement happened in a low traffic period?
Wouldn't it be cheaper to just keep a pod up with a service running?
If scaleability is an issue just plop a load balancer in front of it and scale them up with load but surely you can't need a whole pod for every single one of those millions of requests right?
> Checkly is a synthetic monitoring tool that lets teams monitor their API’s and sites continually, and find problems faster.
>With some users sending *millions of request a day*, that 300ms added up to massive overall compute savings
No shit, right?
Spending serious engineering time to wrangle with the complexities of cloud orchestration is not something that should be taken lightly.
Cloud services should be required to have a black-box Surgeon’s General warning.
Until you get quite big, all necessary interactions with the cloud provider are just bills. It's just much easier, even though it is often expensive
I've spent the last decade or so wondering if the emperor was wearing clothes and not really getting what everyone else has been talking about. Which isn't to say that cloud is useless, but it's not the universal panacea that it was often sold as, and it seems that others are waking up to that.
Another past employer switched because we hit our scale up limit, and needed to start scaling out. A small refactor allowed us to scale out, and we moved to azure's managed database, queue, and blob storage. The web frontend could scale based on connections, and the queue and blob storage was slower than our current approach, but it was better once we added the autoscaling. Since the slower speed was PER connection. Minimum scale was 5 instances, so that there was no bottleneck when scaling.
There are many reasons to go "cloud" but for most small businesses (or at least small departments of large organizations), cloud-first doesn't seem like a great option unless you have 10s of thousands in credits each month. Just build your software, scale up first on-prem or at a datacenter - it is LOADs cheaper and predictable.
It's still true? In my experience it used to be, nowadays most of the organizations have an internal serf-provisioning portal.
Bare metal and datacenter orchestration is leaps and bounds more complex. You're paying for the abstraction.
In most cases, managing software on bare metal is more complex in exactly one case: when engineers only know cloud abstractions.
Orchestration is dead simple and mostly automated using off the shelf, open source tools. If a server goes down, it’s a few minutes to replace it. The cloud based hosting is a fixed cost each month - no usage based surprises.
Meanwhile, for clients, spent huge amounts of time fixing broken Kubernetes setups and hit serious design constraints because of usage based pricing on their PaaS infrastructure like being unable to do complex queries from a database.
I wouldn’t think twice about the same query on our in house hosted DBs on $400 servers.
I’ll add this is a really good write up ! Love this comment :
“There is no harm in using boring/simple methods if the results are right.”
Disclaimer: Checkly founder.
Of course, given stable demand and known requirements, bare metal can be a great option. But it’s not strictly better than public cloud hosting.
I think it’s just been long enough that people have forgotten the limitations of bare metal engineering.
The same 3s runtime startup cost (and need for more hardware) would happen if they were running their own servers.
Routinely: oops, our API usage slipped and we mistakenly paid more than the staff to avoid this would cost
Keep fucking up, tech industry. My job role depends on it (SRE)
Cloud stuff is really alluring at first. Works for awhile then the costs become above what it would cost to run it yourself. Cloud is not a 'set it and forget it' sort of thing. You have to manage it too.
now i write ml code and deploy it on a docker in gcp and the same issues all over again. you import pandas gbq and pretty much the entire google bq set of libraries is part of the build. throw in a few stadard ml libs and soon you are looking at upwards of 2 seconds in Cloud Run startup time. You pay premium for autoscaling, for keeping one instance warm at all times, for your monitoring and metrics, on and on. i am yet to see startup times below 500ms. you can slice the cake any which way, you still pay the startup cost penalty. quite sad.