*Services:*
Kubernetes expects DNS records like {pod}.default.svc.cluster.local. In order to achieve this they will have to have some custom DNS records on the "pod" (fly machine) to resolve this with their metadata. Not impossible, but something that has to be take into account.
*StatefulSets:*
This has 2 major obstacles:
The first is dealing with disk. k8s expects that it can move disks to different logical pods when they lose them (e.g. mapping EBS to an EC2 node). The problem here is that fly has a fundamentally different model. It means that it either has to decide not to schedule a pod because it can't get the machine that the disk lives on, or not guarantee that the disk is the same. While this does exist as a setting currently, the former is a serious issue.
The second major issue is again with DNS. StatefulSets have ordinal pod names (e.g. {ss-name}-{0..n}.default.sv.cluster.local). While this can be achieved with their machine metadata and custom DNS on the machine, it means that it either has to run a local DNS server to "translate" DNS records to the fly nomenclature, or have to constantly update local services on machines to tell them about new records. Both will incur some penalty.
If so, this is very attractive. When using GKS, we had to do a lot of work to get our Node utilization (the percent of resources we had reserve on a VM actually occupied by pods) to be higher than 50%.
Curios what happens when you run “kubectl get nodes” - does it lie to you, or call each region one Node?
I would think they should highlight that a lot more in the product announcement!
Interestingly, there are already multiple providers of virtual-kubelet. For example, Azure AKS has virtual nodes where pods are Azure Container Instances. There’s even a Nomad provider.
> So that’s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine.
So probably a cluster per region. You could theoretically spin up multiple virtual-kubelets though and configure each one as a specific region.
> Because of kine, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.
This would mean the control-plane would be on a single-server without high-availability? Although, I suppose there really isn’t any state stored since they are just proxying requests to the Fly Machine API. But still, if the machine went down your kubectl commands wouldn’t work.
How is this the schedulers fault? Is this not just your resource requests being wildly off? Mapping directly to a "fly machine" just means your "fly machine" utilization will be low
Even if my Pods were perfectly sized, a large percent of the VMs running the Pod was underutilized because the Pods were poorly distributed across the Nodes
Someone else linked GKE Autopilot which manages all of that for you. So if you’re using GKE I don’t see much improvement, since you lose out on k8s features like persistent volumes and DaemonSets.
Same, a while back you had to install cluster-autoscaler and set it to aggressive mode. GKE has this option now on setup, though I think anyone who's had to do this stuff knows that just using a cluster-autoscaler is never enough. I don't see this being different for any cluster and is more a consequence of your workloads and how they are partitioned (if not partitioning, you'll have real trouble getting high utilization)
> A Fly Volume is a slice of an NVMe drive on the physical server your Fly App runs on. It’s tied to that hardware.
Does the k8s have any kind of storage provisioning that allows pods with persistent storage (e.g. databases) to just do their thing without me worrying about it or do I still need to handle disks potentially vanishing?
I think this is the only hold-up that stops me actually using Fly. I don't know what happens if my machine crashes and is brought back on different hardware. Presumably the data is just not there anymore.
Is everyone else using an off-site DB like Planetscale? Or just hoping it's an issue that never comes up, w/ backups just in case? Or maybe setting up full-scale DB clusters on Fly so it's less of a potential issue? Or 'other'?
We back up volumes to off-net block storage, and, under the hood, we can seamlessly migrate a volume to another physical (the way we do it is interesting, and we should write it up, but it's still also an important part of our work sample hiring process, which is why we haven't). So your app could move from one physical to another; the data would come with it.
On the other hand: Fly Volumes are attached storage. They're not a SAN system like EBS, they're not backed onto a 50-9s storage engine like S3. If a physical server throws a rod, you can lose data. This is why, for instance, if you boot up a Fly Postgres cluster here and ask us to do it with only one instance, we'll print a big red warning. (When you run a multi-node Postgres cluster, or use LiteFS Cloud with SQLite, you'd doing at the application layer what a more reliable storage layer would do at the block layer).
But once again, for many of my projects, I still need my outbound IPs to resolve to a specific country. I can't have them all resolve to Chicago, US in undeterministic ways.
I would be willing to pay an additional cost for this but even with reserved IPs, I am given IPs that are labelled as Chicago, US IPs by GeoIP providers even for non US regions.
What about them makes for a good trade-off when considering the many other vendors?
We were surprised at how FKS turned out, which is part of why we decided to launch it as a feature and all of why we wrote it up this way. That's all.
It might now be more accurate to say "You were not a k8s vendor", but now you are based on
> If K8s is important for your project, and that’s all that’s been holding you back from trying out Fly.io, we’ve spent the past several months building something for you.
If it's fundamentally different, maybe you shouldn't call it Kubernetes, perhaps a Kubernetes API compatible alternative?
fwiw/context, I use GKE and also many of the low-level services on GCP
Is Fly supposed to be simpler for the average developer?
I love Kubernetes because the .yaml gives you have the entire story, but I'd _really_ love to get that experience w/o having to run Kubernetes. (Even in most managed k8s setups, I've found the need to run lots of non-managed things inside the cluster to make it user-friendly.)
Sometimes you just want to run k8s without thinking too much about it, without having all the requirements that gcp have answers to.
Also, there is the market for talent, which is non-existent for fly.io technology if it's not open source (I see what you did here, Google): you'll have to teach people how your solution works internally and congratulations, now you have a global pool of 20 (maybe 100) people that can improved it (if you have really deep pockets, maybe you can have 5 Phd). Damn, universities right now maybe have classes about Kubernetes for undergrad students. Will they teach your internal solution?
So, if a big part of your problem is already solved by a gigantic corporation investing millions to create a pool of talented people, you better take use of that!
Nice move, fly.io!
I love fly.io for rethinking some of the problems.
The press release maps pods to machines, but provides no mapping of pod containers to a Fly.io concept.
Are multiple containers allowed? Do they share the same network namespace? Is sharing PID namespace optional?
Having multiple containers per pod is a core functionality of Kubernetes.
It seems that Fly.io Machines support multiple processes for a single container, but not multiple containers per Machine [0]. This means one container image per Machine and thus no shared network namespace across multiple containers.
[0] https://kubernetes.io/blog/2023/08/25/native-sidecar-contain...
> When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine.
Why a single machine? Is it because this single fly machine is itself orchestrated by your control plane (Nomad)?
> ...we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system based on eBPF). But the ideas are the same.
very cool, is this similar to how Cilium works?
There's so much power on the platform with Flycast, LiteFS and other clever ways to work with containers. If it was 90% stable I'd consider it a huge win.
Once you start deploying in SIN/CDG etc you start to get really weird instability (and this is on v2 machines).
To me, I'd imagine kubernetes on fly as running kind (kubernetes in docker) with fly converting the docker images to firecracker images OR "normal" kubernetes api server running on one machine then using CAPI/or a homegrown thing for spinning up additional nodes as needed.
So, what's the deal here? Why k3s + a virtual kublet?
The thought here is: Fly.io already does a lot of the things any K8s distribution would do. If you were to boot up a complete K8s distribution on your own Fly Machines, running oblivious to the fact that they were on Fly.io, you'd be duplicating some of the work we'd already done (that's fine, maybe you like your way better, but still, bear with me).
So, rather than setting up a "vanilla" K8s that works the same way it would if you were on, like, Hetzner or whatever, you can instead boot up a drastically stripped down K8s (based on K3s and Virtual Kubelet) that defers some of what K8s does to our own APIs. Instead of a cluster of scheduling servers synchronized with Raft, you just run a single SQLite database. Instead of bin-packing VMs with Docker and a kubelet, you just run everything as an independent Fly Machine.
We took the time to write about this because it was interesting to us (I think we expected a K8s to be more annoying for us to roll, and when it was easier we got a lot more interested). There are probably a variety of reasons to consider alternative formulations of K8s!
Not sure what the implications of that are in practice but sounds interesting.
Instead, I think the system components should expose themselves as independent entities, and grant other system components the ability to use them under criteria. With this model, any software which can use the system components' interfaces can request resources and use them, in whatever pattern they decide to.
But this requires a universal interface for each kind of component, loosely coupled. Each component then needs to have networking, logging, metrics, credentials, authn+z, configuration. And there needs to be a method by which users can configure all this & start/stop it. Basically it's a distributed OS.
We need to make a standard for distributed OS components using a loosely coupled interface and all the attributes needed. So, not just a standard for logging, auth, creds, etc, but also a standard for networked storage objects that have all those other attributes.
When all that's done, you could make an app on Fly.io, and then from GCP you could attach to your Fly.io app's storage. Or from Fly.io, send logs to Azure Monitor Logs. As long as it's a standard distributed OS component, you just attach to it and use it, and it'll verify you over the standard auth, etc. Not over the "Fly.io integration API for Log Export", but over the "Distributed OS Logging Standard" protocol.
We've got to get away from these one-off REST APIs and get back to real standards. I know corporations hate standards and love to make their own little one-offs, but it's really holding back technological progress.
K8s is popular because it's free, has a lot of bells and whistles, and was made by Google. Otherwise nobody would use it. It's basically a larger, slightly less crappy Jenkins.
Corporations create standards all the time, either directly or through standards bodies, that they also fund. You can already push logs with syslog, or transform them with Beats then push them; you can already attach storage from elsewhere, etc etc. It's just often a bad idea to for performance and data movement cost reasons.
I don't see the major technological progress this holds back, and if you think technological progress is a measure of how much corporations hate standards, then by that logic, based on the last 50 years of utterly insane progress, they must love standards.
This aside, Fly is in a position to build its own alternative to K8s and Nomad from scratch, so maybe it will?
The ordinary way someone would boot up an app on Fly.io is to visit a directory in their filesystem with a Rails or Django or Express app or something, or a Dockerfile, and just type `flyctl launch`. No K8s will be involved in any way. You have to go out of your way to get K8s on Fly.io. :)
I don't know why would anyone would be like "here's a container execution platform, let me go ahead and use their fake Pods API instead of their official API".
Right now, the immediate things you'd get out of using FKS are:
* The declarative K8s style of defining an app deployment, and some of the K8s mechanics for reconciling that declaration to what's actually running. We did most of this stuff before when we were backed on Nomad, but less of it now with Fly Machines. If you missed having a centralized orchestrator, here's one.
* Some compatibility with K8s tooling (we spin up a cluster, spit out a kubeconfig file, and you can just go to town with kubectl or whatever).
This is absolutely not going to let you do everything you can possibly do with K8s! Maybe we'll beef it up over time. Maybe not many people will use it, because people who want K8s want the entire K8s Cinematic Universe, and we'll keep it simple.
Mostly: we wrote about it because it was interesting, is all that's happening here.
I think you asked a super good question, and "I don't know, you might be right" is our genuine answer. Are there big things this is missing for you? (Especially if they're low-hanging fruit). I can (sort of) predict how likely we are to do them near term.
It is Kubernetes since they are running k3s as the control-plane. It’s not just an implementation of the Pod API, it’s an implementation of kubelet which handles logs/exec/etc APIs. The rest of the Kubernetes API is part of the control-plane on k3s.
The only major issue I see is persistent volume support, but persistent volumes in Kubernetes were always a bit flaky and I’ve always preferred to use an externally managed DB or storage solution.
Was there an internal project name for this? Fubernetes? f8s? :D
What a strange way to admit they were wrong.
My most important one is this: can I build a distributed k8s cluster with this?
I mean having fly machines in Europe, US and Asia acting as a solid k8s cluster and letting the kube scheduler do its job?
If yes then it is better than what the current cloud offerings are, with their region-based implementation.
My second question is obviously how is the storage handled when my workload migrates from the US to Europe: so I still profit from NVME speeds? Is it replicated synchronously?
Last but not least: does it support RWM semantics?
If all the answers are yes, kudos, you just solved many folk’s problems.
Stellar article, as usual.
> More to come! We’re itching to see just how many different ways this bet might pay off. Or: we’ll perish in flames! Either way, it’ll be fun to watch.
is like nails on the chalkboard for me.
I don't think it's in poor taste to acknowledge exactly what everyone should understand and be prepared for.
> But, come on: you never took us too seriously about K8s, right? K8s is hard for us to use, but that doesn’t mean it’s not a great fit for what you’re building. We’ve been clear about that all along, right? Sure we have!
which already starts the post in a bad space for the reader. I have cognitive whiplash from what is intended. "We DON'T like Kubernets UNTIL WE DO but then WE MIGHT NOT IN THE FUTURE". Clear meaning is far more appreciated.
This was a bet. We're bullish about this bet! Even without K8s, having core scheduling be "less reliable" but with a simpler, more responsive interface puts us in a position to do some of the "move heaven and earth" work that K8s and Nomad do in simpler components (like: we can write Elixir code to drive the scheduler).
But it might not pay off! That's what makes it a bet.
† (see: comments on this thread asking why overengineered and wrote out own version of stuff; the expectation that you'd run a platform like Fly.io on standard K8s or Nomad is pretty strong!).
That is quite the opposite of “simple”. That is in fact, overly complex and engineered.
That said, it's a questionable design choice when you get to a hyperscale environment, since all the primitives are extremely opinionated and have design and scalability issues with service discovery, networking, and so on. All the controllers had to be rewritten, we had to roll our own deployment system, our own service discovery system, our own load balancing, and so on. But if you reach this level, you're probably making a lot of money and can figure out how to solve your problems.
Kubernetes is a great fit for even extremely simple applications - assuming you have dozens to keep track of and dozens of developers who want to make changes to them.
The real problem is that the point it becomes attractive to have something like Kubernetes is not too far from the point where Kubernetes becomes an overly-complex mess of disparate parts.
Not needing a bloated black box sysadmin framework (aside from Linux itself, which is plenty bloated and over engineered) is a huge time saver. And the eBPF libs have a lot of eyes on them.
IMO sysadmin and devops are done for. They lasted this long to “create jobs”.
Time will tell if embracing the complexity of Kubernetes was a good play for them or not. But, in all honesty, I'm pretty sad to see this happening, although I'm sure they had their reasons.
At FarmLogs (yc 12) we had a pretty righteous gitops (homegrown) kube platform running dozens of microservices. We would not have been able to move as quickly as we did and roll out so many different features without it. This was back when people had just started to adopt it. Mesos was still a contender (lmao). We were polyglot too - python/clojure mixture. Heck, we even ran an ancient climate model called APSIM that was built in c#/mono, required all kinds of ancient fortran dependencies etc and it worked like a charm on kube thanks to containers. We had dedicated internal load balancers behind our VPN for raw access to services and endpoints, like "microservice.internal.farmlogs.com" (this was before istio, fabric networks, all the incredible progress that exists now)
I recall Brendan Burns asking me to write up a blog post for the Kube blog about our success story, but unfortunately was so saddled with product dev work and managing the team that I never found time for it.
I will absolutely adopt K8s again one day (very soon) but you need to know how to harness its capabilities and deploy it correctly. Build your own Heroku that fits your business. Use the Kube API directly. It's really not hard. It gets hard due to all the crap in the ecosystem (helm, yaml files). Hitting API direct means no yaml =)
I am stoked to see Fly offering this.
I've also used Chef, custom RPM packages and classic Unix startup scripts. And I'm probably forgetting some.
And honestly? Kubernetes can be really great. Especially if you:
- Read enough to understand the split between pods/replication controllers/deployments, which is a bit unusual, and the fact that "services" are basically a name lookup system. This split is weird, but it's not that hard to figure out.
- Pay someone for a quality managed Kubernetes.
- Don't get clever with the networking overlays.
I especially like the way that Kubernetes allows me to deploy almost anything with a short YAML file, and the fact that I never need to worry about individual servers at all.
Now, I wouldn't use Kubernetes if I could get away with a "Heroku like" system. But for anything more complicated than that, Kubernetes can be pretty simple and reliable. Certainly I'd take Kubernetes over a really complex Terraform setup.