If you don't believe me, take it from someone who should know what they're talking about: https://twitter.com/kelseyhightower/status/96341350830081229...
What a lot of naysayers leave out, or choose to ignore, is that the challenges running stateful apps on Kubernetes mirror those of running stateful apps anywhere. If you run Postgres on a VM, for example, you're completely reliant on that VM staying up -- this is no different from Kubernetes. Some will also point out the dangers of co-locating lots of software (such as Postgres) on the same machine as many other containers, as they will compete for CPU and I/O; but this is also no different than on Kubernetes, which provides plenty of tools (affinities/anti-affinities, node selectors) to isolate containers to machines. And so on. Containers bring some new challenges, but Kubernetes meets them quite well.
What specific issues do you have? I'm not sure I understand the point about routing. I also don't understand what the "pain" of stateful sets refers to.
1. While "we already rely on VM staying up", with k8s we reply on both VM staying up and kubernetes infra on top of that VM staying UP 2. Maintaining a complex stateful system on k8s _requires_ having and maintaining an operator for that system. 3. You reduce your options when it comes to tweaking systems, e.g. local SSDs on GCP are available in SCSI and NVMe flavors, while GKE supports only SCSI; harder to perform fine-tuning and other tasks on the underlying VMs that would have been trivial with Chef or similar. 4. Enterprise systems like Splunk explicitly mention that their support does not cover Splunk clusters running on kubernetes. 5. As mentioned, you can't even resize a disk without going through dance of operations that would take days or weeks when you're working something like Kafka at scale. 6. Some stateful services like Zookeeper require stable identities and this is far from perfect on kubernetes. 7. More complex traffic routing that involves additional fees because to achieve (6) you sometimes need to expose things publicly.
That's just from the top of my head.
Disclaimer: We run 10+ stateful services on Kubernetes.
https://twitter.com/kelseyhightower/status/96347131657256140...
He said "You still need to worry about database backups and restores. You need to consider downtime during cluster upgrades."
These things are totally true. K8s doesn't automate backups (edit: by default; though, it can) and if you need to take K8s down for upgrades, then everything is down. For its part, though, CockroachDB supports rolling upgrades with no downtime on Kubernetes.
As for routing, that is tough problem if you want to run K8s across multiple regions, though we have some folks who've done it.
And if one finds setting up StatefulSets challenging, we have a tutorial on how to do it written by a former Kubernetes engineer: https://www.cockroachlabs.com/docs/stable/orchestrate-cockro...
There are projects that help you run databases in kubernetes and also make backups of many things hosted:
- Automatic CephFS for your cluster -> https://rook.io/docs/rook/master/
- Backups for cluster resources and volumes -> https://github.com/heptio/ark
- Spin up dynamic postgres clusters -> https://github.com/zalando-incubator/postgres-operator
Databases are just applications with different resource needs. Please stop pushing forward the notion that they can't be run in containers or container orchestration systems. Databases are just programs. If the substrate for running your containers doensn't reliably support flock or fsync or something your database needs, then maybe pick a better substrate that does -- container runtimes these days and kubernetes don't stand in your way these days.
They just don't act like other services, and require more care. That's about it. I think that's what Kelsey is referring to, you can't just treat them the same as other pods.
5% seems like a surprisingly large overhead. What is k8s doing in this situation that would have that kind of impact?
We haven’t yet evolved Kubernetes services to prefer specific cores and avoid app workloads quite yet (although cpu management is getting closer).
Docker is also somewhat hefty memory wise and you may contend on disk if not careful.
5% seems pretty reasonable to me in general, just as a consequence of having something heavier weight on the same node managing workloads.
I'll note, though, that the 5% number is when using host networking for both Cockroach and the client load generator. Using GKE's default cluster networking through the Docker bridge is closer to 15% worse than running directly on equivalent non-Kubernetes VMs.
For example if you ran CDB on a baremetal cluster of 3 nodes with 30TB of raw capacity, 15TB is lost to RAID10, 10TB is lost to running a replicated database such as cockroach DB, leaving you with 5TB effective capacity which is a 1/6 dilution of your initial capacity.
If you ran cockroach DB on a replicated network volume, with a replication factor of three, it gets worse. If you bought 30 TB of disks, you'd lose 20 TB to volume replication, ~6.67TB to CDB replication leaving you with 3.3TB of effective capacity or a 1/9 dilution. If those disks were configured with RAID your effective capacity would drop to a 1/18 dilution.
You could achieve a 1/3 dilution which is the effective minimum for a replicated database if you didn't configure RAID, but you increase the impact of disk failure, in that it would take much much longer to recover a cluster.
I understood that a team at google developed k8s but google doesn't actually run it for their "google-scale" workloads. Am I misinformed?
> [kubernetes is] a simplified clone of Google’s internal borg system
https://medium.com/@steve.yegge/honestly-i-cant-stand-k8s-48...
The "original" Service Fabric is a high-level framework which requires invasive source code changes (you can't just drop an existing app on top of it), but gives you lots of benefits (scale, reliability etc) if you make the effort.
Recently container-based platforms - Docker, Kubernetes, etc - have come along with a different tradeoff: better compatibility with existing applications in exchange for less magical benefits. That approach is getting much more traction, and I think internally at Microsoft there is some infighting between the "Service Fabric camp" and the "Containers camp". One consequence of the infighting is that Service Fabric is extending its scope to include features like "container support". It's not clear to what extent that is done in collaboration with the "container people", or as a way to bypass them. I think they are still trying to decide whether to embrace Kubernetes, or replicate the functionality in-house. My prediction is that the container-based approach will win, but if will take time for the politics to fully play out. In the meantime things will continue to be confusing.
Bottom line: when evaluating Service Fabric, watch out for confusing and inconsistent use of the brand. It's a common pattern with large vendors - for example IBM with "Bluemix", SAP with "Hana", etc.
As zapita said, Service Fabric now handles containers but I think it is just because containers became trendy and FOMO kicked in.
Where Service Fabric is decades ahead of the container orchestration solutions is as a framework to build truly stateful services, meaning the state is entirely managed by your code through SF, not externalized in a remote disk, Redis, some DB, etc...
It offers high level primitives like reliable collections [0], as well as very low level primitives like a replicated log to implement custom replication between replicas [1]. I feel that publicly this is not advertised enough and it is unfortunate because it is a key differentiator for Service Fabric that the competitors won't have for a while, if ever because it is a completely opposite approach: containers are all about isolation, being self-contained and plateform independent while SF stateful services are deeply integrated with Service Fabric.
[0] https://docs.microsoft.com/en-us/azure/service-fabric/servic...
[1] https://docs.microsoft.com/en-us/dotnet/api/system.fabric.fa...