Since some number of persistent connections will get force terminated on scale down or node replacement events...
Cilium and eBPF looks like a pretty good solution to this though since you can then advertise your pods directly on the network and load balance those instead of every node.
There can be a difference, if your LoadBalancer-type service integration is well implemented. The externalTrafficPolicy knob determines whether all nodes should attract traffic from outside or only nodes that contain pods backing this service. For example, metallb (which attracts traffic by /32 BGP announcements to given external peers) will do this correctly.
Within the cluster itself, only nodes which have pods backing a given service will be part of the iptables/ipvs/... Pod->Service->Pod mesh, so you won't end up with scenic routes anyway. Same for Pod->Pod networking, as these addresses are already clustered by host node.
https://kubernetes.io/docs/concepts/services-networking/serv...
May I ask what one might use in an AWS cloud environment to provide that load balancer within a Region?
Does IPv6 address any of these issues? It seems to me that IPv6 is capable of providing every component in the system its own globally routable address, identity (mTLS perhaps) and transparent encryption with no extra sidecars, eBPF pieces, etc.
Each "Service" object provides (by default, can be disabled) load-balanced IP address that by default uses kube-proxy as you described, a DNS A record pointing to said address, DNS SRV records pointing to actual direct connections (whether NodePorts or PodIP/port combinations) plus API access to get the same data out.
There are even replacement kube-proxy implementations that route everything through F5 load balancer boxes, but they are less known.
If you didn’t have k8s and just used an autoscaling group of VMs you would have the same issue…
Per https://blog.dave.tf/post/new-kubernetes/ , the way that this was solved in Borg was:
> "Borg solves that complexity by fiat, decreeing that Thou Shalt Use Our Client Libraries For Everything, so there’s an obvious point at which to plug in arbitrarily fancy service discovery and load-balancing. "
Which seems like a better solution, if requiring some reengineering of apps.
There are two projects that enable writing eBPF with Rust [1][2]. I'm sure there is an equivalent with nicer wrappers for C++.
But once you have more than one language or framework, you need to write more and more of these libraries. And what happens if it's not just HTTP? What if you need to speak Redis, MySQL, or some random binary protocol? Do you write client libraries for those too? Maybe a company like Google has the scale to do this, but most orgs do not. But even then, what if you have to run some vendor-supplied code that you don't even have source for?
I agree with you that shoving more of this into the kernel isn't desirable, but libraries aren't great. Been there, done that, don't feel like doing it again. I'd rather stick with sidecars.
The second aspect is that this can get extremely expensive if your applications are written in a wide number of language frameworks. That's obviously different at Google where the number of languages can be restricted and standardized.
But even then, you could also link a TCP library into your app. Why don't you?
There are a set of sanctioned languages and that is about it.
Also, as an aside, I think WebAssembly has the potential to shift this back. In a world where libraries and programs are compiled to WebAssembly, it doesn't matter what their source language was, and as such, the client library based approach might swing back into vogue.
This probably dictates a lot of Google's famous "not invented here" behavior, but most organizations can't afford to just write their entire toolchain from scratch and need to use applications developed by third parties.
But that doesn't work when you're trying to sell enterprises the idea of 'just move your workloads to Kubernetes!'. :)
I like that approach. If you use client libraries, new RPC mechanisms are "free" to implement (until you need to troubleshoot upgrades). It's also an argument against statically linking.
For instance, if running services on the same machine, io-uring can probably be used? (I'm a noob at this). eBPF for packet switching/forwarding between different hosts, etc.
Also no need to be .so/ (or .dll/.dylib) - just some quick IPC to send messages around. Actually can be better. For one, if their app is still buffering messages, my app can exit, while theirs still run. Or security reasons (or not having to think about these), etc. etc. So still statically linked but processes talking to each other. (Granted does not always work for some special apps, like audio/video plugins, but I think works fine for the case above).
The sockets layer is becoming a facade which can guarantee additional things to applications which are compiled against it, and you've got dependency injection here so that the application layer can be written agnostically and not care about any of those concerns at all.
If some developers want to use some new language, they have to first in put in the effort by a) demonstrating the business case of using a new language and allocating resources to integrate it into the ecosystem b) porting all the shared codebase to that new language.
This is basically the same set of languages people were writing 20 years ago and will probably be the same set of languages people will write in 20 years from now.
For that reason, I really do think that this is a temporary hack while client libraries are brought up to speed in popular languages. It is really easy to sell stuff with "just add another component to your house of cards to get feature X", but eventually it's all too much and you'll have to just edit your code.
I personally don't use service meshes. I have played with Istio but the code is legitimately awful, so the anecdotes of "I've never seen it work" make perfect sense to me. I have, in fact, never seen it work. (Read the xDS spec, then read Istio's implementation. Errors? Just throw them away! That's the core goal of the project, it seems. I wrote my own xDS implementation that ... handles errors and NACKs correctly. Wow, such an engineering marvel and so difficult...)
I do stick Envoy in front of things when it seems appropriate. For example, I'll put Envoy in front of a split frontend/backend application to provide one endpoint that serves both the frontend or backend. That way production is identical to your local development environment, avoiding surprises at the worst possible time. I also put it in front of applications that I don't feel like editing and rebuilding to get metrics and traces.
The one feature that I've been missing from service meshes, Kubernetes networking plugins, etc. is the ability to make all traffic leave the cluster through a single set of services, who can see the cleartext of TLS transactions. (I looked at Istio specifically, because it does have EgressGateways, but it's implemented at the TCP level and not the HTTP level. So you don't see outgoing URLs, just outgoing IP addresses. And if someone is exfiltrating data, you can't log that.) My biggest concern with running things in production is not so much internal security, though that is a big concern, but rather "is my cluster abusing someone else". That's the sort of thing that gets your cloud account shut down without appeal, and I feel like I don't have good tooling to stop that right now.
Why not? AFAIK traces are sent from the instrumented app to some tracing backend, and a trace-id is carried over via an HTTP header from the entry point of the request until the last service that takes part in that request. Why a sidecar/mesh would break this?
Kinda semi-offtopic but I am curious to know if anyone has used identity part of a WireGuard setup for this purpose.
So say you have a bunch of machines all connected in a WireGuard VPN. And then instead of your application knowing host names or IP addresses as the primary identifier of other nodes, your application refers to other nodes by their WireGuard public key?
I use WireGuard but haven’t tried anything like that. Don’t know if it would be possible or sensible. Just thinking and wondering.
So yeah, it's a model that can work. It's straightforward for us because we have a lot of granular control over what can get addressed where. It might be trickier if your network model is chaotic.
I long for the day where Kubernetes services, virtual machines, dedicated servers and developer machines can all securely talk to eachother in some kind of service mesh, where security and firewalls can be implemented with "tags".
Tailscale seems to be pretty much this, but while it seems great for the dev/user facing side of things (developer machine connectivity), it doesn't seem like it's suited for the service to service communication side? It would be nice to have one unified connectivity solution with identity based security rather than e.g Consul Connect for services, Tailscale / Wireguard for dev machine connectivity, etc.
That's exactly what Scalable Group Tags (SGTs) are -
https://tools.ietf.org/id/draft-smith-kandula-sxp-07.html
Cisco implements this as a part of TrustSec
In addition it can also be used to enforce based on service specific keys/certificates as well.
There's earlier projects like CJDNS which provide pubkey-addressed networking, but they're limited in usability as they route based on a DHT.
Disclosure: founder of company which sells SaaS on top of Ziti FOSS.
* https://ziti.dev/blog/bootstrapping-trust-part-5-bootstrappi...
Are they implementing a http2 capable proxy in native kernel C code and making APIs to that accessible via bpf?
We are doing both. Aspect 2) is currently done for HTTP visibility and we will be working on connection splicing and HTTP header mutation going forward.
Sidecars are often useful for platform-centric teams that would like to have access to help manage something like secrets, mTLS, or traffic shaping in the case of Envoy. The team that's responsible for that just needs to maintain a single sidecar rather than all of the potential SDK's for teams.
Especially if you have specific sidecars that only work on a specific infrastructure, for example if you have a Vault sidecar that deals with secrets for your service over EKS IAM permissions, you suddenly can't start your service without a decent amount of mocking and feature flags. Its nice to not have to burden your client code with all of that.
Also, there is a decent amount of work being done on gRPC to speak XDS which also removes the need for the sidecar [0].
It depends on your environment and architecture combined with how fast you can move especially with third party components. Having the microservice be 'dumb' can save everything.
Could you please elaborate on this? I don't fully understand what you mean. Especially, I don't understand if "Its nice to not have to burden your client code with all of that" applies to a setup with or without sidecars.
Being able to pick up something generic rather than something language-specific.
Not having to do process supervision (which includes handling monitoring and logs) within your application.
Not making the application lifecycle subservient to needs such as log shipping and request rerouting. People get sig traps wrong suprisingly often.
Which is not bad. But that area is also often misconfigured for supervision. And trapping signals remains mostly broken in all sidecars.
Yea, if you're getting TLS termination at the load balancer prior to k8s ingress then it's pretty nice.
This is not too different from wpa_supplicant used by several operating for key management for wireless networks. The complicated key negotiation and authentication can remain in user space, the encryption of the negotiated key can be done in the kernel (kTLS) or, when eBPF can control both sides, it can even be done without using TLS but encrypting using a network level encapsulation format to it works for non-TCP as well.
Hint: We are hiring.
The work is still in the early phases, so the exact form this will take has yet to be hammered out, but there's broad agreement that this functionality will be first-class in k8s in the future. If you want to keep running proxies for the other feature they provide, great - They'll be able to use the certificates provided by k8s for identity. If you'd like to know more, come to on of the SIGAUTH Meetings :)
The current security model in Istio delivers a pod specific SPIFFE cert to only that pod and pod identity is conveyed via that certificate.
That feels like a whole bunch of eggs in 1 basket.
https://github.com/grpc/proposal/blob/master/A27-xds-global-...