I think the biggest hurdle to implement a solution like this is your application MUST be a 12 factor app which sadly is not the case from what I've seen. Many devs hardcode values when shipping their code and containers which make it really difficult to add nodes to a DAG. Had this change, SDLC testing would be waaay easier.
In the same space, labeling of all services is a thing. As you are annotating it can be tracked but if you had different gateways or destination routes or other things that need to be tweaked at the mesh level, it could be daunting to track it. In my case, I ensure the whole mesh representation of a version could be tracked by a label so if you had a app=vote1,version=release-4,feature=shiny you could track the shiny components across the whole cluster.
Another hurdle is you are tied to a service mesh implementation. Istio is ok but it can be a beast. It also constraints your org to implement other networking alternatives which is something we wanted to explore.
I do like the project uses Nix =).
As for value prop, maybe emphasize multi tenancy a bit more as this has the most cost saving potential at scale.
However my primary challenge is chronic cluster under utilization after having rewriten the bulk of the system in Rust. Therefore virtualizating the cluster makes the most sense. I think Google Cloud conducted a study that found that the bulk, like 75%, of their K8s customer over provision by a wide margin. So there is definitely a market for cluster virtualization and even more so for multi tenancy.
Luckily the docs site is just generated from markdown files in the repo, so feel free to browse around the various `page.mdx` files here if you prefer that: https://github.com/kurtosis-tech/kardinal/tree/main/website/...
Regarding how Kardinal compares with vcluster, both aim to provide multitenancy within a single Kubernetes cluster, but they approach it differently.
vcluster is designed to offer a full Kubernetes experience to each tenant, including their own API server, control plane, and other Kubernetes-level components. This makes vcluster an excellent choice if you need strong isolation or want to test Kubernetes-specific features like custom CRDs or operators.
Kardinal, on the other hand, focuses on application-level multitenancy rather than providing the full Kubernetes stack to each tenant. This makes Kardinal lighter-weight and more efficient if you're primarily concerned with deploying and testing application-level changes.
That said, the timing of your question is perfect since we are about to push a Tilt integration that allows you to create an isolation flow and use Tilt to hot-reload and update that flow in-place! :)
It's also such an interesting moment for you folks to show up on HN, I just shipped the first preview version of my Kubernetes operator(https://github.com/pier-oliviert/sequencer) that manages ephemeral environments. I can see some things that are similar with both our options as well as some things that are quite different.
Maybe if I had one question is: What made you go for Istio as the main network mesh?
Good luck with the launch!
Nothing too specific on choosing Istio, but it seemed like a popular and battle-tested option to start the implementation with. Kardinal is a fairly young project (about 2 months old), and we expect it to adopt more "service mesh backends" in the future.
sidenote: I noticed Istio (envoy actually) has some weird non-deterministic behavior when you hit pod resource limits (504 bad gateway, 0DC)
[1] https://github.com/kurtosis-tech/neondb-plugin [2] https://neon.tech/
namespace-based deploys (or telepresence too) is equally "lightweight" to Kardinal. it's just that (at least in our phrasing), those don't quite constitute a separate "environment" as state is shared between any developers working at the same time
Kardinal matches those approaches in terms of light-weightedness, but offers state isolation guarantees too (like isolation for your dbs, queues, caches, managed services, etc). so in comparison to "ephemeral environment" approaches that give state isolation, we do believe we're doing this in the most lightweight way possible by implementing that isolation at the layer of the network request rather than by duplicating deployed resources