After a while, the more mature Linux engineers start going the other way. Ripping out as much as possible. Stripping down to the leanest build they can, for performance but also to reduce attack surface and overall complexity.
Very similar dynamic with k8s. Early days are often about scooping up every CNCF project like you're on a shopping spree. Eventually people get to shipping slim clusters running and 30mb containers with alpine or nix. Using it essentially as open source clustering for Linux.
If you have direct access to Etcd (which may not be possible in a managed cloud version of Kubernetes?), putting a watch on / might scale better.
(As an aside, with the Go client API you have to jump through some hoops to even deserialize objects whose kinds' schemas are not already registered. You have to use the special "unstructured" deserializer. The Go SDK often has to deal with unknown types, e.g. for diffing, and all of the serializer/codec/conversion layers in the SDK seem incredibly overengineered for something that could have just assumed a simple nested map structure and then layered validation and parsing on top; the smell of Java programmers is pretty strong.)
The Java client does this with blocking, resulting in a large number of threads.
I truly like Kubernetes, and I think most detractors' complaints around complexity simply don't want to learn it. But the K8s API, especially the Watch API, needs some rigorous standards.
Is this a question of Kubernetes just sticking everything into "standard" datastructures instead of using a database?
- No concept of apiserver rate limiting, by design. I see there is now an APF thingy, but still no basic API / edge rate limiting.
- etcd has bad scalability. It's a very basic, highly consistent kv store that has tiny limits (8GB limit in latest docs, with a default of 2GB). It had large performance issues throughout its life when I was using k8s, I still don't know if it's much better.
This, obviously, isn't a scalable approach, but there's no "wrapper" you could write in order to mitigate the problem. The API itself is the problem.
Getting this to perform well required several optimizations at both the Go and Postgres levels. On the Go side, we use prioritized work queues, event de-duplication, and even switched to Rust for efficient JSON diffs. For Postgres, we leverage materialized views and trigger-based optimistic locking
In fact, recently it is preparing to integrate Cyphernetes as a new search method. I believe this will be a new start!
I’d disagree and say that Kubernetes is much more relational that graph based, and SQL is pretty good for querying graphs anyway, especially with some custom extensions.
This does make more sense though.
For example, I'd love to be able to just do this as the whole query:
metadata.name =~ "foo%"
or maybe: .. =~ "foo%" // Any field matches
or maybe: $pod and metadata.name =~ "foo%" // Shorthand to filter by type
I think a query language for querying Kubernetes ought to start with predicate-based filtering as the foundation. Having graph operators seems like a nice addition, but maybe not the first thing people generally need?It's not quite clear who this tool is for, so maybe this is not the intended purpose?
re. intended purpose: Initially I started writing this to help tackle bigger problems - stuff you'd normally use multiple nested kubectl commands or write a lot of code for interacting with api-server.
Over time, I developed the shell environment around it and it became a daily driver for me as well. Indeed, there's a threshold where writing Cyphernetes becomes more economical than using kubectl but for doing most of the simple day to day stuff writing Cypher is too verbose.
The Cyphernetes shell has an early-stage feature that allows a syntax like you suggested - there's a tiny "macros" feature that lets you define custom procedures of one or more queries (currently shell only, not supported in the web client yet).
Macros are prefixed by ":" and you could define something like:
:pod condition
MATCH (pods:Pod) WHERE $condition
RETURN p.metadata.name, p.status.phase; // and whatever other fields you'd like
Then use it like this: > :pod .metadata.name=~"foo%"
So it gives you a tiny way to customize how you do this day-to-day stuff. Ships out-of-the-box with common stuff you do with kubectl like :getpo, :getdeploy, :createdeploy, :expose and so on - definitely a feature that could be developed further to make this more of a daily driver.
brew install cyphernetes
at the top of the page is an immediate turn-off.Personally use a Mac with Nix, and so do many of my coworkers. Assuming Homebrew, even for a Mac user, leaves a bad impression on me.
go run github.com/avitaltamir/cyphernetes/cmd/cyphernetes@v0.14.0 --helpTargetting a tool at macOS users, and omitting linux instructions, gives the impression that the tool isn't targeted at sysadmins or hackers (i.e. at us), but rather at beginners, frontend developers, etc.
They have a Kubernetes plugin at https://hub.steampipe.io/plugins/turbot/kubernetes and there are a couple of things I really like:
* it's super easy to request multiple Kubernetes clusters transparently: define one Steampipe "connection" for each of your clusters + define an "aggregator" connection that aggregates all of them, then query the "aggregator" connection. You will get a "context" column that indicates which Kubernetes cluster the row came from. * it's relatively fast in my experience, even for large result sets. It's also possible to configure a caching mechanism inside Steampipe to speed up your queries * it also understands custom resource definitions, although you need to help Steampipe a bit (explained here: https://hub.steampipe.io/plugins/turbot/kubernetes/tables/ku...)
Last but not least: you can of course join multiple "plugins" together. I used it a couple of times to join content exposed only in GCP with content from Kubernetes, that was quite useful.
The things I don't like so much but can be lived with:
* Several columns are just exposed a plain JSON fields ; you need to get familiar with PostgreSQL JSON operators to get something useful. There's a page in Steampipe's doc to explain how to use them better. * Be familiar also with PostgreSQL's common table expressions: there are not so difficult to use but makes the SQL code much easier to read * It's SQL, so you have to know which columns you want to pick before selecting the table they come from ; not ideal from autocompletion * the Steampipe "psql" client is good, but sometimes a bit counter intuitive ; I don't have specific examples but I have the feeling it behaves slightly differently than other CLI client I used.
All in all: I think Steampipe is a cool tool to know about, for Kubernetes but also other API systems.
I agree with your comment about JSON columns being more difficult to work with at times. On balance, we've found that approach more robust than creating new columns (names and formats) that effectively become Steampipe specific.
Our built-in SQL client is convenient, but it can definitely be better to run Steampipe in service mode and use any Postgres compatible SQL client you prefer [1].
You might also enjoy our open source mods for compliance scanning [2] and visualizing clusters [3]. They are Powerpipe [4] dashboards as code written in HCL + SQL that query Steampipe.
1 - https://steampipe.io/docs/query/third-party 2 - https://hub.powerpipe.io/mods/turbot/kubernetes_compliance 3 - https://hub.powerpipe.io/mods/turbot/kubernetes_insights 4 - https://github.com/turbot/powerpipe
Disclaimer: I am the creator of Karpor.
The example on the homepage is literally "give me deployments with more than 2 replicas with pods that are not Running, and give me the IP address of the service they're serving"...
Any idea how to do that with kubectl | jq? Their solution seems elegant to me.
adjacent but lots of experts here - independent of Cyphernetes or specific tooling, what are you doing to secure k8s api / kubectl / k8s control plane?
$ kubectl logs -n foo $(kubectl get pod -n foo | awk '/Running/{print $1}')
because one of their selling points is "no nested kubectl queries".I don't see how their queries can be more efficient than hitting the kube-apiserver multiple times, unless they have something that lives clusterside observing lifecycle events for all CRDs and answering queries with only one round-trip instead of multiple.
Or maybe they're selling "no nested kubectl queries" as an experience feature, saying that a query language is more ergonomic than bash command redirection. My brain has been warped into the shape of the shell, for better or for worse, so it's not a selling point for me.