The article is talking about the scaling issues of SSH access via jump boxes or bastions. I'd argue that a better solution to scaling SSH access is to invest in tooling that makes SSH access unnecessary for most of your team. Centralized logs, error reporting, distributed traces, etc. are fairly well solved for most installations. Interactive access to prod (e.g. running DB migrations) requires a little more investment, but tools like ECS Exec make that fairly accessible without requiring SSH access.
The scenarios you need access to the downstream resources is mainly for doing ops on them. Other things such as viewing logs can is predictable and can be exposed via some tool to developers.
The idea of expiring ssh keys seems completely reasonable, but it's also a good idea to have enough centralized access control to be able to actually give someone temporary access allocations. Key validity is evaluated locally, against system time.
It's a pretty big architectural philosophy shift. All our production workload runs on spot instances, unhealthy hosts come and go quickly enough and our traffic rebalances fast enough that we just don't spend that much time debugging low-level issues. Core dumps get shipped to S3. We do continuous profiling with Google Cloud Profiler. There's a lot of tooling required but once that investment is made things run very well.
The overall point should be, figure out what your needed access control permissions are and then find and use tools to meet those needs. SSH might be the tool, it might not be.
Shifting from tail -f <logs> to any web UI doesn't remove the access control for logging, it just shifts it into the fancy web UI. That may or may not be better, it's just different. One could certainly argue the fancy web UI is a better UX, but that's a totally different reason for selecting that tool or not. Access Control happens either way.
Also there is a lot of low contrast text on that website - this shows that they really know what they are doing and you can trust them. I am sure it is a very good company and you should give all the login credentials of your org to them and also install their binary on all servers.
Isn´t "trust in company" the reason why you are using Linux servers?
I also have set this up to restrict SSH access to particular hosts based on the LDAP record.
The problem they are describing isn't a problem with Jump servers... it's a problem with distributed authorization.
You are right about the problem statement. It is distributed authorization problem. And it is a very hard problem to solve or visualize for a fast moving company unless it is a problem.
Where can I see this data? I would like to check your statements.
We are on the internet, so would you please like to add the source of knowledge to your statements - it is a very basic and good feature called "URL", please use it!
Or do you just want to say "I have not seen many orgs with SSH + LDAP in my career because I have never worked in one and from that I conclude that the whole world works like this"?
Also easy to port-forward instances to your local machine.
I’m always surprised that other clouds don’t have this.
> Announcing a new phase of Runops. Launching hoop.dev
with a link to this https://hoop.dev/blog/launching-hoop/
It is far too easy to misconfigure network policies and grant them access to infrastructure that they shouldn't.
And with Tailscale you can run the agent within SaaS products like Github Actions or Terraform Cloud to securely manage their access into your systems.
Is wearing blue denim jacket and jeans an anti-pattern? Is vegemite an anti-pattern? Can't we just say "obsolete", or "bad idea"?
Yup. Use VPNs and firewall ACLs. Jump servers are leftovers of bad practices where good practices were too hard to implement.
> Burden of managing SSH keys of users throughout all nodes. Rotation is required when someone leaves or enter the organization;
That is extremely trivial if you have (you should) any sensible configuration management in place. We just store them in LDAP with user data and distribute where neede (gitlab, servers)
> Role management requires managing sudoers files, making sure file system permissions are properly configured and users are within their proper groups;
Ah yes, managing a text file, so fucking hard /s
> Nodes must be updated with the tooling necessary to interact with internal services.
>Keep a list of updated services (DNS) available to interact with it
see the point about CM
> Usually, infrastructure enginners are a scarce team and keeping all these components updated are hard to tackle. Over time, these nodes will onboard more users and tooling, which will increase the complexity over managing these resources.
Which is why you write it once and use automation. I don't think we touched our sudoers or ssh key management module in years, it was written once then had some small changes but that's about it
We use Basti (https://github.com/BohdanPetryshyn/basti/issues) to set up and manage the jump host. The tool automatically starts/stops the instance, which is excellent for irregular access.
That said, thanks, Basti looks very useful!
Not normally, right? Isn't that the whole point of public-key cryptography?
See e.g. https://www.theregister.com/2016/01/14/openssh_is_wide_open_... for when a vulnerability in the openssh client made this possible back in 2016.