Jump Servers (opens in new tab)

(hoop.dev)

100 pointsandriosr3y ago60 comments

60 comments

I've spent the last 7 years using SSH to get onto a prod box a few dozen times. Less than once a month. When I do have to get on the box, we use signed keys that expire in 24 hours (a manager or SRE is required to sign the key if you need to get on a box).

The article is talking about the scaling issues of SSH access via jump boxes or bastions. I'd argue that a better solution to scaling SSH access is to invest in tooling that makes SSH access unnecessary for most of your team. Centralized logs, error reporting, distributed traces, etc. are fairly well solved for most installations. Interactive access to prod (e.g. running DB migrations) requires a little more investment, but tools like ECS Exec make that fairly accessible without requiring SSH access.

debarshri3y ago

I agree with you. But you are viewing it from a web application centric organisation. There are lot of other types of organisations out there, for eg. a company that uses vendor products and now has to give access to servers, databases etc. to the vendor.

The scenarios you need access to the downstream resources is mainly for doing ops on them. Other things such as viewing logs can is predictable and can be exposed via some tool to developers.

xorcist3y ago

None of takes away the need for tcpdump, strace, core dumps, etc. Surprisingly much troubleshooting needs to be done in production. I have yet to see views of the contrary survive contact with the real world.

The idea of expiring ssh keys seems completely reasonable, but it's also a good idea to have enough centralized access control to be able to actually give someone temporary access allocations. Key validity is evaluated locally, against system time.

phamilton3y ago

tcpdump and co do happen occasionally, but less than I had expected. This isn't a small installation. We support around 30M active users and routinely push 20k qps in web traffic. We run on a few thousand vCPUs in EC2.

It's a pretty big architectural philosophy shift. All our production workload runs on spot instances, unhealthy hosts come and go quickly enough and our traffic rebalances fast enough that we just don't spend that much time debugging low-level issues. Core dumps get shipped to S3. We do continuous profiling with Google Cloud Profiler. There's a lot of tooling required but once that investment is made things run very well.

vasco3y ago

DB migrations should definitely not be done interactively and definitely not in an SSH session. Write them beforehand, have them reviewed with the rest of the code, and have your deployment process run them.

PedroBatista3y ago

Agree, but in the real world that works 100% most of the time.

1 more reply

phamilton3y ago

Just to clarify our process: DB migrations are part of code review, they run as a script, but we don't have that script run automatically as part of CD. We've been bit by enough migration surprises that we require someone watching and able to interrupt and cancel the migration if needed. But that's the extent of action required. Run this command, Ctrl-C if necessary. Definitely not YOLO'ing in a psql in prod.

1 more reply

zie3y ago

It's still all access control, I don't really see a difference. Yes SSH access could perhaps give you access to more things, with less fine-grained permissions, but it just depends on the replacement(s).

The overall point should be, figure out what your needed access control permissions are and then find and use tools to meet those needs. SSH might be the tool, it might not be.

Shifting from tail -f <logs> to any web UI doesn't remove the access control for logging, it just shifts it into the fancy web UI. That may or may not be better, it's just different. One could certainly argue the fancy web UI is a better UX, but that's a totally different reason for selecting that tool or not. Access Control happens either way.

yjftsjthsd-h3y ago

Er, so most of those supposed problems sound like you could fix them by just using unprivileged users jumping through the jump server via `ssh -J`/ProxyJump? Or a VPN, which is pretty close to what it sounds like the product they're trying to sell is anyways. And if you're going to try and sell a Teleport competitor, you really should do a better job convincing me that it's going to actually be secure in the first place. I don't see source code or audits anywhere on this website.

POPOSYS3y ago

But it is written on the website: "without security compromise" - so it must be secure!!!

Also there is a lot of low contrast text on that website - this shows that they really know what they are doing and you can trust them. I am sure it is a very good company and you should give all the login credentials of your org to them and also install their binary on all servers.

Isn´t "trust in company" the reason why you are using Linux servers?

mbreese3y ago

Even when using a jump server, I've solved the issue of revoking access in the past by using LDAP to control access to servers. Instead of adding user accounts directly to server, the account information is stored in LDAP -- including public SSH keys. If you want to revoke a user's access across the entire infrastructure, then you can do so in one fell swoop.

I also have set this up to restrict SSH access to particular hosts based on the LDAP record.

The problem they are describing isn't a problem with Jump servers... it's a problem with distributed authorization.

debarshri3y ago

I haven't seen many organisations actually set SSH PAM via LDAP or another delegated authentication system.

You are right about the problem statement. It is distributed authorization problem. And it is a very hard problem to solve or visualize for a fast moving company unless it is a problem.

POPOSYS3y ago

Do you have some data about how many orgs actually set SSH PAM via LDAP?

Where can I see this data? I would like to check your statements.

We are on the internet, so would you please like to add the source of knowledge to your statements - it is a very basic and good feature called "URL", please use it!

Or do you just want to say "I have not seen many orgs with SSH + LDAP in my career because I have never worked in one and from that I conclude that the whole world works like this"?

2 more replies

trey-jones3y ago

I set up LDAP with sshPublicKey extension in 2021 together with some scripts to configure new servers to use it and a small command line tool to add and remove GIDs per user (across any infrastructure). It's working great. The biggest effort was learning to manage LDAP, which is a bit anachronistic, but the payoff is worth it I think. Even with <20 users, managing public keys across all of our infrastructure was not sustainable.

ilyt3y ago

The problem the article is describing is frankly problem with lack of good configuration management. Hell, you can use SSH keys to authorize via sudo and use hardware token to store those keys, sidestepping problem of "user left their id_rsa somewhere"

SomaticPirate3y ago

Google Cloud’s IAP tunneling has basically made this a non-issue. All access is logged and its extremely easy to get around without any public IPs.

Also easy to port-forward instances to your local machine.

I’m always surprised that other clouds don’t have this.

drowsspa3y ago

Isn't it basically AWS SSM? you can even configure it in the .ssh/config

debarshri3y ago

I am a bit confused about this product. I had once seen a product called Runops [1], the customer list and product testimonials are exactly same as this product Hoop.dev [2]. [1] https://runops.io/

[2] https://hoop.dev/

renewiltord3y ago

At this moment, 6 minutes after your post, the top of runops.io reads:

> Announcing a new phase of Runops. Launching hoop.dev

with a link to this https://hoop.dev/blog/launching-hoop/

berkle44553y ago

AWS Session Manager (aws specific) or Tailscale (anything) has solved this problem for me

threeseed3y ago

Jump hosts seem like an anti-pattern in the era of AWS SSM and Tailscale.

It is far too easy to misconfigure network policies and grant them access to infrastructure that they shouldn't.

And with Tailscale you can run the agent within SaaS products like Github Actions or Terraform Cloud to securely manage their access into your systems.

romanhn3y ago

I believe you still need a bastion host to query a database, for instance, unless you want to set up SSM on existing hosts - my current project is fully serverless, so I had to set up an EC2 instance to serve as the bastion. The beauty of SSM is that the host can be fully on a private subnet, not exposed to the wider internet as commonly suggested.

2 more replies

theteapot3y ago

> anti-pattern

Is wearing blue denim jacket and jeans an anti-pattern? Is vegemite an anti-pattern? Can't we just say "obsolete", or "bad idea"?

3 more replies

nunez3y ago

Agreed. Tailscale is absolutely fantastic and unbelievably easy to use while being significantly more secure than a jump host.

ilyt3y ago

> Jump Servers must be able to reach to a certain private network and this requires specific configuration for each environment;

Yup. Use VPNs and firewall ACLs. Jump servers are leftovers of bad practices where good practices were too hard to implement.

> Burden of managing SSH keys of users throughout all nodes. Rotation is required when someone leaves or enter the organization;

That is extremely trivial if you have (you should) any sensible configuration management in place. We just store them in LDAP with user data and distribute where neede (gitlab, servers)

> Role management requires managing sudoers files, making sure file system permissions are properly configured and users are within their proper groups;

Ah yes, managing a text file, so fucking hard /s

> Nodes must be updated with the tooling necessary to interact with internal services.

>Keep a list of updated services (DNS) available to interact with it

see the point about CM

> Usually, infrastructure enginners are a scarce team and keeping all these components updated are hard to tackle. Over time, these nodes will onboard more users and tooling, which will increase the complexity over managing these resources.

Which is why you write it once and use automation. I don't think we touched our sudoers or ssh key management module in years, it was written once then had some small changes but that's about it

BohdanPetryshyn3y ago

AWS Session Manager is great. However, to connect to an RDS/Aurora/Elasticache instance, you must still create an intermediate EC2 instance to run SSM commands against.

We use Basti (https://github.com/BohdanPetryshyn/basti/issues) to set up and manage the jump host. The tool automatically starts/stops the instance, which is excellent for irregular access.

migf3y ago

This is what I was looking for. It seems weird to always have a jump box sitting up, waiting for someone to come mess with it. Part of the tooling should be to spin up / boot an ephemeral instance.

TheHappyOddish3y ago

It's probably also worth mentioning that you _build_ Basti as well as using it, otherwise you may come across as shilling.

That said, thanks, Basti looks very useful!

fpanzer3y ago

I'd rather patch a bunch of openssh servers than a bunch of proprietary software agents that essentially keep reverse ssh tunnels open for me. Or did I miss something?

vbernat3y ago

Use of SSH agent forwarding is dangerous as it allows an attacker to gain access to more key materials to access more servers. Using it casually in an article about SSH security is a bit worrying.

pritambaral3y ago

Not with the confirm option of ssh-add. I've had agent forwarding on for every host (trusted and untrusted) for a decade now, without worry, because my ssh agent confirms with me each use of any ssh key.

remram3y ago

Interesting. However in practice, I don't ssh-add my keys, they get loaded on first use by the ssh client. Is there a way to make ssh load keys into the agent with that option set?

1 more reply

_joel3y ago

Umm what's wrong with ssh -AJ ?

larschdk3y ago

You need to fully trust your target server, so you need to manage your known_hosts diligently and make sure you trust the host you connect to. If you just accept the host key without checking, you allow any host to use your SSH key for authentication. Any SSH server can accept your private key as authentication. Also, if the target host is infiltrated, it can use your private SSH key for authentication elsewhere without your knowledge.

tpmx3y ago

Also, if the target host is infiltrated, it can use your private SSH key for authentication elsewhere without your knowledge.

Not normally, right? Isn't that the whole point of public-key cryptography?

See e.g. https://www.theregister.com/2016/01/14/openssh_is_wide_open_... for when a vulnerability in the openssh client made this possible back in 2016.

2 more replies

_joel3y ago

If I'm getting host changed violations and I know the infra's not changed, I know there's somethng wrong already? I'm not getting what this provides.

jesprenj3y ago

Why even use -A? Isn't -J enough?

1 more reply

j / k navigate · click thread line to collapse

60 comments

phamilton3y ago

debarshri3y ago

The scenarios you need access to the downstream resources is mainly for doing ops on them. Other things such as viewing logs can is predictable and can be exposed via some tool to developers.

xorcist3y ago

phamilton3y ago

vasco3y ago

PedroBatista3y ago

Agree, but in the real world that works 100% most of the time.

1 more reply

phamilton3y ago

1 more reply

zie3y ago

The overall point should be, figure out what your needed access control permissions are and then find and use tools to meet those needs. SSH might be the tool, it might not be.

yjftsjthsd-h3y ago

POPOSYS3y ago

But it is written on the website: "without security compromise" - so it must be secure!!!

Isn´t "trust in company" the reason why you are using Linux servers?

mbreese3y ago

I also have set this up to restrict SSH access to particular hosts based on the LDAP record.

The problem they are describing isn't a problem with Jump servers... it's a problem with distributed authorization.

debarshri3y ago

I haven't seen many organisations actually set SSH PAM via LDAP or another delegated authentication system.

You are right about the problem statement. It is distributed authorization problem. And it is a very hard problem to solve or visualize for a fast moving company unless it is a problem.

POPOSYS3y ago

Do you have some data about how many orgs actually set SSH PAM via LDAP?

Where can I see this data? I would like to check your statements.

We are on the internet, so would you please like to add the source of knowledge to your statements - it is a very basic and good feature called "URL", please use it!

Or do you just want to say "I have not seen many orgs with SSH + LDAP in my career because I have never worked in one and from that I conclude that the whole world works like this"?

2 more replies

trey-jones3y ago

ilyt3y ago

SomaticPirate3y ago

Google Cloud’s IAP tunneling has basically made this a non-issue. All access is logged and its extremely easy to get around without any public IPs.

Also easy to port-forward instances to your local machine.

I’m always surprised that other clouds don’t have this.

drowsspa3y ago

Isn't it basically AWS SSM? you can even configure it in the .ssh/config

debarshri3y ago

I am a bit confused about this product. I had once seen a product called Runops [1], the customer list and product testimonials are exactly same as this product Hoop.dev [2]. [1] https://runops.io/

[2] https://hoop.dev/

renewiltord3y ago

At this moment, 6 minutes after your post, the top of runops.io reads:

> Announcing a new phase of Runops. Launching hoop.dev

with a link to this https://hoop.dev/blog/launching-hoop/

berkle44553y ago

AWS Session Manager (aws specific) or Tailscale (anything) has solved this problem for me

threeseed3y ago

Jump hosts seem like an anti-pattern in the era of AWS SSM and Tailscale.

It is far too easy to misconfigure network policies and grant them access to infrastructure that they shouldn't.

And with Tailscale you can run the agent within SaaS products like Github Actions or Terraform Cloud to securely manage their access into your systems.

romanhn3y ago

2 more replies

theteapot3y ago

> anti-pattern

Is wearing blue denim jacket and jeans an anti-pattern? Is vegemite an anti-pattern? Can't we just say "obsolete", or "bad idea"?

3 more replies

nunez3y ago

Agreed. Tailscale is absolutely fantastic and unbelievably easy to use while being significantly more secure than a jump host.

ilyt3y ago

> Jump Servers must be able to reach to a certain private network and this requires specific configuration for each environment;

Yup. Use VPNs and firewall ACLs. Jump servers are leftovers of bad practices where good practices were too hard to implement.

> Burden of managing SSH keys of users throughout all nodes. Rotation is required when someone leaves or enter the organization;

That is extremely trivial if you have (you should) any sensible configuration management in place. We just store them in LDAP with user data and distribute where neede (gitlab, servers)

> Role management requires managing sudoers files, making sure file system permissions are properly configured and users are within their proper groups;

Ah yes, managing a text file, so fucking hard /s

> Nodes must be updated with the tooling necessary to interact with internal services.

>Keep a list of updated services (DNS) available to interact with it

see the point about CM

Which is why you write it once and use automation. I don't think we touched our sudoers or ssh key management module in years, it was written once then had some small changes but that's about it

BohdanPetryshyn3y ago

AWS Session Manager is great. However, to connect to an RDS/Aurora/Elasticache instance, you must still create an intermediate EC2 instance to run SSM commands against.

We use Basti (https://github.com/BohdanPetryshyn/basti/issues) to set up and manage the jump host. The tool automatically starts/stops the instance, which is excellent for irregular access.

migf3y ago

This is what I was looking for. It seems weird to always have a jump box sitting up, waiting for someone to come mess with it. Part of the tooling should be to spin up / boot an ephemeral instance.

TheHappyOddish3y ago

It's probably also worth mentioning that you _build_ Basti as well as using it, otherwise you may come across as shilling.

That said, thanks, Basti looks very useful!

fpanzer3y ago

I'd rather patch a bunch of openssh servers than a bunch of proprietary software agents that essentially keep reverse ssh tunnels open for me. Or did I miss something?

vbernat3y ago

Use of SSH agent forwarding is dangerous as it allows an attacker to gain access to more key materials to access more servers. Using it casually in an article about SSH security is a bit worrying.

pritambaral3y ago

remram3y ago

Interesting. However in practice, I don't ssh-add my keys, they get loaded on first use by the ssh client. Is there a way to make ssh load keys into the agent with that option set?

1 more reply

_joel3y ago

Umm what's wrong with ssh -AJ ?

larschdk3y ago

tpmx3y ago

Also, if the target host is infiltrated, it can use your private SSH key for authentication elsewhere without your knowledge.

Not normally, right? Isn't that the whole point of public-key cryptography?

See e.g. https://www.theregister.com/2016/01/14/openssh_is_wide_open_... for when a vulnerability in the openssh client made this possible back in 2016.

2 more replies

_joel3y ago

If I'm getting host changed violations and I know the infra's not changed, I know there's somethng wrong already? I'm not getting what this provides.

jesprenj3y ago

Why even use -A? Isn't -J enough?

1 more reply

j / k navigate · click thread line to collapse