- Ideally, you avoid it in your application design.
- If you need it, you set up SIGTERM handling in the application to wait for all connections to close before the process exits. You also set up "connection draining" at the load balancer to keep existing sessions to terminating Pods open but send new sessions to the new Pods. The tradeoff is that rollouts take much longer- if the session time is unbounded, you may need to enforce a deadline to break connections eventua.
To do it properly you want a maglev-style layer that allows for withdrawals/drains of application servers with minimal disruption thanks to a minimum disruption maglev-style hash and draining support. This will allow you to first drain the given application server (continue maintaining existing connections, but send new ones to a secondaries for that part shard) before fully taking down the instance.
[1] metallb only really announces IPs so that "behind" is probably just CNI that actually handles traffic [2] https://cilium.io/blog/2020/11/10/cilium-19#maglev