Here's my bpftrace SYN backlog tool from BPF Performance Tools (2019 book, tools are online[1]):
# tcpsynbl.bt
Attaching 4 probes...
Tracing SYN backlog size. Ctrl-C to end.
^C
@backlog[backlog limit]: histogram of backlog size
@backlog[128]:
[0] 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
@backlog[500]:
[0] 2783 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1] 9 | |
[2, 4) 4 | |
[4, 8) 1 | |
The source: #!/usr/local/bin/bpftrace
#include <net/sock.h>
BEGIN
{
printf("Tracing SYN backlog size. Ctrl-C to end.\n");
}
kprobe:tcp_v4_syn_recv_sock,
kprobe:tcp_v6_syn_recv_sock
{
$sock = (struct sock *)arg0;
@backlog[$sock->sk_max_ack_backlog & 0xffffffff] =
hist($sock->sk_ack_backlog);
if ($sock->sk_ack_backlog > $sock->sk_max_ack_backlog) {
time("%H:%M:%S dropping a SYN.\n");
}
}
END
{
printf("\n@backlog[backlog limit]: histogram of backlog size\n");
}
This bpftrace tool is only 24 lines. The BCC tools in this post are >200 lines (and complex: needing to worry about bpf_probe_read() etc). The bpftrace version can also be easily modified to include extra details. I'm summarizing backlog length as a histogram since our prod hosts can accept thousands of connections per second.[0] https://github.com/iovisor/bpftrace [1] https://github.com/brendangregg/bpf-perf-tools-book
That said, there might well be a case for automatic backlog scaling. Or, for that matter, for increasing the default.
1: https://github.com/dspinellis/unix-history-repo/blob/0f4556f...
Are the default setting here reasonable for most cases, or is it more like something that you should tune even if you're not really pushing any limits?
/etc/sysctl.conf:
net.core.wmem_max = 12582912
net.core.rmem_max = 12582912
net.ipv4.tcp_rmem = 10240 87380 12582912
net.ipv4.tcp_wmem = 10240 87380 12582912
fs.file-max = 1000000
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_fin_timeout = 3
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_max_orphans = 262144
net.core.somaxconn = 1000000
nginx.conf (just the relevant directives): worker_rlimit_nofile 102400;
events {
worker_connections 102400;
multi_accept on;
}
http {
server {
listen 80 default_server reuseport backlog=102400;
...
}
}
As you can see, the socket and backlog-related values have been cranked way up. I've never had any problems with this configuration. Because these servers are behind and ALB I don't know how relevant they are since the SYN and SYN-ACK relation to RTT is between the server and the load balancer, not the remote clients. But I could be wrong. Maybe there's something I'm missing. But I've never had a problem, and I've never had any performance problems related to TCP connections in the kernel or NGINX.Of course, YMMV. High latency networks reduce those numbers.
Anyway, I don't see why the numbers aren't 100 times larger by default, but there's probably a reason.