Delayed ACKs are a win only in certain circumstances - mostly character echo for Telnet. (When Berkeley installed delayed ACKs, they were doing a lot of Telnet from terminal concentrators in student terminal rooms to host VAX machines doing the work. For that particular situation, it made sense.) The delayed ACK timer is scaled to expected human response time. A delayed ACK is a bet that the other end will reply to what you just sent almost immediately. Except for some RPC protocols, this is unlikely. So the ACK delay mechanism loses the bet, over and over, delaying the ACK, waiting for a packet on which the ACK can be piggybacked, not getting it, and then sending the ACK, delayed. There's nothing in TCP to automatically turn this off. However, Linux (and I think Windows) now have a TCP_QUICKACK socket option. Turn that on unless you have a very unusual application.
Turning on TCP_NODELAY has similar effects, but can make throughput worse for small writes. If you write a loop which sends just a few bytes (worst case, one byte) to a socket with "write()", and the Nagle algorithm is disabled with TCP_NODELAY, each write becomes one IP packet. This increases traffic by a factor of 40, with IP and TCP headers for each payload. Tinygram prevention won't let you send a second packet if you have one in flight, unless you have enough data to fill the maximum sized packet. It accumulates bytes for one round trip time, then sends everything in the queue. That's almost always what you want. If you have TCP_NODELAY set, you need to be much more aware of buffering and flushing issues.
None of this matters for bulk one-way transfers, which is most HTTP today. (I've never looked at the impact of this on the SSL handshake, where it might matter.)
Short version: set TCP_QUICKACK. If you find a case where that makes things worse, let me know.
John Nagle
I wish you hadn't signed your comment, so we could have had the "I am John Nagle" moment when someone inevitably tried to pedantically correct you. :)
Ford Aerospace got out of networking, then computer science, then closed the Palo Alto facility. I was out long before then.
What am I doing now? Robotics, again. Most recent GitHub commit: [1]
Apparently Greg Minshall proposed tinygram prevention alternations 15 years ago to fix the problematic interaction: https://tools.ietf.org/html/draft-minshall-nagle-01
OSX seems to have implemented this in 2007 and be less/not sensitive to the issue e.g. http://neophob.com/2013/09/rpc-calls-and-mysterious-40ms-del... notes that there was no delay on OSX
> it took around 40ms until my application get’s the data. I tested the application on a regular Linux system (Ubuntu) with the same result, so it’s not a RPi limitation. On my OSX MacBook Air however the RPC call needed only 3ms!
In quickack mode, acks are sent immediately, rather than delayed if needed in accordance to normal TCP operation.
So "normal TCP operation" is to delay ACKs "if needed". Not sure if "needed" is the right word to use, but whatever.
Looks like RHEL has a system-wide fix: https://access.redhat.com/documentation/en-US/Red_Hat_Enterp...
OTOH bottom up thinkers take much longer to become productive in an environment with novel abstractions.
Swings and roundabouts. Top down is probably better in a startup context - it's more conducive to broad and shallow generalists. Bottom up is great when you have a breakdown of abstraction through the stack, or when you need a new solution that's never been done quite the same way before.
Usually something is done to mitigate these inefficiencies only when they become egregious. And that is when even basic knowledge of the inner workings of underlying layers really pays off (see also: mechanical sympathy).
It’s really crazy to think about it.
By setting TCP_NODELAY, they removed a series of 40ms delays, vastly improving performance of their web app.
(Alternatively, turn Nagle off entirely and buffer writes manually or using MSG_MORE or TCP_CORK.)
One thing I haven't understood fully is that this only seems to be a problem on Linux, Mac OS X didn't exhibit this behaviour.
For all I know, they believe everything is kept together with the help of magic. I guess I don't trust people who don't have a natural urge to understand at least the most basic things of our foundations.
Don't judge people based on which components of networks they happened to take an interest in and dive into.
Note the final sentence from tcp(7):
TCP_NODELAY
If set, disable the Nagle algorithm. This means that segments are always sent as soon as possible, even if there is
only a small amount of data. When not set, data is buffered until there is a sufficient amount to send out, thereby
avoiding the frequent sending of small packets, which results in poor utilization of the network. This option is
overridden by TCP_CORK; however, setting this option forces an explicit flush of pending output, even if TCP_CORK is
currently set.My proposed flushHint() is also quite different to TCP_NODELAY. Let's say you do 100 writes of 1 byte to a socket. If TCP_NODELAY is set, 100 packets would be sent. However if you do 100 writes to the socket, then one flushHint() call, only one packet would be sent.