Terminal Latency (2017) (opens in new tab)

(danluu.com)

66 pointsmerwanedr3y ago41 comments

41 comments

The bad news is the defaults on modern platforms are often very bad for latency. The good news is that it is possible to achieve good latency on most modern systems with a lot of attention to detail. With good hardware and good software it is even possible to e.g. run console emulators with lower latency than they would have on original hardware connected to a CRT. I just wrote a three part series detailing a lot of ways to improve latency:

Intro: https://james.darpinian.com/blog/latency

Techniques to improve latency in your applications: https://james.darpinian.com/blog/latency-techniques

Platform-specific considerations: https://james.darpinian.com/blog/latency-platform-considerat...

ad8e3y ago

> Delay rendering until just before VSync: If you get it slightly wrong and your frame takes slightly more time to render than you thought, your frame may not be done in time for VSync. Then it will have to wait a whole extra frame and the previous frame will be displayed twice, causing a hitch in any animations.

According to docs, there are extensions WGL_EXT_swap_control_tear/GLX_EXT_swap_control_tear [0], that cause late frames to tear instead of wait a full frame. They don't work on my machine (my Intel HD 4000 reports that it is supported and then silently fails), but this should be the ideal swap mechanism.

[0]: https://registry.khronos.org/OpenGL/extensions/EXT/GLX_EXT_s...

modeless3y ago

I discuss tearing here: https://james.darpinian.com/blog/latency-techniques#tearing-... and here https://james.darpinian.com/blog/latency-platform-considerat...

Tearing is a pretty bad artifact so I wouldn't say that enabling it is ideal. I'm a stickler for low latency but even I don't think it's worth it in most cases. It's possible to achieve great latency without tearing. The ideal swap mechanism would be VRR when available.

Those GL extensions likely predate modern composition window managers and fail to work when compositing is enabled. As I discuss in the platform specific considerations section, tearing on Windows requires either full screen, or Multiplane Overlay which is not supported by OpenGL.

1 more reply

adrusi3y ago

    ~2 msec (mouse)
    8 msec (average time we wait for the input to be processed by the game)
    16.6 (game simulation)
    16.6 (rendering code)
    16.6 (GPU is rendering the previous frame, current frame is cached)
    16.6 (GPU rendering)
    8 (average for missing the vsync)
    16.6 (frame caching inside of the display)
    16.6 (redrawing the frame)
    5 (pixel switching)

I'm not very familiar with graphics pipelines, but some stuff here seems wrong. If a game is rendering at 60fps, the combined compute time for simulation+rendering should be 16.6 ms. You can't start simulating the next tick while rendering the previous tick unless you try to do some kind of copy-on-write memory management for the entire game state. And with double buffering, the GPU should be writing frame n to the display cable at the same time as it's computing frame n+1., and the display writing the frame to its cache buffer should be happening at the same time as the GPU writes the frame to the cable.

By my count that's a whole 50 ms that shouldn't be there.

From the linked article:

One thread is calculating the physics and logic for frame N while another thread is generating rendering commands based on the simulation results of frame N-1.

Maybe modern games do use CoW memory?

[The GPU] might collect all drawing commands for the whole frame and not start to render anything until all commands are present.

It might, but is this typical behavior? This implies that the GPU would just sit idle if it finished rendering a frame before the CPU finished sending commands to draw the next one — why would it do that?

Most monitors wait until a new frame was completely transferred before they start to display it adding another frame of latency.

Maybe this is what is meant by the "16.6 (frame caching inside of the display)" item? That might be real then.

alberth3y ago

John Carmack famously said:

“I can send an IP packet to Europe faster than I can send a pixel to the screen. How f’d up is that?”

https://mobile.twitter.com/id_aa_carmack/status/193480622533...

pistachiopro3y ago

Games generally don't use copy on write, but they do often explicitly pipeline processing to happen across multiple frames (usually by manually copying the necessary data from sim "owned" memory to render "owned" memory, but varying amounts of double buffering is also used). This was especially true after the transition to multi-core but before the many-core regime of today. Transitioning from a single threaded engine, it was easier to run effectively a single-threaded simulation frame and a single-threaded render frame in parallel than to fully multithread everything. Graphics APIs took a while to support multithreading, as well.

These days game programmers have gotten experienced enough to get closer to fully saturating all cores in both the simulation and render steps, so you sometimes no longer see the two full frames of latency there.

> 16.6 (GPU is rendering the previous frame, current frame is cached)

Not entirely sure what this is about. Maybe some sort of triple buffering is being employed as a way to reduce hitches? If you push the engine really close to the 16 ms limit for each stage of your pipeline, sometimes something out of you control, like the OS deciding to do some heavy background work, will push you over your limit. Without the extra buffer, you will miss your vsync and the user will perceive a very disturbing judder.

anonymoushn3y ago

I agree that it seems like the "game" part of the latency has about 33ms extra, but the source of this breakdown[0] seems to be knowledgeable and includes measurements that corroborate many of the claims. I was surprised, for example, that vsync seemed to add 2 frames of latency rather than 1 in this test.

The total time in this breakdown is in line with the measured total time, so if the source is wrong about the game by claiming it takes longer than it does, they're also claiming that some other stages take less time than they do by basically the same amount. I would bet on the monitor, but I don't have much reason to think they're wrong to begin with.

[0]: http://renderingpipeline.com/2013/09/measuring-input-latency...

lodi3y ago

> If a game is rendering at 60fps, the combined compute time for simulation+rendering should be 16.6 ms.

It can work this way--e.g. nvidia exposes an 'ultra low latency mode' in their driver that caps prerendered frames to zero--but typically for smoother animation and higher average fps gpus will have a queue of several frames that they're working on, and this is irrespective of how many render targets you have in your swapchain. Danluu's breakdown above is actually correct for the typical case.

---

Thought I'd clarify how this works since there's lots of confusion in this thread. In the early days you would directly write pixels to memory and they'd be picked up by a RAMDAC and beamed out to the screen. So if you wanted to invert the color of the bottom right pixel it would take at most two frames or 33ms of latency if you were running at 60fps double buffered: first you set your pixel in the back buffer, wait up to 16.66ms to finish drawing the current front buffer, flip buffers, wait 16.65ms for the electron gun to make its way down to the bottom right corner, and then finally draw the inverted pixel.

With modern gpu's, the situation is very similar to sending commands to a networked computer somewhere far away. You have a bit of two-way negotiation at the beginning to allocate gpu memory, upload textures/geometry/shaders, etc., and then you have a mostly one-way stream of commands. The gpu driver can queue these commands to an arbitrary depth, regardless of your vsync settings, double/triple buffering, etc, and is actually free to work on things out of order. You have to explicitly mark dependencies and a 'present' call isn't intrinsically tied to when that buffer will actually end up displayed on screen. So there's no actual upper bound on latency here; even at 360hz if the gpu is perpetually 10 frames behind the cpu, each frame only takes 2.77ms to simulate and 2.77ms to render but the overall input lag could still be ~30ms. (In practice though, drivers will typically only render 2-3 frames ahead.)

jesse__3y ago

Yeah I'm with you .. even with double buffering these numbers don't hold water

sdwr3y ago

I don't know too much about the whole graphics pipeline, but this is definitely double-counting.

I will say though, whatever the numbers are, after running on a 144hz monitor with adaptive sync, 60 FPS feels painfully jerky for gaming.

amarshall3y ago

Input + display lag is a decent chunk of that, at least.

dwheeler3y ago

Related: "Why Modern Computers Struggle to Match the Input Latency of an Apple IIe" https://www.extremetech.com/computing/261148-modern-computer...

The Apple //e, with its 1MHz clock and 8-bit CPU, had an average latency from keypress to character display of 30msec. Modern computers are dramatically slower in keypress to text display. There are reasons, but end users see a slower system.

fhars3y ago

Those results are actually by Dan Luu and done at about the same time (2017), too: https://danluu.com/input-lag/

dang3y ago

Terminal latency (2017) - https://news.ycombinator.com/item?id=19443076 - March 2019 (109 comments)

Terminal and shell performance - https://news.ycombinator.com/item?id=14798211 - July 2017 (204 comments)

terminallybored3y ago

The zutty project had an interesting article on terminal latency, as well:

https://tomscii.sig7.se/2021/01/Typing-latency-of-Zutty

tingletech3y ago

I learned vi over dial up in the early 90s. One's fingers would often be very far ahead of the screen.

icedchai3y ago

Could you type at 2400 baud?

I upgraded to a 9600 baud modem in 1992, I think. That was finally when things felt "fast."

tingletech3y ago

No, but I can read at 300 baud iirc.

1 more reply

latchkey3y ago

Those terminals were a huge upgrade over the 300-1200 baud modem connections in the 80s...

tingletech3y ago

At one job I had in 95, a few times I did connect at 300. (I'd hang up and try again if that happened, but sometimes I would vi at 1200 baud. 2400 I think was normal, and sometimes I'd get lucky and connect at 48k. It was nominally at 56k modem, but I never saw in connect at that.) I can't remember the name of the terminal emulator I used from windows in 95, or what even the browser was, must have been mosaic? I was coding up web pages for lawyers at https://www.lawinfo.com / experienced attorneys referral service before Guenter sold it to Thompson Reuters. Pre-web, the outfit would place ads in yellow pages nationally, and then transfer calls to attorneys who subscribed. I supported the computers for the folks who took the calls from the yellow page ads, and the computers for the folks who cold called attorneys all day, but most of the day I was creating HTML in vi for lawyers. I think we used something called lantastic; and we had a commercial CRM system that ran on DOS and dialed the phones for the sales team... it's on the tip of my tongue... I remember loading new phone numbers into it from some vendor feed for the sales force. We were in a weird strip mall in Encinitas, and I remember hanging out with the folks who worked next door at some sort of computer business that made our PCs but also worked on some sort of B2B software.

leephillips3y ago

Maybe of more interest would be https://lwn.net/Articles/751763/, which measures latency on Linux.

kgwxd3y ago

Not a big deal, 122.6 ms is still way below the Doherty Threshold :)

Dropping into a real terminal on Linux feels so weird when typing. I swear sometimes I see a letter on the screen before I actually touch the key. Similar to playing an Atari on a CRT, paddle games, like Breakout, feel like you're physically attached to the on-screen paddle with a sturdy rod as opposed to the mushy feel you get from a mouse in modern games.

bitwize3y ago

Curiously not listed: xterm, which has close to the lowest latency of any open source TE. But who uses that anymore?

mhd3y ago

I do, mostly. I often ran in some terminal emulation issues with other terminals, especially when using older software.

These days, I could probably switch to another terminal, as I'm in tmux quite often, which creates those compatibility issues itself, no matter what "backend" you're using.

bitwize3y ago

The developer of xterm actually maintains a battery of tests which exercise corner cases in the DEC VT protocol and ensure that xterm conforms in the manner that a real terminal would: https://invisible-island.net/vttest/vttest.html

Xterm really is a terminal emulator. Most other "modern" TEs are more like shitty xterm emulators.

1 more reply

ac130kz3y ago

I did, I switched to kitty, because of Wayland. It feels crazy responsive even without looking at tests

spc4763y ago

I still do. Been using it for over 30 years now.

teddyh3y ago

I tried to switch from XTerm to GNOME Terminal. It went well for a while, and better Unicode and emoji display was nice, but then a new version of GNOME Terminal came out which broke the ability to use the Meta keys for sending an ESC prefix; it is now hard-coded to only accept the Alt keys to do that. So I had to switch back to XTerm.

aurelien3y ago

why not xterm? xterm respect most of standard, maybe one of the closest to all standards.

collegeburner3y ago

semi related but anybody know how i fix the powershell latency on windows? it's literally unusable as a shell besides running scripts.

LtdJorge3y ago

Are you using old PowerShell or the newer Open Source PowerShell 6/7? Try also running it from Terminal, the official modern terminal app from the store. It's much faster than bare cmd or PowerShell, IIRC it's because of conhost.

von_lohengramm3y ago

It is because of conhost. You can actually replace the default Windows one with the new OpenConsole and it makes your entire PC faster. Kinda neat.

1 more reply

ajoseps3y ago

For me the new Terminal app is much slower than something like Cmder (ConEmu)

j / k navigate · click thread line to collapse

41 comments

modeless3y ago

Intro: https://james.darpinian.com/blog/latency

Techniques to improve latency in your applications: https://james.darpinian.com/blog/latency-techniques

Platform-specific considerations: https://james.darpinian.com/blog/latency-platform-considerat...

ad8e3y ago

[0]: https://registry.khronos.org/OpenGL/extensions/EXT/GLX_EXT_s...

modeless3y ago

I discuss tearing here: https://james.darpinian.com/blog/latency-techniques#tearing-... and here https://james.darpinian.com/blog/latency-platform-considerat...

1 more reply

adrusi3y ago

    ~2 msec (mouse)
    8 msec (average time we wait for the input to be processed by the game)
    16.6 (game simulation)
    16.6 (rendering code)
    16.6 (GPU is rendering the previous frame, current frame is cached)
    16.6 (GPU rendering)
    8 (average for missing the vsync)
    16.6 (frame caching inside of the display)
    16.6 (redrawing the frame)
    5 (pixel switching)

By my count that's a whole 50 ms that shouldn't be there.

From the linked article:

One thread is calculating the physics and logic for frame N while another thread is generating rendering commands based on the simulation results of frame N-1.

Maybe modern games do use CoW memory?

[The GPU] might collect all drawing commands for the whole frame and not start to render anything until all commands are present.

Most monitors wait until a new frame was completely transferred before they start to display it adding another frame of latency.

Maybe this is what is meant by the "16.6 (frame caching inside of the display)" item? That might be real then.

alberth3y ago

John Carmack famously said:

“I can send an IP packet to Europe faster than I can send a pixel to the screen. How f’d up is that?”

https://mobile.twitter.com/id_aa_carmack/status/193480622533...

pistachiopro3y ago

> 16.6 (GPU is rendering the previous frame, current frame is cached)

anonymoushn3y ago

[0]: http://renderingpipeline.com/2013/09/measuring-input-latency...

lodi3y ago

> If a game is rendering at 60fps, the combined compute time for simulation+rendering should be 16.6 ms.

---

jesse__3y ago

Yeah I'm with you .. even with double buffering these numbers don't hold water

sdwr3y ago

I don't know too much about the whole graphics pipeline, but this is definitely double-counting.

I will say though, whatever the numbers are, after running on a 144hz monitor with adaptive sync, 60 FPS feels painfully jerky for gaming.

amarshall3y ago

Input + display lag is a decent chunk of that, at least.

dwheeler3y ago

Related: "Why Modern Computers Struggle to Match the Input Latency of an Apple IIe" https://www.extremetech.com/computing/261148-modern-computer...

fhars3y ago

Those results are actually by Dan Luu and done at about the same time (2017), too: https://danluu.com/input-lag/

dang3y ago

Terminal latency (2017) - https://news.ycombinator.com/item?id=19443076 - March 2019 (109 comments)

Terminal and shell performance - https://news.ycombinator.com/item?id=14798211 - July 2017 (204 comments)

terminallybored3y ago

The zutty project had an interesting article on terminal latency, as well:

https://tomscii.sig7.se/2021/01/Typing-latency-of-Zutty

tingletech3y ago

I learned vi over dial up in the early 90s. One's fingers would often be very far ahead of the screen.

icedchai3y ago

Could you type at 2400 baud?

I upgraded to a 9600 baud modem in 1992, I think. That was finally when things felt "fast."

tingletech3y ago

No, but I can read at 300 baud iirc.

1 more reply

latchkey3y ago

Those terminals were a huge upgrade over the 300-1200 baud modem connections in the 80s...

tingletech3y ago

leephillips3y ago

Maybe of more interest would be https://lwn.net/Articles/751763/, which measures latency on Linux.

kgwxd3y ago

Not a big deal, 122.6 ms is still way below the Doherty Threshold :)

bitwize3y ago

Curiously not listed: xterm, which has close to the lowest latency of any open source TE. But who uses that anymore?

mhd3y ago

I do, mostly. I often ran in some terminal emulation issues with other terminals, especially when using older software.

These days, I could probably switch to another terminal, as I'm in tmux quite often, which creates those compatibility issues itself, no matter what "backend" you're using.

bitwize3y ago

Xterm really is a terminal emulator. Most other "modern" TEs are more like shitty xterm emulators.

1 more reply

ac130kz3y ago

I did, I switched to kitty, because of Wayland. It feels crazy responsive even without looking at tests

spc4763y ago

I still do. Been using it for over 30 years now.

teddyh3y ago

aurelien3y ago

why not xterm? xterm respect most of standard, maybe one of the closest to all standards.

collegeburner3y ago

semi related but anybody know how i fix the powershell latency on windows? it's literally unusable as a shell besides running scripts.

LtdJorge3y ago

von_lohengramm3y ago

It is because of conhost. You can actually replace the default Windows one with the new OpenConsole and it makes your entire PC faster. Kinda neat.

1 more reply

ajoseps3y ago

For me the new Terminal app is much slower than something like Cmder (ConEmu)

j / k navigate · click thread line to collapse