Intro: https://james.darpinian.com/blog/latency
Techniques to improve latency in your applications: https://james.darpinian.com/blog/latency-techniques
Platform-specific considerations: https://james.darpinian.com/blog/latency-platform-considerat...
According to docs, there are extensions WGL_EXT_swap_control_tear/GLX_EXT_swap_control_tear [0], that cause late frames to tear instead of wait a full frame. They don't work on my machine (my Intel HD 4000 reports that it is supported and then silently fails), but this should be the ideal swap mechanism.
[0]: https://registry.khronos.org/OpenGL/extensions/EXT/GLX_EXT_s...
Tearing is a pretty bad artifact so I wouldn't say that enabling it is ideal. I'm a stickler for low latency but even I don't think it's worth it in most cases. It's possible to achieve great latency without tearing. The ideal swap mechanism would be VRR when available.
Those GL extensions likely predate modern composition window managers and fail to work when compositing is enabled. As I discuss in the platform specific considerations section, tearing on Windows requires either full screen, or Multiplane Overlay which is not supported by OpenGL.
~2 msec (mouse)
8 msec (average time we wait for the input to be processed by the game)
16.6 (game simulation)
16.6 (rendering code)
16.6 (GPU is rendering the previous frame, current frame is cached)
16.6 (GPU rendering)
8 (average for missing the vsync)
16.6 (frame caching inside of the display)
16.6 (redrawing the frame)
5 (pixel switching)
I'm not very familiar with graphics pipelines, but some stuff here seems wrong. If a game is rendering at 60fps, the combined compute time for simulation+rendering should be 16.6 ms. You can't start simulating the next tick while rendering the previous tick unless you try to do some kind of copy-on-write memory management for the entire game state. And with double buffering, the GPU should be writing frame n to the display cable at the same time as it's computing frame n+1., and the display writing the frame to its cache buffer should be happening at the same time as the GPU writes the frame to the cable.By my count that's a whole 50 ms that shouldn't be there.
From the linked article:
One thread is calculating the physics and logic for frame N while another thread is generating rendering commands based on the simulation results of frame N-1.
Maybe modern games do use CoW memory?
[The GPU] might collect all drawing commands for the whole frame and not start to render anything until all commands are present.
It might, but is this typical behavior? This implies that the GPU would just sit idle if it finished rendering a frame before the CPU finished sending commands to draw the next one — why would it do that?
Most monitors wait until a new frame was completely transferred before they start to display it adding another frame of latency.
Maybe this is what is meant by the "16.6 (frame caching inside of the display)" item? That might be real then.
“I can send an IP packet to Europe faster than I can send a pixel to the screen. How f’d up is that?”
https://mobile.twitter.com/id_aa_carmack/status/193480622533...
These days game programmers have gotten experienced enough to get closer to fully saturating all cores in both the simulation and render steps, so you sometimes no longer see the two full frames of latency there.
> 16.6 (GPU is rendering the previous frame, current frame is cached)
Not entirely sure what this is about. Maybe some sort of triple buffering is being employed as a way to reduce hitches? If you push the engine really close to the 16 ms limit for each stage of your pipeline, sometimes something out of you control, like the OS deciding to do some heavy background work, will push you over your limit. Without the extra buffer, you will miss your vsync and the user will perceive a very disturbing judder.
The total time in this breakdown is in line with the measured total time, so if the source is wrong about the game by claiming it takes longer than it does, they're also claiming that some other stages take less time than they do by basically the same amount. I would bet on the monitor, but I don't have much reason to think they're wrong to begin with.
[0]: http://renderingpipeline.com/2013/09/measuring-input-latency...
It can work this way--e.g. nvidia exposes an 'ultra low latency mode' in their driver that caps prerendered frames to zero--but typically for smoother animation and higher average fps gpus will have a queue of several frames that they're working on, and this is irrespective of how many render targets you have in your swapchain. Danluu's breakdown above is actually correct for the typical case.
---
Thought I'd clarify how this works since there's lots of confusion in this thread. In the early days you would directly write pixels to memory and they'd be picked up by a RAMDAC and beamed out to the screen. So if you wanted to invert the color of the bottom right pixel it would take at most two frames or 33ms of latency if you were running at 60fps double buffered: first you set your pixel in the back buffer, wait up to 16.66ms to finish drawing the current front buffer, flip buffers, wait 16.65ms for the electron gun to make its way down to the bottom right corner, and then finally draw the inverted pixel.
With modern gpu's, the situation is very similar to sending commands to a networked computer somewhere far away. You have a bit of two-way negotiation at the beginning to allocate gpu memory, upload textures/geometry/shaders, etc., and then you have a mostly one-way stream of commands. The gpu driver can queue these commands to an arbitrary depth, regardless of your vsync settings, double/triple buffering, etc, and is actually free to work on things out of order. You have to explicitly mark dependencies and a 'present' call isn't intrinsically tied to when that buffer will actually end up displayed on screen. So there's no actual upper bound on latency here; even at 360hz if the gpu is perpetually 10 frames behind the cpu, each frame only takes 2.77ms to simulate and 2.77ms to render but the overall input lag could still be ~30ms. (In practice though, drivers will typically only render 2-3 frames ahead.)
I will say though, whatever the numbers are, after running on a 144hz monitor with adaptive sync, 60 FPS feels painfully jerky for gaming.
The Apple //e, with its 1MHz clock and 8-bit CPU, had an average latency from keypress to character display of 30msec. Modern computers are dramatically slower in keypress to text display. There are reasons, but end users see a slower system.
Terminal latency (2017) - https://news.ycombinator.com/item?id=19443076 - March 2019 (109 comments)
Terminal and shell performance - https://news.ycombinator.com/item?id=14798211 - July 2017 (204 comments)
I upgraded to a 9600 baud modem in 1992, I think. That was finally when things felt "fast."
Dropping into a real terminal on Linux feels so weird when typing. I swear sometimes I see a letter on the screen before I actually touch the key. Similar to playing an Atari on a CRT, paddle games, like Breakout, feel like you're physically attached to the on-screen paddle with a sturdy rod as opposed to the mushy feel you get from a mouse in modern games.
These days, I could probably switch to another terminal, as I'm in tmux quite often, which creates those compatibility issues itself, no matter what "backend" you're using.
Xterm really is a terminal emulator. Most other "modern" TEs are more like shitty xterm emulators.