"By executing this command, you are effectively replacing the first byte of the `NtUserSetLayeredWindowAttributes` function with a `ret` instruction. This means that any call to `NtUserSetLayeredWindowAttributes` will immediately return without executing any of its original code. This can be used to bypass or disable the functionality of this function"
(Thanks to GitHub Copilot for that)
Also see https://learn.microsoft.com/en-us/windows-hardware/drivers/d...
- eb[0] "enters bytes" into memory at the specified location;
- The RETN[1] instruction is encoded as C3 in x86 opcodes; and
- Debuggers will typically load ELF symbols so you can refer to memory locations with their names, i.e. function names refer to their jump target.
Putting those three together, we almost get the author's command. I'm not sure about the "win32u!NtUser" name prefix, though. Is it name-munging performed on the compiler side? Maybe some debugger syntax thrown in to select the dll source of the name?
[0]:https://learn.microsoft.com/en-us/windows-hardware/drivers/d...
And if you are wondering what's the difference between win32u.dll and user32.dll.
> win32u.dll is a link for System calls between User mode (Ring 3) and Kernel mode (Ring 0) : Ring 3 => Ring 0 https://imgbb.com/L8FTP2C [0]
[0] - https://learn.microsoft.com/en-us/answers/questions/213495/w...
In fact, keeping something preloaded and ready to go is quite common, these two examples are off the top of my head:
- The Emacs server way - https://ungleich.ch/u/blog/emacs-server-the-smart-way/
- SSH connection reuse.
so if anything lags up to that amount our brain will compensate and make it feel imstantenious
there was interesting experiment that I reproduced at university, create app that slowly build up delay to clicks to allow brain to adapt, and then remove it completely. result is that you have feeling that it reacts just before you actually click until brain adapts again to new timing
If the delay is long enough, the output does not just feel delayed, but entirely unrelated to the input.
A latency perception test involving a switch can easily be thrown off by a disconnect between the actual point of actuation vs. the end-users perceived point of actuation. For example, the user might feel - especially if exposed to a high system latency - that the switch actuation is after the button has physically bottomed out and squeezed with an increased force as if they were trying to mechanically induce the action, and later be surprised to realize that the actuation point was after less than half the key travel when the virtual latency is removed.
Without knowing the details of the experiment, I think this is a more likely explanation for a perception of negative latency: Not intuitively understanding the input trigger.
The journey was very useful, even the destination may be pretty specific to your needs. The process of how to go about debugging minor annoyances like this is really hard to learn about.
My very unscientific methodology was to run
$ echo hello && foot
in a terminal and measure the time between the hello text appearing and the new window appearing. Looking at my video, the time from physical key press to "hello" text appearing might be 20ish ms but that is less clear, so about 100 ms total from key press to shell prompt.This is pretty much completely untuned setup, I haven't done any tweaks to improve the figures. Enabling foot server might shave some milliseconds, but tbh I don't feel that's necessary.
It'd be fun to do this with better camera, and also with better monitors. Idk how difficult it would be to mod in some LED to keyboard to capture the exact moment the key is activated, just trying to eyeball the key movement is not very precise.
But you also have to account for the fact that wf-recorder might interfere with the results, capturing screen is not free, and it might even push some part of the pipeline to less optimal paths. With video camera you can be fairly confident that measuring isn't interfering with anything.
However I was interested in knowing whether it does for the author.
Assuming he/she does suffer this 1300 ms delay "hundreds" of times a day (let's say 200) and for the sake of argument they use their computer 300 days a year and have 20 years of such work ahead of them with this config, then this inefficiency will total 1300 x 200 x 300 x 20 / 1000 / 60 / 60 hours wasted during author's lifetime - some 430 hours.
So well worth the effort to fix!
When I used to use Windows 10+ years ago, I had decent luck using xming + cygwin + Cygwin/X + bblean to run xterm in a minimal latency/lag environment.
I also launch Chrome/Spotify/Slack desktop using:
$ open -a open -a Google\ Chrome --args --disable-gpu-vsync --disable-smooth-scrolling
Also if you are using a miniLED M-class MBP, its pixel response is abysmal.
Too bad vscode doesn't support higher refresh rates. It's locked to 60 for some reason I haven't been able to grasp.
Though I've used the Apple Magic Keyboard w/ Touch ID exclusively for awhile, I'm also thinking about upgrading to the new Wooting 80HE keyboard this fall since it has a 8kHz polling rate, analog hall effect switches, and is designed to be ultra low latency w/ tachyon mode enabled.
[0]: https://support.apple.com/guide/mac-help/use-adaptive-sync-w...
Anyway, this also made me think about general bloat we have in new OSes and programs. Im still on old OS running spinning rust and bash here starts instantly when cache is hot. I think GUI designers lost an engineer touch...
Great debugging work to come up with a solution!
$ hyperfine 'alacritty -e true'
Benchmark 1: alacritty -e true
Time (mean ± σ): 84.1 ms ± 4.9 ms [User: 40.1 ms, System: 30.8 ms]
Range (min … max): 80.5 ms … 104.4 ms 32 runs
$ hyperfine 'xterm -e true'
Benchmark 1: xterm -e true
Time (mean ± σ): 81.9 ms ± 2.6 ms [User: 21.7 ms, System: 7.9 ms]
Range (min … max): 74.9 ms … 87.1 ms 37 runs
$ hyperfine 'wezterm -e true'
Benchmark 1: wezterm -e true
Time (mean ± σ): 211.7 ms ± 13.4 ms [User: 41.4 ms, System: 60.0 ms]
Range (min … max): 190.5 ms … 240.5 ms 15 runs $ hyperfine -L arg '1,2,3' 'sleep {arg}'
…
Summary
sleep 1 ran
2.00 ± 0.00 times faster than sleep 2
3.00 ± 0.00 times faster than sleep 3
If your commands don't share enough in common for that approach then you can declare them individually, as in "hyperfine 'blib 1' 'blob x y' 'blub --arg'", and still get the summary.But I have a very different solution to this problem: have just one terminal window and use and abuse `tmux`. I only use new windows (or tabs, if the terminal app has those) to run `ssh` to targets where I use `tmux`. I even nest `tmux` sessions, so essentially I've two levels of `tmux` sessions, and I title each window in the top-level session to match the name of the session running in that window -- this helps me find things very quickly. I also title windows running `vi` after the `basename` of the file being edited. Add in a simple PID-to-tmux window resolver script, scripts for utilities like `cscope` to open new windows, and this gets very comfortable, and it's fast. I even have a script that launches this whole setup should I need to reboot. Opening a new `tmux` window is very snappy!
Even 80ms seems unnecessarily slow to me. 300ms would drive me nuts ...
I'm using a tiling window manager (dwm) and interestingly the spawning time varies depending on the position that the terminal window has to be rendered to.
The fastest startup time I get on the fullscreen tiling mode.
hyperfine 'st -e true'
Benchmark 1: st -e true
Time (mean ± σ): 35.7 ms ± 10.0 ms [User: 15.4 ms, System: 4.8 ms]
Range (min … max): 17.2 ms … 78.7 ms 123 runs
The non-fullscreen one ends up at about 60ms which still seems reasonable.To prove it to myself: I'm using river² and I can see a doubling-ish of startup time with foot³, iff I allow windows from heavier apps to handle the resize event immediately. If the time was a little longer(or more common) I'd be tempted to wrap the spawn along the lines of "kill -STOP <other_clients_in_tag>; <spawn & hold for map>; kill -CONT <other_clients_in_tag>" to delay the resize events until my new window was ready. That way the frames still resize, but their content resize is delayed.
¹ https://tools.suckless.org/tabbed/
Benchmark 1: st -e true
Time (mean ± σ): 35.4 ms ± 6.9 ms [User: 15.1 ms, System: 3.8 ms]
Range (min … max): 24.2 ms … 65.2 ms 114 runs
This is on awesome-wm with the window opening as the 3rd tiled window on a monitor, which means it has to redraw at least the other two windows. I'm also running xfs on top of luks/dm-crypt for my filesystem, which shouldn't matter too much on this benchmark thanks to the page cache, but is a relatively common source of performance woes on this particular system. I really ought to migrate back to unencrypted ext4 and use my SSD's encryption but I haven't wanted to muck with it.