Perhaps, but I am not under this misunderstanding and never expressed it.
> Only specialized software/scenarios will tangibly benefit from the parasitic gains of SMT-induced extra parallelization
In my experience it also speeds up C++/Rust compilation, which is the main thing I care about. I can't find any benchmarks now but I have definitely seen a benefit in the past.
> video encoders like x264 or CPU-bound raytracers to name a few examples. These gains typically amount to about 15-20% at the very extreme end.
Normally those types of compute heavy processes, data streamlined, processes don’t see much benefit from SMT. After all SMT only provides a performance benefit by allowing the CPU to pull from two distinct chains of instructions, and fill the pipeline gaps from one thread, with instructions from the other thread. It’s effectively instruction-by-instruction scheduling of two different threads.
But if you’re running an optimised and efficient process that doesn’t have significant unpredictable branching, or significant unpredictable memory operations. Then SMT offers you very little because the instruction pipeline for each thread is almost fully packed, offering few opportunities to schedule instructions from a different thread.
The only real constraint is that you can actually leverage multiple threads. For protos as an example, that requires a modified version of the format with checkpoints or similar (which nobody does) or having many to work on concurrently (very common in webservers or whatever).