As compared to hand-written assembly or the tailcall technique you describe. But (for the benefit of onlookers) a threaded switch, especially using (switch-like) computed gotos, is still more performant than a traditional function dispatch table.
Has there been any movement in GCC wrt the tailcalls feature?
One of the limitations with computed gotos is the inability to derive the address of a label from outside the function. You always end up with some amount of superfluous conditional code for selecting the address inside the function, or indexing through a table. Several years ago when exploring this space I discovered a hack, albeit it only works with GCC (IIRC), at least as of ~10 years ago. GCC supports inline function definitions, inline functions have visibility to goto labels (notwithstanding that you're not supposed to make use of them), and most surprisingly GCC also supports attaching __attribute__((constructor)) to inline function definitions. This means you can export a map of goto labels that can be used to initialize VM data structures, permitting (in theory) more efficient direct threading.
The tailcall technique is a much more sane and profitable approach, of course.