For these types of applications there are proprietary implementations that you can buy from vendors that are more suited to latency sensitive applications.
The next level of optimization after kernel bypass is to build or buy FPGAs which implement the wire protocol + transport as an integrated circuit.