Not fully, 8 bits has 256 values. It's easy to keep a look up table in the L1 cache of any CPU and constant cache of any GPU. For ASICs and FPGAs, it's a simple 256-value LUT. It's not ideal, yes, but not a deal breaker. Epically considering LLMs are memory bound. GGML dequantizes weights on-the-fly and still gets near linear scaling on GPUs.