Memory bandwidth is the most important thing for token generation. Hardware support for FP8 or FP4 probably does not matter much for token generation. You should be able to run the operations on the CPU in FP32 while reading/writing them from/to memory as FP4/FP8 by doing conversions in the CPU's registers (although to be honest, I have not looked into how those conversions would work). That is how llama.cpp supports BF16 on CPUs that have no BF16 support. Prompt processing would benefit from hardware FP4/FP8 support, since prompt processing is compute bound, not memory bandwidth bound.
As for how well those CPUs do with LLMs. The token generation will be close to model size / memory bandwidth. At least, that is what I have learned from local experiments:
https://github.com/ryao/llama3.c
Note that prompt processing is the phase where the LLM is reading the conversation history and token generation is the phase where the LLM is writing a response.
By the way, you can get an ampere altra motherboard + CPU for $1,434.99:
https://www.newegg.com/asrock-rack-altrad8ud-1l2t-q64-22-amp...
I would be shocked if you can get any EYPC CPU with similar/better memory bandwidth for anything close to that price. As for Strix Halo, anyone doing local inference would love it if it is priced like a gaming part. 4 of them could run llama 3.1 405B on paper. I look forward to seeing its pricing.