If I'm right about that then if you're willing to go in for somewhere in the vicinity of $30k (24x the Max 385 model) you should be able to achieve ChatGPT performance.
Even with a cloud-based LLM where the response is pretty snappy, I still find that I wander off and return when I am ready to digest the entire response.
But the real kicker here is the 90s ttft, that means you ask a question and don't see anything for a full minute and a half.
This is a good list, I like my Beelink a lot, my Minisforum likes to turn itself off every couple of weeks, not sure why yet.
https://www.techradar.com/pro/there-are-15-amd-ryzen-ai-max-...
---
Performance is pretty bad (<10/tps) and context is quite limited. Still good to see progress
Prompt Size (tokens) | TFT (s) - Flash Attention Disabled | TFT (s) - Flash Attention Enabled
4096 | 53.7s | 39.7s
8192 | Out Of Memory (OOM) | 90.5s
16384 | Out Of Memory (OOM) | 239.1s
AFAICT, the answer is "because Minisforum". I don't know if they have a design principle that they should run their systems near the edge of the thermal envelope or what, but Minisforum is the only brand I've had consistent trouble with stability on. My last one got to where it stopped booting altogether, just looped. Since then I've written off Minisforum as a brand, just not worth the hassle.
Though only 5gig Ethernet? Can’t they do usb-c / thunderbolt 40 Gb/s connections like Macs?
Does the network speed matter that much when TFA talks about outputting a few tens of tokens per second? Ain't 5 Gbit/s plenty for that? (I understand the need to load the model but that'd be local already right?)
How much is one of these gonna run me?
Mine sees more use as a Steam machine, but it can run decently large models. Ollama was trivial to get working, and qwen3-coder-next spits out paragraphs of text/code in seconds. I don't really do anything with that, but it's fun to mess around with. (LLMs are still pretty bad at assembly language.)
https://frame.work/products/framework-desktop-mainboard-amd-...