undefined | Better HN

0 points827a8mo ago0 comments

Exactly yeah, my point is that there's a lot more to running these models than just the raw memory bandwidth and GPU-available memory size, and the difference between a $6000 M4 Ultra Mac Studio and a $2000 AI Max 395+ isn't actually as big as the raw numbers would suggest.

On the flip-side, though: Running GPT-OSS-120b locally is "cool", but have people found useful, productivity enhancing use-cases which justify doing this over just loading $2000 into your OpenAI API account? That, I'm less sure of.

I think we'll get to the point where running a local-first AI stack is obviously an awesome choice; I just don't think the hardware or models are there yet. Next-year's Medusa Halo, combined with another year of open source model improvements might be the inflection point.

0 comments

vid8mo ago

I use local AI fairly often for innocuous queries (health, history, etc) I don't want to feed the spy machines plus I like the hands on aspect, I would use it more if I had more time and while I hear the 120b is pretty good (I mostly use qwen 30b), I would use it a lot more if I could run some of the really great models. Hopefully Medusa Halo will be all that.

j / k navigate · click thread line to collapse

0 comments

vid8mo ago

j / k navigate · click thread line to collapse