undefined | Better HN

0 pointsbutILoveLife2mo ago0 comments

>Time to first token measured with an 8K-token prompt using a 14-billion parameter model with 4-bit quantization

Oh dear 14B and 4-bit quant? There are going to be a lot of embarrassed programmers who need to explain to their engineering managers why their Macbook can't reasonably run LLMs like they said it could. (This already happened at my fortune 20 company lol)

0 comments

VladVladikoff2mo ago

I don’t really get why people are smack talking this, are there other laptops available that can do better?

diffeomorphism2mo ago

Wrong question. If you sell a 6k€ machine "for AI", then you are judged on your own merits.

Replies like "but, but other laptops" are very weak attempts at deflection.

HSO2mo ago

at 6k you can get 128 gb RAM so you can use bigger models

butILoveLifeOP2mo ago

My 2023 Nvidia 3060 laptop I spent $700 on?

john_alan2mo ago

you can't run models that are bigger than 16GB, not comparable.

nickthegreek2mo ago

sure you can. system ram is will be your limiter here.

1 more reply

piokoch2mo ago

Nope, but other producers does not claim that their hardware "can run AI".

knicholes2mo ago

I wonder if Apple has foresight into locally running LLMs becoming sufficiently useful.

DiscourseFan2mo ago

It won’t handle serious tasks but I have Gemma 3 installed on my M2 Mac and it is good for most of my needs—-esp data I don’t want a corporation getting its hands on.

andai2mo ago

What kind of tasks are you using it for? I haven't really found any uses for small models.

fnordpiglet2mo ago

I run Qwen 3.5 30B MOE and it’s reasonable at most tasks I would use a local model for - including summarizing things. For instance I auto update all my toolchains automatically in the background when I log in and when finished I use my local model to summarize everything updated and any errors or issues on the next prompt rendering. It’s quite nice b/c everything stay updated, I know whats been updated, and I am immediately aware of issues. I also use it for a variety of “auto correct” tasks, “give me the command for,” summarize the man page and explain X, and a bunch of tasks that I would rather not copy and paste etc.

DiscourseFan2mo ago

Nothing like coding, just like relatively basic stuff. Idk its hard to explain but I use AI so frequently for work that I have a sense for what it is capable of.

1 more reply

b1122mo ago

They do! "You're holding it wrong*

velcrovan2mo ago

This wasn’t a statement about capability. It’s just a detail about what model they used to compare the speed of two chips for this purpose. You want a bigger model, run a bigger model.

bbshfishe2mo ago

Yeah no it didn’t. If you have a fully speced out M3/4 MacBook with enough memory you’re running pretty decent models locally already. But no one is using local models anyway.

razster2mo ago

I run a local model on the daily. I have it making tickets when certain emails come in and made a small that I can click to approve ticket creation. It follows my instructions and has a nice chain of thought process trained. Local LLMs are starting to become very useful. Not OpenClaw crap.

pylotlight2mo ago

What vram you running to allow both a capable model to run and also everything else the device needs to run?

weird-eye-issue2mo ago

> Yeah no it didn’t

What is "it" and what didn't it do?

me551ah2mo ago

If your company can afford fully speced out M3/4 MacBook, then it can also afford cloud AI costs.

imoverclocked2mo ago

Perhaps, but sending everything to the cloud might get them in (very expensive) trouble. Depending on who we are talking about, of course.

zmmmmm2mo ago

cost isn't even close to the main motivating factor for my context

jordhy2mo ago

With OpenClaw and powerful local models like Kimi 2.5, these specs make a lot of sense.

bdavbdav2mo ago

I’m not sure what model I’d trust locally with anything meaningful in Openclaw. The smaller/simpler the model is, the greater the chance of fluff answers is.

john_alan2mo ago

GPT-OSS-120 works well.

jbellis2mo ago

K2.5 isn't remotely a local model

zozbot2342mo ago

Technically you can get most MoE models to execute locally because RAM requirements are limited to the active experts' activations (which are on the order of active param size), everything else can be either mmap'd in (the read-only params) or cheaply swapped out (the KV cache, which grows linearly per generated token and is usually small). But that gives you absolutely terrible performance because almost everything is being bottlenecked by storage transfer bandwidth. So good performance is really a matter of "how much more do you have than just that bare minimum?"

razster2mo ago

Oh sure it is! I’ve helped set up an AI cluster rack with four K2.5s.

With some custom tooling, we built our own local enterprise setup:

Support ticketing system Custom chat support powered by our trained software-support model Resolved repository with detailed step-by-step instructions User-created reports and queries Natural language-driven report generation (my favorite — no more dragging filters into the builder; our (Secret) local model handles it for clients) In-application tools (C#/SQL/ASP.NET) to support users directly, since our software runs on-site and offline due to PPI A cool repair tool: import/export “support file packet patcher” that lets us push fixes live to all clients or target niche cases Qwen3 with LoRA fine-tuning is also incredible — we’re already seeing great results training our own models.

There’s a growing group pushing K2.5s to run on consumer PCs (with 32GB RAM + at least 9GB VRAM) — and it’s looking very promising. If this works, we’ll be retooling everything: our apps and in-house programs. Exciting times ahead!

KPGv22mo ago

of course it's not remotely local: remote and local are literally antonyms

oofbey2mo ago

You can totally run it locally. If you have 500GB of RAM.

j / k navigate · click thread line to collapse

0 comments

VladVladikoff2mo ago

I don’t really get why people are smack talking this, are there other laptops available that can do better?

diffeomorphism2mo ago

Wrong question. If you sell a 6k€ machine "for AI", then you are judged on your own merits.

Replies like "but, but other laptops" are very weak attempts at deflection.

HSO2mo ago

at 6k you can get 128 gb RAM so you can use bigger models

butILoveLifeOP2mo ago

My 2023 Nvidia 3060 laptop I spent $700 on?

john_alan2mo ago

you can't run models that are bigger than 16GB, not comparable.

nickthegreek2mo ago

sure you can. system ram is will be your limiter here.

1 more reply

piokoch2mo ago

Nope, but other producers does not claim that their hardware "can run AI".

knicholes2mo ago

I wonder if Apple has foresight into locally running LLMs becoming sufficiently useful.

DiscourseFan2mo ago

It won’t handle serious tasks but I have Gemma 3 installed on my M2 Mac and it is good for most of my needs—-esp data I don’t want a corporation getting its hands on.

andai2mo ago

What kind of tasks are you using it for? I haven't really found any uses for small models.

fnordpiglet2mo ago

DiscourseFan2mo ago

Nothing like coding, just like relatively basic stuff. Idk its hard to explain but I use AI so frequently for work that I have a sense for what it is capable of.

1 more reply

b1122mo ago

They do! "You're holding it wrong*

velcrovan2mo ago

This wasn’t a statement about capability. It’s just a detail about what model they used to compare the speed of two chips for this purpose. You want a bigger model, run a bigger model.

bbshfishe2mo ago

Yeah no it didn’t. If you have a fully speced out M3/4 MacBook with enough memory you’re running pretty decent models locally already. But no one is using local models anyway.

razster2mo ago

pylotlight2mo ago

What vram you running to allow both a capable model to run and also everything else the device needs to run?

weird-eye-issue2mo ago

> Yeah no it didn’t

What is "it" and what didn't it do?

me551ah2mo ago

If your company can afford fully speced out M3/4 MacBook, then it can also afford cloud AI costs.

imoverclocked2mo ago

Perhaps, but sending everything to the cloud might get them in (very expensive) trouble. Depending on who we are talking about, of course.

zmmmmm2mo ago

cost isn't even close to the main motivating factor for my context

jordhy2mo ago

With OpenClaw and powerful local models like Kimi 2.5, these specs make a lot of sense.

bdavbdav2mo ago

I’m not sure what model I’d trust locally with anything meaningful in Openclaw. The smaller/simpler the model is, the greater the chance of fluff answers is.

john_alan2mo ago

GPT-OSS-120 works well.

jbellis2mo ago

K2.5 isn't remotely a local model

zozbot2342mo ago

razster2mo ago

Oh sure it is! I’ve helped set up an AI cluster rack with four K2.5s.

With some custom tooling, we built our own local enterprise setup:

KPGv22mo ago

of course it's not remotely local: remote and local are literally antonyms

oofbey2mo ago

You can totally run it locally. If you have 500GB of RAM.

j / k navigate · click thread line to collapse