undefined | Better HN

0 pointsrahen11mo ago0 comments

I would have thought the same, but EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second. The X-MP was in the same ballpark, with the added benefit of native vector processing (not just some extension bolted onto a scalar CPU) which performs very well on matmul.

https://www.tomshardware.com/tech-industry/artificial-intell...

John Carmack was also hinting at this: we might have had AI decades earlier, obviously not large GPT-4 models but useful language reasoning at a small scale was possible. The hardware wasn't that far off. The software and incentives were.

https://x.com/ID_AA_Carmack/status/1911872001507016826

0 comments

adwn11mo ago

> EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second

50 token/s is completely useless if the tokens themselves are useless. Just look at the "story" generated by the model presented in your link: Each individual sentence is somewhat grammatically correct, but they have next to nothing to do with each other, they make absolutely no sense. Take this, for example:

"I lost my broken broke in my cold rock. It is okay, you can't."

Good luck tuning this for turn-based conversations, let alone for solving any practical task. This model is so restricted that you couldn't even benchmark its performance, because it wouldn't be able to follow the simplest of instructions.

rahenOP11mo ago

You're missing the point. No one is claiming that a 300K-param model on a Pentium II matches GPT-4. The point is that it works: it parses input, generates plausible syntax, and does so using algorithms and compute budgets that were entirely feasible decades ago. The claim is that we could have explored and deployed narrow AI use cases decades earlier, had the conceptual focus been there.

Even at that small scale, you can already do useful things like basic code or text autocompletion, and with a few million parameters on a machine like a Cray Y-MP, you could reasonably attempt tasks like summarizing structured or technical documentation. It's constrained in scope, granted, but it's a solid proof of concept.

The fact that a functioning language model runs at all on a Pentium II, with resources not far off from a 1982 Cray X-MP, is the whole point: we weren’t held back by hardware, we were held back by ideas.

alganet11mo ago

> we weren’t held back by hardware

Llama 3 8B took 1.3M hours to train in a H100-80GB.

Of course, it didn't took 1.3M hours (~150 years). So, many machines with 80GB were used.

Let's do some napkin math. 150 machines with a total of 12TB VRAM for a year.

So, what would be needed to train a 300K parameter model that runs on 128MB RAM? Definitely more, much more than 128MB RAM.

Llama 3 runs on 16GB VRAM. Let's imagine that's our Pentium II of today. You need at least 750 times what is needed to run it in order to train it. So, you would have needed ~100GB RAM back then, running for a full year, to get that 300K model.

How many computers with 100GB+ RAM do you think existed in 1997?

Also, I only did RAM. You also need raw processing power and massive amounts of training data.

rahenOP11mo ago

You’re basically arguing that because A380s need millions of liters of fuel and a 4km runway, the Wright Flyer was impossible in 1903. That logic just doesn’t hold. Different goals, different scales, different assumptions. The 300K model shows that even in the 80s, it was both possible and sufficient for narrow but genuinely useful tasks.

We simply weren’t looking, blinded by symbolic programming and expert systems. This could have been a wake-up call, steering AI research in a completely different direction and accelerating progress by decades. That’s the whole point.

1 more reply

j / k navigate · click thread line to collapse

0 comments

adwn11mo ago

> EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second

"I lost my broken broke in my cold rock. It is okay, you can't."

rahenOP11mo ago

alganet11mo ago

> we weren’t held back by hardware

Llama 3 8B took 1.3M hours to train in a H100-80GB.

Of course, it didn't took 1.3M hours (~150 years). So, many machines with 80GB were used.

Let's do some napkin math. 150 machines with a total of 12TB VRAM for a year.

So, what would be needed to train a 300K parameter model that runs on 128MB RAM? Definitely more, much more than 128MB RAM.

How many computers with 100GB+ RAM do you think existed in 1997?

Also, I only did RAM. You also need raw processing power and massive amounts of training data.

rahenOP11mo ago

1 more reply

j / k navigate · click thread line to collapse