undefined | Better HN

0 pointsmenaerus1y ago0 comments

You'd still need a fairly large amount of compute power to be able to run DeepSeek R1 locally, no?

0 comments

Well yes, but not so large that it's completely prohibitive. People have been running the full models on computers going as low as $6000: https://x.com/carrigmat/status/1884244369907278106

Of course this is for a personal instance, you'd need a much more expensive setup to handle concurrent users. And that's to run it, not train it.

plagiarist1y ago

Sortof a letdown that after 24 32Gb RAM sticks you only get 6-8 tokens per second.

Mekoloto1y ago

But a token is not just a character.

"hello how are you today?" - 7 tokens.

And this is so much better than I could have imagined in a very short span of time.

acchow1y ago

And only get to use 20k context length before it OOMs

mechagodzilla1y ago

I have a used workstation I got for $2k (with 768GB of RAM) - using the Q4 model, I can get about 1.5 tokens/sec and use very large contexts. It's pretty awesome to be able to run it at home.

nomel1y ago

For me, where electricity is $0.45/kWh, assuming 1kW consumption, it would be around $80 USD/million!

1 more reply

MysticFear1y ago

Would love to know more info & specs of your workstation.

1 more reply

fspeech1y ago

A better approach is to split the model with MOEs running on CPUs and MLAs running on GPU. See the ktransformers project: https://github.com/kvcache-ai/ktransformers/blob/main/doc/en...

This takes advantage of the sparsity of MOE and the efficient KV-cache of MLA.

menaerusOP1y ago

You perhaps forgot to mention that for their AMX optimizations to be even feasible you'd need to spend ~$10k for a single CPU, let alone the whole system which is probably ~$100k.

1 more reply

menaerusOP1y ago

6k is not that bad considering that top of the line Apple laptop costs as much. However, I don't have X so unfortunately I can't read the details.

longitudinal931y ago

You can read the whole thread through nitter:

https://xcancel.com/carrigmat/status/1884244369907278106

j / k navigate · click thread line to collapse

0 comments

roblabla1y ago

Well yes, but not so large that it's completely prohibitive. People have been running the full models on computers going as low as $6000: https://x.com/carrigmat/status/1884244369907278106

Of course this is for a personal instance, you'd need a much more expensive setup to handle concurrent users. And that's to run it, not train it.

plagiarist1y ago

Sortof a letdown that after 24 32Gb RAM sticks you only get 6-8 tokens per second.

Mekoloto1y ago

But a token is not just a character.

"hello how are you today?" - 7 tokens.

And this is so much better than I could have imagined in a very short span of time.

acchow1y ago

And only get to use 20k context length before it OOMs

mechagodzilla1y ago

I have a used workstation I got for $2k (with 768GB of RAM) - using the Q4 model, I can get about 1.5 tokens/sec and use very large contexts. It's pretty awesome to be able to run it at home.

nomel1y ago

For me, where electricity is $0.45/kWh, assuming 1kW consumption, it would be around $80 USD/million!

1 more reply

MysticFear1y ago

Would love to know more info & specs of your workstation.

1 more reply

fspeech1y ago

A better approach is to split the model with MOEs running on CPUs and MLAs running on GPU. See the ktransformers project: https://github.com/kvcache-ai/ktransformers/blob/main/doc/en...

This takes advantage of the sparsity of MOE and the efficient KV-cache of MLA.

menaerusOP1y ago

You perhaps forgot to mention that for their AMX optimizations to be even feasible you'd need to spend ~$10k for a single CPU, let alone the whole system which is probably ~$100k.

1 more reply

menaerusOP1y ago

6k is not that bad considering that top of the line Apple laptop costs as much. However, I don't have X so unfortunately I can't read the details.

longitudinal931y ago

You can read the whole thread through nitter:

https://xcancel.com/carrigmat/status/1884244369907278106

j / k navigate · click thread line to collapse