Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
embedding-shape
6mo ago
0 comments
Share
How many tok/s are you getting (with any runtime) with either the Kimi-Linear-Instruct or Kimi-Linear-Base on your RTX 3070?
0 comments
default
newest
oldest
samus
6mo ago
With a Qwen3-32B-A3B (Q8) I'm getting 10-20 t/sec on KoboldAI, e.g., llama cpp. Faster than I can read, so good enough for hobby use. I expect this model to be significantly faster, but llama.cpp-based software probably doesn't support it yet.
j
/
k
navigate · click thread line to collapse