undefined | Better HN

0 pointsadrian_b1mo ago0 comments

For conversational purposes that may be too slow, but as a coding assistant this should work, especially if many tasks are batched, so that they may progress simultaneously through a single pass over the SSD data.

0 comments

QuantumNomad_1mo ago

Three hour coffee break while the LLM prepares scaffolding for the project.

pbhjpbhj1mo ago

Like computing used to be. When I first compiled a Linux kernel it ran overnight on a Pentium-S. I had little idea what I was doing, probably compiled all the modules by mistake.

stingraycharles1mo ago

I remember that time, where compiling Linux kernels was measured in hours. Then multi-core computing arrived, and after a few years it was down to 10 minutes.

With LLMs it feels more like the old punchcards, though.

drowsspa1mo ago

At least the compiler was free

1 more reply

tempoponet1mo ago

Rather, Imagine you have 2-3 of these working 24/7 on top of what you're doing today. What does your backlog look like a month from now?

zozbot2341mo ago

Batching many disparate tasks together is good for compute efficiency, but makes it harder to keep the full KV-cache for each in RAM. You could handle this in an emergency by dumping some of that KV-cache to storage (this is how prompt caching works too, AIUI) and offloading loads for that too, but that adds a lot more overhead compared to just offloading sparsely-used experts, since KV-cache is far more heavily accessed.

j / k navigate · click thread line to collapse