Knowing the specific multiplies and QKV and how attention works doesn't develop your intuition for how LLMs work. Knowing that the effective output is a list of tokens with associated probabilites is of marginal use. Knowing about rotary position embeddings, temperature, batching, beam search, different techniques for preventing repetition and so on doesn't really develop intuition about behavior, but rather improve the worst cases - babbling repeating nonsense in the absolute worst - but you wouldn't know that at all from first principles without playing with the things.
The truth is that the inference implementation is more like a VM, and the interesting thing is the model, the set of learned weights. It's like a program being executed one token at a time. How that program behaves is the interesting thing. How it degrades. What circumstances it behaves really well in, and its failure modes. That's the thing where you want to be able to switch and swap a dozen models around and get a feel for things, have forking conversations, etc. It's what LM Studio is decent at.