But in a sense, the 300 lines of Llama code are essentially just lines of math. And reading through any math proof will show you that any particular line can hide large amounts of complexity.
This can be true with code with more tedious operations, but those lines are a smaller fraction of the overall code base by definition.
Even the "tedious" parts of the llama code can hide large complexity. Setting a learning rate with a schedule might require reading a paper or two for your particular architecture.
But yes, once you parse all the math and the theory, the lines are kinda simple matmul and forward lol.