I mean… that’s not really fair is it?
We’ve been able to build NN libraries for 30 years, but it’s the transformers algorithm on top of it, and the stacked layers forming a coherent network that are the complex parts right?
Implement stable diffusion in clojure (the python code for it is all open source) and we quickly see that there is a lot of complexity once you’re doing something useful that the primitive operations don’t support.
It’s not really any different from opencv with the basic matrix operations and then paper-by-paper implementations of various algorithms.
Building a basic pixel matrix library using clojure wouldn’t give you an equivalent to opencv either.
Is there really a clear meaningful bridge between building low level operations and building high level functions out of them?
When you implement sqrt, you’ve learnt a thing… but it doesn’t help you build a rendering engine.
Hasn’t this always been the problem with learning ML “from scratch?”
You start with basic operations, do MNIST… and then… uh, well, no. Now you clone a python repo that implements the paper you want to work on and modify it, because implementing it from scratch with your primitives isn’t really possible.
(If anyone has suggestions for a book to read after that one, I'm all ears.)