Notes on neural networks from scratch in Clojure (opens in new tab)

(matthewdowney.github.io)

41 pointsmjdowney2y ago8 comments

8 comments

> Much of the magic inside of neural network libraries has less to do with cleverer algorithms and more to do with vectorized SIMD instructions and/or being parsimonious with GPU memory usage and communication back and forth with main memory.

I mean… that’s not really fair is it?

We’ve been able to build NN libraries for 30 years, but it’s the transformers algorithm on top of it, and the stacked layers forming a coherent network that are the complex parts right?

Implement stable diffusion in clojure (the python code for it is all open source) and we quickly see that there is a lot of complexity once you’re doing something useful that the primitive operations don’t support.

It’s not really any different from opencv with the basic matrix operations and then paper-by-paper implementations of various algorithms.

Building a basic pixel matrix library using clojure wouldn’t give you an equivalent to opencv either.

Is there really a clear meaningful bridge between building low level operations and building high level functions out of them?

When you implement sqrt, you’ve learnt a thing… but it doesn’t help you build a rendering engine.

Hasn’t this always been the problem with learning ML “from scratch?”

You start with basic operations, do MNIST… and then… uh, well, no. Now you clone a python repo that implements the paper you want to work on and modify it, because implementing it from scratch with your primitives isn’t really possible.

maxbond2y ago

I don't think "much of the magic is X" implies "only X matters." A lot of the magic of a NN framework is in getting good enough performance that the problem becomes tractable. PyTorch and Tensorflow give you a lot of out of the box abstractions, which is great, but the most important thing they do is compile your model to run on the GPU so you can do an epoch in 20m instead of 2hrs.

xrd2y ago

This is such a terrific write-up. It's always felt like the ML space takes years of study to get a foothold. But this is a clear path to learning critical first principles. Thanks for writing this!

maxbond2y ago

For anyone who can code and wants to get up and running with neural nets quickly, with a focus on getting your hands dirty over understanding the theory (I'm finding the former is leading naturally to the latter), I recommend Deep Learning with Python by Francois Chollet.

(If anyone has suggestions for a book to read after that one, I'm all ears.)

xrd2y ago

This book was published in 2017. Is it still relevant? I would only say that because the pace of the last six months feels like we are time traveling. :)

maxbond2y ago

Totally reasonable question.

The second edition was published in 2021. It's a book for beginners, so I think it's fine as far as relevance goes. It doesn't cover the attention mechanism, which is a weakness, but ultimately it seems to me like you're gunnuh have to hit the papers at some point to learn deep learning. For me this book spanned the gap until I could follow the papers (at least, the basic ones, I'm still pretty early in my learning). Having enough Keras to be able to try stuff and try different techniques on the same dataset has also been tremendously helpful. (You might also say that one should learn PyTorch instead, and that's fair.)

Luckily many papers start with a verbose preamble explaining the history and motivation for their approach, which is annoying when you're experienced, but helpful for a beginner.

1 more reply

j / k navigate · click thread line to collapse

8 comments

wokwokwok2y ago

I mean… that’s not really fair is it?

We’ve been able to build NN libraries for 30 years, but it’s the transformers algorithm on top of it, and the stacked layers forming a coherent network that are the complex parts right?

It’s not really any different from opencv with the basic matrix operations and then paper-by-paper implementations of various algorithms.

Building a basic pixel matrix library using clojure wouldn’t give you an equivalent to opencv either.

Is there really a clear meaningful bridge between building low level operations and building high level functions out of them?

When you implement sqrt, you’ve learnt a thing… but it doesn’t help you build a rendering engine.

Hasn’t this always been the problem with learning ML “from scratch?”

maxbond2y ago

xrd2y ago

This is such a terrific write-up. It's always felt like the ML space takes years of study to get a foothold. But this is a clear path to learning critical first principles. Thanks for writing this!

maxbond2y ago

(If anyone has suggestions for a book to read after that one, I'm all ears.)

xrd2y ago

This book was published in 2017. Is it still relevant? I would only say that because the pace of the last six months feels like we are time traveling. :)

maxbond2y ago

Totally reasonable question.

Luckily many papers start with a verbose preamble explaining the history and motivation for their approach, which is annoying when you're experienced, but helpful for a beginner.

1 more reply

j / k navigate · click thread line to collapse