The matrix calculus you need for deep learning (2018) (opens in new tab)

(explained.ai)

224 pointscpp_frog2y ago40 comments

40 comments

dang2y ago

The matrix calculus you need for deep learning (2018) - https://news.ycombinator.com/item?id=26676729 - April 2021 (40 comments)

Matrix calculus for deep learning part 2 - https://news.ycombinator.com/item?id=23358761 - May 2020 (6 comments)

Matrix Calculus for Deep Learning - https://news.ycombinator.com/item?id=21661545 - Nov 2019 (47 comments)

The Matrix Calculus You Need for Deep Learning - https://news.ycombinator.com/item?id=17422770 - June 2018 (77 comments)

Matrix Calculus for Deep Learning - https://news.ycombinator.com/item?id=16267178 - Jan 2018 (81 comments)

quanto2y ago

The article/webpage is a nice walk-through for the uninitiated. Half the challenge of doing matrix calculus is remembering the dimension of the object you are dealing with (scalar, vector, matrix, higher-dim tensor).

Ultimately, the point of using matrix calculus (or matrices in general) is not just concision of notation but also understanding that matrices are operators acting on members of some spaces, i.e. vectors. It is this higher level abstraction that makes matrices powerful.

For people who are familiar with the concepts but need a concise refresher, the Wikipedia page serves well:

https://en.wikipedia.org/wiki/Matrix_calculus

PartiallyTyped2y ago

Adding, these operators are also "polymorphic"; for matrix multiplication the only operations you need are (non commutative) multiplication and addition; thus you can use elements of any non-commutative ring, i.e. a set of elements with those two operations :D

Matrices themselves form non-commutative rings too; and based on this, you can think of a 4N x 4N matrix as a 4x4 matrix whose elements are NxN matrices [1] :D

[1] https://youtu.be/FX4C-JpTFgY?list=PL49CF3715CB9EF31D&t=1107

You already know whose lecture it is :D

I love math.. I should have become a mathematician ...

tikhonj2y ago

You can even generalize linear algebra algorithms to closed semirings and have some really cool algorithms pop out, like finding the shortest path in graphs. There's a great paper called "Fun with Semirings" that goes into more details; unfortunately looks like the PDF isn't easily available online any more, but I found some slides[1] that seem to cover the same ideas well enough.

[1]: https://pdfs.semanticscholar.org/2e43/477e26a54b2d1a046c2140...

PartiallyTyped2y ago

Okay I went over the slides and good lord this would have made my life easier not too long ago.

PartiallyTyped2y ago

This deserves its own HN post imho.

mrfox3212y ago

Re [1]: it's fairly concrete to simply say that matrix multiplication can be performed block-wise.

PartiallyTyped2y ago

I don’t disagree; but that is just an example of MM. The gist is not that you can do block multiplication; but that you can define matrices over any non commutative ring, which includes other matrices - ie blocks.

1 more reply

SnooSux2y ago

This is the resource I wish I had in 2018. Every grad school course had a Linear Algebra review lecture but never got into the Matrix Calculus I actually needed.

ayhanfuat2y ago

That was my struggle, too. Imperial College London has a small online course which covers similar topics (https://www.coursera.org/learn/multivariate-calculus-machine...). It helped a lot.

unpaddedantacid2y ago

I just finished my first year in an AI bachelors, we saw Linear Algebra with basic matrix calculations and theorems, so much calculus that the notes take up 3GB space, physics, phycology and very outdated logic classes and basics to python which left many of the students wondering how to import a library

dpflan2y ago

True, this was a designated resource during my studies (2020/2022), but they were post-2018.

cs7022y ago

Please change the link to the original source:

https://arxiv.org/abs/1802.01528

---

EDIT: It turns out explained.ai is the personal website of one of the authors, so there's no need to change the link. See comment below.

parrt2y ago

:) Yeah, I use my own internal markdown to generate really nice html (with fast latex-derived images for equations) and then full-on latex. (tool is https://github.com/parrt/bookish)

I prefer reading on the web unless I'm offline. The latex its super handy for printing a nice document.

cs7022y ago

Even though it's shockingly common, I never cease to be surprised and delighted when authors who are on HN take the time to reply to comments about their work.

Thank you for doing this with Jeremy and sharing it with the world!

parrt2y ago

Sure thing! Very enjoyable to have people use our work.

liorben-david2y ago

Explained.ai seems to be Terrence Parr's personal site

cs7022y ago

Thank you for pointing it out. I edited my comment.

trolan2y ago

I finished Vector Calculus last year and have no experience in machine learning but this seems exceptionally thorough and would have made my life easier having a practical explanation over a mathematical one, but woe is the life of the engineering student I guess.

parrt2y ago

Glad to be of assistance! Yeah, It really annoyed me that this critical information was not listed in any one particular spot.

rdedev2y ago

I had followed this when I was learning DL through Andrew NG's course. In one of the lessons, he had the formula for calculating the loss as well as it's derivatives.

I tried driving these formulas from scratch using what I learned from OP's post but it felt like there was something missing. I think it boils down to me not knowing how to aggregate those element wise derivatives into a matrix form. Afaik the Matrix cookbook and certain notes from Stanford cs231n that helped me grok it fully

bluerooibos2y ago

Oh nice, I did most of this in school, and during my non-CS engineering degree. Thanks for sharing!

Always wanted to dip my toes into ML, but I've never been convinced of it's usefulness to the average solo developer, in terms of things you can build with this new knowledge. Likely I don't know enough about it to make that call though.

williamcotton2y ago

Here’s an ML project I’ve been working on as a solo dev:

https://github.com/williamcotton/chordviz

Labeling software in React, CNN in PyTorch, prediction on app in SwiftUI. 12,000 and counting hand labeled images of my hand on a guitar fretboard!

godelski2y ago

There's a common belief that you don't need math for ML or that you need a lot of math for ML. So let me clarify:

You don't need math to make a model perform well, but you do need math to know why your model is wrong.

nsajko2y ago

Another matrix math reference: https://github.com/r-barnes/MatrixForensics

_the_inflator2y ago

I just had a glimpse look at it. A good sum-up.

It seems that these topics are covered by the first one or two semesters of a Math degree. Of course university is a bit more advanced.

jayro2y ago

We just released a comprehensive online course on Multivariable Calculus (https://mathacademy.com/courses/multivariable-calculus), and we also have a course on Mathematics for Machine Learning (https://mathacademy.com/courses/mathematics-for-machine-lear...) that covers just the matrix calculus you need in addition to just the linear algebra and statistics you need, etc. I'm a founder and would be happy to answer any questions you might have.

thewataccount2y ago

I understand you don't have a free trial, is there any chance you have a demo somewhere of what it actually looks like though? Like a tiny sample lesson or something along those lines? It looks interesting but I'm just uncertain as to what it actually "feels" like in practice vs lets say Brilliant, etc.

I only see pictures, I'm curious the extent of the interaction in the linear algebra/matrix calc specifically

jayro2y ago

That's a good point! We definitely need to add some more information to the website. In the meantime, if you send an email to support@mathacademy.com, I'd be happy to give you demo over Zoom and answer any questions you might have.

barrenko2y ago

Whom do you think Mathematics for Machine Learning benefits? In my personal opinion the only audience for a plethora of courses and articles available in that regard is useful mostly to the people that recently went through college level Linear Algebra.

I'd like more resources geared for people that are done with Khan Academy and want something as well made for more advanced topics.

jayro2y ago

The Mathematics for Machine Learning course doesn't assume knowledge of Linear Algebra, but covers the basics of Linear Algebra you'll need along with the basics of Multivariable Calculus, Statistics, Probability, etc. it does however, assume knowledge of high-school math and Single Variable Calculus. If you've been out of school for while, our adaptive diagnostic exam will identify your knowledge gaps and create a custom course for you that includes the necessary remediation.

If you're REALLY rusty (maybe you've been out of school for a while 5+ years), or maybe you just never learned the material that well in the first place, then you might want to start with one of our Mathematical Foundations courses that will scaffold you up to the level where you can handle the content in Mathematics for Machine Learning. More info can be found here: https://mathacademy.com/courses

The Mathematics for Machine Learning course would be ideal for anyone who majored in a STEM subject like CS (or at least has a solid mathematical foundation) and is interested in doing work in machine learning.

barrenko2y ago

Appreciate the reply, hopefully subscribing to your service beginning of next year (after I am done with Khan Academy math).

thatsadude2y ago

vec(ABC)=kron(C.T,A)vec(C) is all your need for matrix calculus!

esafak2y ago

Can anyone provide an intuitive explanation?

fjkdlsjflkds2y ago

I guess op meant "vec(ABC)=kron(B.T,A)vec(C)", and my attempt at explaining it would be:

If you take the result of transforming the columns vectors in the C matrix by AB and vectorize it you get the same as vectorizing first C and then transforming it by a block matrix obtained as the Kronecker product of B transposed and A.

The significance is that it performs a reduction of matrix calculus to vector calculus (i.e., it shows that you can convert any matrix calculus operation/formula/statement into a vector calculus operation/formula/statement).

hayasaki2y ago

They have an error in their formula, but the vectorized form(stacking columns of the matrix to form a vector) of the triple matrix multiplication(A times B times C) can be changed to a form involving kronecker products against another vectorized matrix.

I wouldn't say that is everything, but it is a useful trick.

esafak2y ago

That is just reading out the equation in English. My question is, why is it so?

1 more reply

scrubs2y ago

Darn good post!

j / k navigate · click thread line to collapse

40 comments

dang2y ago

The matrix calculus you need for deep learning (2018) - https://news.ycombinator.com/item?id=26676729 - April 2021 (40 comments)

Matrix calculus for deep learning part 2 - https://news.ycombinator.com/item?id=23358761 - May 2020 (6 comments)

Matrix Calculus for Deep Learning - https://news.ycombinator.com/item?id=21661545 - Nov 2019 (47 comments)

The Matrix Calculus You Need for Deep Learning - https://news.ycombinator.com/item?id=17422770 - June 2018 (77 comments)

Matrix Calculus for Deep Learning - https://news.ycombinator.com/item?id=16267178 - Jan 2018 (81 comments)

quanto2y ago

For people who are familiar with the concepts but need a concise refresher, the Wikipedia page serves well:

https://en.wikipedia.org/wiki/Matrix_calculus

PartiallyTyped2y ago

Matrices themselves form non-commutative rings too; and based on this, you can think of a 4N x 4N matrix as a 4x4 matrix whose elements are NxN matrices [1] :D

[1] https://youtu.be/FX4C-JpTFgY?list=PL49CF3715CB9EF31D&t=1107

You already know whose lecture it is :D

I love math.. I should have become a mathematician ...

tikhonj2y ago

[1]: https://pdfs.semanticscholar.org/2e43/477e26a54b2d1a046c2140...

PartiallyTyped2y ago

Okay I went over the slides and good lord this would have made my life easier not too long ago.

PartiallyTyped2y ago

This deserves its own HN post imho.

mrfox3212y ago

Re [1]: it's fairly concrete to simply say that matrix multiplication can be performed block-wise.

PartiallyTyped2y ago

1 more reply

SnooSux2y ago

This is the resource I wish I had in 2018. Every grad school course had a Linear Algebra review lecture but never got into the Matrix Calculus I actually needed.

ayhanfuat2y ago

That was my struggle, too. Imperial College London has a small online course which covers similar topics (https://www.coursera.org/learn/multivariate-calculus-machine...). It helped a lot.

unpaddedantacid2y ago

dpflan2y ago

True, this was a designated resource during my studies (2020/2022), but they were post-2018.

cs7022y ago

Please change the link to the original source:

https://arxiv.org/abs/1802.01528

---

EDIT: It turns out explained.ai is the personal website of one of the authors, so there's no need to change the link. See comment below.

parrt2y ago

:) Yeah, I use my own internal markdown to generate really nice html (with fast latex-derived images for equations) and then full-on latex. (tool is https://github.com/parrt/bookish)

I prefer reading on the web unless I'm offline. The latex its super handy for printing a nice document.

cs7022y ago

Even though it's shockingly common, I never cease to be surprised and delighted when authors who are on HN take the time to reply to comments about their work.

Thank you for doing this with Jeremy and sharing it with the world!

parrt2y ago

Sure thing! Very enjoyable to have people use our work.

liorben-david2y ago

Explained.ai seems to be Terrence Parr's personal site

cs7022y ago

Thank you for pointing it out. I edited my comment.

trolan2y ago

parrt2y ago

Glad to be of assistance! Yeah, It really annoyed me that this critical information was not listed in any one particular spot.

rdedev2y ago

I had followed this when I was learning DL through Andrew NG's course. In one of the lessons, he had the formula for calculating the loss as well as it's derivatives.

bluerooibos2y ago

Oh nice, I did most of this in school, and during my non-CS engineering degree. Thanks for sharing!

williamcotton2y ago

Here’s an ML project I’ve been working on as a solo dev:

https://github.com/williamcotton/chordviz

Labeling software in React, CNN in PyTorch, prediction on app in SwiftUI. 12,000 and counting hand labeled images of my hand on a guitar fretboard!

godelski2y ago

There's a common belief that you don't need math for ML or that you need a lot of math for ML. So let me clarify:

You don't need math to make a model perform well, but you do need math to know why your model is wrong.

nsajko2y ago

Another matrix math reference: https://github.com/r-barnes/MatrixForensics

_the_inflator2y ago

I just had a glimpse look at it. A good sum-up.

It seems that these topics are covered by the first one or two semesters of a Math degree. Of course university is a bit more advanced.

jayro2y ago

thewataccount2y ago

I only see pictures, I'm curious the extent of the interaction in the linear algebra/matrix calc specifically

jayro2y ago

barrenko2y ago

I'd like more resources geared for people that are done with Khan Academy and want something as well made for more advanced topics.

jayro2y ago

barrenko2y ago

Appreciate the reply, hopefully subscribing to your service beginning of next year (after I am done with Khan Academy math).

thatsadude2y ago

vec(ABC)=kron(C.T,A)vec(C) is all your need for matrix calculus!

esafak2y ago

Can anyone provide an intuitive explanation?

fjkdlsjflkds2y ago

I guess op meant "vec(ABC)=kron(B.T,A)vec(C)", and my attempt at explaining it would be:

hayasaki2y ago

I wouldn't say that is everything, but it is a useful trick.

esafak2y ago

That is just reading out the equation in English. My question is, why is it so?

1 more reply

scrubs2y ago

Darn good post!

j / k navigate · click thread line to collapse