I agree with your general point, that we don't need insane levels of math, but I would say a college level of calculus, linalg and probability is baseline.
A basic benchmark off the top of my head:
Being able to pick up, without stumbling on the fundamentals
- what LoRA is doing
- how a RBF-kernel SVM works
- why KL and reverse-KL are different
- why using mean squared error is equivalent to MLE on a gaussian
Not saying the four above pieces are all necessary, but that you should be able to learn them on demand without needing to revisit what a basis vector is.
"Working out derivatives of arbitrary functions" is school level.
We aren't talking about doing cutting edge research, just educating people on the basics of how ML does what it does. I agree that the things you list should follow at some point in the sequence for any rigorous education. But it's a question of at what point those things should come up and what the corresponding depth of education is.
For the initial introduction I think everything you listed is entirely out of scope. You don't need any of that to get a basic MLP working using a for loop and naive gradient descent.
Who are we giving an intro to who doesn’t have 2 years of stem education?
Well sure. Your initial statement was about "most applied ML".
> Rate of change -> it is flat -> that is not a useful signal. I don't see the issue?
It's not going to be zero if you sample in your practicum setting. You're gonna get RuntimeError: element 0 doesn't require grad and doesn't have a grad_fn. So yeah.