What's the question this method is attempting to answer? What does an answer look like? How does this method lead to it?
> If you have ever tried to minimize a function with gradient descent
"and if otherwise, go kick sand," I guess.
I wonder what would happen with this analysis if a momentum term was added to the gradient descent. It seems that it would fix the specific failure modes in the examples, but I wonder if there's a corresponding mathematical way of categorizing what kinds of functions can(not) be quickly optimized with GD + momentum.
The neural net answer is being able to spawn a wavelet at any position, as opposed to tweaking the position of an existing one.