> They can theoretically model any function, but the number of parameters needed means in practice they can't.
Even theoretically, no they can't. They can theoretically model any continuos function.
Plus, even for continuous functions, the theorem only proves that, for any function, there exists some NN that approximates it to arbitrary precision. It is not known whether there is some base NN + finite training set that could be used to arrive at that target NN using some algorithm in a finite number of steps.