If an SVM kernel can replicate a 2 layer NN, why couldn't there be a kernel for a X layer NN, and then autoderive the architecture just like SVMs can autoderive the correct number of neurons? Then there'd also be a more robust theoretical understanding of what's happening.