You apply a non-linear function (usually some sigmoid) on the output vector after each matrix product. Otherwise, you'd be correct and any multi-layer ANN could be expressed as a single layer network.
Thanks. It makes sense. The sigmoid is the activation function of the output "neuron". Unfortunately, matrix algebra here is not as useful as in computer graphics.
No problem. Actually, I personally found that a pretty intuitive understanding of linear algebra & vector calculus makes quite a lot of ML straight forward to approach geometrically.