Yeah and it
will happen. I am all for self-driving cars simply as they increase the life-quality for our civilization (more independency, cheaper ways of transportation, less fatalities), but there
will be accidents and then you have to analyze them (which is good).
In the end you want to know what did go wrong (the public will demand it and they are right) and it might be misclassification.
But that is not enough. You want to know why did it classify situation X wrong? So the answer is because the inference network (which was created via the training network) computed its weights because the input was Y. Now you might throw your hands in the air and say "oh it's complicated, the network is undeterministic, blame NVidia", but you can also go farther and build your networks deterministic (which is possible and AFAIK not a performance penalty). Compiler research helps in at least guaranteeing that certain parts of code are deterministic which makes it easier to debug and maybe avoid complex NN misclassification scenarios, but the way to do it doesn't have to do much with NN's itself but more with language design and (real-time, in particular deterministic) OS research.
So the statement oh, that's so complex, we do not no why we misclassified is no excusion, we can do better.
For starters we have to publish NN papers with implementations that describe how to make a particular NN out of given trainigs data (provide that too). We already publish the code and network structure (see caffe, etc.), but often with pre-trained models that have been build on a cluster with many forms of training-data going through network structure, etc.
Now at the moment you read a paper, head to the published code (often the case, again a desirable property of the ML community) and try to reproduce some examples by training on the data.
However it is hard to say in the end, if your network is really as good as the published, as a simple
$ diff my-net-binary-blob.dat tesla-net-binary-blob.dat
might fail, if the stack (training etc.) to build the network is undeterministic. However if you have a good (i.e. deterministic) stack you might be able to reproduce NNs
bit-for-bit, which makes it simpler to answer the question "why did we missclassify".