I think this was a failure. The gold standard should be that the if every human driver was replaced with an AI how well could the system function. This makes it look like things would be catastrophic. Thus, showing how humans continue to be much more versatile and capable than AI.
I suppose if you lower the standards for what you hope AI can accomplish it wouldn't be considered a failure.