I guess what the grandparent means is that there is some good old "discrete logic" on top of the various sensor inputs that ultimately turns things like a detected red light into the car stopping.
But of course, as you say, that system does not consume actual raw (camera) sensor data, instead there are lots of intermediate networks that turn the camera images (and other sensors) into red lights, lane curvatures, objects, ... and those are all very vulnerable to making up things that aren't there or not seeing what is plain to see, with no one quite able to explain why.