This is a discussion that comes up a lot in medical technology, and if I had to guess why this form of rhetoric fails is not just because it's easier to empathize with a human, but also because when the failure case seems "simple", the issue seems a lot more like an oversight and systemic issue. That in turn makes the probability of failure probably look much higher than it really is, while also implying further undiscovered oversights.
That also kinda addresses point 2, in that, specific failure case or not, it implies a weak system with obvious oversights. That definitely doesn't help the political case for complete FSD approval.
I'm generally skeptical of this utilitarian, rationalist form of rhetoric, if only because it's overly optimistic about the ability to overcome issues in some amorphous future. Sure, a future with full FSD is probably a net good, and we can even say it's probably in within our lifetimes, but the claim that future mitigated harms outweighs all current harms of live-testing FSD won't win enough people over, and drowns out other possible policies like advocating for adapting our infrastructure to support FSD rather than having cars attempt to read signs and signals designed for humans.