The problem is that the camera requirements are defined, presumably to try to have some lower bound on the quality of the visuals, there is no requirement for audio such that it sounds good on stereo speakers, which is probably what the majority of consumers are using.
The idea that Netflix have such a strict definition on camera quality is also a bit of a farce, given how woeful the image looks after it goes through their incredibly overenthusiastic level of compression, but that's neither here nor there.