The guideline is that the newer your model, the more likely it is to have diverse voice recognition datasets since it solves the earlier problems caused by non representative data. The trend is moving towards better recognition for outliers. The training models are fed data that is very specific and not at all just whatever recordings they have collected in an S3 bucket. Given the amount of post recording work diarization, and QA we had to do on every single recording, I can’t imagine wanting to YOLO in bulk data.