I think around the BLOOM models (2022) it was found out that if you train english-only, the model performs worse than if you have even little mixture of other languages.
Also, there were other papers (one epoch is all you need) where it was shown that diverse data is better than multiple epochs, and finally, there was paper (textbooks is all you need) for famous Phi model, with conclusion that high-quality data > lots of data.
This by itself is not a proof for your specific question but you can extrapolate.