They are excellent. Not quite human level, but very very close. I was curious about Chinese-English translation capabilities of the latest crop of models and on more difficult texts a bilingual model like GLM-130B makes several errors per page while GPT-4 is down to probably just around one.
Interested to see how that plays out for programming languages.