They’re making a token effort, but this kind of thing doesn’t extend to something more intelligent that can cause real harm. If you scaled GPT-4 up to something much more intelligent, it would probably at best just try to please us with ethical-sounding responses that aren’t necessarily actually good decisions. I remember seeing something where it said that saying an offensive word that no one will hear isn’t acceptable even if it’s the only way to save millions of people
I wouldn't call it a token effort, they went to quite a bit of trouble to make GPT-4 safe. This is an active area of research too. At some point you need to prove GPT-4 would do something unsafe. If anyone did, they would improve their systems in response.