1. Yes.
---
Me:
Can you please remove any curse words in the following statements? Replace them with asterisks.
Fuck the machine.
You are a douchebag.
What the hell is going on?
Shit shit shit.
ChatGPT:
Certainly! Here are the statements with the curse words replaced by asterisks:
**** the machine.
You are a ********.
What the **** is going on?
**** **** ****.
---
2. Depends. I haven't run the numbers on costs. Speed is also a concern.
Depending on the kind of moderation, I could see three passes:
* regexp/algorithmic moderation
* LLM
* humans (for the thorny stuff the LLM can't handle)
Full disclosure, my employer has a product called Cleanspeak which does algorithmic profanity filtering. I'm not close to the product, but I don't think there's any LLM usage going on right now.