I understand that in multiplayer with strangers it would be a problem because you could affect other players' experiences, but in a single-player game I don't see this as a big issue, as long as the NPC doesn't spontaneously bring immersion-breaking topics into the conversation without the player starting it (which I suppose could be achieved with a suitable system prompt and some fine-tuning on in-lore text).
If it's the player that wants to troll the game and break immersion by "jailbreaking" the NPCs, it's on them, just like if they use a cheat code and make the game trivial.