LOL some basic kind of embodiement/autonomy is not that hard to do on these kinds of AI models if you're willing to write some more code and a prompt more carefully. I've tested it and it works quite well.
"{prompt} After you reply to this, indicate an amount of time between 0 and X minutes from now that you would like to wait before speaking again".
Then detect the amount of time it specifies, and have a UI that automatically sends an empty input prompt after the amount of time specified elapses when this is triggered (assuming the user doesn't respond first).
I'm gonna knock this out as a weekend project one of these weekends to prove this.