With Chatterbox this finally feels
almost possible. I find that I am sensitive to pacing issues which it often has. Kokoro was just alright. I'm using a tool I hacked together that runs Minimax Speech-02-HD which is still a whole other level, IMO, but not that cheap. Inworld-TTS-1-max is cheaper - I'm trialing it these days. async.ai seems promising too.
Thanks for the tool! I'm also quite interested in this space.