Pytorch nightly (I use for cuda-12) doesn't work w Python 3.12, but if you stick w 3.11 or 3.10 you should be ok. Rest was just w/o version numbers if you're on a clean venv should be fine, however there's a bug in the Utils lib that requires a 1-line fix if you're trying to inference (also linked). nltk was the only dependency not listed so not bad compared to most code drops!
Thanks for writing up your experience! Good to know it works! And it's fast!
PHONEMIZER_ESPEAK_LIBRARY = c:\Program Files\eSpeak NG\libespeak-ng.dll
PHONEMIZER_ESPEAK_PATH = c:\Program Files\eSpeak NGEdit: Got it working, sounds really great and is super fast as advertised. Amazing! Just tried modifying the code to make it speak more quickly and it worked first try and still sounds good too! This is way better than using Coqui TTS. Just need a few more pretrained models and the voice cloning that was in the paper and this will become super popular very quickly.