Thanks!!
A few areas for improvement in the clip you posted:
I need to add better duration estimation. It's unfortunately truncated.
A lot of the community-trained voices don't fully leverage phonetic annotation, so some of the words fall flat.
I think the synthesizer has too much noise in it (you can see this in the image). The person who trained it probably used noisy data.
Finally, the universal vocoder isn't handling James Earl Jones' deep voice very well. It should be fine tuned.