I find all the ones like espeak, piper, festival to be awful. The voices are OK-ish, but intonation and pronunciation are so very bad. Tortoise is OK, but slow and not for long texts. Paid services like Google, AWS or Elevenlabs are miles ahead. There is a number of CUDA-based engines (provided in the comments of the post I linked) that you supposedly can use if you have a nVidia GPU available. I don’t, so they are not for me.