Zero-shot voice cloning sounds great if you get a lucky generation.
#1
by
Gapeleon
- opened
Depending on the voice, zero-shot voice cloning works pretty well if you provide [reference_text + new_text] then pre-fill the response with encoded reference audio.
https://huggingface.co/spaces/Gapeleon/KaniTTS_Voice_Cloning
Sometimes it needs a couple of re-generations.
KaniTTS and SparkTTS are the only models I've tried that can get my accent right.
Woow!
Thats awesome! We will release a stable base model next week, I hope it will work smoothly