As someone whose tried this repo extensively with politican sound clips (I wanted to troll buddies on discord) - it kinds blows. Don't get me wrong - it's really neat but the results are far less good than one may expect.
Sometimes it almost works, and then it's just totally absurd. Long pauses, voices that don't sound compelling, total failures on female voices. It's great in theory but it showed me that there's a ton of work to be done with voice cloning.
Props to the author for using UMAP to seperates voices though
Also lol at the demand being so high that there are open issues of people offering to pay others to install this on their machine. Freelancing opportunities show up in the strangest of places...
Real Time Voice Cloning certainly has iffy output, but it's probably the most popular because it provides the easiest plug-and-play experience with even a simple UI to get started.
The author says he's working on a more polished toolkit called Resemble.AI, but I've never tried it. https://www.resemble.ai/
There's certainly a market out there for just beautifying existing repos to making it easier for non-scholars to get going. Even having a Colab Notebook ready to click-and-start is quite powerful -- probably a big reason why First Order Model (source paper to the original story) got so much traction so quickly.
Sometimes it almost works, and then it's just totally absurd. Long pauses, voices that don't sound compelling, total failures on female voices. It's great in theory but it showed me that there's a ton of work to be done with voice cloning.
Props to the author for using UMAP to seperates voices though
Also lol at the demand being so high that there are open issues of people offering to pay others to install this on their machine. Freelancing opportunities show up in the strangest of places...