> The new voice capability is powered by a new text-to-speech model, capable of ...

> The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.

I'm more interested in this. I wonder how it performs compared to other competitor models or even open source ones?