Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here's a question for folks who work on DL for audio: what are folks using for vocoders these days?

I feel like that's where a lot of artifacts are introduced (at least for TTS) and the best methods a while ago were slow and autoregressive.



In recent years, there has been substantial advancement in vocoders for DL audio applications. WaveGAN and MelGAN have emerged as promising solutions, harnessing the power of generative adversarial networks (GANs) to produce high-fidelity audio. Furthermore, parallel-waveGAN and HiFi-GAN have showcased improved efficiency with quicker inference times while maintaining exceptional audio quality.


Thanks!!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: