For a software engineer with no experience in machine learning / AI, what does i...

magicalhippo · on March 14, 2020

The tools available means you only need to provide a list of normal sentences, and they should include the words you'd like it to know about.

For my case I just wanted to train it on like 30 different sentences, that took less than a second. But for a general assistant ala Google Home you'll want a large number of sentences and I hear it can take a while (hour or few?).

Due to using probabilities it will match words in other sentences than what you give it, but from my understanding it will be partial to the ones you feed it if DeepSpeech mis-classifies a character or two.

nmstoker · on March 15, 2020

Here's an example I did using a custom LM with DeepSpeech - the description links back to the forum with the steps for producing it.

https://youtu.be/LWUBK6PAaxM

This was on a slightly earlier version, and they've made improvements in speed and quality of recognition since then.

magicalhippo · on March 14, 2020

> Hundreds of hours of audio data from your own voice?

I should clarify this. As I mentioned, training the neural net part requires tons of audio and the corresponding text (and people should totally contribute[1], the resulting data sets are released to the public). The neural net in DeepSpeech is then used on an audio stream and outputs a stream of characters.

Turning that stream of characters into sentences is what the language model is for.

Training the neural net is very data and compute intensive, but fortunately Mozilla provides pre-trained models.

Generating the language model is relatively cheap. And if your target language shares sounds with English, you may get away with using the English-trained neural net but with a non-English language model.

[1]: https://voice.mozilla.org/