Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I develop Kaldi Active Grammar [1], which is mainly intended for use with strict command grammars. Compared to normal language models, these can provide much better accuracy, assuming you can describe (and speak) your command structure exactly. (This is probably more acceptable for a voice assistant for an audience that is more technical.) The grammar can be specified by a FST, or you can use KaldiAG through Dragonfly, which allows you to specify them (and their resultant actions) in Python. However, KaldiAG can also do simple plain dictation if you want.

KaldiAG has an English model available, but other models could be trained. Although you can't just drop in and use a standard Kaldi model with KaldiAG, the modifications required are fairly minimal and don't require any training or modification of its acoustic model. All recognition is performed locally and off line by default, but you can also selectively choose to do some recognition in the cloud, too.

Kaldi generally performs at the state of art. As a hybrid engine, although training can be more complicated, it generally requires far less training data to achieve high accuracy, compared to "end to end" engines.

[1] https://github.com/daanzu/kaldi-active-grammar



Too late to edit, but I should probably have noted that KaldiAG also would make it easy to define "contexts" when (groups of) commands are active for recognition. For example, if the TV is on, you could have commands for adjusting the volume/etc. But if it is off, those commands are disabled, so they can't be recognized, and further, the engine knows this and can therefore better recognize the other commands that remain active.


Could Home Assistant use such commands by running a docker?

Also the video demo is rather impressive, in how accurate (and predictable) it recognises.


I don't know much about Home Assistant, but that certainly should be possible to set up. The KaldiAG API is pretty low level, but basically: you define a set of rules, and send in audio data, along with a bit mask of which rules are active at the beginning of each utterance, and receive back the recognized rule and text. The easy solution is probably to go through Dragonfly, which makes it easy to define the rules, contexts, and actions. It might be a little hacky to do, but you should be able to wire it up with Home Assistant somehow.

Although I mainly use it for computer control as demonstrated in the video, I do have many commands akin to home automation, like adjusting the lights, HVAC, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: