Don’t they do it already? There are a lot of languages where intonation is absol...

bobsmooth · on Sept 25, 2023

In English, intonation changes the meaning of the word but not the word itself. From what I understand, in tonal languages tone changes the whole word. I don't think ML understands that difference yet.

fragmede · on Sept 26, 2023

Yeah they do. I was able to get ChatGPT-4 to transcribe 我哥哥高過他的哥哥, which says that they can. I did have to set the app to Chinese, and the original didn't work so I had to modify what I said slightly.

https://www.tiktok.com/t/ZT86psPxY/

Roughly translated, my older brother is taller than that other guy's older brother.

modeless · on Sept 26, 2023

Of course speech recognition works for Chinese. What it doesn't do is transcribe intonation and prosody in non-tonal languages. It's not even clear how one would transcribe such a thing as I'm not aware of a standard notation.

fragmede · on Sept 29, 2023

IPA format should cover that, no?

modeless · on Sept 30, 2023

Maybe? I thought IPA was just phonetic but I see that it does have some optional prosody stuff that could in theory cover some of it. I'm not sure how standard or complete it really is in practice.

I haven't heard of any large datasets of IPA transcripts of speech with the detail necessary to train a fully realistic STT->LLM->TTS system. If you know of some that would be interesting to look at.