if this is voice chat, in-band might be good enough. There is no technology (yet...

if this is voice chat, in-band might be good enough. There is no technology (yet?) which can real-time recognize spoken emoji description like "weird cucumber with mouth.. wait I think its an alligator or maybe even a crocodile?" and then retroactively replace it with a different one while keeping the timing correct.