Could you watch a music video and say "that's the snare drum, that's the lead singer, keyboard, bass, that's the truck that's making the engine noise, that's the crowd that's cheering, oh and that's a jackhammer in the background"? So can AI.
Could you point out who is lead guitar and who is rhythm guitar? So can AI.
I thought about it. Still seems kind of pointless.
That doesn't seem any better than typing "rhythm guitar". In fact, it seems worse and with extra steps. Sometimes the thing making the sound is not pictured. This thing is going to make me scrub through the video until the bass player is in frame instead of just typing "bass guitar". Then it will burn some power inferring that the thing I clicked on was a bass.
Could you watch a music video and say "that's the snare drum, that's the lead singer, keyboard, bass, that's the truck that's making the engine noise, that's the crowd that's cheering, oh and that's a jackhammer in the background"? So can AI.
Could you point out who is lead guitar and who is rhythm guitar? So can AI.