Live is great but I don’t think it’d be feasible for most languages to be a real 1:1 translation in live.
Even a 10s delay allows for the whole sentence/phrase to be captured and translated in entirety. A lot of languages can drastically change meaning due to a word on the other side of the sentence.
It’s already a thing with near-zero delay. MS Teams does it (dunno about the translation) and the QSMP Minecraft server has a bunch of livestreamers from different countries who use it for realtime translation.
What actually happens is that the current sentence gets “corrected” several times as you keep speaking. It’s a bit jittery and if the word order differs significantly then the translated sentence might be a bit wonky for a few seconds, and there are a few misses but overall it works really well; at least well enough that people who don’t speak each others’ language can have a conversation in their native tongues with essentially no more delay than reading speed. I can easily follow a livestream in a foreign language with the live subtitles (which was not the case a mere 6 months ago for any language other than English).
Live shouldn’t be used in a home setup anyway unless for something where interaction is required, like a teams call or twitch stream. Anything else can take a delay for the sake of preserving the meaning.
It doesn’t have to be live as in with the player but I imagine the audio could be loaded into the program simultaneously and have it produce cc for the entire movie as you watch it
How does that work for people with non US/UK accents? I ask because all of the transcription software I’ve seen will work absolutely fantastically on even the most garbled and redneck American accents, and the vast majority of British ones too, but as soon as you get to Scottish/Welsh/German/Australian/really anywhere elses accents, it has a complete breakdown and you can’t make sense of it at all
https://m.youtube.com/watch?v=h5JNnvXjmmY
Looks like they actually solved it a while ago, this video shows multiple base languages. Sorry but I can’t speak to specifics, but I do know my next project.
Whisper AI is pretty darn good. I’ve used it to make subtitles for MST3K vids where nothing good exists and maybe only had to spend 10 minutes doing some clean up. It even recognizes when different people are speaking and breaks up the subs accordingly.
I don’t think I would use this actually, because I don’t see how an AI could capture the performance. I’m a sub over dub guy anyway, but at least someone making a dub has a sporting chance to make an interesting performance.
Ironically, this might be an area where machine learning could be beneficial.
I’ve been watching a few projects that are attempting to live translate videos. We are very close
Live is great but I don’t think it’d be feasible for most languages to be a real 1:1 translation in live.
Even a 10s delay allows for the whole sentence/phrase to be captured and translated in entirety. A lot of languages can drastically change meaning due to a word on the other side of the sentence.
It’s already a thing with near-zero delay. MS Teams does it (dunno about the translation) and the QSMP Minecraft server has a bunch of livestreamers from different countries who use it for realtime translation.
[EDIT: Live demo from today. Shit’s impressive.]
What actually happens is that the current sentence gets “corrected” several times as you keep speaking. It’s a bit jittery and if the word order differs significantly then the translated sentence might be a bit wonky for a few seconds, and there are a few misses but overall it works really well; at least well enough that people who don’t speak each others’ language can have a conversation in their native tongues with essentially no more delay than reading speed. I can easily follow a livestream in a foreign language with the live subtitles (which was not the case a mere 6 months ago for any language other than English).
Live shouldn’t be used in a home setup anyway unless for something where interaction is required, like a teams call or twitch stream. Anything else can take a delay for the sake of preserving the meaning.
I absolutely hate to watch subtitles appear word for word. So no, please no live captions.
It doesn’t have to be live as in with the player but I imagine the audio could be loaded into the program simultaneously and have it produce cc for the entire movie as you watch it
How does that work for people with non US/UK accents? I ask because all of the transcription software I’ve seen will work absolutely fantastically on even the most garbled and redneck American accents, and the vast majority of British ones too, but as soon as you get to Scottish/Welsh/German/Australian/really anywhere elses accents, it has a complete breakdown and you can’t make sense of it at all
https://m.youtube.com/watch?v=h5JNnvXjmmY Looks like they actually solved it a while ago, this video shows multiple base languages. Sorry but I can’t speak to specifics, but I do know my next project.
Here is an alternative Piped link(s):
https://m.piped.video/watch?v=h5JNnvXjmmY
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I’m open-source; check me out at GitHub.
Whisper AI is pretty darn good. I’ve used it to make subtitles for MST3K vids where nothing good exists and maybe only had to spend 10 minutes doing some clean up. It even recognizes when different people are speaking and breaks up the subs accordingly.
Imagine the next step though, soon AI will generate actors’ voices speaking in any language you want.
I don’t think I would use this actually, because I don’t see how an AI could capture the performance. I’m a sub over dub guy anyway, but at least someone making a dub has a sporting chance to make an interesting performance.