StyleTalker takes one photo of you and produces a video of you talking
Mar. 18, 2024.
1 min. read.
15 Interactions
Korean researchers recently presented a new AI model that takes one photograph as input, and outputs a video of somebody saying any arbitrary speech, with realistic lip-movement
Scientists from KAIST, the South Korean AI Institute, have described a new model called StyleTalker, which takes a single image of a person as an input, and produces a video of them talking “with accurately audio-synced lip shapes, realistic head poses, and eye blinks”.
StyleTalker combines AI techniques for “audio-driven generation” (generating realistic lip-movements from audio) with “motion-controllable” generation, that can do things like take the head-movements and gestures from one video, and use them in a new video with a new face.
The work builds on a recent boom in neural lip-synced video generation, a research-field that aims to “transforming the lip region of the person in the target video,generating new videos with the lip shapes that match the input audio.” (This could be used, for example, when movies are dubbed from one language into another.)
StyleTalker “can generate more natural and robust talking head videos compared to other models” previously described. It is a step towards more realistic fake videos, but is that a good thing? Let us know in the comments how you think this technology could be used for good and for bad.
Citation: Dongchan Min, Minyoung Song, Eunji Ko, Sung Ju Hwang. StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation. eprint (2024). https://arxiv.org/abs/2208.10922 (open access)
Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter.
4 Comments
4 thoughts on “StyleTalker takes one photo of you and produces a video of you talking”
beautiful drawing
🟨 😴 😡 ❌ 🤮 💩
Learned a lot and the drawing is also beautiful thanks
🟨 😴 😡 ❌ 🤮 💩
good info??
🟨 😴 😡 ❌ 🤮 💩
good one
🟨 😴 😡 ❌ 🤮 💩