Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
April 19, 2024 07:30 pm

Microsoft's VASA-1 Can Deepfake a Person With One Photo and One Audio Track

Microsoft Research Asia earlier this week unveiled VASA-1, an AI model that can create a synchronized animated video of a person talking or singing from a single photo and an existing audio track. ArsTechnica: In the future, it could power virtual avatars that render locally and don't require video feeds -- or allow anyone with similar tools to take a photo of a person found online and make them appear to say whatever they want. "It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors," reads the abstract of the accompanying research paper titled, "VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time." It's the work of Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, and Baining Guo. The VASA framework (short for "Visual Affective Skills Animator") uses machine learning to analyze a static image along with a speech audio clip. It is then able to generate a realistic video with precise facial expressions, head movements, and lip-syncing to the audio. It does not clone or simulate voices (like other Microsoft research) but relies on an existing audio input that could be specially recorded or spoken for a particular purpose.

Read more of this story at Slashdot.


Original Link: https://slashdot.org/story/24/04/19/1931210/microsofts-vasa-1-can-deepfake-a-person-with-one-photo-and-one-audio-track?utm_source=rss1.0mainlinkanon&u

Share this article:    Share on Facebook
View Full Article

Slashdot

Slashdot was originally created in September of 1997 by Rob "CmdrTaco" Malda. Today it is owned by Geeknet, Inc..

More About this Source Visit Slashdot