Skip to main content
Best News Website or Mobile Service
WAN-IFRA Digital Media Awards Worldwide 2022
Best News Website or Mobile Service
Digital Media Awards Worldwide 2022
Hamburger Menu
Advertisement
Advertisement

East Asia

ByteDance’s new AI model transforms photos, videos with deepfake technology

Clips from the TikTok owner’s new OmniHuman-1 multimodal model have gone viral for their lifelike appearance and audio synchronisation.

ByteDance’s new AI model transforms photos, videos with deepfake technology

Women wearing masks to pass by the headquarters of ByteDance, owner of TikTok, in Beijing, China, Aug 7, 2020. (File photo: AP/Ng Han Guan)

New: You can now listen to articles.

This audio is generated by an AI tool.

ByteDance, the tech giant behind TikTok, has introduced an artificial intelligence (AI) model that is gaining widespread attention for its ability to transform photos and sound bites into realistic videos, underscoring China’s growing capabilities in the field.

The company’s OmniHuman-1 multimodal model can create vivid videos of people speaking, singing and moving with a quality “significantly outperforming existing audio-conditioned human video-generation methods”, the ByteDance team behind the product said in a paper. AI-generated images, videos and audio of real people are often referred to as deepfakes, a technology becoming more prominent in cases of fraud as well as more harmless uses for entertainment.

ByteDance has become one of the hottest AI companies in China. Its Doubao app is currently the most popular consumer-facing AI app in the country. It has not released the OmniHuman-1 to the public yet, but sample clips have gone viral.

One notable demo features a 23-second video of Albert Einstein delivering a speech. TechCrunch’s Kyle Wiggers described the app’s output as “shockingly good” and “perhaps the most realistic deepfake videos to date”.

The model highlights the advancements Chinese developers are making despite Washington’s efforts to curb the country’s AI progress. The launch follows OpenAI widening the release of its video-generation tool Sora, which was made publicly available to ChatGPT Plus and Pro users in December.

In the technical paper published on Tuesday (Feb 4), ByteDance researchers Lin Gaojie, Jiang Jianwen, Yang Jiaqi, Zheng Zerong and Liang Chao detailed a novel training strategy that combines diverse data sets of text, audio and movement to build more advanced video-generation models, in a bid to address challenges faced by global researchers in scaling up such models.

ByteDance said its method improves on conventional video generation, without specifically naming competing AI tools. The team said its data-mixing approach allows for the generation of realistic videos with varying aspect ratios and body proportions, from close-ups of faces to full-body shots.

The generated clips feature detailed facial expressions matched to audio and natural head and gesture movements, potentially unlocking broader real-world applications, the team said.

Among the released sample clips is one of a man delivering a speech in the style of a TED Talk with realistic hand gestures matching lip movements, which made it difficult to distinguish from a live recording.

Chinese tech firms have been making significant strides in video generation since OpenAI first previewed its Sora model in February 2024. ByteDance leads the pack with its Jimeng AI platform, powered by its flagship video models, PixelDance and Seaweed, which have been receiving regular updates with new capabilities.

The November update to Jimeng incorporated the S2.0 Pro and P2.0 Pro versions of the models. These updates enable Jimeng to produce clips that consistently match images uploaded by users, giving subjects “new life and liveliness”, ByteDance said in a statement at the time.

Other Chinese tech companies are also competing in this space, including ByteDance’s rival Kuaishou Technology with its Kling app, and AI start-ups such as Beijing-based Zhipu AI and Shengshu Tech, as well as Shanghai-based MiniMax.

This article was first published on SCMP.

Source: South China Morning Post/cm
Advertisement

Also worth reading

Advertisement