ByteDance’s new AI model transforms photos, videos with deepfake technology

Clips from the TikTok owner’s new OmniHuman-1 multimodal model have gone viral for their lifelike appearance and audio synchronisation.

New: You can now listen to articles.

This audio is generated by an AI tool.

South China Morning Post

ByteDance, the tech giant behind TikTok, has introduced an artificial intelligence (AI) model that is gaining widespread attention for its ability to transform photos and sound bites into realistic videos, underscoring China’s growing capabilities in the field.

The company’s OmniHuman-1 multimodal model can create vivid videos of people speaking, singing and moving with a quality “significantly outperforming existing audio-conditioned human video-generation methods”, the ByteDance team behind the product said in a paper. AI-generated images, videos and audio of real people are often referred to as deepfakes, a technology becoming more prominent in cases of fraud as well as more harmless uses for entertainment.

ByteDance has become one of the hottest AI companies in China. Its Doubao app is currently the most popular consumer-facing AI app in the country. It has not released the OmniHuman-1 to the public yet, but sample clips have gone viral.

One notable demo features a 23-second video of Albert Einstein delivering a speech. TechCrunch’s Kyle Wiggers described the app’s output as “shockingly good” and “perhaps the most realistic deepfake videos to date”.

The model highlights the advancements Chinese developers are making despite Washington’s efforts to curb the country’s AI progress. The launch follows OpenAI widening the release of its video-generation tool Sora, which was made publicly available to ChatGPT Plus and Pro users in December.

How do Chinese AI bots stack up against ChatGPT? We put them to the test

Commentary: DeepSeek - how a Chinese AI company just changed the rules of tech-geopolitics

In the technical paper published on Tuesday (Feb 4), ByteDance researchers Lin Gaojie, Jiang Jianwen, Yang Jiaqi, Zheng Zerong and Liang Chao detailed a novel training strategy that combines diverse data sets of text, audio and movement to build more advanced video-generation models, in a bid to address challenges faced by global researchers in scaling up such models.

ByteDance said its method improves on conventional video generation, without specifically naming competing AI tools. The team said its data-mixing approach allows for the generation of realistic videos with varying aspect ratios and body proportions, from close-ups of faces to full-body shots.

The generated clips feature detailed facial expressions matched to audio and natural head and gesture movements, potentially unlocking broader real-world applications, the team said.

Among the released sample clips is one of a man delivering a speech in the style of a TED Talk with realistic hand gestures matching lip movements, which made it difficult to distinguish from a live recording.

‘A lot is up in the air’: Is Chinese firm DeepSeek’s AI model as impactful as it claims?

Baidu CEO says more AI spend still needed despite DeepSeek's success

Chinese tech firms have been making significant strides in video generation since OpenAI first previewed its Sora model in February 2024. ByteDance leads the pack with its Jimeng AI platform, powered by its flagship video models, PixelDance and Seaweed, which have been receiving regular updates with new capabilities.

The November update to Jimeng incorporated the S2.0 Pro and P2.0 Pro versions of the models. These updates enable Jimeng to produce clips that consistently match images uploaded by users, giving subjects “new life and liveliness”, ByteDance said in a statement at the time.

Other Chinese tech companies are also competing in this space, including ByteDance’s rival Kuaishou Technology with its Kling app, and AI start-ups such as Beijing-based Zhipu AI and Shengshu Tech, as well as Shanghai-based MiniMax.

This article was first published on SCMP.

Source: South China Morning Post/cm

ByteDance’s new AI model transforms photos, videos with deepfake technology

ByteDance’s new AI model transforms photos, videos with deepfake technology

South China Morning Post

How do Chinese AI bots stack up against ChatGPT? We put them to the test

Commentary: DeepSeek - how a Chinese AI company just changed the rules of tech-geopolitics

‘A lot is up in the air’: Is Chinese firm DeepSeek’s AI model as impactful as it claims?

Baidu CEO says more AI spend still needed despite DeepSeek's success

Sign up for our newsletters

Get the CNA app

Also worth reading

Search

Trending Topics

Follow CNA

Recent Searches

Trending Topics

Search

Trending Topics

Follow CNA

Recent Searches

Trending Topics

ByteDance’s new AI model transforms photos, videos with deepfake technology

South China Morning Post

Related:

How do Chinese AI bots stack up against ChatGPT? We put them to the test

Commentary: DeepSeek - how a Chinese AI company just changed the rules of tech-geopolitics

Related:

‘A lot is up in the air’: Is Chinese firm DeepSeek’s AI model as impactful as it claims?

Baidu CEO says more AI spend still needed despite DeepSeek's success

Week in Review

Week in Review

Sign up for our newsletters

Get the CNA app

Related Topics

Also worth reading