
CHINESE startup Shengshu AI rolled out its text-to-video tool Vidu for global users yesterday, rivaling OpenAI’s Sora. Vidu supports both Chinese and English text prompts, the South China Morning Post reported. The video generation model is accessible through its official website, making it the latest Chinese startup to offer text-to-video services to the public following players like Zhipu AI and Kuaishou Technology. Users registered with the service will be able to generate 1080P video clips of four or eight seconds in length. The Beijing-based company first unveiled Vidu in April, just two months after OpenAI announced its Sora video model, showing a few selected preview clips, making it the first firm in China to take on Sora. Shengshu said Vidu is able to generate a four-second clip in 30 seconds, according to a statement. That makes it one of the fastest on the market, as other similar tools usually take longer to generate a video of similar length. Shengshu exemplifies how China’s prestigious Tsinghua University has emerged as a main force backing the country’s AI ambitions. Behind Vidu is the firm’s self-developed architecture called U-ViT, first detailed in a September 2022 research paper authored by a team led by Zhu Jun, Shengshu AI’s chief scientist, who is also a computer science professor at Tsinghua University. Another Tsinghua author of the paper, Bao Fan, currently serves as Shengshu’s chief technology officer. Shengshu’s chief executive Tang Jiayu was a graduate of Tsinghua’s department of computer science and technology. In an interview in April, Tang told local media that it would be easier for Chinese firms to catch up with Sora than with GPT-4, OpenAI’s advanced large language model that is the technology behind ChatGPT. He did not elaborate. In addition to text and image-to-video, Vidu has added a function that lays the foundation for commercialization of the technology due to its potential use in the animation and content industries, said Zhang Xudong, product director at Shengshu AI, as quoted by the Post. The new character-to-video function lets users upload an image of a real person or an animated character, and use simple text prompts to make it come alive. “In the future we hope (users) could upload multiple characters and (describe) scenes, and have them act in those scenes, similar to how a film is being produced,” Zhang said. “Our goal is to integrate AI tools with traditional sectors.” Shengshu, which has raised tens of millions of U.S. dollars, counts Qiming Venture Partners, search giant Baidu, Alibaba Group Holding’s fintech affiliate Ant Group, and the Beijing AI Industry Investment Fund as its backers. (SD-Agencies) |