IN January last year, renowned AI researcher Li Fei-Fei took a leave of absence from Stanford to trade academia for startup life. Nearly two years later, her venture World Labs has unveiled its first commercial product: a world model called Marble. Marble can create 3D virtual worlds from text, images, video, or even rough layouts. It builds on an earlier World Labs prototype that created 3D scenes from 2D images, but overcomes previous limitations such as restricted interactive areas. So-called world models like Marble are central to Li’s vision for the future of AI. Because these models can reason about and interact with complex environments, they are essential for building AI that understands not just language, but the physical world itself. World Labs aims to imbue its systems with spatial intelligence, teaching them physical concepts humans intuitively grasp, such as parking a car without bumping the curb, catching a tossed object, or pouring a drink without looking. “Today, leading AI technology such as large language models (LLMs) have begun to transform how we access and work with abstract knowledge,” Li wrote in a blog post Monday. “Yet they remain wordsmiths in the dark; eloquent but inexperienced, knowledgeable but ungrounded.” An emphasis on visual and spatial intelligence has long been Li’s “North Star,” said the researcher, who in 2006 played a key role in the release of ImageNet, a database of 15 million images that spurred the rise of deep learning. Backed by Radical Ventures, Andreessen Horowitz and Nvidia, World Labs has raised US$230 million to pursue its spatial intelligence vision. Marble has been in beta for a few months and is now publicly available. It can create a full 3D world from a single image or text prompt. Users can also merge multiple environments by uploading several images within a prompt. According to World Labs, the model can combine photos or short videos of real-world spaces to generate immersive, realistic virtual worlds. The model includes a range of editing tools that let users customize their creations. A feature called Chisel allows users to sketch out a coarse 3D layout, while other tools make it possible to expand worlds or build entirely new scenes within the same environment. Looking ahead, World Labs plans to develop world models with more interactive capabilities for both humans and AI agents. Google DeepMind and Nvidia have explored similar technologies with their Genie and Cosmos models, respectively. Yann LeCun, Meta’s chief AI scientist, is reportedly in the early stages of fundraising for his own world model startup. Li said the applications of spatial intelligence tools like Marble will “span varying timelines.” The model is already being used by filmmakers, game designers and architects to enhance creative workflows. In the medium term, Li expects such technology to advance robotics, while future applications in science, healthcare, and education could enable breakthroughs in experiment simulation, drug discovery and immersive learning. “This is AI’s next frontier,” she said. (SD-Agencies) |