Secrets of DeepSeek AI model revealed in Nature paper

-	Important news
-	News
-	In-Depth
-	Shenzhen
-	China
-	World
-	Business
-	Speak Shenzhen
-	Culture
-	Leisure
-	Photos
-	Lifestyle
-	Travel
-	Tech
-	Special Report
-	Digital Paper
-	Opinion
-	Features
-	Kaleidoscope
-	Health
-	Markets
-	Sports
-	Entertainment
-	Business/Markets
-	World Economy
-	Weekend
-	Newsmaker
-	Advertisement
-	Diversions
-	Movies
-	Hotels and Food
-	Yes Teens!
-	News Picks
-	Glamour
-	Campus
-	Budding Writers
-	Fun
-	Qianhai
-	CHTF Special
-	Futian Today

szdaily -> Tech ->

2025-09-19 08:53 Shenzhen Daily

THE success of DeepSeek’s powerful artificial intelligence (AI) model R1 — that made the U.S. stock market plummet when it was released in January — did not hinge on being trained on the output of its rivals, researchers at the Chinese firm have said. The statement came in documents released alongside a peer-reviewed version of the R1 model, published Wednesday in Nature.

R1 is designed to excel at “reasoning” tasks such as mathematics and coding, and is a cheaper rival to tools developed by U.S. technology firms. As an “open weight” model, it is available for anyone to download and is the most popular such model on the AI community platform Hugging Face to date, having been downloaded 10.9 million times.

The paper updates a preprint released in January, which describes how DeepSeek augmented a standard large language model (LLM) to tackle reasoning tasks. Its supplementary material reveals for the first time how much R1 cost to train: the equivalent of just US$294,000. This comes on top of the US$6 million or so that the company, based in Hangzhou, spent to make the base LLM that R1 is built on, but the total amount is still substantially less than the tens of millions of dollars that rival models are thought to have cost.

DeepSeek says R1 was trained mainly on Nvidia’s H800 chips, which in 2023 became forbidden from being sold to China under U.S. export controls.

R1 is thought to be the first major LLM to undergo the peer-review process. “This is a very welcome precedent,” says Lewis Tunstall, a machine-learning engineer at Hugging Face who reviewed the Nature paper. “If we don't have this norm of sharing a large part of this process publicly, it becomes very hard to evaluate whether these systems pose risks or not.”

In response to peer-review comments, the DeepSeek team reduced anthropomorphizing in its descriptions and added clarifications of technical details, including the kinds of data the model was trained on, and its safety.

DeepSeek’s major innovation was to use an automated kind of the trial-and-error approach known as pure reinforcement learning to create R1. The process rewarded the model for reaching correct answers, rather than teaching it to follow human-selected reasoning examples. The company says that this is how its model learnt its own reasoning-like strategies, such as how to verify its workings without following human-prescribed tactics. To boost efficiency, the model also scored its own attempts using estimates, rather than employing a separate algorithm to do so, a technique known as group relative policy optimization.

Media reports in January suggested that researchers at OpenAI, the company, based in San Francisco, California, that created ChatGPT and the “o” series of reasoning models, thought DeepSeek had used outputs from OpenAI models to train R1, a method that could have accelerated a model’s abilities while using fewer resources.

DeepSeek has not published its training data as part of the paper. But, in exchanges with referees, the firm’s researchers stated that R1 did not learn by copying reasoning examples that were generated by OpenAI models. However, they acknowledged that, like most other LLMs, R1’s base model was trained on the web, so it will have ingested any AI-generated content already on the Internet.

Tunstall says that although he can’t be 100% sure R1 wasn’t trained on OpenAI examples, replication attempts by other labs suggest that DeepSeek’s recipe for reasoning is probably good enough to not need to do this. “I think the evidence now is fairly clear that you can get very high performance just using pure reinforcement learning,” he says.

Other researchers are now trying to apply the methods used to create R1 to improve the reasoning-like abilities of existing LLMs, as well as extending them to domains beyond mathematics and coding, says Tunstall. In that way, he adds, R1 has “kick-started a revolution.”(SD-Agencies)