The Race To Reproduce Deepseek's Market-breaking Ai Has Begun
Companies like Hugging Face are working to rebuild DeepSeek's R1 model from scratch.
Dado Ruvic/REUTERS
- Chinese startup DeepSeek shook the tech world and markets when it released R1, its new AI model.
- The West is now trying to reproduce R1 on its own terms and cut out Chinese servers.
- Recreating R1 from scratch can help researchers build better models and validate DeepSeek's claims.
Silicon Valley doesn't want to get caught out again. It's scrambling to replicate DeepSeek's AI model, the cheaper Chinese tech that shook Wall Street and is freely available for anyone to adopt.
Companies like Microsoft and Amazon have already made versions of DeepSeek's R1 models available on their cloud platforms. This allows people to use the models, which appear to match the capabilities of rivals like OpenAI, while keeping data from being sent to servers in China.
But there are also attempts to replicate DeepSeek's cost-efficient AI from the ground up — and see if all of the Chinese AI lab's market-moving claims hold up.
One major effort is being led by Hugging Face, a platform for researchers in AI's open-source community to collaborate and share their trade research notes and ideas for free.
Leandro von Werra, head of research at Hugging Face, told Business Insider that the company expected to complete its replication efforts within "weeks." He described the mood at Hugging Face as "kind of like Avengers assemble" as they dissect the inner workings of R1.
DeepSeek obtained open-source licensing for its model from MIT, which means a lot of the vital components of the recipe needed to build R1 have been laid out in the company's publicly available technical paper.
However, there are some elements of R1 that remain unclear.
In a December paper on V3, DeepSeek's earlier model, the Chinese company said training cost $5.6 million in total. The cost was calculated based on its use of H800 GPUs, a less powerful version of Nvidia's top chips, at a rental price of $2 per GPU hour.
Right now, no one can be quite sure what the actual development cost of R1 was, von Werra told BI.
DeepSeek's research paper also did not share what was required to bake reasoning capabilities into V3 to then produce R1.
That said, von Werra thinks it won't remain a mystery for long. "I don't know about the compute number, we can only guess at this time," he said. "I think one thing that's exciting about our reproduction is we're going to find out pretty quickly if the numbers hold up."
We're just a few weeks away from having a fully open pipeline of R1 and everybody who can rent some GPUs can train their own version.
— Leandro von Werra (@lvwerra) January 28, 2025
Follow along and contribute to open-r1!https://t.co/yFxsFzAZSH
Learning from DeepSeek
Some quarters of Silicon Valley responded swiftly to the launch of DeepSeek. This week, Meta set up "war rooms" for its researchers to analyze DeepSeek, The Information reported. Sam Altman, the CEO of OpenAI, said Tuesday his company would accelerate the release of "better models."
DeepSeek's decision to publish its findings and make its R1 model open gives researchers worldwide an insight into its novel approach.
The main technique used to make R1 so capable was "pure reinforcement learning," DeepSeek's paper said. This, Hugging Face researchers said in a blog on Tuesday, can "teach a base language model how to reason without any human supervision."
The researchers also know more specific technical details about why R1 caused such a stir in Silicon Valley and wiped $1 trillion from US stocks on Monday. For instance, the reasoning model is what's known as a "mixture of experts" model — industry-speak for a model that can be "pre-trained with far less compute." It also involves subtle changes to its architecture by introducing techniques like "multi-token prediction," first introduced by Meta, that make models more efficient.
Hugging Face's von Werra notes that details like this from DeepSeek have helped the industry better understand how a reasoning model like OpenAI's closed-source o1 was built. "Everybody thought this is the secret that is going to take a while to crack," he said.
DeepSeek comes to platforms
Days after its launch, DeepSeek flew to the top of Apple's Top Free Apps chart. While everyone wanted to try the latest AI tool, DeepSeek's policy of storing user data in China has prompted security concerns.
This spurred US companies to make R1 available on their own platforms so customers could use the Chinese AI model while cutting out China's servers.
Lin Qiao, CEO of Fireworks AI and former head of the PyTorch team at Meta, told BI that one clear reason for doing so was to ensure AI developers and users continue to get access to top model innovations.
"The approach we have been taking is always to enable state-of-the-art models for developers the fastest," she said. "DeepSeek is one example."
Her company, founded in 2022, made R1 available on its platform after congratulating DeepSeek for "pushing the boundaries of what's possible in open models." It has been made available via its serverless service, as well as through on-demand and for enterprise customers.
Others have followed suit. On Wednesday, Microsoft announced that it was making R1 available in its model catalog on its AI development platform Azure AI Foundry to make it "accessible on a trusted, scalable, and enterprise-ready platform."
Asha Sharma, corporate vice president at the tech giant, wrote in a blog that R1 "offers a powerful, cost-efficient model," but one that it had done "rigorous red teaming and safety evaluations" on before introducing as a new model to its library.
Amazon Web Services is making a similar move. Swami Sivasubramanian, vice president of AI and data at AWS, said this week that the company's "commitment to AI accessibility" meant R1 was being made available on its platforms like SageMaker and Bedrock.
Perplexity CEO Aravind Srinivas, whose AI startup made R1 available to users of its search platform this week, said in a Tuesday X post that downloading the model onto its servers could also help control the way it responds to user queries as well as ensure those queries don't go to servers in China.
DeepSeek's AI appears to censor sensitive information about China, such as refusing to answer questions about the 1989 Tiananmen Square protests. Srinivas said that Perplexity's version of R1 had no censorship and shared its accurate response to what happened in Tiananmen Square.
DeepSeek declined to answer a question about Tiananmen Square.DeepSeek/Business Insider
David Sacks, the White House's AI czar, offered one reason why Perplexity's R1 integration was an important way to reproduce R1 in the West. "This is one of several ways that you can try DeepSeek R1 without downloading the app or sharing any data with a Chinese company," he said on X.
Many see DeepSeek as an example of China challenging American AI hegemony using a tried-and-tested playbook. OpenAI, which has a lot to lose from DeepSeek making its technology freely available, said on Wednesday it's investigating whether the Chinese firm "inappropriately" replicated its models for training.
For von Werra, it's a full-circle moment. The whole field started as open-source, so seeing efforts to make a leading reasoning model available for free is welcome, he said.
"I think in the end, everybody's going to get better models and do cooler things," he said. "I feel like it's a win-win situation."