An open-source reasoning model developed by Chinese artificial intelligence startup DeepSeek has created a stir in the tech industry, with investors uncertain about its potential impact. The release of DeepSeek’s R1 model on January 20 has garnered praise and skepticism alike, causing U.S. technology stocks to plummet in trading on Monday.
The R1 model, which DeepSeek claims performs on par with OpenAI’s o1 reasoning model, quickly became a sensation, topping the download charts on the Apple App Store and ranking highly on Chatbot Arena, an AI benchmarking platform. However, the startup faced challenges when it warned users of “large-scale malicious attacks” that led to a slowdown in sign-ups.
Founded by Chinese entrepreneur Liang Wenfeng in 2023 with funding from his quantitative hedge fund, High Flyer, DeepSeek’s success with the R1 model has raised eyebrows due to its comparatively low development cost of $5.6 million. This stands in stark contrast to the hundreds of millions typically required by leading American AI companies, sparking debates about the efficiency of DeepSeek’s approach.
Despite the skepticism, prominent figures in the tech industry have lauded DeepSeek’s achievement. Venture capitalist Marc Andreessen described the R1 model as “one of the most amazing and impressive breakthroughs” he has ever seen, highlighting the company’s innovative training approach using reinforcement learning without supervised fine-tuning.
The cost efficiency of DeepSeek’s model is particularly notable against the backdrop of U.S. sanctions prohibiting the sale of advanced chips to Chinese entities. DeepSeek claimed to have trained the V3 model, which includes the R1 refinement, using a cluster of 2,048 Nvidia model H800 chips, designed as a sanctions-compliant alternative to the H100 flagship chip.
While some industry analysts remain skeptical about the total cost of training the V3 model, others, like Y Combinator CEO Garry Tan, see DeepSeek’s advancements as a boon for the tech industry. Tan believes that cheaper and faster model training will accelerate the demand for AI applications, driving the growth of the sector.
Meta Chief AI Scientist Yann LeCun emphasized the significance of open-source innovation in DeepSeek’s success, highlighting the company’s use of open research and source code from platforms like PyTorch and Llama. LeCun underscored the power of collaboration and knowledge-sharing in driving technological progress, suggesting that DeepSeek’s achievements can benefit the entire industry.
In conclusion, DeepSeek’s R1 model has generated both excitement and skepticism within the tech community, showcasing the potential for innovation and cost efficiency in AI development. As the industry continues to evolve, the lessons learned from DeepSeek’s approach may shape the future of artificial intelligence and machine learning technologies.