Silicon Valley Startup Inception Labs Creates Faster Llm

A Silicon Valley startup founded by professors from Stanford, UCLA and Cornell has created a new type of large language model that they say is faster, cheaper and higher quality than existing models.
Inception Labs developed diffusion large language models that use a technique normally applied to images, audio and video — and deployed it to text. They claim the result was an LLM that was five to 10 times faster and five to 10 times cheaper while performing competitively with top artificial intelligence (AI) models.
Companies today struggle with scaling generative AI in part due to the cost of deploying it, which was why Chinese AI startup DeepSeek became wildly popular for its low-cost, open-source LLMs that could hold their own with the best AI models on industry benchmarks.
Inception Labs has introduced a family of diffusion LLMs called Mercury, starting with a coding assistant called Mercury Coder, which can also do text responses. Coming soon is an AI chatbot and API for developers.
“It’s a fundamentally different approach,” Stefano Ermon, co-founder and CEO of Inception Labs, said in an interview with PYMNTS.
“All existing large language models are autoregressive, meaning that they generate text or code one word at a time in a left-to-right way, and this is pretty slow because you cannot generate something until you generated everything that comes before it.”
Inception Labs applies a technique called diffusion, typically used for generating images, video and audio, and deploys it on text. “We start with a rough guess for what the answer should be, and then this answer is then refined through a neural network until we get the final answer,” said Ermon, who is on leave as a professor of computer science at Stanford University.
“The key advantage is that the neural network is able to modify multiple tokens, multiple words, in parallel,” he said.
Diffusion is faster, he said, because it does the processing simultaneously while autoregressive models like OpenAI’s GPT-4.5, Google’s Gemini 2.0 or Anthropic’s Claude 3.7 Sonnet do it one word at a time sequentially. By being faster, diffusion models are cheaper to train because it takes less time to process, reducing the hours needed for using GPUs.
Inception Labs hit a record per-user throughput of 1,000 tokens per second on Nvidia H100 AI chips. The startup said it ranked first on speed and second on quality on Copilot Arena, a VSCode extension that lets developers compare and evaluate code completions from LLMs in real time. Inception claims it outperformed OpenAI’s GPT-4o and Google’s Gemini-1.5-Flash.
Diffusion models are also easier to control, Ermon added. For example, when generating images, it can more closely reproduce sketches that are hand-drawn and uploaded. Inception plans to bring this capability to text and code, with images, video and audio coming in the future.
“It now becomes possible to have a single type of generative AI model that can handle all different modalities, and you can share knowledge,” Ermon said. In robotics, for example, “what you learn from videos can be used to produce better tasks and vice versa.”
OpenAI Co-founder Andrej Karpathy posted on X that it’s long overdue for LLMs to use diffusion, saying, “this model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses.”
Less than one year old, Inception Labs is already “well-funded” and is not currently fundraising, Ermon said. He declined to disclose who invested.
Read more: DeepSeek Offers Steep Off-Peak Discounts to Spread Out Demand
A Future of Diffusion LLMs?
Ermon has had the idea of using diffusion models for content since 2019. Back then, image generation models were using GANs, or generative adversarial networks, which are machine learning models. Ermon and his team at Stanford thought the result was “not that good.” So they applied the diffusion technique, which became the key method used in image generation models like Midjourney and OpenAI’s DALL-E and Sora.
Ermon’s team also thought about using diffusion for text and code, but it was a tougher problem to solve and took years of research. “It’s been challenging to figure out how to do it, but we had a breakthrough,” he said.
They documented their findings in a paper submitted to the International Conference on Machine Learning (ICML) in 2024 — and won an award for best paper.
“It was recognized as a major contribution to the field,” Ermon said. Around that time, he and two of his former Ph.D. students, Aditya Grover and Volodymyr Kuleshov, decided to found the startup. Grover and Kuleshov are computer science professors at UCLA and Cornell, respectively.
The founders decided that they “really need to build something that people can use, and we’re going to change the way generative AI techniques work, not only for images, but also for language,” Ermon said.
The result was Mercury. Ermon said it is a proprietary, not open source, model. It also is not a reasoning model like OpenAI’s o1 or o3 that goes back over its answers and uses the chain-of-thought technique. Mercury still hallucinates, like the generative AI models developed by its competitors. The model sizes and training datasets used were not disclosed.
Ermon said several Fortune 500 companies are testing Mercury for use cases that include customer support chatbots, but declined to name names. Mercury is free to test. He said his team has built the software infrastructure around Mercury to enable customers to fine-tune and deploy the models for specific uses in a business.
For Inception Labs, this is just the start of a new LLM paradigm.
“We’re envisioning a future where all LLMs are going to be based on the diffusion paradigm,” Ermon said. “That’s important because it will make generative AI solutions better, will make them cheaper, will make them faster, will improve the quality of the answers.”
“Our vision is to revolutionize the way people build generative AI solutions, and LLMs in particular, by bring in these new ideas,” he said.
The post Silicon Valley Startup Inception Labs Creates Faster LLM appeared first on PYMNTS.com.