The Low-Down: Why Venture Firms Have Invested $2.9 Billion In Open Source AI Startups

Venture investors are trying to figure out how best to invest in an AI market dominated by a few big players who dont really need their expertise or money.

The answer, increasingly, is in open source AI startups. While this sounds counterintuitive because 'open source' traditionally meant giving everything away including source code, open-source AI companies make money by selling services in addition to their models to businesses which do not have the people, the time or the money to spend on developing their own solutions. This may turn out to be the fastest, most efficient means of building the AI business ecosystem given the shortage of skilled talent in the field and the demand for operationally useful services now. JL

Belle Lin reports in the Wall Street Journal:

Venture-capital funding for open-source AI startups rose from $900 million in 2022 to $2.9 billion last year. Investors are attracted by open-sources' rapid innovation through the independent developers they draw. Open-source AI models appeal to businesses because they offer large language models without paying and sharing data with a vendor like Microsoft. And open-source models include their inner workings, which companies can use to build their own models. Open-source AI companies turn a profit by selling business services and apps on top of their models (as) businesses’ pay for enterprise features and support, ready-to-use apps and the convenience of outsourcing building custom models. Open-source also creates an ecosystem of software and services which other startups sell.
As the market for artificial intelligence models consolidates around
Microsoft
, OpenAI and a handful of other proprietary systems and players, some companies are aiming to compete by offering their AI models free.
The open-source approach of freely distributing technology for the public to use, share and modify helped create the modern internet, cloud-computing and billion-dollar companies. There is no guarantee it will work for open-source large language models, but some of tech’s biggest players are betting it can help them chip away at OpenAI’s dominance.
OpenAI accounted for nearly 80% of the global generative AI market in 2023, according to an estimate from market research firm Valuates Reports.
Elon Musk last week said his xAI startup will open-source its chatbot Grok, which hasn’t captured as much interest as OpenAI’s ChatGPT or Anthropic’s Claude.
Meta Platforms
last year released its Llama 2 open-source model, intensifying the competition between private and open-source models. Google released its Gemma open-source models in February, though it also sells access to its more powerful Gemini offerings.

Plenty of AI startups are betting on open source, too, with some of the most well-funded including Mistral AI, Hugging Face, Runway ML, Together AI, Writer, Cerebras and Databricks.
Open-source AI models appeal to businesses because they offer a means of using large language models without paying and sharing data with a vendor like Microsoft. And because they are shared for public dissection, open-source models typically include their inner workings, which companies can use to build their own models.
But AI companies face a number of challenges—namely the initial costs required to train their models, which can run into the hundreds of millions of dollars, licensing their technologies and maintaining the loyalty of independent developers who propelled their technologies to popularity.
A callback to open source’s freemium model
Open-source software firms have traditionally made money by charging for services that make their technologies easier for businesses to use, said Joseph Jacks, founder of OSS Capital, a venture-capital firm that funds open-source startups. The approach has paid dividends for companies like MongoDB, which sells an enterprise version of its open-source database.
Open-source AI companies say they also can turn a profit by selling business-grade services and applications on top of their open models—betting on businesses’ willingness to pay for enterprise features and support, ready-to-use apps like chatbots, and the convenience of outsourcing work to AI firms to build custom models for them.
AI models on their own are useless to enterprises, giving companies like Databricks the opportunity to sell capabilities to help them use it, said Ali Ghodsi, the company’s co-founder and CEO. Databricks’s new AI group has released open models and helps businesses build low-cost models with proprietary data.
Another strategy involves building a collection of AI models—some of them free, and the more powerful requiring a fee or paid subscription.
French AI startup Mistral AI last month announced a partnership with Microsoft and a more powerful, proprietary AI system customers must pay to use. Its free models, which are considered advanced, are less sophisticated than its paid model.
Other well-known startups like Stability AI, which built the popular open-source image-generation model Stable Diffusion alongside AI startup Runway ML, began charging a subscription fee for commercial use of some of its advanced models in December. A subscription also comes with business features, a spokesperson said.
Still, it’s not easy to make open-source pay. Online furniture seller
Wayfair
used Stable Diffusion to create an interior design tool that helps customers generate images of their homes with new decor, said Chief Technology Officer Fiona Tan. But the company is using a version of the model that is free for commercial use.
Similarly, the San Francisco-based AI startup Writer has released open-source and proprietary language models aimed at helping businesses write content. Its open models are less powerful than its paid models, said Writer co-founder and Chief Executive May Habib.
Sim Simeonov, chief technology officer of healthcare technology services and marketing firm Real Chemistry, said the choice isn’t about open or closed models from vendors like Writer, but selecting technology that meets a company’s technical, legal and business needs.
The ecosystem effect
Global venture-capital funding for open-source AI startups rose from $900 million in 2022 to $2.9 billion last year, according to PitchBook. That’s partly because investors are attracted to the fact that open-source companies can rapidly innovate through the legions of independent developers they draw, said Madrona Venture Group partner Jon Turow.
Cerebras, a chip startup that has raised $750 million, makes money by selling AI chips and model-building services. It develops open-source models not because they are profitable, but because it can help woo developers, said co-founder and Chief Executive Andrew Feldman.
Open-source also helps create a platform or ecosystem of software and services for the technology, which other startups can sell.
Among them, Together AI this month raised $106 million from Salesforce Ventures and other investors, valuing the firm at $1.25 billion. The startup has released its own open models, but primarily sells tools that make it cheaper and faster for businesses to use open-source models developed by others, said co-founder and CEO Vipul Ved Prakash.
Hugging Face, which has raised nearly $400 million from tech giants including
Alphabet
and
Nvidia
, helped develop the open-source BLOOM and StarCoder models. Rather than focusing on developing models, however, the startup sells compute power and enterprise support for other open models and its open-source model-sharing platform, said Jeff Boudier, head of product at Hugging Face.
“Open models tend to create an ecosystem, whereas closed models just tend to find customers,” he said. “That’s a ten, one hundred times multiplier.”
‘Uncharted territory’
Unlike open-source software, which was developed decades ago, the commercialization of open-source AI is “very much uncharted territory,” Boudier added.
For one, AI models require more capital than software. Big tech firms like Microsoft and Alphabet have the deep pockets to buy Nvidia graphics processing units, or GPUs, and put billions into building their AI.

Another challenge is the difficulty of porting software licenses, which were designed for computer code, to AI models, which consist of numerical parameters or “weights,” training data, and other components—largely what are considered “math,” said Mark Shuttleworth, founder and CEO of Canonical, a company that sells commercial services for the open-source Ubuntu technology.
While efforts are being made, there’s also not yet a standard definition or set of licenses for open-source AI. Some companies have chosen to release only certain elements of their AI models, which limits their technical usefulness for others.
For instance, companies like Meta call their models open source while restricting certain usage, said Lawrence Lessig, a professor at Harvard Law School who studies tech and policy. Llama 2’s community license lets developers use it commercially, but requires seeking Meta’s permission for use in a product with over 700 million monthly users. A Meta representative said the company’s open approach is to help resource-constrained companies and developers access AI, and allows the AI community to “advance the state-of-the-art in safety and overall model performance while balancing responsibility concerns.”
So far, licensing has prevented some companies from open-sourcing their AI models, Shuttleworth said. “Once you have a license that lots of people understand,” he said, “there’s broad understanding of what we can do that’s not going to get taken away.”