Tuesday, July 23, 2024
HomeAI News & UpdatesReddit Has Made $203 Million Via Data Licensing By Now

Reddit Has Made $203 Million Via Data Licensing By Now

Reddit’s stock market listing prospects are significantly more dependent than one might expect on its partnerships with AI vendors like OpenAI.

In its initial public offering (IPO) prospectus submitted to the US Securities and Exchange Commission yesterday, Reddit emphasized time and time again how much it believes it can and has profited from DLAs (data licensing agreements) involving companies that train artificial intelligence models on its more than 16 billion comments and 1 billion posts.

“In January 2024, we entered into certain data licensing arrangements with an aggregate contract value of $203.0 million and terms ranging from two to three years,” the prospectus reads on page one. “We anticipate that the year ending December 31, 2024, and the years following will see the recognition of at least $66.4 million in revenue.”

At this point, it is unclear which AI vendors have licensed Reddit data. According to Bloomberg and Reuters, a licensing agreement valued at approximately $60 million per annum was reportedly inked earlier this week by a “large unnamed AI company” (possibly Google). However, OpenAI is not out of the ordinary as a client either, particularly because Reddit’s former board member and current CEO, Sam Altman, has an 8.7 percent stake in the company, ranking him as the third-largest shareholder.

Sam Altman's quest for an alternative to Nvidia GPUs - ReadWrite

What makes Reddit data useful? To train their models, companies like OpenAI scan the web for millions—if not billions—of examples of various types of text, code, emails, articles, and more (as explained on Reddit). In the public domain, you can find a few instances. Some aren’t, and the ones that are, like Reddit content, are subject to restrictive licenses that demand credit or some kind of payment.

When it came to using its data for AI training, Reddit did not restrict access. Its position changed last year, though, when CEO Steve Huffman stated that the company’s data shouldn’t be “[given] to some of the biggest companies in the world for free.”

He explained that their data APIs can give someone access to ever-changing subjects like fashion, sports, news, and the newest trends in real-time, the prospectus maintains. When it comes to training and improving large language models, “we think Reddit’s massive corpus of conversational data and knowledge will be useful. We expect models to want to update their training using Reddit data to reflect these new ideas as our content refreshes and grows daily.”

In response to competition from chatbots such as OpenAI’s ChatGPT and Google’s Gemini, content creators are increasingly entering into DLAs with artificial intelligence vendors. This includes news publishers and stock media libraries. The Atlantic recently published a model that predicted that Google and other search engines could answer user queries 75 percent of the time without directing them to their website if they used artificial intelligence (AI) in their search algorithms.

AI Wars: Google's Gemini Can't Stop Comparing Itself to OpenAI | PYMNTS.com

As a result, vendors are increasingly turning to licensing agreements to stave off the flood of lawsuits that claim they lack the legal authority to train their models on data that was either paid for or obtained without permission. Some time ago, the NYT claimed that OpenAI was hurting its business by essentially creating news publisher competitors with its works.

One example is OpenAI’s partnerships with companies like Shutterstock and publishers like Axel Springer (who owns Politico and Business Insider, among others). On the other hand, the licenses are said to be relatively tiny, with annual caps of $5 million.

Editorial Staff
Editorial Staff
Editorial Staff at AI Surge is a dedicated team of experts led by Paul Robins, boasting a combined experience of over 7 years in Computer Science, AI, emerging technologies, and online publishing. Our commitment is to bring you authoritative insights into the forefront of artificial intelligence.


Please enter your comment!
Please enter your name here

Most Popular

Recent Comments