About Us Contact
Log In
SEO 8 min read

Reddit Sues Perplexity AI and Data Scrapers for Stealing Content

Reddit has sued Perplexity AI and three data-scraping companies, accusing them of stealing millions of user posts to train artificial intelligence models. The lawsuit, filed October 22, 2025, in New York federal court, could set a major precedent for how human conversations online are treated in the age of AI.

Reddit Sues Perplexity AI and Data Scrapers

Update: Perplexity AI has responded to Reddit’s lawsuit, saying it does not train its models on Reddit data. The company claims it only summarizes Reddit discussions with citations, similar to how users share links. Reddit maintains that Perplexity and its scraping partners bypassed protections to access content, citing a fortyfold increase in Reddit references after a cease-and-desist order.

Reddit built its reputation as the internet’s massive discussion board, a place where millions of people trade advice, share experiences, and argue about everything from cooking to coding. That same content has now become the focus of a major legal battle.

This week, Reddit filed a lawsuit against Perplexity AI and three companies (Oxylabs from Lithuania, AWMProxy from Russia, and SerpApi from Texas) accused of supplying it with scraped data.

The lawsuit claims these companies secretly copied Reddit’s user-generated posts through automated systems and sold that information to help train AI products without paying for it.

The complaint describes the process as “industrial-scale data laundering.”

Reddit says the companies pretended to be ordinary web users, bypassed protective barriers, and gathered vast amounts of its data through Google’s search results. The suit asks the court for damages and an injunction to stop the defendants from using Reddit’s data again.

Ben Lee, Reddit’s chief legal officer, said the race to build powerful AI systems has pushed some developers to ignore boundaries. “There’s enormous pressure to find high-quality human content,” he said. “That pressure has created an underground market that thrives on stolen data.”

The Hidden Economy of Data Scraping

Scraping has existed for decades. It helped search engines like Google organize the web in the early days of the internet.

Over time, a smaller group of companies started scraping Google itself, using the results to sell marketing insights or improve how other sites performed in search rankings. Then came the explosion of artificial intelligence.

Suddenly, the kind of data Reddit holds became gold.

AI systems need human language to learn how people think and talk. Companies that could supply that data discovered a lucrative business.

SerpApi was one of them. Based in Austin, Texas, it built tools that scraped Google’s results at a massive scale. Others followed, including Oxylabs and AWMProxy.

According to Reddit’s lawsuit, these firms shifted their focus to AI clients once tools like ChatGPT and Gemini made natural language data valuable. Their data packages allegedly included scraped Reddit posts that could be resold to companies like Perplexity.

Reddit argues that this activity went far beyond what’s acceptable.

The company has banned scraping for years and now charges for data access through licensing deals.

Google and OpenAI are among those who agreed to pay. Perplexity, Reddit says, did not.

Perplexity Pushes Back

Perplexity rejected Reddit’s accusations.

It said the company had not been served with the lawsuit and that its methods were ethical.

“Our approach remains principled and responsible as we provide factual answers with accurate AI,” Perplexity said in a public statement. “We will not tolerate threats against openness and the public interest.”

SerpApi’s response was defiant as well. It claimed to have received no formal notice from Reddit and promised to fight the allegations in court.

Denas Grybauskas from Oxylabs argued that no one should be allowed to claim ownership of public information. AWMProxy offered no comment.

Reddit’s legal team says it can prove the scraping happened.

In its filing, the company describes setting a trap, a hidden Reddit post that could only be found by Google’s crawler. Within hours, the post appeared in Perplexity search results. To Reddit, that was the smoking gun.

When the Web’s “Sharing Culture” Meets the AI Gold Rush

For much of the internet’s history, scraping was considered part of the bargain. Websites got visibility, while search engines organized information and sent readers back. That balance has eroded.

AI systems often take content without returning traffic or credit. They generate answers directly, leaving publishers and creators out of the loop.

Doug Leeds, co-founder of the nonprofit Really Simple Licensing, has watched that shift unfold. He said what once looked like a mutually beneficial system has become something else entirely. “It used to work because everyone involved made money somehow,” he explained. “Now, AI tools are consuming content without giving anything back.”

Media companies and publishers have started drawing their own lines.

The New York Times has sued OpenAI and Microsoft for using its reporting to train models. Major book publishers, including Simon & Schuster, have launched similar cases. Reddit’s lawsuit joins that growing list, signaling that online communities are no longer willing to give away their data for free.

What Makes Reddit’s Data So Valuable

More than 416 million people use Reddit each week. Its content spans nearly every human interest imaginable, from niche hobbies to personal struggles to global news. Those authentic exchanges are what make Reddit data so appealing to AI developers. It captures how real people communicate, argue, and ask questions.

Reddit Daily Active Users

Reddit began charging for access to its data in 2023. It now earns revenue through licensing deals with major tech firms, using those funds to support its operations and protect user content.

But the company says it also spends tens of millions every year to stop unauthorized scraping.

The lawsuit paints Perplexity as one of the worst offenders.

It accuses the company of claiming compliance with robots.txt, a standard file that tells web crawlers what they can access, while continuing to scrape content in violation of Reddit’s terms.

After Reddit sent a cease-and-desist notice in May 2024, citations to Reddit content on Perplexity “rose fortyfold,” according to the complaint.

The Unsettled Law of Data Ownership

No court has yet drawn a clear boundary around how public web data can be used to train AI. Some judges have leaned toward allowing scraping if the information is publicly visible.

Others have sided with content owners, arguing that copying and repurposing data at scale crosses into infringement.

That uncertainty makes Reddit’s case especially significant. A win could strengthen the ability of platforms and publishers to control their data and demand payment. A loss might encourage more aggressive scraping, reinforcing the idea that “publicly available” equals “free to use.”

The defendants are spread across multiple countries, which complicates enforcement even further. Still, Reddit has signaled it intends to pursue the case to the end, saying it has a duty to protect its users’ contributions.

What Reddit Users Should Take From This

People who post on Reddit may not think much about who reads their comments. But those comments help train AI systems that generate profit. The debate is about fairness. Should everyday conversations be treated as a free training ground for commercial products?

Reddit’s position is that licensing deals allow both sides to benefit. The company gets paid for its data, and partners receive high-quality content with clear permissions. Unauthorized scraping, it says, erases that balance and disrespects the time and creativity of its community.

Critics counter that Reddit’s move toward tighter control goes against its roots as an open platform. They see this as part of a broader trend of corporatizing the internet, where everything is fenced off and monetized. That tension between openness and ownership isn’t going away.

What to Watch Next

The case will likely move slowly through the courts, but its impact could arrive quickly.

AI companies, social platforms, and regulators are watching closely. However the judge rules, this lawsuit will influence how others handle data collection and licensing.

Reddit’s leadership has made its stance clear. “We support innovation,” a spokesperson said, “but respect for creators and communities isn’t optional.”

If nothing else, the case has forced a public reckoning. The casual posts that people make every day now have measurable value in the AI economy. What happens to that value is what this lawsuit will decide.

Practical Advice

To protect your content and navigate potential legal challenges, consider implementing the following strategies:

  1. Review your website’s robots.txt settings to manage which crawlers can access your pages.
  2. Monitor server traffic for automated scraping behavior.
  3. Consider offering licensed data access if your content attracts commercial interest.
  4. Keep records of scraping incidents, they may become evidence in future disputes.
  5. Educate contributors or users about how their posts might be reused by third parties.

Key Takeaways

  • Reddit is suing Perplexity AI and three data-scraping firms for allegedly copying and reselling its user content.
  • The case raises unresolved questions about who owns public online data used for AI training.
  • Perplexity and others deny wrongdoing, arguing that public data shouldn’t be restricted.
  • A court ruling could define new limits on how AI companies collect and use information.
  • Reddit’s users and creators are at the center, as their conversations have become valuable digital assets.
Zulekha

Zulekha

Author

Zulekha is an emerging leader in the content marketing industry from India. She began her career in 2019 as a freelancer and, with over five years of experience, has made a significant impact in content writing. Recognized for her innovative approaches, deep knowledge of SEO, and exceptional storytelling skills, she continues to set new standards in the field. Her keen interest in news and current events, which started during an internship with The New Indian Express, further enriches her content. As an author and continuous learner, she has transformed numerous websites and digital marketing companies with customized content writing and marketing strategies.

Keep Reading

Related Articles