**Adobe is once again under scrutiny as a proposed class action lawsuit accuses the company of using pirated books to train an AI model, adding to a growing list of legal challenges facing the generative AI industry.**

![ Adobe Faces New Lawsuit Over AI Training Data](https://www.stanventures.com/news/wp-content/uploads/2025/12/ChatGPT-Image-Dec-18-2025-06_47_24-PM-300x200.png)

Adobe’s push into artificial intelligence has helped it roll out tools like Firefly and other AI-powered features across its software lineup. Now, that expansion has landed the company in legal trouble.

A proposed class action [lawsuit](https://fingfx.thomsonreuters.com/gfx/legaldocs/movabyadapa/ADOBE%20AI%20COPYRIGHT%20LAWSUIT%20complaint.pdf) filed on behalf of Oregon-based author Elizabeth Lyon claims Adobe used unauthorized copies of books to train [SlimLM](https://arxiv.org/html/2411.09944v1), a small language model designed to support document-related tasks on mobile devices. 

The case was first reported by Reuters and centers on whether copyrighted material was used without permission during the model’s development.

## How Slimlm Became the Focus

According to Adobe, SlimLM was trained using SlimPajama-627B, an open-source dataset released by Cerebras in June 2023. 

Adobe has described SlimPajama as a deduplicated dataset compiled from multiple sources to support efficient AI training.

The lawsuit challenges that description. 

Lyon alleges that SlimPajama is derived from RedPajama, another well-known dataset that has already drawn legal scrutiny. 

RedPajama is widely believed to include Books3, a massive collection of around 191,000 books that critics say contains copyrighted works shared without authorization.

The complaint argues that because SlimPajama is built from RedPajama, it also includes Books3. That would mean SlimLM was trained, at least in part, on copyrighted books written by Lyon and other authors, without consent, credit, or compensation.

## A Familiar Pattern for AI Companies

Adobe is not alone in facing these accusations. Books3 and RedPajama have appeared repeatedly in lawsuits involving major tech companies.

Earlier this year, Apple was sued over claims that its Apple Intelligence model relied on copyrighted material pulled from similar datasets. 

Salesforce faced a comparable lawsuit months later, also tied to RedPajama. Each case points to the same underlying issue: the use of large, loosely vetted datasets to train AI systems at scale.

As[generative AI has spread](https://www.stanventures.com/news/ai-traffic-surges-527-in-2025-seo-strategies-face-a-radical-rewrite-3836/) quickly, lawsuits like these have become more common. 

Training modern language models requires enormous volumes of text, and many companies have relied on open-source or publicly available datasets that may include material of uncertain origin.

## Why This Case Matters

Adobe has often emphasized its commitment to responsible AI and creator-friendly practices. This lawsuit puts those claims under pressure by questioning whether the company fully understood what was inside the datasets it used.

The outcome could have implications beyond Adobe. 

If courts hold companies responsible for copyrighted material in third-party datasets, AI developers may need to rethink data sourcing.That shift could lead to deeper audits, clearer documentation, or new licensing deals with publishers and authors.

## The Broader Legal Backdrop

In September, Anthropic agreed to pay $1.5 billion to settle claims from authors who accused the company of using pirated books to train its Claude chatbot. That settlement was widely seen as a signal that copyright disputes over AI training data can carry serious financial consequences.

For authors, these lawsuits represent an effort to regain control over how their work is used. For AI companies, they highlight the growing cost of moving fast without clear guardrails around data use.

## Guidance For Businesses Working With AI

Companies that build or deploy AI systems should start by closely examining the data sources behind their models. As legal scrutiny increases, knowing exactly where training data comes from is no longer optional.

At the same time, relying on open-source datasets does not automatically eliminate risk. Many widely used datasets still include material with unclear ownership or uncertain licensing terms.

Because of this, businesses should focus on tighter safeguards. Clear documentation of training data is essential. Stronger vetting processes can catch issues early. In some cases, licensing content from rights holders may further reduce legal exposure.

## Key Takeaways

- Adobe is facing a proposed class action over the alleged use of pirated books in AI training
- The lawsuit focuses on SlimLM and its reliance on the SlimPajama dataset
- SlimPajama is accused of containing copyrighted books through RedPajama and Books3
- Similar claims have targeted Apple, Salesforce, and other AI developers
- The case could influence how AI companies handle training data going forward