Contact Us About Us
Log In
4 min read

Adobe Faces New Lawsuit Over AI Training Data

View as Markdown

Adobe is once again under scrutiny as a proposed class action lawsuit accuses the company of using pirated books to train an AI model, adding to a growing list of legal challenges facing the generative AI industry.

 Adobe Faces New Lawsuit Over AI Training Data

Adobe’s push into artificial intelligence has helped it roll out tools like Firefly and other AI-powered features across its software lineup. Now, that expansion has landed the company in legal trouble.

A proposed class action lawsuit filed on behalf of Oregon-based author Elizabeth Lyon claims Adobe used unauthorized copies of books to train SlimLM, a small language model designed to support document-related tasks on mobile devices.Β 

The case was first reported by Reuters and centers on whether copyrighted material was used without permission during the model’s development.

How Slimlm Became the Focus

According to Adobe, SlimLM was trained using SlimPajama-627B, an open-source dataset released by Cerebras in June 2023.Β 

Adobe has described SlimPajama as a deduplicated dataset compiled from multiple sources to support efficient AI training.

The lawsuit challenges that description.

Lyon alleges that SlimPajama is derived from RedPajama, another well-known dataset that has already drawn legal scrutiny.

RedPajama is widely believed to include Books3, a massive collection of around 191,000 books that critics say contains copyrighted works shared without authorization.

The complaint argues that because SlimPajama is built from RedPajama, it also includes Books3. That would mean SlimLM was trained, at least in part, on copyrighted books written by Lyon and other authors, without consent, credit, or compensation.

A Familiar Pattern for AI Companies

Adobe is not alone in facing these accusations. Books3 and RedPajama have appeared repeatedly in lawsuits involving major tech companies.

Earlier this year, Apple was sued over claims that its Apple Intelligence model relied on copyrighted material pulled from similar datasets.Β 

Salesforce faced a comparable lawsuit months later, also tied to RedPajama. Each case points to the same underlying issue: the use of large, loosely vetted datasets to train AI systems at scale.

As generative AI has spread quickly, lawsuits like these have become more common.Β 

Training modern language models requires enormous volumes of text, and many companies have relied on open-source or publicly available datasets that may include material of uncertain origin.

Why This Case Matters

Adobe has often emphasized its commitment to responsible AI and creator-friendly practices. This lawsuit puts those claims under pressure by questioning whether the company fully understood what was inside the datasets it used.

The outcome could have implications beyond Adobe.Β 

If courts hold companies responsible for copyrighted material in third-party datasets, AI developers may need to rethink data sourcing.
That shift could lead to deeper audits, clearer documentation, or new licensing deals with publishers and authors.

The Broader Legal Backdrop

In September, Anthropic agreed to pay $1.5 billion to settle claims from authors who accused the company of using pirated books to train its Claude chatbot. That settlement was widely seen as a signal that copyright disputes over AI training data can carry serious financial consequences.

For authors, these lawsuits represent an effort to regain control over how their work is used. For AI companies, they highlight the growing cost of moving fast without clear guardrails around data use.

Guidance For Businesses Working With AI

Companies that build or deploy AI systems should start by closely examining the data sources behind their models. As legal scrutiny increases, knowing exactly where training data comes from is no longer optional.

At the same time, relying on open-source datasets does not automatically eliminate risk. Many widely used datasets still include material with unclear ownership or uncertain licensing terms.

Because of this, businesses should focus on tighter safeguards. Clear documentation of training data is essential. Stronger vetting processes can catch issues early. In some cases, licensing content from rights holders may further reduce legal exposure.

Key Takeaways

  • Adobe is facing a proposed class action over the alleged use of pirated books in AI training
  • The lawsuit focuses on SlimLM and its reliance on the SlimPajama dataset
  • SlimPajama is accused of containing copyrighted books through RedPajama and Books3
  • Similar claims have targeted Apple, Salesforce, and other AI developers
  • The case could influence how AI companies handle training data going forward
Zulekha

Zulekha

Author

Zulekha is an emerging leader in the content marketing industry from India. She began her career in 2019 as a freelancer and, with over five years of experience, has made a significant impact in content writing. Recognized for her innovative approaches, deep knowledge of SEO, and exceptional storytelling skills, she continues to set new standards in the field. Her keen interest in news and current events, which started during an internship with The New Indian Express, further enriches her content. As an author and continuous learner, she has transformed numerous websites and digital marketing companies with customized content writing and marketing strategies.

Keep Reading

Related Articles

Link Building Vendor Scorecard
Built from auditing 40+ vendors
⏸️

Wait. You're This Close to Your Score.

You've answered several out of 20 questions. Just a few more and you'll see your full vendor scorecard.

If you leave now, you won't see how your vendor stacks up against industry standards, where your biggest risk gaps are, or what your peers are doing differently. Finish the last few questions to unlock your complete report.