Copywrong: Authors Miss the Mark(et Harm) when Arguing Meta Didn’t Engage in Fair Use
- June 30, 2025
- Snippets
Weighing in just two days after Judge Alsup of the U.S. District Court for the Northern District of California issued his fair use summary judgment opinion in Bartz v. Anthropic, Judge Chhabria (also of the Northern District of California) handed down another fair use summary judgment decision in Kadrey v. Meta Platforms, Inc. The two judges both reached the conclusion that the accused infringers should be granted summary judgment (at least in part in Bartz and in full in Kadrey) that their use of copyrighted material to train generative AI (genAI) models did not infringe under the fair use doctrine. But the judges’ reasoning was substantially different.
The difference in outcomes centered on the defendants’ use of pirated material for model training: Judge Alsup distinguished that material from legally-obtained publications, Judge Chhabria did not. The difference in reasoning used to reach those outcomes was primarily in the judges’ treatment of the “market harm” factor of fair use. When viewed as the beginning of a dialogue between courts on the novel copyright issues raised by genAI, these decisions shed light on how the many current disputes between copyright holders and genAI companies may unfold.
In Kadrey, Judge Chhabria faced dueling summary judgment motions on whether Meta’s use of the plaintiffs’ works to train its Llama large language models (LLMs) constituted fair use.[1] On the record before him – an issue that he stressed repeatedly – he denied the plaintiffs’ motion and granted Meta’s. But Judge Chhabria’s analysis betrayed his skepticism over whether such training is generally lawful, and he repeatedly scolded the plaintiffs for making the wrong arguments. He did so with such detail and specificity that his opinion may serve as a roadmap for current and future litigants not to make the same mistakes.
Thirteen authors sued Meta for copyright infringement, alleging that Meta downloaded their books from “shadow libraries” without permission and used them as training data for Llamas. Shadow libraries are online repositories that provide pirated content for free. Meta conceded copying from the shadow libraries but argued their copying and training was a transformative fair use, while the plaintiffs contended that no fair use defense could apply. Meta also asserted that it had made some effort to license copyrighted works from publishers for training purposes, but had been stymied by the publishers’ lack of responsiveness to their efforts. In the end – unlike Anthropic – Meta simply resorted to downloading millions of books and articles from sources it knew were not licensed. Both sides moved for summary judgment on the fair use issue.
The court applied the four-factor fair use test of 17 U.S.C. § 107. The first factor considers the purpose and character of the use, including whether the use is commercial or for nonprofit educational purposes, and whether it is transformative. A transformative work adds new expression, meaning, or purpose to the original work rather than merely superseding it. The second factor looks at the nature of the copyrighted work, recognizing that fair use is more difficult to establish when the work is highly creative or expressive and less so when the work is factual. The third factor evaluates the amount and substantiality of the portion of the copyrighted work used, both quantitatively and qualitatively, in relation to the original work as a whole and the purpose of the copying. Generally, copying smaller or less significant portions is more likely to favor fair use, unless the copying encompasses the “heart” of the work. The fourth factor examines the effect of the use upon the potential market for or value of the copyrighted work, asking whether the use serves as a substitute or otherwise causes market harm that would undermine the incentive for creation that copyright law is intended to protect.
Purpose and character: Here, Judge Chabbria deemed Meta’s copying from shadow libraries to be “highly transformative.” Meta’s purpose in training LLMs to predict language patterns was fundamentally different from the entertainment or educational purpose of the plaintiffs’ books. However, he explicitly rejected analogies equating LLM training with teaching children to write, criticizing that aspect of Judge Alsup’s decision in Bartz. Particularly, the court appeared to combine aspects of the fourth factor into the first, stating that “using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take.” In doing so, Judge Chhabria noted that while the training of Meta’s LLMs was ultimately intended for a commercial purpose (Meta having forecast that its genAI business may yield up to $1.4 trillion over 10 years), that purpose was too attenuated from the copying itself to outweigh the highly transformative nature of the use. As a result, he found the first factor to weigh in Meta’s favor.
Judge Chhabria also departed from Judge Alsup in refusing to distinguish between “pirated” copies of works downloaded from shadow libraries and copies of legally-obtained works used in teaching an LLM. That issue was outcome determinative for Judge Alsup with regard to the first factor; for Judge Chhabria it was a distinction without a difference. Judge Alsup essentially analyzed the building of a central library as an independent step for copyright purposes; Judge Chhabria collapsed the two steps into one.
The difference in the treatment of pirated works belied a difference in attitude between the judges with regard to the impact of bad faith. As Judge Chhabria noted, it would be involved in the analysis (if anywhere) of the first factor. While he was unsure whether bad faith should have any impact, he found that the subjective bad faith involved in piracy would not tilt the first factor against Meta here. Judge Alsup, in contrast, concluded that bad faith has no role in fair use analysis. But he distinguished between Anthropic’s good faith use of licensed copies and its use of pirated copies. So while Judge Alsup expressly disclaimed considering bad faith, he seemed to make a distinction based on it; Judge Chhabria was open to considering it, but did not let it change his analysis.
Nature of the work: The books, mostly novels, plays, and memoirs, were highly creative and at the core of copyright protection. This factor favored plaintiffs, as in Bartz, but was afforded little weight in the overall analysis.
Amount and substantiality of copying: While Meta copied entire books, the court found this reasonable and necessary for the transformative purpose of LLM training. The court noted that full copying was likely needed to achieve the models’ functionality, echoing the approach in Bartz. Thus, the transformative purpose of the copying heavily influenced the weighing of this factor in Meta’s favor.
Market effect: This was the principal divergence from Bartz. Judge Chhabria emphasized that fair use primarily aims to prevent market substitution. But he found no evidence in the record that would suggest Meta’s use caused market harm to the plaintiffs. Notably, Llama was not shown to reproduce plaintiffs’ text meaningfully – the plaintiffs were unable to get it to reproduce more than 50 words of their works, even with “adversarial” prompting – and the market for licensing books as LLM training data was not a relevant one under fair use principles in Judge Chhabria’s eyes. The court stressed that rightsholders cannot claim harm simply because they were not paid for a transformative use. In contrast, Bartz placed less weight on market harm, viewing the transformative nature of training as outweighing any potential substitution risk.
While Meta prevailed on its fair use defense with respect to reproduction, the court underscored that this ruling binds only these plaintiffs upon this record and does not foreclose claims by other authors whose works were used for training LLMs. The decision also does not establish that Meta’s model training practices are lawful per se; rather, the ruling reflects the plaintiffs’ failure to present evidence of actual or likely market harm from Meta’s use.
While both decisions acknowledged the improper acquisition of copyrighted works (pirated copies) by the genAI company, Kadrey treated this as potentially relevant but not dispositive. Indeed Judge Chhabria seemed to be looking for the plaintiffs to provide some evidence that Meta’s piracy to provide some financial or operational benefit to the shadow libraries, but such evidence was not made available to him. He noted that he could speculate that such harm existed, but his job was to weigh the proofs put in front of him, not speculate. In contrast, Judge Alsup found the act of downloading pirated content to be separable from its use in training.
Nonetheless, this decision was not a full win for Meta. The court left open the plaintiffs’ separate distribution claims (for example, alleged reuploading via torrenting), postponing any ruling on the broader class certification as well.
As with Bartz, this decision will not be the final word on fair use in the context of generative AI training. These opinions reflect only the views of two district judges grappling with complex, novel issues at the poorly-defined intersection of copyright law and genAI. Given the high stakes for both rightsholders and AI developers, and the inconsistent reasoning across these cases, more nuanced and authoritative rulings should be expected in the coming years. These future decisions should focus on issues such as the weight of transformative purpose, the scope of market harm, and the legal significance of using pirated or unlicensed materials as training data. But Judge Alsup in Bartz and Judge Chhabria here have at least started placing landmarks on the roadmap for litigants in these cases.
Toward that end, we leave the reader with a few choice quotes from Judge Chhabria regarding what he characterizes as preventing him from ruling for the plaintiffs:
“As for the potentially winning argument—that Meta has copied their works to create a product that will likely flood the market with similar works, causing market dilution—the plaintiffs barely give this issue lip service, and they present no evidence about how the current or expected outputs from Meta’s models would dilute the market for their own works.”
“The upshot is that in many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission. Which means that the companies, to avoid liability for copyright infringement, will generally need to pay copyright holders for the right to use their materials. But that brings us to this particular case. The above discussion is based in significant part on this Court’s general understanding of generative AI models and their capabilities. Courts can’t decide cases based on general understandings. They must decide cases based on the evidence presented by the parties.”
“[T]his ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.”
While these words may not help all plaintiffs litigating genAI cases before other judges – even other judges in the Northern District of California – they certainly start to define the issues for how to make the case before Judge Chhabria (or any like-minded judge) that training an LLM on copyrighted works is not a fair use.
[1] Meta also filed a motion for summary judgment on the plaintiffs’ claim under the Digital Millennium Copyright Act (DMCA), which Judge Chhabria indicated would be granted in a separate opinion.