INSUBCONTINENT EXCLUSIVE:

But if you're not intimately familiar with the AI industry and copyright, you might wonder: Why would a company spend millions of dollars on

books to destroy them? Behind these odd legal maneuvers lies a more fundamental driver: the AI industry's insatiable hunger for high-quality

text.To understand why Anthropic would want to scan millions of books, it's important to know that AI researchers build large language

models (LLMs) like those that power ChatGPT and Claude by feeding billions of words into a neural network

During training, the AI system processes the text repeatedly, building statistical relationships between words and concepts in the

process.The quality of training data fed into the neural network directly impacts the resulting AI model's capabilities

Models trained on well-edited books and articles tend to produce more coherent, accurate responses than those trained on lower-quality text

like random YouTube comments.Publishers legally control content that AI companies desperately want, but AI companies don't always want to

negotiate a license

That meant buying physical books offered a legal workaround.And yet buying things is expensive, even if it is legal

So like many AI companies before it, Anthropic initially chose the quick and easy path

In the quest for high-quality training data, the court filing states, Anthropic first chose to amass digitized versions of pirated books to

But by 2024, Anthropic had become "not so gung ho about" using pirated ebooks "for legal reasons" and needed a safer source.

Anthropic destroyed millions of print books to build its AI models