AI Startup Scanned and Destroyed Millions of Books to Train Its Models

AI Startup Scanned

Court records and internal documents show that Anthropic, the AI company behind the chatbot Claude, carried out a project to buy, cut open, and scan millions of print books to feed its artificial intelligence training systems. The project was called Project Panama and was kept secret until legal filings in a copyright lawsuit were unsealed. The details paint a picture of a company racing to build one of the largest book repositories ever assembled for training AI.

In early 2024 Anthropic executives activated Project Panama with the goal of “destructively scanning all the books in the world,” according to internal planning documents. The company purchased physical books in large batches, often from wholesalers and used book sellers. Workers then used industrial cutting machines to remove the spines before scanning the pages into digital formats. After scanning, the original books were recycled or discarded. The cost and total number of books purchased is redacted in court filings, but Anthropic is said to have processed millions of titles before calling the project complete.

Anthropic’s strategy was rooted in a belief shared across the technology industry that high-quality text from novels, academic works, and nonfiction books makes AI models better at understanding and generating language. Internal emails from both Anthropic and other AI firms like Meta describe book access as essential for teaching models to write well and think more deeply than they could from internet text alone. Rather than negotiate licensing agreements with authors and publishing houses, the companies found it more practical to obtain book data through bulk purchases or digital shadow libraries.

Those shadow libraries are collections of digitized books available online without authorization. One such site, LibGen, was directly used by Anthropic’s cofounder to download large amounts of content. Other internal messages show Meta employees wrestling internally over whether to download pirated books before receiving approval from senior leadership.

The copyright lawsuit was brought by thousands of authors who argued that Anthropic and similar companies violated copyright laws by acquiring, scanning, and storing these books without proper permission or payment. Anthropic settled the case in 2025 for $1.5 billion. Although the settlement agreement did not force the company to admit guilt, the documents which became public this month reveal how the company collected information and assessed its potential dangers.

Legal experts say key questions remain unsettled around how AI companies can use books and other copyrighted material. Some judges have ruled that training models on books may fall under fair use because the material is transformed into new outputs. The legal arguments in this case examine two different aspects of the situation because they investigate both the methods used to obtain data and the company’s decision to download pirated content.

The case shows that authors and advocates require specific regulations which will guarantee them fair compensation through complete understanding of AI systems that use their original work. The outcome of this legal conflict will determine how upcoming AI training methods develop together with copyright regulations.

Read More News: Click Here