Recently disclosed documents in the lawsuit filed by the Authors Guild against OpenAI indicate that the company deleted two extensive datasets, referred to as “books1” and “books2,” which were utilized in training its GPT-3 AI model.
Legal representatives for the Authors Guild stated in court documents that these datasets likely comprised “more than 100,000 published books” and are crucial to their claims that OpenAI utilized copyrighted materials in training AI models.
The Guild has been pursuing information from OpenAI regarding these datasets for several months. Initially, OpenAI resisted disclosing this information, citing concerns about confidentiality. However, it eventually acknowledged that it had erased all copies of the data, as outlined in the legal documents reviewed by Business Insider.