In response to gathering legal efforts to rein in its data collection, OpenAI is arguing that the creation of advanced generative AI (genAI) tools is unfeasible without the use of copyrighted content to train them.
In a report to the UK’s House of Lords Communications and Digital Select Committee, OpenAI said that training extensive large language models (LLMs) such as GPT-4, the underlying technology of ChatGPT, would be impossible without the use of copyrighted materials.
“Because copyright today covers virtually every sort of human expression — including blog posts, photographs, forum posts, scraps of software code, and government documents — it would be impossible to train today’s leading AI models without using copyrighted materials,” OpenAI said in its submission.