The New York Times has sued OpenAI and Microsoft for copyright infringement, alleging that the companies' AI technology illegally copied millions of Times articles to train ChatGPT and other services to provide people with instant access to information — technology that now competes with… The Times.
The complaint is The latest in a series of lawsuits Which seeks to limit the use of purported scraping of large swaths of online content – without compensation – to train so-called big-language AI models. actors, Writers, journalists and other creative types Those who publish their work online fear that artificial intelligence will learn from their material and provide them with competitive chatbots and other sources of information without adequate compensation.
But the Times' lawsuit is the first among major news publishers to compete with OpenAI and Microsoft, the two most well-known AI brands. Microsoft (MSFT) has a seat on OpenAI's board of directors and a multibillion-dollar investment in the company.
In a complaint filed Wednesday, the Times said it had a duty to inform its subscribers, but that “Microsoft and OpenAI's unlawful use of the Times' work to create artificial intelligence products that compete with it threatens the Times's ability to provide that service.” The newspaper noted that OpenAI and Microsoft used other sources for the “large-scale copying,” but they “gave the Times’ content special focus” seeking to “free-ride on the Times’s massive investment in its journalism by using it to build alternative products without permission or a premium.”
“We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models,” OpenAI said in a statement from spokesperson Lindsey Held. “Our ongoing conversations with The New York Times have been productive and moving forward constructively, so we are surprised and disappointed by this development. We hope to find a mutually beneficial way to work together, as we do with many other publishers.”
Microsoft did not respond to a request for comment on the lawsuit.
In its complaint, the Times said it objected when it discovered months ago that its work had been used to train large corporate language models. The Times said it began in April Negotiate with OpenAI and Microsoft to obtain fair compensation and determine the terms of the agreement.
But The Times claims it has been unable to reach a solution with the companies. Microsoft and OpenAI claim that the Times' works qualify as “fair use,” giving them the ability to use copyrighted material for “transformative purposes,” the complaint states.
The Times strongly disputed this claim, saying that ChatGPT and Microsoft's Bing chatbot (also known as “copilot”) could provide a similar service as The New York Times.
“There is nothing 'transformative' about using Times content without compensation to create products that replace The Times and steal audiences from it,” the newspaper said in its complaint. “Because the outputs of the defendants’ GenAI models compete with and closely mimic the inputs used to train them, copying Times works for this purpose is not fair use.”
The Times is among a number of leading newsrooms, including CNN, which earlier this year Added code to their website Which prevents OpenAI's web crawler, GTBot, from scanning their platforms for content.
In separate but related lawsuits earlier this year, comedian Sarah Silverman and two authors were charged File a lawsuit against Meta and OpenAI In July, she alleged that the companies' AI language models were trained on copyrighted material written by her without her knowledge or consent. Neither company commented on the lawsuit. Judge in November It was rejected Most lawsuit claims.
A group of famous fiction writers joined the Authors Guild in filing a separate lawsuit Lawsuit Against OpenAI in September, On the grounds of the company's technology Illegally uses their copyrighted works.
The Times alleges in its lawsuit that the datasets used to train OpenAI's latest large language models, which power its AI tools, “were likely to have been used by millions of Times-owned businesses.” In a 2019 English-language snapshot of one such data set — called Common Crawl and known as “a copy of the Internet” — the New York Times website is the third most representative source of information, after Wikipedia and the US Patent Documents Database, according to the complaint.
The Times claims that because its AI tools are trained on its content, they can “generate output that reads the Times’ content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by dozens of examples… These tools also falsely attribute false information to the Times.” , the complaint states.
In one case cited in the complaint, ChatGPT provided a user with the first three paragraphs of the 2012 Pulitzer Prize-winning article “Snow Fall: The Avalanche at Tunnel Creek,” after the user complained in the chat about being hit by a paywall at The Times. And not being able to read it.
The news outlet also claims that Microsoft's Bing search engine, which was He was promoted earlier this year Using OpenAI technology, it “transcribes and sorts” the Times’ content to produce longer, more detailed responses than traditional search engines.
“By making Times content available without permission or authorization from The Times, Defendants undermine and harm The Times’ relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenues,” the complaint said.
But fighting AI is like sticking a finger in a dam. It's coming, and publishers like The New York Times know they'll have to embrace the future. They just want to make sure it's a future in which they get fair compensation, the New York Times said.
“We recognize the potential,” New York Times executive vice president and general counsel Diane Brighton told the newspaper's staff in a memo Wednesday morning [generative AI] To the public and the press.”
“But at the same time, we believe that the success of GenAI and the companies working to develop it should not come at the expense of news organizations,” according to the memo obtained by CNN. “Using our work to create GenAI tools must come with permission and an agreement that reflects the fair value of that work, as the law states.”
In its lawsuit, the newspaper seeks billions of dollars in damages, but it does not specify the compensation it seeks for the alleged infringement of its copyrighted materials. It is also seeking a permanent injunction preventing Microsoft and OpenAI from continuing the alleged infringement. The Times also seeks to “destroy” GPT and any other artificial intelligence models or training datasets its journalism includes.
The Times' lawsuit could ultimately set a precedent for the broader industry, because the question of whether using copyrighted materials to train AI models violates the law is an unsettled legal question, according to Dina Blichstein, a partner in the law firm's Artificial Intelligence and Deep Learning practice group. Heinz Bohn.
“I think there will be a lot of these types of suits that come out, I think eventually [the issue will] “Get it up to the Supreme Court, and at that point we'll have some specific case law,” Blichstein said, adding that right now, “there's nothing specific for big language models and AI just because it's so new.”
This story has been updated with additional developments and context.
“Extreme travel lover. Bacon fanatic. Troublemaker. Introvert. Passionate music fanatic.”