Sitting among the upper echelon of Canadian startups, artificial intelligence (AI) company Cohere is at the centre of a legal battle. 

Co-founded by U of T alumni Aidan Gomez, and last valued at 5.5 billion USD, Cohere is facing a lawsuit by a coalition of major news organizations for systematic copyright and trademark infringement. The plaintiffs in the case include the Toronto Star, Forbes, The Atlantic, The Guardian, Vox Media, and Politico

Their lawsuit, filed in the Southern District of New York on February 13, alleges that Cohere unlawfully used the organizations’ content and trademark without compensation.

Understanding the AI landscape 

The legal AI landscape is filled with lawsuits over intellectual property (IP) infringements: when creative or protected work is used without permission. This is not the only recent IP lawsuit against an AI company; in November 2024, a coalition of major news organizations sued OpenAI for scraping. The charges in both cases highlight a broader issue: whether AI model training qualifies as ‘fair use’ under copyright law.

Fair use refers to the legally accepted use of copyrighted works without the owner’s permission, typically when the use benefits the public and does not harm the copyright holder economically. In essence, if the use is fair, it is legal. 

However, applying these principles to generative AI is complicated. Large language models (LLMs) such as OpenAIs must be fed with content across the web through a process called scraping — the process of using software to extract and export data from websites — to build their knowledge base. It remains unclear whether this practice violates copyright law or constitutes fair use.

In an interview with The Varsity, Nisarg Shah, an associate professor at the Department of Computer Science and the research lead for ethics of AI at the Schwartz Reisman Institute for Technology and Society, explained that current copyright frameworks were not designed to handle the nature of AI systems.

“A lot of the current laws are written with deterministic processes in mind,” he said. He added that such processes in the law fail when systems like an LLM operate using randomized processes, making it hard to tell whether their use of copyrighted information violates the laws.

“[How] do you deal with that in a legal framework?”

The lawsuit against Cohere

The lawsuit accuses Cohere of two breaches: systematic copyright and trademark infringement. According to the plaintiffs, Cohere’s AI models were trained using scraped versions of their articles “Without permission or compensation.” 

Their legal documents specifically emphasize the model’s use of the C4 dataset — created by a group of Google engineers — for training: a free resource that contains the plaintiffs’ copyrighted material. Many of the plaintiffs’ articles are behind paywalls, which makes unauthorized usage particularly problematic. 

The lawsuit highlights concerns within the publishing industry over AI companies’ business models that rely on large-scale data scraping of proprietary content. In their complaint, the plaintiffs argued that Cohere exploits publishers’ creative efforts and investments to boost its own profits.

A key allegation distinguishing this case from the OpenAI lawsuit is the claim that Cohere’s LLM commits trademark infringement by manufacturing articles and falsely attributing them to these news organizations. The plaintiffs argued that this misleads the public and damages the organizations’ credibility. 

The plaintiffs are seeking up to $150,000 USD per infringed work and a court order preventing Cohere from using their copyrighted works.

Cohere’s defense response

In a statement to the Financial Post, Cohere spokesperson Josh Gartner called the lawsuit “misguided and frivolous.” He defended the company’s AI training practices, emphasizing their commitment to respecting IP:

“Cohere strongly stands by its practices for responsibly training its enterprise AI. We have long prioritized controls that mitigate the risk of IP infringement and respect the rights of holders,” Gartner said, adding that Cohere “expects the matter to be resolved in our favor.”

Cohere’s Co-Founder and Chief Executive Officer, Aidan Gomez, also weighed in on the matter. On February 8, just days before the lawsuit was filed, he posted on X:

“AI is only as useful as the data it can access, and the systems it can control. If it can’t see the data that answers your question, or can’t control the systems needed to automate, then value is left on the table.”

Next steps

The case is still in its early stages, with court dates yet to be set. Given the high stakes and the number of similar lawsuits, the outcome will set an important precedent for both AI companies and media organizations.

Will Cohere have to strike a deal with news organizations to use their work, as OpenAI did with The Atlantic and Vox Media? Or will the courts side with AI developers, ensuring that Western AI companies remain dominant, especially amid the ongoing ‘AI arms race’ with China?