How Traditional Media and AI Firms Are Clashing Over Content Use

Tribune Publishing Company LLC has stepped into federal court in New York with a lawsuit against Perplexity AI, Inc., claiming the startup uses its news content without permission. The case highlights a growing rift between traditional media outlets and artificial intelligence firms that rely on vast amounts of online material to power their tools. What started as a mid-October inquiry from Tribune lawyers has escalated into allegations of verbatim copying, even after Perplexity insisted it only handles factual summaries.

Picture this: a newsroom in Chicago reaches out to a San Francisco AI company, asking if their articles fuel the machine. Perplexity responds that no training data comes from Tribune work, only non-verbatim facts might appear. Yet Tribune lawyers point to search results that mirror their stories word for word, raising questions about what counts as fair use in the age of instant answers. This exchange, detailed in the complaint, underscores the core dispute. Media companies argue their journalism forms the backbone of AI outputs, while tech firms maintain they transform data into something new.

These tensions extend far beyond one skirmish. The New York Times launched its own suit against Perplexity just days ago in the same New York federal court, accusing the firm of pulling entire articles to compete directly with its subscriptions. Earlier, Dow Jones, which owns The Wall Street Journal, filed similar claims against Perplexity in August 2024. A coalition of outlets like Condé Nast, The Atlantic, and The Guardian targeted Cohere, Inc., another AI developer, alleging scraped content trains models that mimic paywalled news. Then there is the high-profile New York Times case from December 2023 against OpenAI and Microsoft, where millions of articles allegedly trained ChatGPT without compensation. Raw Story Media joined the fray early in 2024, focusing on removed copyright markers in AI training data. Over 40 such cases simmer nationwide, with some spilling into Canada and beyond, where publishers sue over web-scraped journalism.

For business readers, the stakes involve more than legal briefs. Publishers face eroded revenue as AI search engines offer free summaries that sidestep ads and subscriptions. Perplexity, founded in 2022 by ex-OpenAI talent, built a tool blending ChatGPT-style responses with web citations, drawing millions of users. Yet critics say this convenience comes at journalism’s expense, with “hallucinations” sometimes falsely tagging real outlets as sources. On the flip side, AI companies contend public web data fuels innovation, much like search engines did decades ago. Courts must now decide if generating answers counts as infringement or protected transformation.

Some media giants pivot to deals instead of disputes. The New York Times inked a multi-year pact with Amazon in May to license content for AI training, starting with recipes and sports from The Athletic. OpenAI and Microsoft have similar arrangements with other newsrooms. These pacts signal a path forward, blending compensation with tech collaboration. Still, smaller outlets like Tribune lack such leverage, amplifying calls for clearer rules on data use.

Perplexity and its peers bet on defenses rooted in fair use precedents, but early rulings show mixed results. A New York court dismissed some claims against OpenAI for lacking proof of intent, yet let others proceed on detailed scraping allegations. As these battles play out, businesses watch closely. Media firms risk obsolescence without safeguards, while AI startups face uncertain costs that could slow growth. The outcomes will shape how content creators and innovators coexist in a data-driven world. 

 

Related posts