Why top publishers are opting out of Apple Intelligence AI data scraping

Apple’s recent efforts to train its AI model, Apple Intelligence, have faced growing resistance from major publishers and platforms, despite the tech giant’s attempts to operate more transparently than competitors like Google. A

pplebot, the web-crawling bot initially designed to power Siri and Spotlight, has been repurposed to gather data for Apple Intelligence, and now includes an extension called Applebot-Extended. This extension allows website owners to block their content from being used in AI training, while still permitting search indexing.

Apple Intelligence

In response to Apple’s introduction of this tool, several high-profile companies, including The New York Times, Facebook, Instagram, and The Financial Times, have chosen to block Applebot-Extended, signaling a clear pushback against AI data scraping. The use of the robots.txt file, a simple text file that tells web crawlers which parts of a site to avoid, has become increasingly common among publishers wary of having their content used without direct compensation.

While Apple’s approach to AI training, which includes offering millions of dollars to publishers for the right to use their content, seems more ethical compared to Google’s stance that all data should be freely available, it hasn’t prevented a notable percentage of websites from opting out. Reports indicate that around 6% to 7% of high-traffic websites are currently blocking Applebot-Extended. In comparison, studies have shown that more than half of the sites checked are blocking OpenAI’s bot, and nearly 43% are blocking Google’s AI-specific crawler, Google-Extended.

This resistance is partly due to concerns about copyright infringement and the potential misuse of content. The New York Times, for instance, is not only blocking Applebot-Extended but also suing OpenAI over the unauthorized use of its content. The newspaper has made it clear that any commercial use of its content requires prior written permission, reflecting a broader trend among publishers to assert control over their intellectual property.

Despite the relatively low percentage of sites currently blocking Applebot-Extended, the numbers are expected to rise as awareness of the tool increases. The current trend suggests that more publishers may follow suit, especially those holding out for potential financial deals with Apple. This situation highlights the growing tension between tech giants and content creators as AI continues to evolve and raise new ethical and legal challenges.

(via Wired)

About the Author

Asma is an editor at iThinkDifferent with a strong focus on social media, Apple news, streaming services, guides, mobile gaming, app reviews, and more. When not blogging, Asma loves to play with her cat, draw, and binge on Netflix shows.