Make WordPress Core

Opened 4 months ago

#60805 new feature request

Reading Settings: add option to discourage AI services from crawling the site

Reported by: jeherve's profile jeherve Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Privacy Keywords:
Focuses: privacy Cc:

Description

I'd like to suggest a new addition to the bottom of the Reading Settings screen in the dashboard:

https://cldup.com/p6xw24IFff.png

This new section would help site owners indicate whether or not they would like their content to be indexed by AI services and used to train future AI models.

There have been a lot of discussions about this in the past 2 years: content creators and site owners have asked whether their work could and should be used to train AI. Opinions vary, but at the end of the day I believe most would agree that as a site owner, it would be nice if I could choose for myself, for my own site.

In practice, I would imagine the feature to work just like the Search Engines feature just above: when toggled, it would edit the site's robots.txt file and disallow a specific list of AI services from crawling the site.


There are typically 4 main approaches to discouraging AI Services from crawling your site:

  1. You can add robots.txt entries matching the different User Agents used by AI services, and asking them not to index content via a Disallow: /.
    • This seems to be the cleanest approach, and the one that AI services are the most likely to respect.
    • This also has an important limitation ; it relies on a list of AI User Agents that would have to be kept up to date. It would obviously be hard for that list to ever be fully exhaustive. See an example of the user agents we would have to support below.
  2. You can add an ai.txt file to your site, as suggested by Spawning AI here.
    • However, we have no guarantee AI services currently recognize and respect this file.
  3. You could add a meta tag to your site's head: <meta name="robots" content="noai, noimageai" />. This is something that was apparently first implemented by DeviantArt.
    • I do not know if this is actually respected by AI services. It is not an HTML standard today. In fact, discussions for a new HTML standard are still in progress, and suggest a different tag (reference).
    • If a standard like that were to be accepted, and if AI Services agreed to use it, it may be the best implementation in the future since we would not have to define a list of AI services.
  4. You can completely block specific User Agents from accessing the site.
    • I believe we may not want to implement something that drastic and potentially blocking real visitors in WordPress Core. This is something that is better left to plugins.

Some plugins already exist that implement some of the approaches above. It shows that there may be interest to include such a feature in Core.


If we were to go with the first option, here are some examples of the User Agents we would have to support:

This list could be made filterable so folks can extend or modify that list as they see fit.

Attachments (1)

options-reading.png (154.6 KB) - added by jeherve 4 months ago.
Mockup of how such an option would look like in the WordPress dashboard

Download all attachments as: .zip

Change History (1)

@jeherve
4 months ago

Mockup of how such an option would look like in the WordPress dashboard

Note: See TracTickets for help on using tickets.