How to Stop ChatGPT Indexing Your Content

Content:

When running a website, it is inevitable that a range of bots and crawlers will visit your site.

Web crawlers, which are used to populate search engines, will allow your content to show up in search engine results. If you want your site to show up in search engines, you’ll want these crawlers to pay you a visit.

The growth in AI tools, such as ChatGPT, has created a new type of web crawler – one which learns from your site, and gives information to users without them ever needing to visit your site themselves. For sites which rely on adverts, this is very worriesome.

While the ethics of such bots is up for debate, there is fortunately a way to prevent ChatGPT from using data from your website. Note that this only applies to future iterations of the ChatGPT tool – current versions will have already been trained on existing data.

To control who can access your site, you need to edit (or create) a file named robots.txt. This file should exist in the root directory of your website. In this file, it is possible to control who is able to access your website.

Access control relies on part of a web request known as a user agent. The user agent gives information to the server about the origin of the request. While it’s possible to spoof (fake) a user agent, non-malicious bots will generally have a way of identifying them.

Luckily, ChatGPT is included, and their web crawler will honour the content of your robots.txt file. It’s therefore possible to block the bot behind ChatGPT from accessing your site. Note that the following information is taken from the official OpenAI bot documentation.

To do this, add the following entry to your robots.txt file.

User-agent: GPTBot
Disallow: /

You can then test your site configuration using a robots testing tool, such as the one found here.

Simply enter your site name, and select ‘OpenAI’ as the user agent. OpenAI is the name of the company behind ChatGPT. We use this technique on this very site, the result of which can be seen below.

Robots test result
Result of the robots.txt test for this site

With this included, the data from your site will no longer be used to train ChatGPT. Be aware, though, that this doesn’t impact any other AI-powered tools, which will need their own entries.

If you like what we do, consider supporting us on Ko-fi