Skip to main content

There is no doubt that AI is a fantastic tool. With the main AI platforms ChatGPT and Googles Gemini & Vertex AI causing huge disruptions and changes throughout the web.

However these Large Language tools are crawling the internet harvesting information from websites in order to train themselves and accumulate information.

That includes your businesses website! Now you maybe fine having this information collected, however like all things, it is good to know you have the option of stopping this should you not want your site to participate.

Solution: Use Robots.txt to stop crawlers

Thankfully these AI platforms allow you to block them from crawling your website. To block you need to edit your Robots.TXT file and only takes a few minutes (plus you don’t need to be a web developer to do it!).

First of take a look at your robots.txt file to see how it is currently set. The robots.txt file can be found in the root directory of your domain.
To view it, simply enter your domain name followed by /robots.txt in your browser’s address bar.
For instance, you would type https://yourdomain.com/robots.txt. Once you do this, your robots.txt file will be displayed.

WordPress website? RankMath

If you have a WordPress website, then we highly recommend using SEO plugin RankMath. This plugin allows easy editing of the robots.txt file directly from the plugins settings. You can see a full step by step guide at the RankMath website.

If you don’t have this setup you can create a new robots.txt file in a text editor on your computer, then upload this file via FTP to your websites root folder on its web hosting server.

Googles Gemini and Vertex AI models

svg+xml;charset=utf

If you prefer that Google does not utilize your website’s content to train its Gemini and Vertex AI models, you can disallow its user-agent. However, please be aware that this action will not stop Google from crawling and indexing your website’s content for search results.

Additionally, it will not prevent your content from being featured in Google’s AI Overviews within search results.

See below for the text to add to your Robots.txt file:

User-agent: Google-Extended
Disallow: /

OpenAI’s ChatGPT and others

svg+xml;charset=utf

As with Googles AI, if you wish to prevent your content from being utilized to train an AI model, you can disallow the relevant user agent. F

or example, you can block OpenAI from using your content to train its AI models, including ChatGPT, by implementing the rule provided below.

See below for the text to add to your Robots.txt file:

User-agent: GPTBot
Disallow: /

Note: This rule will also prevent ChatGPT users from utilizing the ChatGPT-User bot to access your site.

Lastly: Modify & verify your robots.txt file

It is important to treat edits to the robots.txt file with caution as it has the power to break or make your website!

After editing the robots.txt file, you can simulate how Google bots crawl the pages on your website. This allows you to verify whether they can access specific pages by using a robots.txt tester.

If you have any questions about editing your robots.txt file or facing issues while editing, please get in touch and we can help you out.

Interested in our services?
Book a discovery call now