Free SEO Tool

Robots.txt Generator & Validator

Generate and validate robots.txt files. Block AI crawlers, control search engine access, and protect your content — all for free.

Block AI Training Crawlers

Protect your content from being used to train AI models. Block these crawlers to opt out of AI training datasets.

11 AI crawlers
G

GPTBot

OpenAI

Used by OpenAI to crawl content for training ChatGPT models

C

ChatGPT-User

OpenAI

ChatGPT browsing mode — fetches pages in real time

C

ClaudeBot

Anthropic

Anthropic crawler for Claude AI training data

C

CCBot

Common Crawl

Common Crawl dataset used by many AI companies for training

G

Google-Extended

Google

Controls content used to train Google Gemini AI models

F

FacebookBot

Meta

Meta AI crawler for training Llama and other models

B

Bytespider

ByteDance

ByteDance/TikTok crawler for AI training data

A

Applebot-Extended

Apple

Controls content used to train Apple Intelligence features

P

PerplexityBot

Perplexity

Perplexity AI search and answer engine crawler

A

Amazonbot

Amazon

Amazon crawler for Alexa AI and product training

c

cohere-ai

Cohere

Cohere AI model training data crawler

Live Preview

robots.txt7 lines
1# robots.txt generated by Hand On Web
2# https://www.handonweb.com/tools/robots-txt-generator
3# 2026-02-24
4
5User-agent: *
6Allow: /
7

How to Use This Robots.txt Generator

  1. Choose which crawlers to block. Start with the AI Crawlers section to block bots like GPTBot and ClaudeBot from training on your content. Then review the Search Engine Crawlers to ensure Google, Bing, and others can access your site.
  2. Add custom rules. Use the Custom Rules section to block specific paths like /admin/, /api/, or /private/ from all bots or specific user-agents.
  3. Add your sitemap URL. Enter the full URL of your XML sitemap to help search engines discover your pages more efficiently.
  4. Copy or download. Use the live preview on the right to review your robots.txt in real time. When you're happy, click Copy or Download and upload the file to your website's root directory.
  5. Validate. Switch to the Validator tab to paste and check any existing robots.txt for errors and warnings.

What is Robots.txt and Why It Matters

The robots.txt file is one of the most fundamental yet overlooked parts of technical SEO. It sits at the root of your website and acts as a gatekeeper, telling web crawlers — including Google, Bing, and AI bots — which parts of your site they can and cannot access.

A properly configured robots.txt helps you manage your crawl budget — the number of pages search engines will crawl on your site within a given timeframe. By blocking access to low-value pages like admin panels, search result pages, and staging environments, you ensure crawlers spend their time on the content that matters most for your rankings.

Without a robots.txt, crawlers will attempt to access every URL they discover, which can lead to wasted crawl budget, duplicate content issues, and even sensitive pages being exposed. For any serious website — whether a small business site or a large e-commerce platform — having a well-maintained robots.txt is essential.

Should You Block AI Crawlers?

The rise of large language models like ChatGPT, Claude, and Gemini has created a new category of web crawlers specifically designed to collect training data. Unlike traditional search engine crawlers that index your pages to show them in search results, AI crawlers harvest your content to train machine learning models — often without direct attribution or compensation.

Many website owners and publishers are choosing to block AI crawlers to protect their intellectual property. The New York Times, Reddit, and thousands of other publishers now block GPTBot and similar bots. If your website contains original content, research, or creative work, blocking AI training crawlers is a reasonable step to protect your investment.

However, it's worth noting that blocking ChatGPT-User (the browsing agent) will prevent ChatGPT from fetching your pages during live conversations, which could mean missed referral traffic. Similarly, blocking Google-Extended only affects Gemini AI training — it does not impact your Google Search rankings. Our generator makes it easy to selectively block the crawlers you want while keeping others active.

Frequently Asked Questions

A robots.txt file is a plain text file placed at the root of your website (e.g. example.com/robots.txt) that tells web crawlers which pages or sections of your site they can or cannot access. It follows the Robots Exclusion Protocol and is the first file most crawlers check before indexing your site.
Free consultation

Need Technical SEO Help?

Our SEO experts can audit your robots.txt, fix crawl issues, and optimise your site for search engines and AI visibility.

30-day money-back+44 7471 487274No contracts