Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a plain text file placed at the root of your website (e.g. example.com/robots.txt) that tells web crawlers which pages or sections of your site they can or cannot access. It follows the Robots Exclusion Protocol and is the first file most crawlers check before indexing your site.

Question 2

Where do I put my robots.txt file?

Accepted Answer

Your robots.txt file must be placed in the root directory of your website, accessible at https://yourdomain.com/robots.txt. It must be at the top-level domain — placing it in a subdirectory will not work. For most hosting platforms, this means uploading it to your public_html or static files folder.

Question 3

Does robots.txt block pages from appearing in Google?

Accepted Answer

No — robots.txt only prevents crawlers from accessing a page, but Google can still index a URL if other pages link to it. The page may appear in search results with the note "No information is available for this page." To truly prevent indexing, use a "noindex" meta tag or X-Robots-Tag HTTP header instead.

Question 4

Should I block AI crawlers like GPTBot and ClaudeBot?

Accepted Answer

That depends on your goals. Blocking AI crawlers like GPTBot (OpenAI), ClaudeBot (Anthropic), and CCBot (Common Crawl) prevents your content from being used to train large language models. Many publishers and businesses choose to block these crawlers to protect their intellectual property. However, blocking ChatGPT-User will prevent ChatGPT from browsing your site in real-time conversations, which could reduce referral traffic.

Question 5

What is the difference between Disallow and noindex?

Accepted Answer

Disallow in robots.txt tells crawlers not to fetch a page, but it does not remove it from search results. A noindex meta tag or HTTP header tells search engines not to include the page in their index at all. For best results, use noindex when you want to completely hide a page from search results, and Disallow when you want to save crawl budget or prevent certain bots from accessing content.

Question 6

Can robots.txt improve my SEO?

Accepted Answer

Yes — a well-configured robots.txt can improve SEO by directing crawlers to your important pages and away from duplicate content, admin areas, and thin pages. This helps optimise your crawl budget, ensuring search engines spend their time on your most valuable content. Including a Sitemap directive is also an SEO best practice.

Question 7

How do I test if my robots.txt is working?

Accepted Answer

You can test your robots.txt using Google Search Console's "robots.txt Tester" under the Crawl section. Paste your robots.txt content and enter a URL to see if it would be blocked. You can also use our Validator tool above — paste your robots.txt and we'll check for syntax errors, missing directives, and show which AI crawlers are blocked.

Robots.txt Generator & Validator

AI Crawlers

Block AI Training Crawlers

GPTBot

ChatGPT-User

ClaudeBot

CCBot

Google-Extended

FacebookBot

Bytespider

Applebot-Extended

PerplexityBot

Amazonbot

cohere-ai

Search Engine Crawlers

Custom Rules

Sitemap

Crawl Delay

Live Preview

How to Use This Robots.txt Generator

What is Robots.txt and Why It Matters

Should You Block AI Crawlers?

Frequently Asked Questions

Need Technical SEO Help?