Robots.txt for AI
Robots.txt is a text file at the root of your domain that tells web crawlers which pages they can and cannot access. In GEO, robots.txt configuration determines which AI platforms can discover and cite your content. Blocking an AI crawler in robots.txt is functionally equivalent to telling that AI platform your content does not exist. Every AI platform operates its own crawler, and each must be explicitly allowed.
AI Crawlers and Their Robots.txt User-Agents
- GPTBot and OAI-SearchBot (OpenAI): GPTBot crawls for training data. OAI-SearchBot crawls for real-time ChatGPT Browse retrieval. Allow both for maximum ChatGPT visibility.
- Bingbot (Microsoft): Powers Bing search, Microsoft Copilot, and feeds ChatGPT and Meta AI web retrieval. Blocking Bingbot blocks you from three major AI platforms simultaneously.
- ClaudeBot (Anthropic): Crawls for Claude’s retrieval system. Strong quality preference in source selection.
- PerplexityBot (Perplexity AI): Real-time crawl indexing. Always shows source links in responses, making it the most measurable AI platform.
- Googlebot (Google): Powers traditional search and Google AI Overviews. Already allowed on most sites.
- Meta-ExternalAgent (Meta): Crawls for Llama model training and Meta’s emerging proprietary search index. Can be aggressive with crawl volume.
- CCBot (Common Crawl): Open dataset used by multiple AI companies. Blocking CCBot reduces your presence across several AI systems.
Configuration Best Practices
The default GEO recommendation is to allow all known AI crawlers unless you have a specific reason to block one. Many AI crawlers do not execute JavaScript, so content must be available in the initial HTML response. If a specific crawler causes excessive server load (Meta-ExternalAgent is known for high request volumes), rate-limit it at the CDN level rather than blocking entirely. Monitor server logs by filtering user-agent strings to understand which AI crawlers are accessing your content and how frequently.
For the complete technical optimization framework, see the Generative Engine Optimization guide.
Related: AI Crawler · Bing Index · Meta AI · ChatGPT Browse


