Ready to refine your crawl controls on your site? Understanding your robots.txt file format gives you clear boundaries for bots. In this guide you’ll map every directive, test for errors, and lock in best practices. By the end you’ll have a bulletproof file guiding your key pages.
Understand robots.txt basics
Define file role
It sits at your domain root and applies per subdomain (Google Developers). It tells crawlers which sections to access and which to skip.
Locate your file
Locate your robots.txt at https://yourdomain.com/robots.txt for bots to find it. Learn how to verify your file path in our robots.txt file location post.
Explain file format structure
Required commands
According to seoClarity, robots.txt files require User-agent and Disallow commands.
- User-agent: names the crawler this rule applies to.
- Disallow: lists paths you want to block.
Optional commands
Add Allow to permit specific paths inside a blocked directory. Include Sitemap to point crawlers to your index.
- Allow: lets bots access subfolders under a disallowed path.
- Sitemap: declares your sitemap URL for faster discovery.
Avoid Noindex in robots.txt; use a meta-robots noindex tag instead.
Detail user-agent directive
Name specific crawlers
Define each crawler group with a User-agent line.
User-agent: Googlebot
Understand case sensitivity
Match your User-agent strings exactly, including letter case. Avoid mismatches or bots will ignore your Disallow rules.
Explore disallow and allow rules
Block entire directories or pages
Use Disallow to stop bots from crawling resource paths.
Disallow: /private/
Permit specific paths
Use Allow to override a Disallow pattern for subpaths.
Allow: /private/public-report.html
| Directive | Purpose | Example |
|---|---|---|
| Disallow | Block resource path | Disallow: /secret/ |
| Allow | Permit resource path | Allow: /secret/open/ |
Add sitemaps and extras
Include sitemap location
Add Sitemap at the bottom to guide crawlers to your map.
Sitemap: https://www.example.com/sitemap.xml
Use comments and wildcards
Prefix lines with # to add notes without affecting rules. Use * to match any sequence and $ to mark URL ends.
- Comments: write # before any text to document your file.
- Wildcards: use * to cover multiple URL patterns.
- End markers: use $ for exact path matches.
Follow size limits
Keep your robots.txt under 500 kibibytes or Google may ignore extra rules (Conductor).
Apply format in practice
Draft your own file
Open a plain text editor and paste your directives. Save the file as UTF-8 encoded robots.txt.
- Open Notepad, TextEdit, vi, or emacs.
- Paste directives following your plan.
- Save and upload to your domain root.
Check our how to create robots.txt file tutorial for detailed steps.
Integrate in WordPress
Use plugin settings or manual upload to set your file. Explore robots.txt file wordpress for tips.
Track and test your file
Use Search Console
Go to Google Search Console and open the robots.txt tester. Fetch and simulate crawls to spot issues.
Validate syntax
Run your file through a validator or try our robots.txt file generator to auto-check patterns.
- Check for typos in directive names.
- Confirm wildcard behavior matches your plan.
- Ensure no URLs are blocked by mistake.
Checkpoint: Confirm each directive produces the intended crawl outcome.
You’ve built a rock-solid file guiding every crawler. Plan your next crawl test and track any changes. Take charge by uploading your file and testing it in Google Search Console within 24 hours.