Robots.txt Tester
Test if URLs are allowed or blocked by robots.txt. Check user-agent rules, wildcards, and crawl permissions instantly.
📝 Batch Testing
Test multiple URLs at once:
📚 Example Robots.txt
🤖 How Robots.txt Works
- User-agent: Specifies which bot the rules apply to (* = all bots).
- Disallow: Blocks bots from accessing specified paths.
- Allow: Explicitly permits access (overrides Disallow for more specific paths).
- Wildcards: * matches any sequence, $ matches end of URL.
- Priority: Most specific (longest) matching path wins.
- Case-Sensitive: Paths are case-sensitive (/Admin/ ≠ /admin/).
- Sitemap: Tells bots where to find your XML sitemap.
- Crawl-delay: Sets delay between requests (not supported by Googlebot).
How to Use Robots.txt Tester to Check Crawl Permissions
Test if URLs are allowed or blocked by robots.txt. Check user-agent specific rules, wildcard patterns, and crawl permissions. Supports Googlebot, Bingbot, and all major search engines. Free robots.txt validation tool.
Getting Started
Test robots.txt rules in seconds.
- Paste Robots.txt: Copy your robots.txt content into the text area.
- Enter URL: Type the path you want to test (e.g., /admin/page.php).
- Select User-Agent: Choose which bot to test (Googlebot, Bingbot, etc.).
- Click Test: Tool analyzes and shows if URL is allowed or blocked.
- Check Result: Green = allowed, Red = blocked.
- View Matched Rule: See exactly which rule applies.
- See All Rules: Review all rules for selected user-agent.
- Batch Test: Test multiple URLs at once for efficiency.
Understanding Robots.txt
How robots.txt controls search engine crawlers.
- Purpose: Tells search bots which pages they can or cannot crawl.
- Location: Must be at website root: https://example.com/robots.txt
- Not Security: Robots.txt is not access control - it's a request to bots.
- Respectful Bots: Major search engines follow robots.txt rules.
- Public File: Anyone can view your robots.txt file.
- Syntax Matters: Small errors can block or expose unintended pages.
- Case Sensitive: /Admin/ and /admin/ are different paths.
- Priority Rules: Most specific (longest) matching path wins.
User-Agent Directive
Specifying which bots rules apply to.
- Syntax: User-agent: Googlebot
- Wildcard: User-agent: * applies to all bots.
- Case Insensitive: Googlebot = googlebot = GOOGLEBOT.
- Specific Bots: Can target individual crawlers.
- Multiple Agents: Separate user-agent blocks for different bots.
- Order Matters: Tool matches user-agent, then applies its rules.
- Common Bots: Googlebot, Bingbot, Slurp (Yahoo), DuckDuckBot.
- Fallback: If no specific match, uses * rules if present.
Allow and Disallow
Core directives for controlling access.
- Disallow: Blocks bots from specified paths.
- Allow: Explicitly permits access (overrides Disallow).
- Disallow: / blocks entire site.
- Disallow: (empty) allows everything.
- Allow: / allows entire site.
- Path Prefix: Disallow: /admin/ blocks /admin/page.php
- Specificity Wins: Allow: /admin/public/ overrides Disallow: /admin/
- Case Sensitive: Disallow: /Admin/ does not block /admin/
Wildcard Patterns
Using * and $ for flexible matching.
- Asterisk (*): Matches any sequence of characters.
- Example: Disallow: /*.pdf$ blocks all PDF files.
- Example: Disallow: /temp/* blocks everything in /temp/
- Example: Disallow: /*?utm_ blocks URLs with utm_ parameters.
- Dollar Sign ($): Matches end of URL.
- Example: Disallow: /file.pdf$ blocks exactly /file.pdf
- Without $: Disallow: /file.pdf blocks /file.pdf.html too.
- Combined: /*admin* blocks any path containing "admin"
Rule Priority
How conflicting rules are resolved.
- Longest Match Wins: Most specific path takes precedence.
- Example: Allow: /admin/public/ overrides Disallow: /admin/
- Path Length: /admin/settings/ (16 chars) beats /admin/ (7 chars).
- Allow vs Disallow: If same length, Allow takes precedence.
- No Match: If no rule matches, default is allow.
- Wildcards Count: Character length without wildcards.
- Example Priority: /admin/settings/ > /admin/* > /admin/ > /*
- Tool Shows: Which rule matched and why.
Sitemap Directive
Telling bots where to find your sitemap.
- Syntax: Sitemap: https://example.com/sitemap.xml
- Multiple Sitemaps: Can list multiple sitemap URLs.
- Full URL: Must be complete URL including https://
- Not User-Agent Specific: Applies to all bots.
- Location: Can be anywhere in robots.txt.
- Not Mandatory: Optional but recommended for SEO.
- XML Format: Typically points to XML sitemap file.
- Tool Displays: All sitemap URLs found.
Batch URL Testing
Test multiple URLs at once.
- Input Format: One URL per line in batch text area.
- Efficiency: Test 10, 20, or 100 URLs in one click.
- Same User-Agent: All URLs tested with selected user-agent.
- Quick Overview: See allowed/blocked count at top.
- Color Coded: Green for allowed, red for blocked.
- Use Cases: Audit entire site sections, verify rule changes.
- Export Ready: Copy results for documentation.
- Example: Test all /admin/, /api/, /private/ paths together.
Common Robots.txt Patterns
Frequently used configurations.
- Block Admin: Disallow: /admin/
- Block Private: Disallow: /private/
- Allow Public in Admin: Allow: /admin/public/ then Disallow: /admin/
- Block PDFs: Disallow: /*.pdf$
- Block Parameters: Disallow: /*?sort= for filtering URLs.
- Block Temp Files: Disallow: /temp/*
- Block Search Results: Disallow: /search?
- Allow Images: User-agent: Googlebot-Image then Allow: /
Syntax Validation
Checking for robots.txt errors.
- Missing User-Agent: Allow/Disallow without User-agent is invalid.
- Unknown Directives: Tool warns about unrecognized commands.
- Line Numbers: Warnings show exact line number.
- Typos: Common mistakes like "Dissallow" are caught.
- Order Issues: Allow/Disallow before User-agent is wrong.
- Empty Values: Disallow: (empty) means allow everything.
- Comments: Lines starting with # are ignored.
- Tool Validates: Checks syntax and shows warnings.
Testing Strategy
How to effectively test robots.txt.
- Test Key Paths: Admin areas, private folders, API endpoints.
- Test Both: Check paths you want blocked AND allowed.
- Multiple User-Agents: Test Googlebot, Bingbot, etc. separately.
- Edge Cases: Test paths with special characters, parameters.
- After Changes: Always test after modifying robots.txt.
- Before Deploy: Validate rules before uploading to live site.
- Batch Testing: Use batch mode for comprehensive audit.
- Document Results: Keep record of tested URLs and outcomes.
Common Mistakes
Frequent robots.txt errors to avoid.
- Blocking CSS/JS: Google needs these for rendering pages.
- Blocking Entire Site: Disallow: / prevents all crawling.
- Forgetting Slash: /admin means /admin.html too, use /admin/
- No Wildcards: /temp blocks /temp only, use /temp/* for folder.
- Case Errors: /Admin/ does not block /admin/
- Security Misconception: Robots.txt is not access control.
- Exposing Secrets: Disallow reveals what you want hidden.
- Wrong Location: robots.txt must be at domain root.
FAQ
How accurate is this robots.txt tester?
Can this tool fetch robots.txt from my live website?
What does "longest matching path wins" mean?
Why does * (all bots) not apply when I select Googlebot?
Does robots.txt block users from accessing pages?
What is the difference between Allow and Disallow?
Can I use robots.txt to remove pages from Google?
Should I block /admin/ in robots.txt?
What happens if I have syntax errors in robots.txt?
Can I test URLs with query parameters?
Related tools
Pro tip: pair this tool with Robots.txt Generator and Canonical Pagination Conflict Checker for a faster SEO workflow.