SEOlust
Technical

Robots.txt Tester

Test if URLs are allowed or blocked by robots.txt. Check user-agent rules, wildcards, and crawl permissions instantly.

All tools
0 lines • 0 rules
Enter the path you want to test (e.g., /path/to/page.html)

📝 Batch Testing

Test multiple URLs at once:

📚 Example Robots.txt

Basic
Simple allow/disallow rules
Wildcards
Using * and $ patterns
Multi-Agent
Different rules per bot
E-commerce
Online store setup

🤖 How Robots.txt Works

  • User-agent: Specifies which bot the rules apply to (* = all bots).
  • Disallow: Blocks bots from accessing specified paths.
  • Allow: Explicitly permits access (overrides Disallow for more specific paths).
  • Wildcards: * matches any sequence, $ matches end of URL.
  • Priority: Most specific (longest) matching path wins.
  • Case-Sensitive: Paths are case-sensitive (/Admin/ ≠ /admin/).
  • Sitemap: Tells bots where to find your XML sitemap.
  • Crawl-delay: Sets delay between requests (not supported by Googlebot).

How to Use Robots.txt Tester to Check Crawl Permissions

Test if URLs are allowed or blocked by robots.txt. Check user-agent specific rules, wildcard patterns, and crawl permissions. Supports Googlebot, Bingbot, and all major search engines. Free robots.txt validation tool.

Getting Started

Test robots.txt rules in seconds.

  • Paste Robots.txt: Copy your robots.txt content into the text area.
  • Enter URL: Type the path you want to test (e.g., /admin/page.php).
  • Select User-Agent: Choose which bot to test (Googlebot, Bingbot, etc.).
  • Click Test: Tool analyzes and shows if URL is allowed or blocked.
  • Check Result: Green = allowed, Red = blocked.
  • View Matched Rule: See exactly which rule applies.
  • See All Rules: Review all rules for selected user-agent.
  • Batch Test: Test multiple URLs at once for efficiency.

Understanding Robots.txt

How robots.txt controls search engine crawlers.

  • Purpose: Tells search bots which pages they can or cannot crawl.
  • Location: Must be at website root: https://example.com/robots.txt
  • Not Security: Robots.txt is not access control - it's a request to bots.
  • Respectful Bots: Major search engines follow robots.txt rules.
  • Public File: Anyone can view your robots.txt file.
  • Syntax Matters: Small errors can block or expose unintended pages.
  • Case Sensitive: /Admin/ and /admin/ are different paths.
  • Priority Rules: Most specific (longest) matching path wins.

User-Agent Directive

Specifying which bots rules apply to.

  • Syntax: User-agent: Googlebot
  • Wildcard: User-agent: * applies to all bots.
  • Case Insensitive: Googlebot = googlebot = GOOGLEBOT.
  • Specific Bots: Can target individual crawlers.
  • Multiple Agents: Separate user-agent blocks for different bots.
  • Order Matters: Tool matches user-agent, then applies its rules.
  • Common Bots: Googlebot, Bingbot, Slurp (Yahoo), DuckDuckBot.
  • Fallback: If no specific match, uses * rules if present.

Allow and Disallow

Core directives for controlling access.

  • Disallow: Blocks bots from specified paths.
  • Allow: Explicitly permits access (overrides Disallow).
  • Disallow: / blocks entire site.
  • Disallow: (empty) allows everything.
  • Allow: / allows entire site.
  • Path Prefix: Disallow: /admin/ blocks /admin/page.php
  • Specificity Wins: Allow: /admin/public/ overrides Disallow: /admin/
  • Case Sensitive: Disallow: /Admin/ does not block /admin/

Wildcard Patterns

Using * and $ for flexible matching.

  • Asterisk (*): Matches any sequence of characters.
  • Example: Disallow: /*.pdf$ blocks all PDF files.
  • Example: Disallow: /temp/* blocks everything in /temp/
  • Example: Disallow: /*?utm_ blocks URLs with utm_ parameters.
  • Dollar Sign ($): Matches end of URL.
  • Example: Disallow: /file.pdf$ blocks exactly /file.pdf
  • Without $: Disallow: /file.pdf blocks /file.pdf.html too.
  • Combined: /*admin* blocks any path containing "admin"

Rule Priority

How conflicting rules are resolved.

  • Longest Match Wins: Most specific path takes precedence.
  • Example: Allow: /admin/public/ overrides Disallow: /admin/
  • Path Length: /admin/settings/ (16 chars) beats /admin/ (7 chars).
  • Allow vs Disallow: If same length, Allow takes precedence.
  • No Match: If no rule matches, default is allow.
  • Wildcards Count: Character length without wildcards.
  • Example Priority: /admin/settings/ > /admin/* > /admin/ > /*
  • Tool Shows: Which rule matched and why.

Sitemap Directive

Telling bots where to find your sitemap.

  • Syntax: Sitemap: https://example.com/sitemap.xml
  • Multiple Sitemaps: Can list multiple sitemap URLs.
  • Full URL: Must be complete URL including https://
  • Not User-Agent Specific: Applies to all bots.
  • Location: Can be anywhere in robots.txt.
  • Not Mandatory: Optional but recommended for SEO.
  • XML Format: Typically points to XML sitemap file.
  • Tool Displays: All sitemap URLs found.

Batch URL Testing

Test multiple URLs at once.

  • Input Format: One URL per line in batch text area.
  • Efficiency: Test 10, 20, or 100 URLs in one click.
  • Same User-Agent: All URLs tested with selected user-agent.
  • Quick Overview: See allowed/blocked count at top.
  • Color Coded: Green for allowed, red for blocked.
  • Use Cases: Audit entire site sections, verify rule changes.
  • Export Ready: Copy results for documentation.
  • Example: Test all /admin/, /api/, /private/ paths together.

Common Robots.txt Patterns

Frequently used configurations.

  • Block Admin: Disallow: /admin/
  • Block Private: Disallow: /private/
  • Allow Public in Admin: Allow: /admin/public/ then Disallow: /admin/
  • Block PDFs: Disallow: /*.pdf$
  • Block Parameters: Disallow: /*?sort= for filtering URLs.
  • Block Temp Files: Disallow: /temp/*
  • Block Search Results: Disallow: /search?
  • Allow Images: User-agent: Googlebot-Image then Allow: /

Syntax Validation

Checking for robots.txt errors.

  • Missing User-Agent: Allow/Disallow without User-agent is invalid.
  • Unknown Directives: Tool warns about unrecognized commands.
  • Line Numbers: Warnings show exact line number.
  • Typos: Common mistakes like "Dissallow" are caught.
  • Order Issues: Allow/Disallow before User-agent is wrong.
  • Empty Values: Disallow: (empty) means allow everything.
  • Comments: Lines starting with # are ignored.
  • Tool Validates: Checks syntax and shows warnings.

Testing Strategy

How to effectively test robots.txt.

  • Test Key Paths: Admin areas, private folders, API endpoints.
  • Test Both: Check paths you want blocked AND allowed.
  • Multiple User-Agents: Test Googlebot, Bingbot, etc. separately.
  • Edge Cases: Test paths with special characters, parameters.
  • After Changes: Always test after modifying robots.txt.
  • Before Deploy: Validate rules before uploading to live site.
  • Batch Testing: Use batch mode for comprehensive audit.
  • Document Results: Keep record of tested URLs and outcomes.

Common Mistakes

Frequent robots.txt errors to avoid.

  • Blocking CSS/JS: Google needs these for rendering pages.
  • Blocking Entire Site: Disallow: / prevents all crawling.
  • Forgetting Slash: /admin means /admin.html too, use /admin/
  • No Wildcards: /temp blocks /temp only, use /temp/* for folder.
  • Case Errors: /Admin/ does not block /admin/
  • Security Misconception: Robots.txt is not access control.
  • Exposing Secrets: Disallow reveals what you want hidden.
  • Wrong Location: robots.txt must be at domain root.

FAQ

How accurate is this robots.txt tester?
90-95% accurate for standard robots.txt testing. Parses syntax, matches patterns, and applies rules using the official robots.txt specification. Handles wildcards, user-agents, and rule priority correctly.
Can this tool fetch robots.txt from my live website?
No, browser security (CORS) prevents fetching from external domains. Instead, visit yoursite.com/robots.txt, copy the content, and paste it here. This keeps validation private and fast.
What does "longest matching path wins" mean?
If multiple rules match a URL, the most specific (longest) path takes precedence. Example: URL /admin/page.php matches both Disallow: /admin/ and Allow: /admin/page.php - the longer Allow wins.
Why does * (all bots) not apply when I select Googlebot?
If robots.txt has a specific User-agent: Googlebot block, those rules apply instead of *. Specific user-agent rules override the wildcard. Tool uses exact match first, then falls back to *.
Does robots.txt block users from accessing pages?
No! Robots.txt only asks bots not to crawl. Users can still access pages directly. For security, use proper authentication, not robots.txt. Disallow prevents indexing, not access.
What is the difference between Allow and Disallow?
Disallow: /path/ tells bots not to crawl. Allow: /path/ explicitly permits crawling (used to override more general Disallow rules). If no rule matches, default is allow.
Can I use robots.txt to remove pages from Google?
Disallow prevents future crawling but does not remove already-indexed pages. To remove, use Google Search Console removal tool or add noindex meta tag. Robots.txt blocks crawl, not index.
Should I block /admin/ in robots.txt?
Only if you want to prevent search engines from indexing admin pages. For security, use server-side authentication instead. Blocking in robots.txt reveals the path exists and can attract attackers.
What happens if I have syntax errors in robots.txt?
Bots may ignore invalid lines or misinterpret rules. Tool validates syntax and shows warnings with line numbers. Fix errors before deploying to ensure bots follow your intended rules.
Can I test URLs with query parameters?
Yes! Enter full path including parameters: /search?q=test. Tool handles patterns like /*?utm_ to block all URLs with utm parameters. Test both with and without parameters.

Related tools

Pro tip: pair this tool with Robots.txt Generator and Canonical Pagination Conflict Checker for a faster SEO workflow.