Stay organized with collections
Save and categorize content based on your preferences.
February 10, 2006
Earlier this week, we told you about a feature we made available through the Sitemaps program that
analyzes the robots.txt file for a site.
Here are more details about that feature.
What the analysis means
The Sitemaps robots.txt tool reads the robots.txt file in the same way Googlebot does. If the tool
interprets a line as a syntax error, Googlebot doesn't understand that line. If the tool shows
that a URL is allowed, Googlebot interprets that URL as allowed.
This tool provides results only for Google user-agents (such as Googlebot). Other bots may not
interpret the robots.txt file in the same way. For instance, Googlebot supports an extended
definition of the standard. It understands Allow: lines, as well as
* and $. So while the tool shows lines that include these extensions as
understood, remember that this applies only to Googlebot and not necessarily to other bots that
may crawl your site.
Subdirectory sites
A robots.txt file is valid only when it's located in the root of a site. So, if you are looking at
a site in your account that is located in a subdirectory (such as
https://www.example.com/mysite/), we show you information on the robots.txt file at
the root (https://www.example.com/robots.txt). You may not have access to this file,
but we show it to you because the robots.txt file can impact crawling of your subdirectory site
and you may want to make sure it's allowing URLs as you expect.
Testing access to directories
If you test a URL that resolves to a file (such as
https://www.example.com/myfile.html), this tool can determine if the robots.txt file
allows or blocks that file. If you test a URL that resolves to a directory (such as
https://www.example.com/folder1/), this tool can determine if the robots.txt file
allows or blocks access to that URL, but it can't tell you about access to the files inside that
folder. The robots.txt file may have set restrictions on URLs inside the folder that are different
than the URL of the folder itself.
If you test https://www.example.com/folder1/, the tool will say that it's blocked.
But if you test https://www.example.com/folder1/myfile.html, you'll see that it's not
blocked even though it's located inside of folder1.
Syntax not understood
You might see a "syntax not understood" error for a few different reasons. The most common one is
that Googlebot couldn't parse the line. However, some other potential reasons are:
The site doesn't have a robots.txt file, but the server returns a status of 200 for
pages that aren't found. If the server is configured this way, then when Googlebot requests the
robots.txt file, the server returns a page. However, this page isn't actually a robots.txt
file, so Googlebot can't process it.
The robots.txt file isn't a valid robots.txt file. If Googlebot requests a robots.txt file and
receives a different type of file (for instance, an HTML file), this tool won't show a syntax
error for every line in the file. Rather, it shows one error for the entire file.
The robots.txt file containes a rule that Googlebot doesn't follow. Some user-agents obey rules
other than the
robots.txt standard.
If Googlebot encounters one of the more common additional rules, the tool lists them syntax
errors.
Known issues
We are working on a few known issues with the tool, including the way the tool processes
capitalization and the analysis with Google user-agents other than Googlebot. We'll keep you
posted as we get these issues resolved.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eGoogle's Sitemaps robots.txt tool analyzes a website's robots.txt file, mirroring how Googlebot interprets it for crawling purposes, identifying any syntax errors or allowed URLs.\u003c/p\u003e\n"],["\u003cp\u003eThe tool focuses on Google's user-agents (like Googlebot), which may interpret robots.txt differently than other bots, especially with extended definitions and allowances.\u003c/p\u003e\n"],["\u003cp\u003eWhen analyzing subdirectory sites, the tool considers the root robots.txt file, as it impacts crawling even if you don't have direct access to that file.\u003c/p\u003e\n"],["\u003cp\u003eWhile the tool can assess file and directory access based on robots.txt rules, it may not capture granular restrictions within folders, requiring individual URL testing.\u003c/p\u003e\n"],["\u003cp\u003e"Syntax not understood" errors can indicate various issues like server misconfiguration, invalid file types, or unsupported rules beyond the standard robots.txt protocol.\u003c/p\u003e\n"]]],["The Sitemaps robots.txt tool analyzes a site's robots.txt file as Googlebot does, highlighting any syntax errors. It shows whether URLs are allowed or blocked for Google user-agents, including extended definitions like \"Allow:,\" \"*,\" and \"$.\" The tool checks the root robots.txt file even for subdirectory sites. Testing URLs that resolve to a directory only shows if access to that directory itself is allowed. Errors in syntax include unparseable lines, non-existent robots.txt files with a 200 status, or invalid file types.\n"],null,["February 10, 2006\n\n\nEarlier this week, we told you about a feature we made available through the Sitemaps program that\n[analyzes the robots.txt file for a site.](/search/blog/2006/02/more-stats-and-analysis-of-robotstxt)\nHere are more details about that feature.\n\nWhat the analysis means\n\n\nThe Sitemaps robots.txt tool reads the robots.txt file in the same way Googlebot does. If the tool\ninterprets a line as a syntax error, Googlebot doesn't understand that line. If the tool shows\nthat a URL is allowed, Googlebot interprets that URL as allowed.\n\n\nThis tool provides results only for Google user-agents (such as Googlebot). Other bots may not\ninterpret the robots.txt file in the same way. For instance, Googlebot supports an extended\ndefinition of the standard. It understands `Allow:` lines, as well as\n`*` and `$`. So while the tool shows lines that include these extensions as\nunderstood, remember that this applies only to Googlebot and not necessarily to other bots that\nmay crawl your site.\n\nSubdirectory sites\n\n\nA robots.txt file is valid only when it's located in the root of a site. So, if you are looking at\na site in your account that is located in a subdirectory (such as\n`https://www.example.com/mysite/`), we show you information on the robots.txt file at\nthe root (`https://www.example.com/robots.txt`). You may not have access to this file,\nbut we show it to you because the robots.txt file can impact crawling of your subdirectory site\nand you may want to make sure it's allowing URLs as you expect.\n\nTesting access to directories\n\n\nIf you test a URL that resolves to a file (such as\n`https://www.example.com/myfile.html`), this tool can determine if the robots.txt file\nallows or blocks that file. If you test a URL that resolves to a directory (such as\n`https://www.example.com/folder1/`), this tool can determine if the robots.txt file\nallows or blocks access to that URL, but it can't tell you about access to the files inside that\nfolder. The robots.txt file may have set restrictions on URLs inside the folder that are different\nthan the URL of the folder itself.\n\nConsider this robots.txt file: \n\n```\nUser-Agent: *\nDisallow: /folder1/\n\nUser-Agent: *\nAllow: /folder1/myfile.html\n```\n\n\nIf you test `https://www.example.com/folder1/`, the tool will say that it's blocked.\nBut if you test `https://www.example.com/folder1/myfile.html`, you'll see that it's not\nblocked even though it's located inside of `folder1`.\n\nSyntax not understood\n\n\nYou might see a \"syntax not understood\" error for a few different reasons. The most common one is\nthat Googlebot couldn't parse the line. However, some other potential reasons are:\n\n- The site doesn't have a robots.txt file, but the server returns a status of `200` for pages that aren't found. If the server is configured this way, then when Googlebot requests the robots.txt file, the server returns a page. However, this page isn't actually a robots.txt file, so Googlebot can't process it.\n- The robots.txt file isn't a valid robots.txt file. If Googlebot requests a robots.txt file and receives a different type of file (for instance, an HTML file), this tool won't show a syntax error for every line in the file. Rather, it shows one error for the entire file.\n- The robots.txt file containes a rule that Googlebot doesn't follow. Some user-agents obey rules other than the [robots.txt standard](https://www.rfc-editor.org/rfc/rfc9309.html). If Googlebot encounters one of the more common additional rules, the tool lists them syntax errors.\n\nKnown issues\n\n\nWe are working on a few known issues with the tool, including the way the tool processes\ncapitalization and the analysis with Google user-agents other than Googlebot. We'll keep you\nposted as we get these issues resolved.\n\nPosted by [Vanessa Fox](https://www.vanessafox.com/)"]]