Stay organized with collections
Save and categorize content based on your preferences.
Monday, September 21, 2020
Last year we released the
robots.txt parser and matcher that we use in
our production systems to the open source world. Since then, we've seen people build new tools
with it,
contribute to the
open source library (effectively improving our production systems- thanks!), and release new
language versions like golang and
rust, which make it easier for
developers to build new tools.
With the intern season ending here at Google, we wanted to highlight two new releases related to
robots.txt that were made possible by two interns working on the Search Open Sourcing team,
Andreea Dutulescu and
Ian Dolzhanskii.
Robots.txt Specification Test
First, we are releasing a
testing framework for robots.txt
parser developers, created by Andreea. The project provides a testing tool that can validate
whether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently
there is no official and thorough way to assess the correctness of a parser, so Andreea built a
tool that can be used to create robots.txt parsers that are following the protocol.
Java robots.txt parser and matcher
Second, we are releasing an official
Java port of the C++ robots.txt parser,
created by Ian. Java is the
3rd most popular programming language
on GitHub and it's extensively used at Google as well, so no wonder it's been the most requested
language port. The parser is a 1-to-1 translation of the C++ parser in terms of functions and
behavior, and it's been thoroughly tested for parity against a large corpora of robots.txt
rules. Teams are already planning to use the Java robots.txt parser in Google production
systems, and we hope that you'll find it useful, too.
As usual, we welcome your contributions to these projects. If you built something with the
C++ robots.txt parser or with these new
releases, let us know so we can potentially help you spread the word! If you found a bug, help
us fix it by opening an issue on GitHub or directly contributing with a pull request. If you
have questions or comments about these projects, catch us on
Twitter!
It was our genuine pleasure to host Andreea and Ian, and we're sad that their internship is
ending. Their contributions help make the Internet a better place and we hope that we can
welcome them back to Google in the future.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eGoogle released a testing framework for robots.txt parser developers to validate adherence to the Robots Exclusion Protocol.\u003c/p\u003e\n"],["\u003cp\u003eGoogle released an official Java port of the C++ robots.txt parser, making it accessible to a wider developer community.\u003c/p\u003e\n"],["\u003cp\u003eBoth projects were developed by interns on the Google Search Open Sourcing team and are available on GitHub.\u003c/p\u003e\n"],["\u003cp\u003eGoogle encourages contributions, bug reports, and feedback on these open source projects.\u003c/p\u003e\n"]]],["Google released two new robots.txt-related tools: a testing framework for parser developers and a Java port of the C++ parser. The testing tool, created by Andreea, validates robots.txt parsers against the Robots Exclusion Protocol. The Java parser, developed by Ian, mirrors the C++ version and is already planned for use in Google production. Contributions, bug reports, and feedback are encouraged via GitHub or Twitter. The work was completed by two Google interns.\n"],null,["Monday, September 21, 2020\n\n\nLast year we released the\n[robots.txt parser and matcher](https://github.com/google/robotstxt) that we use in\nour production systems to the open source world. Since then, we've seen people build new tools\nwith it,\n[contribute](https://github.com/google/robotstxt/pulls?q=is%3Apr+is%3Amerged) to the\nopen source library (effectively improving our production systems- thanks!), and release new\nlanguage versions like [golang](https://github.com/google/robotstxt/issues/29) and\n[rust](https://github.com/google/robotstxt/issues/31), which make it easier for\ndevelopers to build new tools.\n\n\nWith the intern season ending here at Google, we wanted to highlight two new releases related to\nrobots.txt that were made possible by two interns working on the Search Open Sourcing team,\n[Andreea Dutulescu](https://www.linkedin.com/in/andreea-nicoleta-dutulescu) and\n[Ian Dolzhanskii](https://www.linkedin.com/in/ian-dolzhanskiy-6297a119b/).\n\nRobots.txt Specification Test\n\n\nFirst, we are releasing a\n[testing framework](https://github.com/google/robotstxt-spec-test/) for robots.txt\nparser developers, created by Andreea. The project provides a testing tool that can validate\nwhether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently\nthere is no official and thorough way to assess the correctness of a parser, so Andreea built a\ntool that can be used to create robots.txt parsers that are following the protocol.\n\nJava robots.txt parser and matcher\n\n\nSecond, we are releasing an official\n[Java port of the C++ robots.txt parser](https://github.com/google/robotstxt-java),\ncreated by Ian. Java is the\n[3rd most popular programming language](https://madnight.github.io/githut/#/pull_requests/2020/2)\non GitHub and it's extensively used at Google as well, so no wonder it's been the most requested\nlanguage port. The parser is a 1-to-1 translation of the C++ parser in terms of functions and\nbehavior, and it's been thoroughly tested for parity against a large corpora of robots.txt\nrules. Teams are already planning to use the Java robots.txt parser in Google production\nsystems, and we hope that you'll find it useful, too.\n\n\nAs usual, we welcome your contributions to these projects. If you built something with the\n[C++ robots.txt parser](https://github.com/google/robotstxt) or with these new\nreleases, let us know so we can potentially help you spread the word! If you found a bug, help\nus fix it by opening an issue on GitHub or directly contributing with a pull request. If you\nhave questions or comments about these projects, catch us on\n[Twitter](https://twitter.com/googlesearchc)!\n\n\nIt was our genuine pleasure to host Andreea and Ian, and we're sad that their internship is\nending. Their contributions help make the Internet a better place and we hope that we can\nwelcome them back to Google in the future.\n\n\nPosted by [Edu Pereda](https://twitter.com/epere4) and [Gary Illyes](https://garyillyes.com/+), Google Search Open Sourcing team"]]