Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
July 1, 2019 05:20 pm

Google's Robots.txt Parser is Now Open Source

From a blog post: For 25 years, the Robots Exclusion Protocol (REP) was only a de-facto standard. This had frustrating implications sometimes. On one hand, for webmasters, it meant uncertainty in corner cases, like when their text editor included BOM characters in their robots.txt files. On the other hand, for crawler and tool developers, it also brought uncertainty; for example, how should they deal with robots.txt files that are hundreds of megabytes large? Today, we announced that we're spearheading the effort to make the REP an internet standard. While this is an important step, it means extra work for developers who parse robots.txt files. We're here to help: we open sourced the C++ library that our production systems use for parsing and matching rules in robots.txt files. This library has been around for 20 years and it contains pieces of code that were written in the 90's. Since then, the library evolved; we learned a lot about how webmasters write robots.txt files and corner cases that we had to cover for, and added what we learned over the years also to the internet draft when it made sense.

Read more of this story at Slashdot.


Original Link: http://rss.slashdot.org/~r/Slashdot/slashdot/~3/eIlavNv2Zxc/googles-robotstxt-parser-is-now-open-source

Share this article:    Share on Facebook
View Full Article

Slashdot

Slashdot was originally created in September of 1997 by Rob "CmdrTaco" Malda. Today it is owned by Geeknet, Inc..

More About this Source Visit Slashdot