ūü§Ė On Robots.txt

I have spent a lot of mental energy thinking about how to be more defensive with the robots.txt files in my projects.

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.


In theory, this file helps control what search engines and AI scrapers are allowed to visit, but I need more confidence in its effectiveness in the post-AI apocalyptic world.

Over the last few weeks, I have added and updated a static robots.txt file on several projects. Since then, I have noticed the number of known AI scrapers has doubled, and then some. See Dark Visitors for a comprehensive list of known AI agents.

Today, I decided to switch to the django-robots project because I can update it from the Django admin. Since django-robot’s rules are stored in a database, I can automate updating them.

My research so far

These websites and articles have seemed helpful so far.

Django resources

Jeff Triplett @webology