🤖 On Robots.txt

I have spent a lot of mental energy thinking about how to be more defensive with the robots.txt files in my projects.

robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.

https://en.wikipedia.org/wiki/Robots.txt

In theory, this file helps control what search engines and AI scrapers are allowed to visit, but I need more confidence in its effectiveness in the post-AI apocalyptic world.

Over the last few weeks, I have added and updated a static robots.txt file on several projects. Since then, I have noticed the number of known AI scrapers has doubled, and then some. See Dark Visitors for a comprehensive list of known AI agents.

Today, I decided to switch to the django-robots project because I can update it from the Django admin. Since django-robot’s rules are stored in a database, I can automate updating them.

My research so far

These websites and articles have seemed helpful so far.

🤖 On Robots.txt

Mar 20, 2024

My research so far

Django resources