Django
🤖 On Robots.txt
I have spent a lot of mental energy thinking about how to be more defensive with the robots.txt
files in my projects.
robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.
In theory, this file helps control what search engines and AI scrapers are allowed to visit, but I need more confidence in its effectiveness in the post-AI apocalyptic world.
Over the last few weeks, I have added and updated a static robots.txt
file on several projects. Since then, I have noticed the number of known AI scrapers has doubled, and then some. See Dark Visitors for a comprehensive list of known AI agents.
Today, I decided to switch to the django-robots
project because I can update it from the Django admin. Since django-robot
’s rules are stored in a database, I can automate updating them.
My research so far
These websites and articles have seemed helpful so far.
- Block the Bots that Feed “AI” Models by Scraping Your Website
- Go ahead and block AI web crawlers
- Dark Visitors
- Your Pika robots.txt File
Django resources
Wednesday March 20, 2024