Facebook's Fascination with My Robots.txt
2 days ago
- #Web Crawling
- #Robots.txt
- Facebook has been repeatedly accessing the author's /robots.txt file on their self-hosted Forgejo instance for the past 4 days.
- The requests are coming from Meta's IP ranges and use the user-agent 'facebookexternalhit/1.1'.
- Only the robots.txt file is being accessed, with no other files or paths requested.
- Facebook's documentation states that their crawler is meant to gather metadata for shared links, but the author doubts their site is being widely shared.
- The author speculates whether this is a bug or misconfiguration on Meta's end, questioning the global bandwidth and energy usage of such repetitive requests.
- Compared to previous AI bot traffic, this is mostly benign but remains an odd and interesting observation.