You cannot have our user's data
a year ago
- #Open source
- #LLM crawlers
- #Data privacy
- SourceHut has deployed Anubis to protect against aggressive LLM crawlers.
- Terms of service allow automated tools for archival or open-access research but prohibit use for recruiting, solicitation, or profit.
- Proposed updates to terms include stricter rules for crawlers, requiring clear User-Agent headers and adherence to robots.txt.
- Robots.txt explicitly disallows marketing crawlers, ML model feeders, and aggressive bots.
- LLM scrapers are criticized for ignoring copyright and causing performance issues.
- Some argue sysadmins should optimize or negotiate with LLM companies, but SourceHut rejects this.
- SourceHut believes LLM companies are not entitled to user data, which is meant for open-source contributors.
- No special arrangements will be made to share data with LLM companies, even for payment.
- SourceHut is funded by subscriptions, not by selling user data.