Hasty Briefsbeta

Bilingual

You cannot have our user's data

a year ago
  • #Open source
  • #LLM crawlers
  • #Data privacy
  • SourceHut has deployed Anubis to protect against aggressive LLM crawlers.
  • Terms of service allow automated tools for archival or open-access research but prohibit use for recruiting, solicitation, or profit.
  • Proposed updates to terms include stricter rules for crawlers, requiring clear User-Agent headers and adherence to robots.txt.
  • Robots.txt explicitly disallows marketing crawlers, ML model feeders, and aggressive bots.
  • LLM scrapers are criticized for ignoring copyright and causing performance issues.
  • Some argue sysadmins should optimize or negotiate with LLM companies, but SourceHut rejects this.
  • SourceHut believes LLM companies are not entitled to user data, which is meant for open-source contributors.
  • No special arrangements will be made to share data with LLM companies, even for payment.
  • SourceHut is funded by subscriptions, not by selling user data.